Redefining Data Science Roles: Beyond Technical Skills and Traditional Job Descriptions - ML 155 - Adventures in Machine Learning -

Redefining Data Science Roles: Beyond Technical Skills and Traditional Job Descriptions - ML 155

In today's episode, Michael Berk and Ben Wilson dive deep into the intricacies of technical interviews for machine learning roles. They discuss the importance of assessing candidates' genuine knowledge of traditional and deep learning models and the value of being candid about one's expertise.

Hosted by:

Ben Wilson •

Michael Berk

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

They explore how technical skills, particularly in applied machine learning, are evaluated with a focus on their impact on business outcomes. Michael and Ben also address the common misalignments between job descriptions and the actual skills required, stressing the need for problem-solving capabilities and critical thinking over memorized knowledge.

Additionally, they delve into the roles within data science—analysts, applied ML specialists, and researchers—highlighting the importance of fitting the right skills to the right job. They also touch on the evolving expectations and frustrations with the current hiring process, offering insights on how it can be improved.

Stay tuned as they unpack these topics and more, including valuable tips for showcasing your skills effectively on resumes, and the significance of asking insightful questions during interviews. Whether you’re an aspiring data scientist or a seasoned professional, this episode is packed with practical advice and industry insights you won’t want to miss!

Socials

Transcript

Michael Berk [00:00:05]:
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. And I am joined by my co host.

Ben Wilson [00:00:14]:
Ben Wilson. I write blogs about MLflow, at Databricks.

Michael Berk [00:00:20]:
Really? That's that's your your primary use of time, Ben?

Ben Wilson [00:00:24]:
For this afternoon, yes. Not normally, but Nice.

Michael Berk [00:00:28]:
What's the blog on?

Ben Wilson [00:00:29]:
Unique every time. Right?

Michael Berk [00:00:31]:
Yeah. I've you've only duplicated once. I've been counting. What's the blog on?

Ben Wilson [00:00:39]:
New features in MO flow deep learning volume 2. Okay. So somebody in the field, one of our, products, specialists that we work with, basically wrote up instructions on how to do PEPF based fine tuning of transformers and then leveraging MLflow's new deep learning visualizations, which are very slick by the way, and stuff like automatic checkpointing of, like we're doing auto logging for epochs. So we'll take the actual model weights and checkpoint them periodically for a user, which is super usual

Michael Berk [00:01:17]:
Oh, yeah. Wow.

Ben Wilson [00:01:18]:
When you're doing fine tuning of LLMs because that stuff can take days to run. And if anything happens, like you use your g you lose your GPU midway through training, that sucks. So you can resume from where it failed.

Michael Berk [00:01:34]:
And it'll be model agnostic and training process agnostic?

Ben Wilson [00:01:39]:
Yeah. For PyTorch and TensorFlow with Keras. Yeah. Wow.

Michael Berk [00:01:44]:
That's super cool.

Ben Wilson [00:01:45]:
Yeah. I was showing, like, showing people how to do it with PyTorch Lightning because that's a nice high level API, and PETH is super cool.

Michael Berk [00:01:52]:
Yeah. Interesting. I might have to read that.

Ben Wilson [00:01:56]:
Yeah. You'll like it, man.

Michael Berk [00:01:59]:
So as you guys have astutely noticed by now, we do not have a guest. Today is a panelist episode, and so that means Ben and I are just gonna talk about random things. But there will be hopefully a lot of value over the next 45 minutes. Ben has a lot of wisdom in this space, and, I can ask a lot of dumb questions. And we'll see if we can sort of distill signal from noise and determine what skills in machine learning are actually valuable and what buzzwords are just buzzwords and crap. And so the prompt for this is we've been chatting for, like, I don't know, an hour now. And we've been going through lists of the top data science skills. So I googled top 100 data science skills.

Michael Berk [00:02:44]:
All the lists are a joke. And we are specifically looking at a ZipRecruiter article that, seems to be the best out of all the resources that we found, and it's still really, really bad. The reason it is really, really bad is it's so high level and so impossible to, like, actually complete that it sort of loses a lot of meaning, at least for me. So, currently, we're looking at the top skills mentioned in job descriptions, and I'll quickly shout them out. Python, machine learning, statistics, SQL, collaboration, technical, analysis, innovation, computer science, and data analytics. So to me, almost none of these have any meaning whatsoever except for maybe SQL. Ben, what's your high level take?

Ben Wilson [00:03:40]:
I mean, directionally, some of these are valid, but the problem I have with reports like this is that somebody in their sophomore year of undergrad is probably looking at this stuff. Like, how am I gonna get a job? How am I gonna get an internship? I wanna work someplace cool. And they go and look at something like this, and that informs their class loading. Exactly. Or even worse, somebody's 6 months out of school. They haven't had an opportunity to get a job offer yet, and they're still, you know, waiting table somewhere and just trying to to figure out, like, what company should I go and, you know, try to get hired at. And they get some feedback from an interview, and they're like, well, you don't you don't have experience in this area. The whole system is broken, in my opinion, and it starts with really crappy job descriptions and companies that are looking for people to get into the data science field that are are mapping what their current projects are and what skills that people use in that team or the the skills that the people in that team claim that they have at an expert level.

Ben Wilson [00:04:59]:
And then they just put that into a document and say, we're looking for people, you know, with these skills. Somebody coming straight out of school or even with, like, you know, less than 10 years experience, they're not usually gonna have all of these things. And it's it's worded so broadly that it does a disservice to candidates that are coming in. It makes, like, interviewers who don't know how to properly interview just start going through a checklist and saying, like, qualified, not qualified, or the recruiters that are going through and looking at resumes. So it just creates this game system that I think is broken where everybody's trying to game the system and try to be like you get people that are claiming they know something about these topics, but they actually don't, or they they know such a cursory level of that information that it's impossible for the company that's looking to acquire somebody with those skills to know if that person actually has the skills that they're looking for in that topic. So it's it's just disingenuous and misleading, and it annoys me.

Michael Berk [00:06:09]:
Alright. So I have seen a bunch of resumes I've hired at multiple companies and reviewed like, done resume screening, done interviews. And there's always that little skill section at the bottom of this, like Python, SQL, innovation.

Ben Wilson [00:06:27]:
And this is why. This is why it exists. This is because of stupid articles like this.

Michael Berk [00:06:31]:
Exactly. Yeah. So these are the top mentioned, things in job descriptions and below it, we can see the top skills mentioned in resumes. And there's definitely some overlap. But the so the real question is, like, what does Python mean? And how do you sort of showcase that you know Python for a given job description? So let's say I'm I'm applying. What should I say in my resume or my cover letter that shows that I actually know what Python is for this job description.

Ben Wilson [00:07:03]:
The first thing that comes to mind for me would be a definition of a Southeast Asian snake that lives in the jungle. That's about as useful as mentioning something like that on a resume or in a job description. Like, it's a a programming language, so mentioning that you understand its syntax and how to apply it or how to do things with it is better served by explaining projects that you've done.

Michael Berk [00:07:29]:
Exactly.

Ben Wilson [00:07:29]:
Like, I used I I I built project x that resulted in y amount of, you know, improvement to process or revenue or whatever it may be by using this library, these techniques, and this programming language. And that can tell somebody who's reading it, like, okay. They built they built a cool app in Python. Sweet. And that would instantly tell me what area of Python they know. So if it's data science, like, okay. They know they probably know how to use NumPy. They know Pandas.

Ben Wilson [00:08:06]:
They know scikit learn or XGBoost. They know how to, like, use these language like, these frameworks within Python in order to build something with applied ML. And if they mention, like, some hyperparameter tuning library, cool. It's gonna open up a a door for conversation for me with them to say, okay. I know how I would probably think about building a solution to this problem, maybe, or I'll ask them details about it. And then I'll ask them the things that I know are gotchas in implementing something like that. And sometimes it is, like, what was the nature of the data? How many features did you include? How did you determine which features were important and which ones to get rid of? How many versions of this did you deploy, and what was that process like? What when you went back and revisited it a month later or 2 weeks later after initial release, what were you looking at to make it better? And how did you fix that, or how did you correct that? How did you do version control of that? That would give me more information about the usage of Python than just saying I know Python. Because, I mean, I program in Python every single day.

Ben Wilson [00:09:30]:
Yeah. But do you know Python? I know parts of Python. Do I know every core library? Hell no. Like, there's no way. Do I know how to use its syntax? Do I know how to do the things that like, how to do certain tasks in Python? Yeah. Do I know how to do those same tasks in Scala and Java? And recently, I've been learning how to do that in JavaScript. Yeah. But it's just a languages implementation of a computer science fundamental.

Ben Wilson [00:10:05]:
Like, can you traverse a list efficiently? Well, Python has a way of doing that. Scala has a way of doing that. Java has a way of doing that. So it's saying that you know a language is completely irrelevant. You should know the foundation of what you can do in that language and what that language can and can't do or what it's optimized for and then what potential workarounds there are. But that those are technical skills. Like, there's nothing that knowing that regardless of how well you know it, unless you don't know it at all. If you're an intermediate level skill in that language at doing tasks that you've been working on versus you're an air quote expert in that language, that has no bearing on your performance in that job, like, from what I've seen.

Ben Wilson [00:10:57]:
You can know tons about the language and know all of its idiosyncrasies and know about, you know, how the the core implementations are are done in the language itself. You're not gonna use that stuff. You're using high level APIs because it makes your code simpler.

Michael Berk [00:11:16]:
Right. Okay. So what this has prompted me to do is go on the job descriptions for my current role at Databricks to see how much buzzword, how many buzzwords are leveraged. And, I mean, I guess, it's sort of a fundamental question. Right? Like, if the process of interviewing is typically a resume screener followed by a hiring manager phone screen followed by, like, an on-site of 6, like, deep dive interviews followed by, like, references. At least that's what it was for Databricks and and to be my prior company. How do you appease everybody in that process? Because a a technical recruiter probably just wants the buzzwords right. Is that is that a fair assumption?

Ben Wilson [00:12:03]:
I mean, the recruiter is the first line screen to make sure you're not wasting the time of the hiring manager and the interviewers. That's all they're there for.

Michael Berk [00:12:12]:
Right. I

Ben Wilson [00:12:12]:
mean, if you're a desirable company, they're the filter. If you're not a desirable company, they're the acquirer. Right. They're going out trying to find talent and convince them to come and have an interview. But they're knowing that they're a filter, you should be at least referencing some of these things that are in the job description if they are relevant to things that you've done and including them in the text that explains your impact to the company or to your team in the projects that you've worked on. Right. But just stating, like, I'm a level 4 Python developer. What does that mean? I I I don't know.

Ben Wilson [00:12:55]:
I wouldn't know how to rank anybody that I work with currently who write Python code all day every day. They also write in multiple different languages. We don't think like that. We try to solve the problem, like, the business problem that's that's arising or, like, build this feature because people want this feature, and it sounds like a good idea. We our mechanism for doing that is this language or one of these languages. So we just go and do that, and then we review each other's code and make sure that it's good and maintainable. Right.

Michael Berk [00:13:32]:
Okay. So got it. That makes sense. So I'm currently looking at the job description, and all of there's, like, no specifics, and that sorta makes sense. Job descriptions would look for high level buckets of skills, and then resume should sort of plug into those buckets with very specific projects. Matching the high level job description terms with resume, high level job description terms, that's not a good strategy because it doesn't really show, especially in the interview process, what you've done. Right. Okay.

Michael Berk [00:14:02]:
So that makes sense. So this article, we might be, like, bashing incorrectly at least on the job description side. Let's go down to the top skills mentioned in resumes for now. So, the first This is where the problem is. Exactly. Yeah. So for instance, just to put a face to the name, let's say I'm hiring for, ETL's ability, ability to write extract transform load transformations in Spark. Well, on a job description, you would say ability to write ETL in Spark because there are many different ways that can be performed in a proficient level.

Michael Berk [00:14:38]:
And then a good resume that would fill in that job description requirement would say, built a CDC system from x to y that transferred 10 terabytes of data a day in a scalable and efficient manner. There's something like that with some more technical jargon to know what actually happened. And there are many different versions of that bullet point that would check the box of being able to write ETL in Spark. So, going to this, like, buzzwords that are mentioned in resumes, let's just go through them really quick. Machine learning, SQL, Python, analysis, data analytics, Tableau, statistics, database, technical, and Amazon Web Services.

Ben Wilson [00:15:26]:
Cool. So machine learning, in case any listener hadn't been aware for the last, I don't know how many hundreds of podcasts we've done. It's a pretty broad topic. And Really?

Michael Berk [00:15:40]:
I thought it was just linear aggression.

Ben Wilson [00:15:42]:
I I've seen people on resumes claim stuff like that, like expert machine learning, and then start you know, I'm always curious, and I'll ask somebody in an interview. Like, so what what are you an expert on in machine learning? Like, I see these 2 projects that you put on your resume that are you know, sound interesting, but I'll I'll just ask them. Like, how do you what's your handle on this? And the good candidates will be very open and honest and be like, well, I've used these tool kits before, and and I've I'm familiar with, like, these sorts of things. Like, I'm really good with logistic regression models and and tree based models. And I I've done a couple of deep learning projects. So I have a a bit of an experience in that, but I'm not an expert. So if somebody's open and honest, I'm like, sweet. Okay.

Ben Wilson [00:16:41]:
I'm gonna verify their level of knowledge of the traditional stuff, you know, these linear based models and tree based models, and we'll we'll geek out about those algorithms and how to what the nuances are of them. And then we'll ask, you know, ask a couple of things about deep learning. But some like, for the other 60% of people that I've talked to, they just double down on that. Like, yeah. I'm an expert. And then I start asking about, like, how how does a a linear model get solved? Can you, like, walk me through that process of how that thing optimizes? Or or what is an optimizer, and and how does it work? And the vast majority of people that that claim they're an expert in a broad topic like that, they don't know. They're really good of using a high level API and copying code from the Internet and, you know, hacking it together to get something to output, and they equate the fact that their code executes without exception as being successful. So I call those those people, like, Kaggle developers, where they're interested in taking example code, and they get excited when it runs without blowing up, which has nothing to do with the concept of machine learning, like applied machine learning at a company.

Ben Wilson [00:18:06]:
Because the focus is not on, oh, I got this really I I got my accuracy really up on this model. It's like nobody cares, man, like at all. Like, I don't care. I don't care what that accuracy score is. The the thing that I would care about is how was that prediction used in order to influence the business? Like, how did your project work? The technical details are irrelevant when you're talking about applied ML. It's not about the accuracy. That's what you do when you're getting ready to ship something. You wanna make sure that you've tuned it and that you've gone through this iterative process of making it the best it can be and then responding over time to make it better and better by iterative development.

Ben Wilson [00:18:54]:
And you also I also go to into that stuff in interviews, but that has nothing to do with the technical skills of machine learning, in a broad sense.

Michael Berk [00:19:05]:
Alright. Here's so a bunch of questions, actually. I'll start off with a prompt, which is whenever I am doing technical interviews, my goal is to figure out the limits of the candidate's knowledge. And sometimes it will exceed my knowledge on a given topic, but I think it's a massive red flag, and it just won't fly in a 45 minute interview to basically abstract what you don't know and use general buzzword terms. Like, I will ask why or how until you say I don't know or until you prove that you do know it up to the, basically, levels that are required in the job description. Ben, it seems like you do something similar. Do you find it a do you think it's a red flag if, people don't admit what they don't know?

Ben Wilson [00:19:56]:
Oh, yeah. Yeah. Yeah. I I've never approved, a candidate moving on who's not honest about not knowing something. I've even, like, prevented people from being hired at Databricks because even though they knew an incredible amount of information about and had it seemed like they had an incredible skill in this one area. But then when I started poking into an area that I had a pretty pretty good hunch that what they claimed they knew wasn't what they actually knew, and they just would fight in an interview, not aggressively. And some people have fought aggressively and got very irritated. And that's that's a no brainer to move on, like, pass on somebody.

Ben Wilson [00:20:42]:
But they just it's like they have a a psychological block of not being able to admit. Like, yeah, I don't know that quite like, that topic that you just mentioned because that behavior is dangerous in a company. If you're building if you have customers that are gonna be using something that you're building and you are unable to admit that you don't know something, you're gonna have problems on, like, higher date plus 21, at least on the r and d side. It would be very bad because you're gonna you're gonna, like, produce stuff that you don't understand, that you don't know why it works, or you don't have context of why something you just built could be a problem. And it merges, it deploys, and then you're like, okay. We just created an incident here, and customers are furious that this is totally broken. It's dangerous. Alright.

Ben Wilson [00:21:46]:
So that's on the

Michael Berk [00:21:47]:
I don't know side. What about the I know side? What are the green flags and the red flags when someone's in it is explaining something? So for instance, to put a little context from my side, I am pretty bad at remembering, like, implementation notes. So I do know how OLS works. I do know, like, a bit about the other solvers for linear models, but I have intentionally not committed a lot of those things to memory and instead tried to learn the the principles and tenants of solving a convex shape or, like, whatever the if it's linearly tractable, for instance, you should do this. If it's not, you should use this. And those types of concepts. And I have intentionally not memorized the 5 steps of a lot of these solvers, for instance. So if you ask me that question in an interview, I could say with analogies how it works, I could say when to use it, but I probably couldn't walk through the math of anything but OLS.

Michael Berk [00:22:52]:
So would that be a red flag for you? Would that be a green flag for you? How do you demonstrate knowledge and capacity and skill without having a bunch of stuff memorized?

Ben Wilson [00:23:03]:
So definitely not a red flag. My red flag is somebody who's incapable of doing what you just explained. So even if you have deep knowledge of, like, you have an idiotic memory and you memorize the actual implementation details in in c or Fortran for a solver and you know, like, you can read the code in your head, that's absolutely useless if you're in the business of working with other humans. Right? So if I ask for an explanation of how something highly technical works, I'm not looking for and I I'm open with people when I say when I ask the question, like, I don't want the mathematical proof for this. I want how does this actually work? Explain it to me like I'm 5. And if somebody's incapable of doing that, that means they don't really know it. It means either well, it's one of 2 things. They either don't know it, or they don't understand what I'm asking for, which is also a red flag.

Ben Wilson [00:24:10]:
They don't know that, hey. I'm looking for how you would explain this to a colleague who's trying to ask you for advice on what to do. And if you can't explain something in abstract terms, in simple terms, in a brief, you know, concise way, how are you gonna interface with your colleagues? Got it. If you're the expert and you can't make it approachable, you're useless. Right? Nobody develops solutions in a vacuum.

Michael Berk [00:24:41]:
Yeah. Exactly. So I think you've hinted at one of my interview hacks, which is there's gonna be a lot of open ended questions. Right? It's like, how would you build a machine learning model to reduce churn or whatever it is? Those are typically found on on, like, a lot of a lot of the, like, deep dive on-site interviews. And one thing that I found really valuable in this space is steering into your sweet spot. And so, if you really know linear regression, like, spend an outsized amount of time understanding all the nuances of it. And then if you can talk through basically how it works, how it would be implemented. In a 45 minute interview, typically, you can't cover every base of something like, quote, unquote, machine learning.

Michael Berk [00:25:29]:
But if you can demonstrate skill in this one area, and then for the other stuff to have, like, a one sentence or a high level, that's typically enough to get you through the interview process. What are your thoughts, Ben?

Ben Wilson [00:25:44]:
Yeah. I mean, that's kinda what I'm looking for when, when I used to do the the interviews in the field for, like, data science people, or ETL data engineers. I was always asking just one question for the entire interview. Like, we would do a quick chitchat. Hey. How's it going? Thanks for taking your time to come and talk to me, and let's talk a little bit about, you know, something to break the ice, sort of. Right? And after that 3 minutes is up, I would tell people, like, here's the question I'm gonna be asking you for the next 42 minutes. There's no right or wrong answer here.

Ben Wilson [00:26:22]:
All I wanna do is here is just listen to how you think. And I'll I'll cut you off, and I'll ask you questions and want you to go a little bit deeper in into certain areas, but here's the problem that we're gonna solve together, and let's talk through it. And I would pick something different every time, and it was usually something that I had to actually build for a customer, and then just talk about it in abstract ways. But the people that were very successful were the ones that distilled down the actual problem, and we're focusing on the important aspects of it. So they were like, I need to talk to the business first. What are our goals? And then based on that, where where's our data? What is the nature of the data? How clean is it? Have we done has it been validated? If not, then we have to go and do that. And they would walk through this whole process of getting up to production deployment, and then I would throw them curve balls, like, during that. Like, well, what if we have this amount of, you know, this amount of data that we need to serve, or this is the nature of this this model that we're building, and here's the the attributes that we have available at inference time.

Ben Wilson [00:27:39]:
How do we build that? And the people that could step back and see the forest for the trees, those were the the strong thumbs up for me because we have the Internet. Right? You can look stuff up. You can research it as you're, you know, investigating something. That holistic view of being able to step back and look at a problem, that's something that the Internet is not gonna help you with. Experience, like, valid, useful, real world experience is what's gonna help you with that and being able to just think. And that's really what you should be looking for in a good candidate, somebody who can think. Do you think somebody who can think can learn anything?

Michael Berk [00:28:23]:
Do you think that's globally true for most interviews or a lot of interviews about memorizing flashcards?

Ben Wilson [00:28:31]:
Memorizing flashcards. I mean, I've been at the receiving end of those interviews before, where, I mean, I usually I've never put myself into a position where I'm interviewing for something that I'm completely unqualified for because I don't wanna waste that person's time or my own time. Right. Because if you get that interview, you're like, yeah, there's no way I'm passing this. Because if that person is going through a list of topics and they're like, explain to me how this thing works, and I've I've never done that before. I'm like, I'm just gonna tell them. Like, I don't know. Next.

Ben Wilson [00:29:11]:
And then you you just sit there and burn through their their 20 questions where you're like, I've never heard of that before. Next. And then you sit around for 40 minutes talking about irrelevant things because they're too uncomfortable to say, hey. You failed. Get out. I hate those interviews, unless they're relevant to what I've done. And, yeah, they're annoying where and sometimes I've asked people at the end of that, like, is this a company policy that you have to go through, this list of topics? And if so, does that policy extend to how you run your business? And I've had people flat out tell me, like, my manager asked me to ask these 20 questions and get your responses from them. I was like, well, let me ask some questions to you about how it it's like working here.

Ben Wilson [00:30:08]:
How much how much control do you have over your project when you're doing a design? I'd be like, well, we don't do designs. We just get told what to build. I'm like, thanks for your time. I'm not interested. Yeah. How important work for a place that does that.

Michael Berk [00:30:23]:
How important are the interviewee's questions to the interviewer, in your opinion?

Ben Wilson [00:30:28]:
Absolutely critical. You you're either for a really good candidate, you're the one who's selling the company Right. In that position. And if all you're doing is reading off of a flashcard of we need to ask this person these 20 questions, they're just gonna be like, okay. This is not the place that I wanna work. This is not a place of innovation or creativity. It's hiring somebody who is just reading off of a card.

Michael Berk [00:30:54]:
Right. But my question was about the person being interviewed. How much green or red signal do you get from the questions that the person being interviewed asks the interviewer?

Ben Wilson [00:31:06]:
Oh, it's a good question. I think it depends on the dynamism of the discussion that we had. If it if it feels like I can have a a technical conversation with somebody in a data science position where we're keeping it high level to be for, like, for brevity sake and discussing through complex topics because we have we share a common reference point of we can use that jargon, but the jargon is being applied properly for abstraction purposes, and we can move through things quickly. And we understand we're on the same wavelength. I don't really care if they ask a lot of questions. And a lot of times, those interviews, people don't ask that many because they kinda got a feel for what's I'm like, and I got a feel for what they're like. So it Mhmm. It's like the hitting it off interview.

Ben Wilson [00:31:58]:
But if it's pulling teeth, the whole interview long, I'll cut it short by 15 minutes just to give them some some white space, some silence in the room. I'll be like, what questions do you have for me? Even if they bombed everything up until that point, I'll give them that 15 minutes so that they can ask for feedback. They should know that they bombed at that point. And if they don't and they they just sit there uncomfortably in silence or they just ask questions that they memorized before the interview that they want they thought that I would wanna hear. And I'm like, yes. Person either really bad at interviewing, or they're really nervous, or they're just not qualified. And sometimes you get super insightful questions at that time. I've changed my opinion on people once they get the opportunity to ask questions.

Ben Wilson [00:33:00]:
And it's if it's stuff that I know they pulled off the Internet, and they're asking it because some recruiter somewhere wrote a blog post about these are the top ten questions to ask during your interview. That just gets a third thumb down from me. Like, okay. You're not even creative enough to come up with your own meaningful questions. Yeah. And they're like, what are what what are your thoughts on the profitability of the company over the next year? Really? Don't. This isn't a sales position. Yeah.

Ben Wilson [00:33:34]:
Like, why do you care?

Michael Berk [00:33:36]:
Yeah. Interesting. Okay.

Ben Wilson [00:33:39]:
What's what's the bonus structure like? I don't know. Talk to HR. I'm not the person to ask.

Michael Berk [00:33:45]:
Yeah. Yeah. The for questions after interviews, I typically really value that for small close teams. For distributed teams that are less, like, culture focused, I think it matters a little bit less. But at the my prior role, I play so so so much emphasis on what the interviewee would ask me at the end of an interview. And, like, what do you do day to day is, like, a good intro, but anytime that we're actually, like, insightful and genuinely, like, interesting and curious questions, that was always a massive green flag. And the lack of them isn't always a red flag, but if someone asks a good question showing just curiosity, ability to think critically, clarity on what they want, it's it's typically really good. And, I I always love to see that.

Ben Wilson [00:34:36]:
Mhmm.

Michael Berk [00:34:38]:
Cool. So going back to those skills, let's talk about what I mean, I don't even know what these like, what is technical? What is a technical skill? There's a category that's technical that is 9.3%. Yeah. Yeah. Like wait. Sorry. It's 6 so in the skills required by employers based on job descriptions, 9.3 13% of the keywords that are posted here is the word technical. And then on the resume side, it's only 6.88 percent.

Michael Berk [00:35:14]:
So there's a massive miss mismatch in in in the skills that are in the the the world today. So

Ben Wilson [00:35:22]:
So my question is for those 9% of companies that need to differentiate between technical and nontechnical data scientists and the 6 point 6% of data scientists that claim that they're technical data scientists. What's a nontechnical data scientist?

Michael Berk [00:35:41]:
An analyst? But that's still sort of technical.

Ben Wilson [00:35:45]:
Oh, you gotta I mean, so the I do see here, you know, Tableau listed. Right? Which is a it's a wonderful software package, wonderful platform. Data scientists use it. Analysts use it. It's a great graphical user interface to get business insights, build very pretty dashboards, can do some cool stuff. I wouldn't classify it as technical in nature for if that's the only interfacing tool that you use, that and SQL. I would say that that in addition to statistical modeling software, the incorporation of both of those together and then using Tableau for visualization would be Yeah. Like, technical analytics, and that's the toolkit of a lot of analysts.

Ben Wilson [00:36:43]:
You know, they're somebody up high is saying, I need to know the impact of x and y on our customer base. And then the analyst has to be like, where's this data? Okay. I gotta join these 17 tables, and then I have to clean this stuff up because this is irrelevant. And, okay, I need to distill down this statement about our business over this time period to answer this question, and here's the report. And then I have to get my artistic skills working, so I need to make these graphs tell the story that I want this like, the story to be told. And then you have to be a technical writer where you're writing a write up about the you know, what you noticed in this analysis. Yeah. Like, that that's an analyst role.

Ben Wilson [00:37:32]:
But I think that stuff like this is put onto a data science job description or somebody who holds the title of data scientist was at before we recorded, we were talking about the 3 types of data science, you know, that I've seen. There's probably more than that now. It's like, LOM is being, you know, used in production now. But you've got, historically, I used to see people who spend 90% of their time building reports and doing not ETL, but data cleanup tasks. So they're not so much extracting from source systems. They're pulling from a data lake or a data warehouse, transforming the data in some way to create a report that's efficient for computation and generating that report. So it's just TL. Right? And they're using tools like Tableau or Power BI or oh, what was that other one? Used to use it at Samsung.

Ben Wilson [00:38:38]:
I can't remember, but it's another competitor to Tableau.

Michael Berk [00:38:41]:
Looker?

Ben Wilson [00:38:42]:
TIBCO. TIBCO Spotfire. So those tools are people that do that work. That's an analyst, right, at most companies, and there's nothing wrong with that. Right? It's a super valuable position to have in a company. You should have a whole team of people doing that or embedded people in in departments doing that work. But just because somebody's really good at that and you just say, like, oh, because you work with data and and I don't know what you like, how you generate these things. That seems like science.

Ben Wilson [00:39:16]:
Okay. You're a data scientist. It's kind of stupid to me, that that happens. Like, just call the person by the title that they're actually doing the work. There's nothing wrong with being an analyst. It's sorta like, when I see stuff like that, it kinda makes me feel sorry for business analysts because it almost feels like they're being cheapened. They're like, oh, super, you know, fancy analyst get this job title data scientist. I'm like, why are you doing that to well, you gotta do analysts that way.

Michael Berk [00:39:52]:
Yeah. Yeah. And powerful analysts are powerful. Like, if you really know your way around the dashboard, around SQL, around inference, and around the product slash business, you are basically I mean, one of the most valuable people in the data org because you typically interface between so many different roles like a product manager, a engineer, a, like, engineering manager as well. Like, there's just so many people that you interface with and so many areas that you touch that I think that role is incredibly valuable and underutilized.

Ben Wilson [00:40:27]:
Yeah. They're the oracles of the organization. They're the ones providing truth to the people making decisions.

Michael Berk [00:40:32]:
Exactly.

Ben Wilson [00:40:33]:
So they're arguably the most valuable people in a technical role besides, you know, r and d or building the tool that everybody's using. Right. When you then look at another form of data science, which is the I'd say when the proliferation of that job title happened about 15 years ago, when all of a sudden it it was there were job applications going out with, like, seeking data scientist for this role, that's applied ML, and that's where both you and I used to work. You know, we were interfacing with business units using machine learning to build solutions for the business. That's applied ML, and I don't think there's a problem with that job title. It's it's applicable. It probably should be called applied data scientist at most places because you're using data and using frameworks that build models to solve problems. So you're applying that technology.

Ben Wilson [00:41:48]:
But the historic definition of data scientist that goes back way longer are, you know, computer science or statistics or mathematics PhDs who were the ones doing research or were doing, you know, high performance computing on mainframes that were applying statistical algorithms to build models that will solve very challenging problems that couldn't be solved in any other way. You know, a lot of people we've had a bunch on the podcast in the in the past. People have been doing this for 30, 40 years. That was the historic one, and the their their scope of job responsibilities and the Venn diagram of these charts here don't overlap much. Right. It's more pure computer science, mathematics, statistics, but it it's all algorithms.

Michael Berk [00:42:47]:
So we got analysts. We got algorithm specialists. 3rd type, you said?

Ben Wilson [00:42:54]:
Yeah. The 3rd type would be, like, the people building the algorithms, doing independent research. They work for places like Databricks on the r and d side. We have people in our our research organization that are working on, like, DBRX that just came out. We have these people, and they don't have the job title of data scientist, though.

Michael Berk [00:43:20]:
They're just software engineer. Right? Or researcher. Right? That's a common title.

Ben Wilson [00:43:25]:
Yeah. Sometimes you have titles like that, but at most big tech companies, they're just gonna be called, like, hey. You're a software engineer. Right? You have a specialization. We're not gonna ask the people that are doing that to do what I do. Right. And they're not gonna ask me to go and do what they're doing. We we have the same job title, but we we live in different circles.

Ben Wilson [00:43:51]:
We interface with one another. We help each other out, but it's different.

Michael Berk [00:43:56]:
So I'm

Ben Wilson [00:43:57]:
not gonna have we're not gonna get somebody with the job title of data scientist at Databricks to come and, like, build the infrastructure needed to train the next generation of large language model. It's just not applicable skill set for, like, somebody who's an applied data scientist. Right.

Michael Berk [00:44:19]:
Okay. So that lens is super helpful of sort of a analyst, a applied ML person, and then a almost researcher's algorithm developer, essentially. How do you think about, a, bidding yourself into one of these categories and then, b, matching yourself, your your, like, skill set and your abilities into a given job description? Like, how do you know which role you should be applying for?

Ben Wilson [00:44:49]:
I mean, you can typically look at a a well crafted job description and know, do I understand what all this stuff means? Right. Or have I done this before? So if you're applying for, like, a senior level role, the assumption of your interviewers is that you've done this before. You just wanna do it somewhere else. But if you wanna move into something else, you're like, okay. I'm doing applied ML right now. I don't really enjoy that. I wanna get experience so that I could move into the algorithm side, and I have a a deep enough understanding of the fundamentals because of, you know, went to enough classes in school or whatever. It's all about demonstrating that you're in your current position directionally moving to that by the type of projects that you're working on.

Michael Berk [00:45:49]:
Right. That makes sense. So, yeah, just going through the this list a little bit more, we can see, there's some nice visualizations of the gaps in keywords and what's actually shown on resumes. Just one question for you. So in the the visualization that we're looking at right now, I know it's audio, but, computer science and mathematics are at the top of the JD required list. Why do you think and economics, interestingly. Why do you think that that is the case? I never understood the value of, like, I don't know, education.

Ben Wilson [00:46:30]:
So I think these job descriptions are written by by companies that don't really know what they're trying to do with data science. Yeah. So they wanna hire top quality talent, and they assume that well, because because machine learning uses mathematics, we need a mathematician. Right? And I've met plenty of applied data scientists that have a degree in math, and they're great. They're awesome, but they learned it as almost a trade school after, you know, after having graduated and gotten into the workforce. They they picked up this this skill set over time. It's not something that you're gonna come straight out of school with with, like, both of these things. So you could dual major in computer science and math.

Ben Wilson [00:47:23]:
They play really well hand in hand. But

Michael Berk [00:47:30]:
Well, question to that, like, wouldn't because they play really well hand in hand together and you're learning the quote, unquote fundamentals of the building blocks of machine learning, it's math and computer science, wouldn't that make you really good at your job, and everybody go get a math and computer science dual major?

Ben Wilson [00:47:49]:
No. So what most companies are actually looking for when they're hiring a data scientist is either an analyst. So if they're not currently deploying models that can be used for inference, like, in real time systems where they're they're needing to rely on predictions, and they just want somebody to make them money with the data that they collect. They just need analysts. You don't need to hire data scientists. Hire somebody with an economics background who can really understand the scope of business problems and how to analyze complex data. That's economics. Right? The other 85% that are putting down a data science requirement, they're looking for an applied data scientist, right, where it's the same thing.

Ben Wilson [00:48:49]:
Like, you're learning your coding chops and interfacing with libraries on the job. So you learn Python, right, not as a even in computer science, you're not learning Python yet, learning a language. You're learning the fundamentals of software design and computing systems, information architecture. So these background degrees don't necessarily mean that you're gonna do really well as an applied data scientist. You could, you know, everybody's mutable and they can learn anything, but it's not like your foundation that you're getting from schooling is gonna prepare you for that job perfectly. Unlike you wanna start

Michael Berk [00:49:35]:
Yeah. Right over there?

Ben Wilson [00:49:37]:
Yeah. I just got a cold. But if you wanna start working as a software engineer straight out of school, if you're not coming from, you know, a double e or or a CS degree, it's gonna be quite a stretch to adapt to that environment and be able to productively solve problems quickly. Right. Because you just don't have the foundation. You don't know why these things are built the way they are. Like, what is going on underneath the covers when you use these things? So it's still ambiguous to me about, like, the data science thing. So I've seen people that come from an education background that is just wildly buried from you name a version of of an engineering discipline, any that you can think of from aeronautical to environmental.

Ben Wilson [00:50:29]:
And I've seen fantastic applied data scientists come from those backgrounds. Because the key to being a good applied data scientist is not your your code quality skills or how well you can build an algorithm that's irrelevant. Nobody cares. You're never gonna use it. The thing that's important is can you think through a problem and come up with hypotheses and then test them? And any science degree prepares you for that. That is the foundation of science in general, and that applies to pure sciences as well. So mathematics, physics, economics, you're studying processes and having to figure out solutions to those. That's applied data science.

Ben Wilson [00:51:15]:
Got it. But if you're going down the route of, hey. I want a job at OpenAI in LOM research for building the next one. You're not gonna come from anything but, like, a CS PhD or, like, doing some sort of math undergrad or physics undergrad and then getting a PhD at CS.

Michael Berk [00:51:38]:
Or industry experience. Because theoretically, you could have already made that jump at another company if you, like, were moved horizontally.

Ben Wilson [00:51:47]:
You're gonna have to spend a lot of time and get very lucky on building a lot of street cred so that they tap you on the shoulder and say, hey. Do you wanna come over? Got it. Which that's that's, you know, hiring lottery factor. Right? Right. Somebody's like, my goal is to, you know, work on this amazing team at this research organization, and they're currently doing, like, applied ML. You gotta do something, like, open source a package that people really care about or volunteer a ton for an existing package and, like, just start producing hundreds of commits that are useful, then maybe they'll tap you on the shoulder and be like, do you want a job? Yeah. But aside from that, like, it's it would be hard to get that.

Michael Berk [00:52:41]:
That makes sense. Okay. Cool. Well, I think we're we're coming up on time, and I I have some, like, high level rapping thoughts. Before I wrap, Ben, do you have anything else that you wanna say about buzzwords, interviewing, hiring, or anything of the like?

Ben Wilson [00:53:02]:
Just that I hope that it changes one day. It seems just to be the most inefficient way of interfacing with a potential hire.

Michael Berk [00:53:13]:
How would you do it?

Ben Wilson [00:53:19]:
To redesign the entire system? Yes. I don't know. That's probably a whole podcast worth of discussion. Like, how do we get rid of CVs and make the system work? Because I mean, it it's I don't think there's a way to make it faster. There's a way to make it slightly more efficient. But what the goal should be is to save people's time by not requiring them to declare or talk through irrelevant things and instead use that time to get to know somebody better who you potentially could be bringing onto your team.

Michael Berk [00:54:07]:
Yeah. Like, the fundamental problem is you're trying to predict if they will do well in this job, period. And that takes a lot of forms. So are they happy? Do they collaborate? Do they contribute to the org? Do they contribute to the team specifically? And a lot of other things. And

Ben Wilson [00:54:25]:
I think Are they gonna be happy here? Exactly. So if they're a good candidate, you want them to stick around. You want them to like working with the team. Like, that's the end goal. If you're a manager, that's all you want is your people to be happy and producing stuff that they're proud of.

Michael Berk [00:54:42]:
Yeah. It's it seems like a very like, historically, jobs are a one or have a single direction where the interviewer is in the seat of power and the interviewee is not. And I feel like that's sort of changing as, there are more, like, tip top professionals that, have a lot more power and and can say like, they basically have their pick of the litter. But, yeah, anyway, I think I'm starting to ramble a little bit.

Ben Wilson [00:55:08]:
Think that might be an artifact of just your work history is that perception. So you'll notice 15 years from now if I was to, like, make that same statement in front of you, you'd be like, no. I think that we're on equal footing, or I have a little bit more power here. It's not really power. It's more like I have choice. Right. But when you're in the earlier stage of your career, the perception to you is that you just want this job Yeah. Desperately.

Ben Wilson [00:55:39]:
But when you get a little bit older and you have that street cred built up, you're like, I don't need this. Right. It'd be nice if things work out, but I don't need this. Right.

Michael Berk [00:55:50]:
Yeah. That's definitely I've been I've been the college kid trying to get the job for a big portion of my professional career, and that ratio will change pretty soon. But yeah. Cool. So, couple things that stood out to me. 1st, job descriptions should be sort of a high level. We're looking for something within this broad category of skills. So for instance, if it's Spark, there's many different ways where a resume could plug in with a project and say, I have competency in Spark.

Michael Berk [00:56:23]:
And resume should be very specific. Job description should be pretty high level. And using keywords in your resume doesn't really inform the interviewer or the hiring manager what you've done. They wanna know the technical details, and they wanna know the extent to which you know something. Conversely on that, when you don't know something, please be transparent about it. Showing humility, showing the ability to collaborate, and also knowing your own limits is actually a really good indicator that you'll succeed in, growing because when you enter a new job, you're gonna have to grow no matter what the job is. And then for interviewing, some tips while interviewing or during the interview process is it's really important to distill the problem you're given into something useful into its component parts. Also, thinking about edge cases, and then finally, saving the solution till the very end once you've gathered a bunch of additional information about what the question is actually looking to solve.

Michael Berk [00:57:19]:
All of those are really helpful in displaying that you're able to think and not just pair it how OLS works. Anything else, Ben? No. That's good. Cool. All right. Well, until next time, it's been Michael Burke and my co host. Ben Wilson. And have a good day, everyone.

Ben Wilson [00:57:38]:
We'll catch you next time.

Redefining Data Science Roles: Beyond Technical Skills and Traditional Job Descriptions - ML 155

0:00

57:43

Playback Speed: