The Science-Engineering Blend - ML 146

Ben and Michael dive into the dynamic relationship between engineers and scientists in the realms of software engineering and physical science. They explore the differences and similarities between these roles, sharing valuable insights on the research and testing processes, the importance of thorough research, the value of teamwork, and the challenges of transitioning between engineering and science. With analogies, real-world examples, and expert perspectives, they shed light on the intricacies of these roles and the considerations for hiring scientists and engineers based on company size and market effects. Tune in for a thought-provoking discussion on finding the optimal path between efficiency and innovation in the world of technology and research!

Hosted by:

Ben Wilson •

Michael Berk

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

Sponsors

Transcript

Michael Berk [00:00:09]:
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data science and data engineering at Databricks. I'm joined by my co host. Ben Wilson. I,

Ben Wilson [00:00:24]:
yeah, I do software stuff for Databricks. That sounds fun. It is. Really smart people I work with.

Michael Berk [00:00:31]:
Yeah. So I've heard. So, we're actually just to give a little bit of insight into how these these recordings work. When it's just panelists, just myself and Ben, we typically just shoot the shit for at least an hour and then realize that, oh, we have to record this this episode and then quickly put together a plan. And that's what you're about to see today. So we've been talking about all of the illegal NDA driven things that can't be discussed, all the super secret secrets. And, instead of discussing those, we're gonna be discussing science versus engineering. And specifically, if you're working in industry in the data profession, you might be coming from sort of 2 different backgrounds, typically.

Michael Berk [00:01:17]:
1 is sort of an academic PhD, like, researcher type of of of mindset. Another is more of the computer science implementation mindset. And then, of course, we'll have that catch all category for everything that doesn't fit into that, such as myself. But, these two roles and these two sort of mindsets are really, really valuable, but in different settings. So, Ben, to you, what is a scientist and what is an engineer?

Ben Wilson [00:01:45]:
I think the analogy that we're using before we started recording was me regaling you with history of my previous life before I got into data science and ML and software engineering. So I worked at a company that made, LEDs. And because of the nature of what we were doing, we were attempting to innovate on designs. And that's not, hey. We bought some IP from from some company, and we're just manufacturing this thing. Those companies exist, but that's not where I worked. I worked at a place that was inventing the new technologies and coming up with new designs and building new structures that you're actually growing in these, these reactors. So we had 2 separate teams.

Ben Wilson [00:02:39]:
One team, r and d, and the other team was process engineers. And when you break that down, that's science and engineering. Each of those teams had people with different backgrounds, So we did have people on the process side, you know, the engineering side that had doctorates that were, you know, diploma carrying scientists that were applying their knowledge of advanced material science or chemical engineering into the production of LEDs at large volumes, so industrial manufacturing. And then we had people that came from an engineering background that worked on r and d. They were there to help scientists run these reactors so they could run their experiments and test things out. But we're both of those teams are working towards the same end goal, the ultimate goal of capitalism, which is make this company more money so you can dominate an industry and you can produce, you know, better product. And there are other lofty goals. Like, we can can reduce carbon emissions, you know, throughout the world by by making, you know, light bulb replacements.

Ben Wilson [00:04:00]:
Lofty goals that actually were met, which is kinda cool, thinking back on it. But aside from that end goal of profitability of the company and, you know, enhancing shareholder value because it was a public company, that one goal is split up into 2 different, you know, sort of topics when we're talking about the the operation within the company. So you have the engineers that are focused on solving a problem. And that problem is, how do I make this widget, this LED structure, as efficient as possible while maintaining product quality and getting my yields up as high as possible. So you're focusing on operational excellence effectively. And there's a bunch of science involved. You have to understand, like, how are these things grown, and why do I have 1600 knobs to turn here with every run that I put into this machine?

Michael Berk [00:05:00]:
Yeah. It sort of sounds like optimization with a closed, set of criteria.

Ben Wilson [00:05:06]:
Not as closed as you might think. It was pretty wide open.

Michael Berk [00:05:10]:
Mhmm.

Ben Wilson [00:05:11]:
And you had to understand when you make a change on something, you change the temperature of the growth, like, during a growth layer, or you change the the gas flow rate, like, the volumetric flow rate or you're changing mixture compositions between the different carrier like, the carrier gas and the actual active gas. You have to understand what the implications are for that. Like, are we cracking the triethyl off of this this, you know, molecule that we actually want to land on the wafer surface at the right time, at the right physical place inside of the reactor chamber. And what does that do to the crystal lattice? And can we preserve, you know, an effective high quality crystal as it's growing? There's a lot of things you have to understand about material science and about chemistry and about physics, and you learn that stuff if you're an engineer like I I was. I was on the engineering side. But you're also focusing more on controlling that process. Like, hey. How do I get the light emitting from this thing to be within a target range? Trying to grow this thing to hit 450 nanometers plus or minus 3 nanometers or something.

Ben Wilson [00:06:30]:
Like, that's my target. Well, there's 17 different knobs that can affect that across 8 different layers. So you have to know your tooling, like, how it behaves. You have to know the science and the theory behind what's going on. But then you're approaching each of your runs that you're executing. You're using the scientific process. You you have a conjecture, formulate your hypothesis. You run your experiment, which in MOCVD land, every run is an experiment effectively, because so many things can change.

Ben Wilson [00:07:11]:
The morphology inside the reactor changes. You know, nozzles get partially clogged. You know, heating elements age. There's loads of things that go into it. So you're you're fighting against variables and fighting against chaos in order to try to control something and make it make it consistent and reproducible. The other side of the house, the science side, they're not working on optimization hypotheses. They don't care, and they shouldn't care. It's not their their realm of responsibility.

Ben Wilson [00:07:45]:
What they're trying to figure out is what's that next structure that we need to build to meet our goals for the company? We need a brighter LED, or we need an LED of a different color. Or, hey. We're gonna be introducing a new product line next year that's, you know, in a completely different, spectrum range. So I remember them working on stuff like near UV LEDs, which was fascinating to me. And I remember sitting down with a couple of scientists that I knew. I was like, how are you guys gonna make the structure in order to do that so we don't have, like, leakage loss, of current? Like, do we have to dope it differently? Or I remember asking this this one guy that I was good buddies with, all these questions. He just looks at me. He's like, I don't know, man.

Ben Wilson [00:08:36]:
Like, we haven't we haven't even, like, worked on it yet. I was like, well, so you're doing, like, pure scientific method research here. He's like, yeah, man. Like, we're get we have some, like, hypotheses that we discuss amongst our group, and then we have to run 6 months of experiments to figure out which one of those is gonna pay off. But he's like, the first 3 months is probably just gonna be shipping dead wafers that are used for metrology only, because we have to invent something new here. It's cool. And then he he would start asking me questions about, like, hey. That structure that we made last year, I saw, you know, a couple of your reactors are at, like, 98% yield.

Ben Wilson [00:09:20]:
That's pretty cool. How'd you do that? What and he starts asking questions to me about, like, was it you know, do you focus on this thing, this part of the structure? I can't tell you what the structure is, but he starts asking me stuff about that. I was like, actually, no. That doesn't have any bearing on it. It's this component that makes it more successful, and we found this out through running experiments based on this to widen our process acceptability range, like, the threshold of of how much our process can deviate. And he was like, wow. That's fascinating. Like, we never, like, we never thought that that was an impact.

Ben Wilson [00:09:59]:
And those sorts of conversations, the reason I'm telling you this story is we think about problems, and we're approaching them from a completely different perspective. So, the we're you know, both those groups were doing science. We're we're developing hypotheses. We're testing them, but our goals are completely different. So their goals are, you know, build a a manufacturable process and a new structure recipe template. And I'm in there fine tuning that template to eke out as much yield as possible while still producing product that is as good as it can get.

Michael Berk [00:10:45]:
Okay. I think I'm following. Let me ask some questions. So the core difference of how you would describe what an engineer or a scientist is tasked with doing is engineers are determining the best way to implement something, and scientists are trying to determine what's the next big thing. Is that sort of the high level?

Ben Wilson [00:11:08]:
I would say that applies to, like, in general. Yeah. Science is about the undiscovered country. Right? Figuring out something new, a discovery, an invention. And the way you go about that in a controlled manner is using the scientific method. But engineers, like, by virtue of their job title usually, and what they're tasked to do when they're employed at doing something is to produce something. Right. At like, in an efficient manner.

Ben Wilson [00:11:46]:
It it's not to go and invent something new. That can happen. If you're solving if there's a problem that's presented, there is no solution, and you've exhaustively tested out available solutions to your problem that just don't function, sometimes you are going into research mode and building a new invention to solve this problem. Nothing wrong with that. Nothing at all. But the reason we're talking about all of this in this in the realm of software engineering and development practices is a lot of people don't do that rigor when they're trying to solve a problem. They just immediately go into research mode. They're like, I'm smart.

Ben Wilson [00:12:33]:
I can figure this out. I'll go and build a solution. And if you take that person, compare them with somebody who goes through the process of saying, is there prior art here? Is there something I can use that will make this solvable faster, more reliably so I can move on to something else. That second person is gonna solve this problem really quickly and then go do other things that are more useful.

Michael Berk [00:13:05]:
Right. So let me explain how I think about the science versus engineering dichotomy. And I'll use your analogy of the undiscovered country because this was how it was explained to me for the first time, back at my prior company called Tubi. So I had just built this super fancy tool. I've spent a lot of time. It worked. It was tested, and it solved a real use case that was important. And my manager came to me and said, hey.

Michael Berk [00:13:37]:
Great work. We're not gonna use this because the company doesn't care about it anymore. And my my little brain was like, well, damn. That's super sad. I worked so hard. This is my child. This is my baby. I've nurtured it into an amazing state, and now it's just gonna be thrown into a dumpster.

Michael Berk [00:13:55]:
And I remember I was chatting with another data scientist about this. He was known as, like, the chill guy on our team. And I had seen this happen to his project several times, and he always took it in stride. He just, whenever his project was candy, just said, hey. That's fine. I'll work on the next thing. Give me the next ticket. And so I wanted to understand how his brain worked and how he sort of could let go of his baby so easily.

Michael Berk [00:14:19]:
And what he told me was our job as data scientists, and it's more in the decision science realm, so informing decisions with data, our job is to create maps. We want to see what is out there, and then we present those maps for key stakeholders and key decision makers. And our job is to make the map as accurate as possible. And, also, maybe if we discover some things like, oh, there's a berry bush over here. There's water over here. We should probably have some intuition about what is valuable to our our overlords, but we're getting maps. And that was super, super transformative for me because it made me realize that the maps are sort of just a data point, and we can get a better data point or a worse data point. And all that matters is the information contained in there local to that time.

Michael Berk [00:15:09]:
So maybe within a week or two range. After that week or two range, your data point might not even be valid anymore. And so this process of building maps, I think, is one thing that is consistent between engineering and scientist or science ing, I guess. And, Ben, I'd be curious your take. So you sort of describe science as this sort of open floor for curiosity, and we say, how are we gonna make the company 10 times more money via this new innovation? And then conversely, engineering and sort of building is very focused, and we're looking to optimize a more concise and well defined set of problems. But I think both share this relationship of finding out what the unknown is and then acting upon it. And that's the process of sort of filling out a map. If you've ever played a video game where you explore and then you have a map and suddenly the map becomes green or blue with your terrain, it's the exact same analogy that I think of.

Michael Berk [00:16:06]:
So what are your thoughts?

Ben Wilson [00:16:08]:
I kinda like that analogy, and I'm gonna steal that. So thinking about as you were just describing that, 2 things pop up to my mind immediately. 1 is the video game map exploration. The scientists would probably be the ones that would be figuring out what path we would be taking of, like, hey. We wanna go through this mountain pass because we understand that that's gonna be probably more efficient to get to the other side to see what's over there. And they would also determine a direction. And that direction, like, the end goal of where we wanna get to, maybe that you know, it's probably management at a company that's doing that C suite is saying, here's where we need to get to. The continent is over there somewhere.

Ben Wilson [00:16:58]:
We have no idea how to get there. But we would have scientists that would do the analysis of what are we estimating the terrain to be like at these different places, and what is the optimal vectors that we need to travel down in order to get there. The engineers are not involved in map making whatsoever. The engineers are the ones who are saying a good engineer would say, what vehicles do we have at our disposal right now and which ones match up with what the scientists are telling us the expected terrain will be like? Should we think about, you know, large vehicles that can get over, you know, any type of terrain but consume a lot of fuel or smaller vehicles that need to skirt around difficult terrain but are very efficient with fuel so we can go further without having to stop or something. That's the engineer's role. The problem becomes when you have somebody who's in the engineering role who thinks that they're a scientist. So they approach the engineering problem like a scientist would, which is just trying to figure something out from, you know, with no with no context. So if we were saying, hey.

Ben Wilson [00:18:17]:
We need to make a trip to this other side of the of where the we think the map is. We don't know what's over there. An engineer who's got that thinking cap on of, like, I'm a genius, and I can figure this out. They're gonna disassemble all the vehicles that are sitting in the garage, and they're gonna build something that's like, hey. It's amphibious. It's got big tires, and it's got freaking helicopter blades on the top. So we can we can go over everything. And then you start the the journey and everything breaks, and the vehicle just it's broken.

Ben Wilson [00:18:52]:
Like, you can't make your journey. That's the danger. Another danger is somebody who's excellent at cartography and understanding geology. You know, one of those scientists, you tell them to drive the car on the road out there. And they, they, like, drive it straight into a rock or a tree or something, break the car. This was a perfectly functional vehicle that we're about to get in and use that the engineers know how to use really well because that's their job is to get to a location. And then somebody who doesn't know how to operate it breaks it. So, yeah, I like that that analogy.

Ben Wilson [00:19:36]:
It's really good.

Michael Berk [00:19:37]:
Yeah. Thanks for running with it. But I just wanted to hone in on one point, which is and it's a working theory. I we can debunk this in the next 3 minutes potentially. But I would argue that it's a continuous scale of between science and engineering in terms of this map making concept. Like both share the goal of defining a map and then acting on the map. So with engineering, it's typically a much more closed space. You have 3 sets of vehicles that you can choose from.

Michael Berk [00:20:10]:
And, alright, based on these six criteria of the current terrain, this vehicle is best. This vehicle is second best. This vehicle is 3rd best. Cool. In science, though, you're still sort of doing that same thing, except just to take a data science analogy, your loss function is a bit broader. It's not optimizing to the terrain. It's optimizing to the scale of the innovation and sort of letting your curiosity drive that loss function and your intuition about what is valuable. So I it it sort of seems like it's all the same process just with different amounts of specificity of the loss function.

Michael Berk [00:20:50]:
Does that make sense?

Ben Wilson [00:20:51]:
And completely different skill sets.

Michael Berk [00:20:54]:
Yeah. Good yeah. That's a good point.

Ben Wilson [00:20:56]:
So it if you're working in a capacity as an engineer, your your goals and the tasks that you're determining what, like, what you need to do are completely removed from research. You're 100% right. Research shouldn't have hard guardrails on how it's being conducted. You're gonna stifle innovation if you do that. You know, even people to try a bunch of crazy stuff. Sometimes the crazy stuff is the stuff that ends up panning out where and that was the type of thing that that I saw at that other company a lot of times. We would see sometimes, like, what the scientists were were testing out on some of their their reactors. And you'd look at the recipe, and you'd be like, what the hell are they doing here? Like, there's no way this is gonna work.

Ben Wilson [00:21:46]:
And sometimes I would talk to, you know, people that I knew. It's like, what are you testing here? They're like, well, we wanna we need to know what the bounds are of, like, what happens with this combination of parameters so that we can know what the process bound is. Like, oh, you're intentionally making junk. They're like, yeah. We we're expecting it to fail, and but we need to validate that it's going to fail where we think it's gonna fail. Like, oh, so that's how we get our process controls for production. They're like, yeah. That's what we're doing.

Ben Wilson [00:22:19]:
We're prepping this for production. So they test a bunch of stuff. It's not exhaustive. They don't have, you know, infinite time and money and resources, but they they would do a really good job of telling us, like, hey. Don't grow this layer too thick because this is what happens 99% of the time. And then if if something goes wrong in the reactor in production and that layer grow like, some hardware fails or something. And now gas is flowing in at an unrestricted rate during this growth layer. When we get the the wafers out of the reactor, we look at them and then look at the data, and we're like, oh, yeah.

Ben Wilson [00:22:57]:
Yeah. R and d told us, like, this would happen. We need and now we know where to go in the tool to fix what went wrong because we know this mechanism and what it like, how this happened or at least where to go and verify. Right. That's not too different from, like, software development as well, except in our world, putting like, setting job pedals aside and stuff, you know how much I don't like the term data scientist.

Michael Berk [00:23:28]:
I do.

Ben Wilson [00:23:29]:
Oh, there are people that are working on pure research, that are gonna be used in industry by people with the dot job title data scientist. A lot of them are computer science PhDs. Most of them have job titles of software engineer, principal software engineer, distinguished software engineer, whatever, at a big tech company. And they're working on innovations for the next big thing. Or they're working on something that's total garbage, but they're gonna learn a bunch along the way and realize this isn't really viable, but I I know where to test next. And those innovations eventually make their way, if they're successful, into tools that everybody else uses. Fundamental frameworks, stuff like TensorFlow, PyTorch Python. Transformers.

Ben Wilson [00:24:30]:
Mhmm. You know, you have these these tools that are written for people to use and build things with, but you don't need to go when you're presented with a problem that might seem unique and you don't know if there's a solution for it, it's kind of a hard sell if you have a manager that that actually knows what's going on, convincing them like, yeah. I can't do this in PyTorch. I can't do this in any of these libraries that I've looked up. You know, scikit learn doesn't have this functionality. XGBoost doesn't, but we need to solve this problem. And I can't do it with with PyTorch, so I'm gonna go build my own deep learning framework. You know? That's doing pure research in the realm of data science work where you're trying to solve not just your business problem, but you're trying to solve a fundamental problem.

Ben Wilson [00:25:28]:
That's scary when unless you're staffed for that and you have an r and d team that has the rigor around how do we do r and d for tooling. You know, look where these these sorts of frameworks came from. They came from Skunkworks teams at Google, at Meta, at Microsoft. You know, you have these core group of people that are doing pure research and building tooling that people at their company can use to solve very challenging problems. And then they they're just nice, and they open source it. So but you need a specific group of people and support staff around them that are doing stuff like the things that nobody talks about, like, at at Google for development of the TensorFlow framework. That's not just that team of a bunch of CS research scientists that built that thing, and they all did it on their own and then just open sourced it. It's like, no.

Ben Wilson [00:26:34]:
You have software engineers helping build that and maintaining that, and you have test infrastructure engineers that are building, you know, other tooling that that's used for doing CI for that, like, Bazel, you know, the entire build tool for that, that runs all the test orchestration. There's an engineering team that owns that and supports that. You have people that are doing the deployment of that and doing integration testing and you're trying to solve a business problem with tooling, it's important to to realize the difference between, like, what is our goal here? Are we trying to open source something to solve a a a nascent problem that nobody else has ever encountered before? Do we even have the ability to do that? Should we be going down that road, or should we say, is there something else we can work on? Or maybe we should go and do a little bit more research and see if there's a different way of solving this.

Michael Berk [00:27:44]:
Yeah. There was a related topic that we were discussing before recording that I think is super interesting and super relevant. And that is, how does the ratio of scientists to engineering differ as a function of company size and as a function of your industry? So let's use OpenAI as an example. They're probably the most, hyped company for per number of employees. And so I'd be curious, Ben, if you started OpenAI OpenAI on day 1, knowing their product line and knowing where they were, would you hire scientists? Would you hire engineers? And then what would be the inflection points where you start hiring more or less of one category?

Ben Wilson [00:28:28]:
I mean, my first hire would be a CTO who has done that before and is, like, highly skilled at doing that and then talk with them every day.

Michael Berk [00:28:42]:
Would would you call a CTO a scientist or an engineer, or can they live in both worlds typically?

Ben Wilson [00:28:48]:
That's an engineer. A 100%. Even if they did do r and d research in the past, you wouldn't want somebody who doesn't have engineering experience running an engineering organization.

Michael Berk [00:29:03]:
Probably.

Ben Wilson [00:29:04]:
That would be scary. Doesn't work out well for anybody. So I I would respond to the inflection point based on how fast we're getting the market and how far along r and d is. And that's something that you would adapt over time. So you would know when to make a change in hiring based on the market effect. Like, hey. If we're getting super popular and we're able to you know, our service can handle query loads just fine. And everything is built and designed in a way that it's stable, reliable.

Ben Wilson [00:29:46]:
Things just work. We're we're able to be productive over time and maintain a high velocity. We maybe have enough engineers and good engineers at that. But if we have service stability issues and scalability problems and, you know, we're an actual real world company that has all of those things, it would be a blend of, like, where do we actually need people and what what skill sets are we looking for, and how much experience do we need from candidates in order to come in and and do this?

Michael Berk [00:30:24]:
So so it sounds like the the causal feature in this relationship is not company size, but it's almost infrastructure stability. If you have low infrastructure stability, you probably want more engineers. If not, you want more scientists. Or do you think that engineers can do some of the science almost and innovate in in creating new products?

Ben Wilson [00:30:46]:
I mean, if I'm the one hiring all these people, I don't want them doing that because I've seen how that plays out in other industries. I've seen how it plays out in software engine industry as well. If I'm hiring an engineer to, like, make my product stable and reliable and build features that enhance the user experience or reduce the, the friction of using this product. I want them to do that. I wanna find people that wanna do that, are good at that, and wanna learn how to get even better at that. I don't wanna hire somebody who wants to do research, but they're getting paid and are being tasked on a sprint basis to make something better because they're gonna get really frustrated and quit, or they're gonna develop a really toxic, you know, personality while at work and bring everybody else down. I would not be interested in having that person. If they wanna go do research, go do research.

Ben Wilson [00:31:59]:
You know? Apply for that job at the company. And if you don't get accepted into that job, then go apply at another company for that job. Go find your your peace. Go find the thing that you wanna do. Somebody out there will have you to do it. But if you come in with that thought of, like, oh, I wanna go and innovate and invent and do all this cool stuff, and then you take a job as a software engineer, you like, that could happen. You could file some patents. You could invent something new that nobody's ever done before, but that's not an ex an expectation of that role.

Ben Wilson [00:32:43]:
Whereas if you're going as an r and d scientist somewhere, that's your core job. Go find, like, go find that undiscovered country. Find that blank part on the map and figure out what's there and plant the flag there.

Michael Berk [00:32:59]:
Right. Yeah. I was actually giving a presentation at my, alma mater at Penn, to a sports research group, and it was cool to see a bunch of undergrads and and grad students and sort of discuss what their hopes and dreams are in the data space. All of them generally wanna work in sports research involving data or are just selling their souls to finance companies per usual. But it was really interesting to see this sort of divergence between the people that said, I really want to get into the data. I really want to solve real world problems. I don't want a clean dataset. I want a messy dataset.

Michael Berk [00:33:36]:
I wanna explore and and actually, like, deliver value. And then there are other people that sort of just wanna play around and and, like, try things and tinker. And I've always been the person that's very solution driven. Like, if if it ain't broke, don't fix it. But there's some people that can just spend hours and hours just saying, oh, I wonder what this does. I wonder what this does. And, like, curiosity really drive. And so it's it's been really cool to see this, divergence sort of develop in myself, but also see it, in a bunch of people that are about to enter enter industry, whether it's academics or or the tech industry, and see how where they're starting at, what their perceived out outcome is gonna be.

Michael Berk [00:34:17]:
And I'm curious if it changes for some of them. I I think it will.

Ben Wilson [00:34:21]:
I mean, it's got to. If you're entering the workforce, whether you wanna go and do research or you wanna go make product in any any industry, this is applies to every industry I've worked in. I've worked in a number of them. You have to align your desires and what you wanna do with what that job position is. Otherwise, you're either gonna be very unhappy with what you're doing and you end up quitting, or you're gonna be really happy with what you're doing, and your company is gonna be very furious with what you're doing. And you're gonna be out of a job. So it's good to align your interests with what the job actually is. Right.

Ben Wilson [00:35:13]:
And make sure before signing on that piece of paper that that company, that hiring manager has been honest with you. And that's why it's important to to ask your potential future peers. Like, what do you do in a daily, like, day to day? What do you what do you ask what did you do today before we started talking? Like, what what are you gonna do after we start talk it's like stop talking. What are you doing next week? Should ask those questions. It always blew my mind doing interviews, and I've done thousands of them over my career now. The people that don't ask that question, I was like, that's the the most important question to ask. And so many like, so few people actually ask it to understand, like, am I signing up for what I wanna sign up for?

Michael Berk [00:36:00]:
Yeah. No. I completely agree. But one nuance to that point is if you don't know what you like, just try stuff. Like, it's it's not that, like, you don't need the the best job out of college. Like, it'll sometimes take 3, 5, 10, 15, maybe even a career to find what you are truly passionate about. And so don't stress out if if you don't know if it's a perfect fit, number 1. And number 2, even if you know it's not a perfect fit, there are often aspects of a job, if you keep an open mind, that are really awesome and that you can sort of specialize to.

Michael Berk [00:36:32]:
And sometimes even within that organization, there's lateral movement available where where you can move into something that's more more your speed. And so, yeah, there's just, like, bias for action is really valuable. That's actually a Databricks culture value, bias for action. But it's true. Like, if you'd if you sort of just, like, overplanning and are paralyzed by choice, it's typically really problematic.

Ben Wilson [00:36:59]:
Yes.

Michael Berk [00:37:01]:
So, Ben, I had a one final closing topic, which is, I think it's really important. And after seeing many different organizations and many different data scientists, data engineers, and just data professionals, I think it's really important to be able to merge the skill sets of scientists and engineer and sort of not necessarily do both perfectly, but borrow from each of the core designs of and like the core tenants of of these two professions. But before we get into that, you mentioned before we started recording that sometimes moving from engineering to science and science to engineering is not a smooth transition. And we talked about science to engineering, but what are your thoughts for engineering to science? Why is that sometimes challenging?

Ben Wilson [00:37:49]:
I mean, not just challenging, but in my experience of seeing that happen in a couple of different industries, it never works out because the mindset of engineering is optimization. And taking not taking shortcuts, but finding the minimum amount of work to do in order to solve a problem. It's all about efficiency, particularly about time. And that time component bleeds into a lot of different decisions or outcomes that we can talk about. It's like, hey. I I built this simple implementation of this thing so that 6 months from now, I can understand how that works and change it easily if I need to. So maintainability. Or you're building hey.

Ben Wilson [00:38:40]:
I built this in the simplest way possible so that when we have to migrate away from this in the future, which we probably will have to do, I don't have to sync weeks of time in redoing all of this from scratch because it's very easy to change this. So that mindset is introducing a bias to your own thought processes that is not 100% conducive to how scientists approach problems. And everybody that I've seen that's done that, it's gone from, like, that that engineering mindset into r and d. If they don't consciously try to fight against that, the the habits that they've built, if they're if they become a successful engineer, it's really hard for them to do that mental shift to say, I don't really have those guardrails anymore. I now need to see what's possible and think more creatively, but but also applying a much more structured and rigid approach to what it is that I'm testing out. You need to collect a lot more evidence. It's not it's not the, you know, LGTM result from a a the unit test passing on code being like, yep. Cool.

Ben Wilson [00:40:02]:
Good. The research side of that might be I need to do stress testing of this, and I need to write these very complex integration tests and and, you know, like, load testing. And I need to really understand what the implications are of this design decision that I've made. We're talking about software. If you're talking about physical science in like, you're making a product to put on, you know, in the market. You're trying a bunch of stuff, but then you're collecting you know, to give you an example, those r and d teams making those LEDs. Mhmm. They would do their tests in a a given morning.

Ben Wilson [00:40:45]:
They would run all of their tests in the morning. The entire latter half of the day, they're in the metrology department, like, slicing up wafers, doing atomic force microscope measurements. They're doing SEMs, TEMs. They're running this thing, this, like, this set of wafers that they ran and trying to gather as much data as they can off of this experiment so that they understand what actually happened. They're controlling processes in special ways. They're, you know, doing specific things in order to answer the question that they've posed. Whereas on the engineering side, when we're making product, we would select one one wafer out of the run of, say, 10 of them and send that out for a thickness measurement with nondestructive testing. You know, it shine a laser at an angle at the surface of the wafer, and that tells you the thickness of, you know, the different layers that you just That's cool.

Ben Wilson [00:41:46]:
And you, Yeah. Photoluminescent testing. So you you would also shine a laser of a particular energy or light up certain aspects of the wafer to get the LED to to light up, and then you would measure the wavelength than the intensity and measure the the electrical characteristics of it. But you're just doing you're collecting the data you need to do process control engineering. Just like, hey. Am I within the bounds of acceptability tolerances of the product that is made? Am I making what I think I'm making? And that's as far as the scientific aspect of it goes. You that's now engineering process monitoring. And you are looking at data, but it's quick and dirty and fast.

Ben Wilson [00:42:34]:
And you're like, yep. Thickness is good on all the layers. My forward bias voltage is good. My leakage current's good. Light intensity is awesome. Okay. I have a winning recipe. I'm gonna do that one again with some slight tweaks.

Ben Wilson [00:42:49]:
Whereas scientists are completely different, you know, approach to that.

Michael Berk [00:42:55]:
Yeah. So let's close out with, some tips for engineers from a science perspectives, for scientists from an engineer perspective. And then if you got some tips in common around this realm, let's do that as well. But I'll kick it off, for engineers from a science perspective, which is stay in the problem space more. Typically, engineers rush to build a solution. And if you know how to build it, freaking go ahead. That's like, don't spend time planning if you already know the solution. But often spending even, like, an extra hour or even sleeping on it overnight, you'll come up with a much simpler and a much more robust and a much more effective and creative solution.

Michael Berk [00:43:37]:
So prototype, but don't jump to building a final solution, and instead, explore the space.

Ben Wilson [00:43:44]:
Could not agree more.

Michael Berk [00:43:46]:
Cool. So you got one for either direction?

Ben Wilson [00:43:49]:
Yeah. For for people in the engineering space, do your research. And this is a message to my younger self and my current self and my future self. They do not forget to figure out if this is a solved problem. You know, you're looking at something that you don't understand in code. You're like, need to do this thing where I take the data and I'm I'm change it to this other type, and then I have to make sure I can serialize that and then deserialize that and send it over the wire. Figure out if that's been solved. And if it's part of, like, the core language, please use that.

Ben Wilson [00:44:28]:
If it they test things out in the way that I I do it now is I always test sort of the happy path first. Like, what's the easiest thing for me to do to solve this? And then test it and say, does this solve it? If so, awesome. Move on. Next problem. But if it's, yeah, this doesn't quite do what I need it to do, now when you get to that point, somebody without very much experience would say, oh, time to reinvent the wheel. I gotta go and build something for this. That's dangerous. The best thing to do is is there something else that exists that'll solve this that's just slightly more complicated than what I'm doing right now? If so, test that.

Ben Wilson [00:45:20]:
If that works, use it. And, eventually, you get to a point if you're going through that iteration cycle and saying, okay. I'm gonna test the next thing, see if that works. You can exhaust all the options. And at that point, that's not go off and build a new thing that does this thing. That's the time to go talk to some friends. Get a peer review. Ask people.

Ben Wilson [00:45:44]:
They're like, hey. You ever solved anything like this? Do you know if this is am am I thinking of this right, or am I being dumb? And most of the time, it's you're being dumb. But sometimes you ask a bunch of people, and they're like, yeah, man. I don't think that anybody's solved that before because nobody's ever had to solve that. Or if they have, they're not sharing it with the rest of the world. Then it's time to go and ask people, like, should we build this? Is this worth our time? Let's go scope what it would take to build this. Right. And sometimes that scope is it's, you know, 2 days of work.

Ben Wilson [00:46:25]:
It's gonna be a pain in the ass if we ever have to change this thing, but let's write it in such a way that we understand it, and it's easy to maintain. Or it's this is gonna take us 6 months to build, and it like, what is the the end goal here? Is this to solve this one problem that somebody asked for that makes the company $100,000 a month at max? Is this worth 6 months of our time solving this problem? Usually, the answer is no. Move on to something else.

Michael Berk [00:47:00]:
Yeah. Yeah. The amount of times that I've invented something and then a week later realized that there's a million open source solutions, it's just it's just mind blowing. And I was so proud of my invention too. But at this point, I sort of expect it to happen anytime I write any code. So on the flip side, for scientists from an engineering perspective, I think engineers are really, really good at knowing their toolset. So if you look at a a lot of software engineers, if you look at their Versus code setup, their VIE key bindings, their their Vim RC files, like, all these things that help them produce in an efficient and and scalable manner. It really is valuable.

Michael Berk [00:47:43]:
And if you're a scientist, a big part of of creation is a creative iteration. And if tools are blocking your mind from sort of wandering, it's really bad. And so just learn your tools, like spend a weekend play around with it or spend a week on your on your like on job time, it will pay dividends down the road.

Ben Wilson [00:48:07]:
Yeah. And learn the languages that you're writing your experiments in. Right. Like, if you're writing shit code, like, what are the results of your experiment? Is it a byproduct of your idea is good or bad, or is it Exactly. Implementation is good or bad? So, yeah, if you're in the realm of of software engineering, you have to, like, learn software engineering, if you're doing research. And most good people are. Like, they're pretty good developers. In the physical sciences, my experience was the embedding was the answer there.

Ben Wilson [00:48:49]:
So it's there's not enough time in the human life span to become, like, post doc level experience with material science and understanding all of that stuff, which is a mountain of knowledge, and understanding concepts and theory. Getting those people to be really good at operating the tools that they were using to grow those structures, They could write recipes, definitely. It's not that challenging. Like, they're smart people. But do you want them changing out valves on the machine and, like, soldering pipes together? Probably not. The engineers are the ones that know how to do that stuff. So would they would be embedded in that team to be like, hey. You're here as sort of support staff to help them out if they get stuck, and that works really well.

Ben Wilson [00:49:48]:
Same kinda work with data science too. Some of the most effective data science teams that I saw pump out the most reliable and sustainable projects were those that it was, like, 4 data scientists and 2 software engineers. They're all working together on this project, and they produce something that works in prod. It's, like, great. And the model's good, and there's some mechanism to retrain it. And yeah. But you get a bunch of people who don't know how to do all of those different things. Now having to do all those different things, it's probably not gonna be that great.

Ben Wilson [00:50:25]:
Yeah. Yeah. I heard. Alright.

Michael Berk [00:50:27]:
So I know we're coming up on time, so I will quickly wrap. Today, we talked about science versus engineering. And while they share a lot of skill sets, they are typically quite different in terms of job description and what makes a good person good at each of these. So in engineering, typically, you're looking for solving an optimization problem. You're looking at how to best implement something in the least amount of time possible. And, you must master both tooling and theory to to resolve this, but, typically, you're less driven by curiosity and exploration. Science, on the other hand, is a lot more about what is the next big thing, and you let creativity and curiosity drive. And it's important to be able to build prototypes and scope, but, you're more sort of an undirected, person.

Michael Berk [00:51:13]:
And then one thing that's common for both though is building maps, whether you're building a condensed map of vehicle parts, or you're building a large map of the entire continental US. Both of these are are essential skills, and the underlying essential skills is map building. So, I think that's about it. Anything else, Ben?

Ben Wilson [00:51:35]:
No. Sounds good.

Michael Berk [00:51:37]:
Cool. Alright. Until next time, it's been Michael Burke and my co host.

Ben Wilson [00:51:41]:
Ben Wilson.

Michael Berk [00:51:42]:
Have a good day, everyone.

Ben Wilson [00:51:43]:
We'll catch you next time.

The Science-Engineering Blend - ML 146

0:00

51:49

Playback Speed: