The Journey to Expertise with Fernando Lopez - ML 152
Fernando Lopez is an AI Engineer at Google. They delve deep into the realms of machine learning, documentation challenges in open-source projects, and the transition from startup environments to tech giants like Google. They share their candid experiences with impostor syndrome, practical tips for continuous learning, and the nuances of scaling solutions in the dynamic tech landscape. Explore the nuances of software development, the complex interplay of learning strategies, and the realities of navigating large-scale organizations. Join them as the industry experts unravel the intricacies of prototyping, scaling challenges, and the value of hands-on experience in shaping successful tech careers. Get ready to immerse yourself in a wealth of knowledge and thought-provoking insights that underscore the essence of growth and innovation in the tech realm.
Show Notes
Socials
Transcript
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. And today, I am joined by my cohost
Ben Wilson [00:00:15]:
Ben Wilson. I do PR reviews at Databricks.
Michael Berk [00:00:18]:
And you're so good at it. It it blows my mind every time I see one. So today, we have Fernando, and he was a guest on episode 89, during which we focused on software engineering, algorithms, and just generally developing skills. His background is as an ML engineer and also an MLT manager, but most recently, he joined Google as an AI engineer, and he specifically focuses on LLMs and computer vision. So, Fernando, before we kicked off the recording, we were discussing sort of, the transition to Google scale. How has that been?
Fernando Lopez [00:00:55]:
Having a call here. It's been I see. It's been, yeah. As we were discussing previously, my background is based on just jumping between different startups. And now that, that I joined Google, the problems that we are solving here are pretty much different. So in in theory are sort of the same. We train models, we deploy models, but the scale is is the, the difference. So, it's been pretty funny because this is something that I I really wanted when I started my career my career, but I needed to prepare a lot.
Fernando Lopez [00:01:38]:
So, I can't even imagine how how the difference is when you are working inside Google because you're all the same Google, from outside. So working inside is pretty different. But, yeah, it's been very, very different incubation with the start ups. The scale is is is the key point that I can say that is, what makes the difference. Scale. Yeah.
Michael Berk [00:02:09]:
Yeah. There's definitely this, at least stereotype in business where from the outside, everything looks great. Things are moving along, you're hitting your revenue targets. And then on the inside, it's absolute chaos. There's just people yelling at each other. And, of course, Databricks is never like that. But, there are lots of examples where that is the case. So, yeah, it definitely makes sense that there would be a different external perception to what in for, like, the same use cases, is it the size of the models? In for, like, the same use cases, is it the size of the models? Is it the amount of data? Is it the amount of users? What what do you talk about, when what do you think about when we're talking about
Fernando Lopez [00:02:53]:
scale? Everything. Yeah. Everything. I will, explain, like, yeah, some specific points. The models, for example, at least in the models, I've been, yeah, the the models I I I trained in the past for for these startups based problems, We were trying to build models with some sort of data, like, I don't know, thousands of rows. But we were a lot yours, like, trying to deploy the model for one endpoint to serve, some, I would say, like, imaginary imaginary users. Like, we were they're gonna wanna speak, like, I don't know, 1,000 requests per day or something like that. We didn't even have those those user.
Fernando Lopez [00:03:46]:
Just we were guessing that those users were going to use our our model. But in this side, Workiva, we will say, like, okay. Maybe I will need to retrain, say like, okay. Maybe I will need to retrain the model because I wanna see my result quickly. And then if it goes well, I will replace my model because you do that in startups. It's it's like something like that. Here is not like this is not the case here. You need to be very careful with, the the the suggestions that you make, with, next steps that you are suggesting.
Fernando Lopez [00:04:31]:
And when you say, yes, I can do it for one day, in 2 days, you need to be very responsible that it will be done in one or two place. Obviously, there are some sort of, like, bandwidth. You you you you are expected, like, okay. It would took me, like, 3 days or so. But, the difference is like that. Yeah. You you you have, the models are complex. We have ton of tools to use.
Fernando Lopez [00:04:58]:
We have ton of, technologies distributed, millions of users. So this is the the scale that we that we handle here in comparison with with the well, we have a new start. So yes. Yes. I have here, like I've been here in Google for the last 7 months, 6 months or so. So I am just, let's, let's say, starting here. I'm just sort of new here. I'm still a new builder, how they call us.
Fernando Lopez [00:05:31]:
But, yeah, I think I still need to learn a lot in all the time that I will be here.
Ben Wilson [00:05:43]:
I've got a sort of a meta question for you about this. Yep. As somebody who has done something similar to what you've done, which is go from a company that's using ML that when you're when you're building it and you're thinking about, oh, we've got a 100000 customers and, woah, we got a 100 requests in the last 10 minutes. And when you're building something like that, you're you're sort of in that that bubble world of scale. So a 100 requests in a minute seems like a lot. And then moving from that to, like, deploying a model where you're like, hey. We've got 2,500,000 requests a second to this endpoint, and this is a big deal, and there's a lot of infrastructure that goes around that. A lot of thought and engineering has gone into supporting a system like that.
Ben Wilson [00:06:46]:
Do you find that there there's advice that you would give to somebody who's making the opposite move direction about ML. So say they, they finish their PhD. They go straight into Google at that scale, and then they go on to a startup. And if they go in there and they're like, hey, I gotta deploy a model. I have to build it Google style because that's what they know. Do you think that's effective in a start up to have to build infrastructure and use complex tooling to support, you know, a 100 requests an hour? And what advice would you give to somebody who's making that
Fernando Lopez [00:07:30]:
move? My first advice is will be like, ask for such a great conversation, first of all. The second, the third one was the
Speaker D [00:07:47]:
second, I I think
Fernando Lopez [00:07:50]:
I think it it could be a good move. But, yes, with with the with the warning that in startups, at least in my experience, you need to provide that you're building something to attend 1,000, millions of users only to provide to your investors that you are doing that stuff Even though you don't really have the user, you don't really need it at that time, but you need to provide to your investors that I I used the money you gave me, and I have build this this pretty expensive stuff, emails, endpoints, APIs, lot of engineers with the money that you gave me. So we have that in startups. I think we have no doubt, obviously, in in in in Google. We have another, like, metrics or something like that. I don't know how to say it, but, instead of just like that, you need to, like, say that you're gonna be something really big, and you you you have training model. You you can attend thousands of the request per minute because you need to, make happy your your your investors. So I think there's a difference.
Fernando Lopez [00:09:07]:
And in conclusion, my advice will be like, half yoke, half no yoke, ask for a great compensation. Second of all, you need to keep in mind that you're gonna be required to build things only to make happy the investors. And second of all, enjoy the freedom that you will have. And you you you have the the Google, guidelines to build things, but then you put your, like, your your your your feeling on that or your your style on on on the work you're doing. So, yep, that would be my advices. Yeah.
Michael Berk [00:09:47]:
Ben, if you could redo your career, would you start off at a Google, or would you start off as at a start up? Of And Fernando too. Yeah. I'm curious for both of you.
Fernando Lopez [00:09:58]:
Okay.
Ben Wilson [00:09:59]:
I mean, my career started in nuclear engineering, so, that was pretty much a waste of time.
Fernando Lopez [00:10:06]:
Oh, really?
Ben Wilson [00:10:07]:
But, I mean, I did so many different things in different industries. So I don't think I would change any of it because it forces me to think of things differently. But if we were to just take the last 15 years and say, hey. When you started messing around with software and ML and learning data science, I don't think I would change it either because I learned better by, like, doing dumb things and then figuring out how much I messed up. That resonates in my memory a lot better than learning the right way to do something and then being put in an environment that is so foreign to how I did things before, I think that would frustrate me. Like, if I started with the rigor of a large tech company that had a lot of stuff figured out, and I wasn't involved in seeing that stuff built. Not saying like, oh, I need to be the one building it. It's more like, I just wanna see what it was like before, then some smart person came in and built this tool, and then I start using it.
Ben Wilson [00:11:12]:
I'm like, man, this is awesome. And you get an appreciation for why it was built and and what the impact was. But if you come in somewhere and everything already works I mean, there's always problems for not knowing. I'm sure you can confirm. Even at a massive tech company, there's always stuff to work on. Right? And nobody has perfection in in what they're doing. But going from someplace that largely has most things figured out to pure green field where it's like, here's a platform you can run your stuff on. They and use whatever you want from the open source community or build your own if if that doesn't exist.
Ben Wilson [00:11:54]:
I think that's overwhelming a bit. I think it it would lead you to overengineering. It would lead me to overengineering stuff.
Michael Berk [00:12:03]:
What's your take, Fernando?
Fernando Lopez [00:12:07]:
Wait. Wait. Wait. Wait. Wait. Wait. Wait. Wait.
Fernando Lopez [00:12:09]:
Wait. Wait. Wait. Okay.
Speaker D [00:12:13]:
Okay. I I think,
Fernando Lopez [00:12:18]:
Yeah. I I think I don't know. In my case, I would prefer to learn first in startups, like learning, have freedom to learn, to make mistakes. But, also, I'm not sure because, I see my my my peers that are, how to say, probably some juniors in in in Google. And I see them, and I see they're happy. So I don't know. Maybe the path that, they will take, it could be, in my opinion, a happy path because they have started from beginning in in Google. So they they are learning your step by step, all the Google guidelines.
Fernando Lopez [00:13:12]:
In my case, I started, I saw almost a senior level. So the expectation on me is pretty high. I'll I know some things, but most some of the things I know are in theory, but I have never played, like, in real life. And here, the expectation is that somehow I had experience or the knowledge how to do to to do it in in in real life, in in real and in scale problems. So I think that is something that I would prefer change, like, being started from the beginning because from the beginning, I have I will have the experience, learning step by step and then growing. Yeah. So I don't know if you got my point, but now that I have entered in this level, the expectation on me is, like, I will behave as a senior or something. But some of part of my experience, proves that I have the level, but for some specific topics, I have not that much experience.
Fernando Lopez [00:14:24]:
So I need to learn quickly and to realign with the other engineers. And I think these other engineers might be those that have started from the beginning in Google. So I need to, like, align my level with the others. Yeah. And so yeah.
Ben Wilson [00:14:44]:
This is this a topic that I'd love to dive into, actually. You're talking about imposter syndrome, which exists everywhere. I've I've personally interacted with people that have been hired at Databricks in our department who are senior staff engineers. And then within their first, you know, 4 or 5 months, just talking to them and ask them questions. Like, hey. How's everything going? How are you enjoying it here? And almost uniformly, all of them kinda get a look on their face. They're like, I'm overwhelmed. There's so much stuff that has been built here.
Ben Wilson [00:15:24]:
There's so many processes, and they're they're, like, just looking at the build infrastructure. You know, they come from another big tech giant, and it's they were using similar tools, but, like, what we're building is fundamentally different. So learning a whole new tech stack and then learning about the differences in product development and the the velocity. It's just different wherever you go. Like, some some places, it's like move fast, fail fast, correct fast, and just go, go, go. Other places are more lots of upfront planning, making sure that prototypes work, and, you know, you do all your homework beforehand, and then you get into development mode. And then you, you know, you have, like, a mix between those 2. But that imposter syndrome, I think everybody shares that when you start working in in a, you know, engineering in a big tech company.
Ben Wilson [00:16:24]:
Everybody's like, am I the dumbest person here? Like, I don't get what's going on. And I've met it at every level. And the only solution that I found for that is open communication amongst engineers. People just admitting, like, yeah. I have no clue what's going on. Can somebody explain this to me? You know, the number of times that I've seen the e l 15, explain it like I'm 5, right, on a comment on something. Even from principal, or distinguished engineers, they'll leave that comment in the design. You know, like, They realize that this is too complex, and we need to make it simpler.
Ben Wilson [00:17:05]:
But they're not afraid to say that. And when you start seeing that, you're like, Okay. I'm not the only one. Sweet.
Fernando Lopez [00:17:13]:
Yeah.
Michael Berk [00:17:14]:
Yeah. It's interesting that you you bring that up. I've been an imposter my whole life. Like, since, like, in school, I was not always the, what's the word? Like, I didn't I I went to a very prestigious school and everybody around me I felt was smarter, specifically in the technical space. Like, you had the math leads that had done combinatorics since they were 7 years old and can, like, take us the square root of any prime number up to 10,000,000,000. And I was just not like that. I was, I played a lot of sports in school. I hung out with my friends a lot.
Michael Berk [00:17:50]:
Played video games, of course. And, so I'm starting to get to the point where that just feels normal and it's actually kinda cool. Like, I just being around people where I'm just like, yeah, you could whoop my ass in, like, 90 of these technical fields. And that just is starting to feel right. It's actually sort of interesting as I'm sort of discovering it as I'm saying it. Do you guys still get sort of self conscious about it, or is it sort of a comforting feeling when you're feeling like an imposter?
Speaker D [00:18:24]:
Let me think.
Ben Wilson [00:18:25]:
Yeah. What a thought provoking question. I don't get bothered by that. I mean, I think I've moved on from the imposter syndrome and worrying about that so much. It's just gone to acceptance of knowing that Nice. It it doesn't do me any benefit to feel like I'm inadequate because I don't understand something. It's just it's like I've short circuited that pathway in my brain. Like, I don't know this.
Ben Wilson [00:18:56]:
I need to learn this, or I need to figure this out because I have a task that needs to be done in the next month. So I know how to how to get those answers and make sure that I understand it in the way that I understand things in the fastest way possible, and then learning to ignore everything that I really don't need to know. And that fire hose of knowledge that can come when you're like, okay. It's it's the start of a a new fiscal year or start of a new quarter. What are all the things that we're we're all you know, like, everybody's working on? As an outsider, you'd probably look at that and be like, I wanna learn all this stuff. I wanna understand what's going on. But as somebody who's responsible for building a part of that, or guiding a team in building that. You're just like, I cannot pay attention to that other stuff, or worry about it.
Ben Wilson [00:19:51]:
Like, there's really smart people who are gonna solve that, and I can't wait to use what they're gonna build because it's gonna be awesome. But I need to focus on understanding the problem space that I'm working on as best I can so that I can do my role in the development of what we're responsible for. So it's just not worrying about like, within Databricks. Right? Like, I I don't worry about what's going on with Delta or Unity Catalog or unless we have to integrate with it, and then I don't need to understand everything about that. I need to understand who do I need to ask, who's gonna point me in the direction so that I can learn what I need to know about this before making technical decisions.
Michael Berk [00:20:38]:
So, like, focused excellence and not worrying about things out of your purview?
Ben Wilson [00:20:44]:
Because, otherwise, it's everything is a distraction, I think. Yeah. It's overwhelming. It it's just too much information.
Fernando Lopez [00:20:52]:
Yeah.
Michael Berk [00:20:53]:
How do you think about it, Fernando?
Fernando Lopez [00:20:56]:
Yep. In in my case, let me know what you think. In in my case, it's I'm not conscious that I have the imposter syndrome. I only I'm aware of it when I look back and I analyze my picture, like, something happened there. I act in this way, in this situation, so on and so forth. So at that moment, I realized, like, okay. Something something we are, is happening that is medical imposter syndrome. But in the moment, in the loop, I am not, like, aware, like, oh, I'm facing that far.
Fernando Lopez [00:21:38]:
I feel like stress. I feel worried, etcetera. I'm just like a hard worker, and I would prefer be a smart worker. I am a hard worker because if I have a problem, I'm just focusing on, like, how can I solve this problem? And, if something, I try to, like, I don't know, touch all the possibility, before I contact someone for making, like, okay. Can you give me a hand? I don't understand this. I'm pushed inside Google to to to do the all positive to like, as Ben said, like, if I cannot handle this, I not understand this. Instead of, like, being overwhelmed with a problem by itself, I need to first ask with some guys in the channel that we have. And someone will reply.
Fernando Lopez [00:22:34]:
Someone will, provide some guide like, hey. Look at this documentation. Look at that, etcetera. So my point is, like, I'm not aware of that impostor syndrome when I am in it. Only when I look back and I analyze my my past and I say, love. Okay. And I I'm aware of that. Like, that what what I'm saying, like, I'm I'm a hard worker, but I prefer to be a smart worker.
Fernando Lopez [00:23:04]:
And, yeah. So that is my feeling. But there there are many, many phases of this program because On the other side, I really enjoy and really passionate with the problem by itself. So for example, now I'm working on GPU optimization. I mean, looking at how the operations go from CPUs to GPUs, how do you optimize kernel launch time, and so on and so forth. So suddenly, I got passionate about it, and I just focused on this problem. But sometimes I've found, I don't know, some blockers. And instead of go with someone, from, I don't know, TensorFlow team, the Keras team, or someone, I try to go through documentation, read the the entire documentation, and see where was my my question to be solved.
Fernando Lopez [00:23:57]:
So where where is my the solution to my question? So the this is the way I that I do. But in the smarter in the smarter way, I it I think it will be better, like, going to those experts and ask, like, how is the problem? Where can I go? What so something like yeah. You you see my phone right.
Ben Wilson [00:24:20]:
I would ask you, Michael, but I think I already know the answer. Are you a hard worker or a smart worker?
Speaker D [00:24:27]:
Both. Of
Michael Berk [00:24:28]:
course, both. I'm per I'm a perfect worker. I am naturally inclined to be a hard worker, and I'm aware of that. And I try to be a smart worker when I can, but I, my style is, like, smashing my head through a wall. It's not walking around the wall. And I need to get better at walking around the wall. And that's actually been something that I've I've been focusing on, and I'm a lot better at it now. But there's something satisfying.
Michael Berk [00:25:00]:
At least, the psychology for me is I see a problem. I'm like, alright. This is not too bad. I try to solve it. I fail, and then I get interested. If I can solve it on the first iteration, it's not that satisfying to me. And so but that that that yeah. Exactly.
Michael Berk [00:25:17]:
That but there's a dangerous loop to that. It's a slippery slope. Like, on that 70th iteration, you're like, oh, it's gonna be so satisfying, but you're just so far away from the potential set of solutions that you you just are not gonna do it, and then you, like, go and cry. But if you had asked on, like, the 7th iteration for steering, that's typically optimal for me. What what about you guys?
Ben Wilson [00:25:40]:
I mean, I would agree with your self assessment and how that's changing over time as well. And one thing I've noticed after talking to thousands and thousands of ML professionals in industries, either 1 on 1 on this podcast or, you know, when I was working with ML teams in the field of Databricks, is that that whole work harder instead of work smarter is sort of the defining characteristic for really good data scientists and ML engineers, because that methodology is the most effective approach to solving machine learning problems of saying, I'm gonna approach this in a scientific method. I have a bunch of hypotheses I need to test out, whether it's just an idea of, yeah, I think this should work this way. Let me test it. And that can be that's sometime applied to framework code, like, not even stuff that's involving training models or or constructing an architecture for a model. So it's sometimes data scientists take longer to write, you know, boilerplate code and stuff because we're we're conditioned or maybe it's just who we are working in that role to try to brute force our way through and test a bunch of stuff and then see which you know, what what's the most promising result and then iterate and make it better, and then finally, you have a product. Whereas pure software engineering is the exact opposite. It's for, like, plan, intelligently think about the solution, get peer review, ask questions, and get that optimal answer as quickly as possible and then go and build it.
Ben Wilson [00:27:27]:
And that's why I think for you, Michael, you're seeing that transition in yourself because you're you're making that conscious decision to be like, I'm thinking about things differently. I noticed that myself when I did that exact transition. I'm like, I approach things way differently now because of needs of velocity, product development life cycle, and the type of like, the nature and type of what I'm working on now is rapid iteration, and you just can't I can't do that that old way of how I used to build things.
Michael Berk [00:28:02]:
Yeah. Fernando, I know you have, at least from the outside, it seems like you have a very strong work ethic, and you spent a lot of time building up those skills via, like, leet code or whatever it might be practicing algorithm implementation. What do you think are the nontechnical things you gained from those exercises?
Fernando Lopez [00:28:24]:
Could could you repeat your your question, please?
Michael Berk [00:28:26]:
Yeah. So you spent some amount of years, like, in a lot of your free time working very hard on the technical. So, obviously, now you can implement a cert sorting algorithm or whatever it might be. You know your light GBM versus xGBoost versus random for us. And I'm curious what all of that hard work has given you from a nontechnical perspective. So you have a better work ethic? Are you better at understanding your limits? Are you better at planning out how long something will take? Are you better at prototyping? What are the things that you gained from all of those, years of hard work that are not learning how to implement an algorithm?
Fernando Lopez [00:29:10]:
Yep. I got
Speaker D [00:29:10]:
it.
Fernando Lopez [00:29:12]:
I think
Speaker D [00:29:15]:
I think
Fernando Lopez [00:29:17]:
I feel better prototyping because I think I can move fast building, like, POCs. You need to prove how this tool will work. I can do quickly, and and, yeah, I think that that is the only thing that I gained because, for example, with lead cut problems, As I said many times, the only thing that I learned besides, have being really good on understanding how this product has been, like, written and how to approach the interviews, so on and so forth, It's like the understanding of, the natural of these data structures. Something as simple as that. Yes. The natural with natural, I mean, like, how do a has might work? How do, link lead link list works? How do that so much work? So, I think going back to your question, Michael, I think, what I've learned, my skill that I, yeah, I I work, and now I feel better on it is prototyping. And and this this helps me, for example, when I need to when I start a new project, I need to be a POC here at Google, and I can do quickly. The the the the the I don't know.
Fernando Lopez [00:30:43]:
The time we have is, like, one week. I can do it in one way, and I have and I I do know I do not do many iterations. Like, okay. I did it in one day, but the next day, I had to change, like, almost everything. No. No. It's like one day, I'm just adding minor things, and that's it. So I think I'm I'm I'm good on it.
Fernando Lopez [00:31:06]:
Actually, I got feedback from my peers, about it. Like, Fernando is, like, sort of the fast guy doing POCs and building code quick. So I'm aware of it. I like it. I feel like, you know, like an athlete that train a lot of time, running whatever now. He's able to run redefines in comparison to some others that did not train as hard as the one that trained almost every day. So this is a sort of analogy that I have. But yeah.
Fernando Lopez [00:31:45]:
And also, I think, I was able to to to oh, like, I don't know how how can we measure that, but, the the better understanding of of the problems. Because before my my my hard training on email and this, legal problems, I was, like, overwhelmed with what the problem statement says in terms of machine learning, for example. You you you have always a problem in in the the description of the problem is is always in a problem in your real life. Like, we have, this amount of data that, is related to some sort of population. So it was a context there, and it was, like, blocked in the terms, like, I don't know what to do. I don't know what yeah. I don't know what to do. And then training over and over again in Kaggle, for example, I was able to sort of, like, quickly synthesize in my mind what I have to in order to overcome these tasks.
Fernando Lopez [00:32:48]:
So, yeah, I think that will be something similar to to the first one that I mentioned. Yep.
Michael Berk [00:32:55]:
Yeah. One thing that stands out about your work ethic is that it creates a lot of confidence via reps. So confidence is I define it at least as sort of general certainty about your abilities. And if you've done something a bunch of times, well, you can go do it again. And so the fact that you've done all these little mini prototypes, it's probably really, really useful when you're building a, let's say, a complex version of those prototypes with maybe 8 components, like you said, your fast POC. If you have done each of those components a bunch of times, you're probably pretty confident about building them. And then also because of that, you can assemble them really nicely. So you essentially have, like, a lot of raw building blocks that are very solid that you can insert wherever you need.
Michael Berk [00:33:41]:
Does that make sense?
Fernando Lopez [00:33:42]:
Yep. Yep. Yep. And also one drawback, an important drawback, is the one that I mentioned at the beginning of this conversation. Working at a scale because I think I'm the guy that is pretty good doing, like, these POCs. And once you are moving to the part of making those scalable, taking care of how the user will gonna use your tool or whatever, For me, it started to come a little bit overwhelmed. Like, I I don't feel comfortable, like, pretty pretty I I don't have the the how do you say that? I I don't feel pretty sure, in comparison when I'm building POCs. I start feeling doubts, not feeling confident, but I, yeah, I I exchange ideas.
Fernando Lopez [00:34:33]:
I communicate with my peers, but I do not feel kind of comfortable in comparison with what I'm building POCs. When I'm building POCs, I feel like a rock star. I don't know how how it doesn't sound, but I hope it doesn't sound like
Michael Berk [00:34:45]:
No. It doesn't.
Fernando Lopez [00:34:47]:
But I I feel I feel good. I feel confident on it. But once I move into to to a territory that is not, my comfort zone building POCs, I start, losing my confidence.
Michael Berk [00:35:00]:
That's good. And that's also something you can't really learn from Kaggle. Like, I I will never forget the first time that I tried to run a SQL query against the table that was too big to run a SQL query against. And I was like, woah. I could sit here for 7 weeks or I could sample the data. And that sort of inflection point, you can't really get you can read it. You can hear me say it. You hear someone else say it, but you need to experience it firsthand.
Michael Berk [00:35:24]:
It's like, why is this query taking 16 hours? And so I I feel like you can't learn that outside of a large organization. Do you agree, or do you think you can, potentially sort of intuit that from textbooks or anecdotal conversations?
Fernando Lopez [00:35:40]:
That that is a good one. And I think yeah. Totally biased in my experience. I think it's pretty difficult that you will learn these things outside large scale companies because it is how I was doing it. I was learning from books, and I was translating this knowledge in block, in some blocks. And my block says, like, how to deploy a model, whatever, whatever. But it's what the theory says. But when you're really working on these problems, it's pretty pretty different.
Fernando Lopez [00:36:19]:
It's it's not as as the book says. So you you need to yeah. It's pretty different. You can say how to deploy how how to make Docker container, how to deploy your Docker container. And this this sounds good. You are deploying something. But doing this, for large scales problems is pretty different. You need to consider many factors.
Fernando Lopez [00:36:43]:
A lot of factors that you weren't aware of when you were when I was writing my blog, like, how to deploy a model blah blah blah blah. So I think you're you're the only way, in my experience, the only way you can learn learn these things and have experience on it is working on a large scale company. Wherever the company is, just large scale company. Because, again, working in a start ups, I worked a lot of times in a start ups, and I realized that I I I was work I work hard. I wasn't in my job, but I was always curious about how to do this in real scale, like, in big scale. So I was a guy. I was looking at tools to put your models in large scale. So at one point, when AML Flow was launched, I was the guy who who said, like, hey.
Fernando Lopez [00:37:39]:
There is a tool that we can use for versioning your models and deploying, see whatever the metrics and with plenty of models. But we have not a plenty of model. We had not data. It it was it it was useless for for our case, but I was, like, the guy who was trying to see, like, how this problem will be, handled, large scale. So it was it was the John Fernando thinking. Living, like, I don't know, 6, 7 years ago. But but, yeah, in conclusion, my my my conclusion is, yep, the only way to learn these large scale things is working in a large scale company. And, you can learn, read a lot of books.
Fernando Lopez [00:38:26]:
You will you will have a theory, but but in real life, you will need to communicate with different departments. The things that are not in in in the book that are not yeah. Mentioning that book, you will need to make agreements with your peers. You need to yeah. There are a lot of things that you need to consider when they're working in large scale companies.
Ben Wilson [00:38:50]:
Yeah. I can I can say from my time at at smaller companies that didn't have a large ML presence and that the teams that I was in, I was able to ship prototype code to production? You know, write tests and stuff. It it wasn't, like, junk code that was being shipped, but you don't have to worry about things like scale, because it's like, hey. We need to retrain the model every 2 weeks. And then if we wanna replace the model, it's just like you said before. It's like, let's just test it offline with production data that's coming in. Does it look good? Let's run through some statistical tests. Let's compare it to what's currently in production.
Ben Wilson [00:39:33]:
That's just analytic notebooks. I was doing that stuff in Jupyter. And you get an analytic report from some, you know, boilerplate analytics code that you wrote with Matt Potlib, and you're like, yeah. This looks good. Look at my, my t test results here. I'm good. Let's let's switch it off. Let's turn it on and shut production off.
Ben Wilson [00:39:55]:
Make sure the traffic is going through it. And for me, the biggest thing that opened my eyes was working with a couple of the the customers at in my first few months of working at Databricks in the field where some of these customers, they weren't they're not household names. And you go into their office, you know, like, okay. It's like a 50 person company. I'd classify that as, you know, startup size. And then you look at their data, and you're like, hang on. This is geofence data for an entire cell provider, that your app is installed on. How many trillions of rows is this per day? They're like, oh, yeah.
Ben Wilson [00:40:44]:
We we get, like, exabytes of data, per month coming in. You know, like, what do you do with all that data? Like, oh, we we build models off of it. And I was like, what kind of models? Like, they're like, oh, we write the algorithms ourselves in in c plus plus, and then, you know, they're compiled binaries that we're, you know, using, for deployment, but we do all of our analytics and feature engineering on Databricks. I was like, that's cool, but why do why do you use c plus plus? And they're like, big o. Like, oh. So, yeah, at that scale, this actually does matter, and it gave me an appreciation, particularly in the feature engineering stage. And that was the the light bulb that went on my head. Like, algorithmic efficiency and data structure, particularly with traversals, you start to really get a a a true appreciation for what that stuff means.
Ben Wilson [00:41:44]:
You know? Because the, you know, computational time complexity with most things where you're like, oh, should I use a a loop or a comprehension? Makes sense to, like, map over it. But then when you start talking about these data structures, why those why we go through an algorithms interview, It becomes apparent when you're like, oh, that's why we use a hash map here. When yeah. It doesn't matter when you're storing a 100000 items. It's almost instantaneous. But when you're selling it storing a 100,000,000,000 items and you need to drive to, you know, the difference between log n and n, becomes very apparent very quickly. That was a big chip for me. It like, seeing that and then also the concept of staging where you're not just testing in dev of saying, like, does my code work? Does this solve the problem? I'm gonna fetch some prod data and then do an offline test to make sure that everything's working.
Ben Wilson [00:42:50]:
It's more like, I need to test this at production scale with junk data just to make sure that the service doesn't blow up.
Michael Berk [00:43:02]:
Yeah. Fernando, do you feel prepared for learning these big data concepts from your training? Or do you think that, you would would have liked to have taken a class or read a book before joining or gotten a PhD?
Speaker D [00:43:23]:
I'm not sure.
Fernando Lopez [00:43:24]:
I would prefer just jump directly to yourself, Victoria, and then then be learning on the fly.
Michael Berk [00:43:33]:
Cool. Yeah. Yeah. Same here. I I definitely think that the fastest way to absorb knowledge is just, like, jumping head first. Definitely the most painful way, and some people can really, like, excel in classroom settings. But for me, my main issue is is twofold. It's interest.
Michael Berk [00:43:52]:
Like, I don't care about the theoretical example with the clean data. I wanna see, like, the real data and have this turn into a real use case that's actually deployed in production. And then also, for me, retention is not a strength of mine, I would say. I'm very dynamic when put on a new problem. And I remember where things are, but I don't like, if you ask me to explain how any algorithm works, I could do it at a high level. But I don't know that I could recount it from memory that well. Ben and Fernando, in your guys' experience like, I know, Ben, your brain is a Rolodex. But for both of you, do you guys think you are strong in terms of maintaining memory of complex systems? Or how do you guys think about memorizing facts? Like, how does your how is your brain structured?
Speaker D [00:44:45]:
For
Ben Wilson [00:44:45]:
me? I don't even know, man. Like, I've had people ask that after I've I've spoken up during, like, design reviews and stuff or I made a comment. Somebody's like, why did you even ask that? And they're like, oh, it's because of this, this, this, this, this.
Fernando Lopez [00:45:00]:
And they're
Ben Wilson [00:45:00]:
like, wow. But we didn't even think of that. Thanks for bringing that up.
Michael Berk [00:45:06]:
Funny you say that, actually. I was working with a data scientist, like, a month back, and they were like, oh, yeah. I have to go do this podcast with Ben. They were like, oh, Ben Wilson. Yeah. He's a fountain of knowledge. So congratulations about that. Fountain of knowledge.
Michael Berk [00:45:20]:
Fountain
Ben Wilson [00:45:21]:
of BS. I don't know, how like, why I retain certain things. I think most of the stuff that I hate will speak up about and not just tell my brain to shut up and, like, don't leave a comment on something. When I'm talking about, like, implementation design reviews and stuff, which is where that knowledge becomes useful, or during, like, a code review. The stuff that I am aware of me from memory is usually stuff that I screwed up. So I'm bringing it up not because I'm like, you didn't think of this and you should have thought of that. It's like, hey, this burned me, like, 5 years ago. Can we just look at this? Because this could potentially be something bad, and I don't want you to fail the same way I did.
Ben Wilson [00:46:18]:
And sometimes it's legit, and somebody's like, yeah. This is yeah. We definitely need to clean up the state of this because there's a memory leak here. Or it's, now we have this other mechanism or now this process is gonna be killed. We don't have to worry about that. And I didn't have context for that other part of it or it's part of the system that they're working on. I didn't know it was there. And it's like, cool.
Ben Wilson [00:46:41]:
Great. Thanks for looking at it. We're good to go. Or sometimes it's, oh, jeez. We need we need to rethink this. Let's use this other data structure, or let's use this, you know, complimentary process that will do stuff like flush a queue. If it starts growing, you know, use TTL. Like, there's lots of things to think about with scale that you don't have to when you're, you know, working on smaller problems.
Michael Berk [00:47:06]:
You didn't answer the question. The question is, how is organ how is information organized in your brain? And, I guess you did answer the question.
Ben Wilson [00:47:17]:
I think that did answer the question. Yeah. It's not organized, which is why I pointed they didn't answer the question. No. I I really don't know.
Michael Berk [00:47:29]:
It just it's just there.
Ben Wilson [00:47:31]:
When I see something that when I process that that topic, it just pops up through some quantum event of like, hey. Remember how stupid you were back here at this time? And I'll remember that and then write you know, transcribe, like, hey. We should think about
Michael Berk [00:47:51]:
this. Maybe. Got it. So your cache is optimized for emotional connection to that fact. Oh,
Ben Wilson [00:47:59]:
yeah. Definitely. It's about all the the hubris checks I've had throughout my career. Even before software. Like, me thinking, like, yeah. I know this. I I understand this, this concept and then find out, like, you remember that nuclear evaluation board that you went through and and the, that NRC investigator who asked you that question that almost got you disqualified for your your terrible answer, think before you open your mouth. So, like, stuff like that just sticks with me.
Michael Berk [00:48:35]:
Got it. What sticks for you, Fernando?
Fernando Lopez [00:48:40]:
I think sort of, like, similar Ben as Ben as Ben does. But I I would say, like, in my case is when I sort of learning stuff, different stuffs, I try to I try to change. Instead of learning the what, learn the how. Like, for example, what is TensorFlow? I don't wanna memorize what is TensorFlow, what is something. I I I wanna learn how TensorFlow works, like, in this scenario under this context, how lead clicks works, how a decision tree works instead of think what is a decision tree. A decision tree is a a setup whatever. So when you change the how instead of the what, I think in my in my in my in my personal experience, it helps me to, like, conduct my knowledge, my, yeah, my knowledge in in in a more handled way for myself. So when I learn new things like how how JAX improves my code instead of some TensorFlow, for example.
Fernando Lopez [00:49:54]:
So instead of being like, what is JAX? What is this new function? What is this whatever? It's about how this thing replace this. What how yeah. The how. I think this is yeah. Yeah.
Michael Berk [00:50:08]:
Could you could you walk us through an example? You shot out a a bunch of cool technologies, but could you show us a what and how for maybe one of those or something else? Just a simple example.
Fernando Lopez [00:50:19]:
Yeah. For example
Ben Wilson [00:50:20]:
What is Jax? How we use Jax? That's the big one. Yeah. I mean, it's new. It's hot. It's it's pretty awesome, but low level and yeah.
Fernando Lopez [00:50:29]:
That that is a good one. That is a good one. For example, I I don't wanna, like, say what what is JAX? And JAX is a new framework that I was with, but some guys, I didn't wanna memorize that. I don't wanna memorize, like, the manual. I wanna find the things that I need and how those new things replace the things that I want to improve by using this new tool. So, for example, let's say I'm working with TensorFlow, with Keras, and I need to replace some piece of code to be optimized with Keras with the YAX. And since this is my first time with YAX, I don't wanna learn what is Yext, what is this function, what is, the way to make a model with Yext. I wanna learn learning how to take advantage of the different functions, operations that JAX provide in order to optimize my code.
Fernando Lopez [00:51:26]:
So when I change that mindset now I think I learn a little bit faster, better, the the new thing, in this case, Jacks, for example. So I don't know if my example is clear, but this is the way how I, speak inside my mind, the things in order to learn this new stuff. So yeah.
Michael Berk [00:51:52]:
Is why relevant?
Ben Wilson [00:51:55]:
For me? Yeah.
Fernando Lopez [00:51:57]:
Yeah. Yeah. For example, y. But if you try to my my my advice would be try to avoid the what. What is that thing? So I try, I think it's better to put into context the thing like how the fee will improve, how the thing will help me, how the thing. I think in that mindset, it's better instead of thinking of what is the thing.
Michael Berk [00:52:22]:
So how is what with context? Or it's, like, with context and implementation?
Ben Wilson [00:52:33]:
Like, how is, like, an application of what, but divorced from pure reference knowledge of that thing. So if we're talking about a a software package, you you're talking about TensorFlow and, you know, Jack's implementation. But if you're to shift to something like, a tool that is more hierarchical, I guess, in nature, and restrictive. So TensorFlow, you can do anything. But if you're to say, okay. Let's talk about stats models. Like, what is stats models? You could memorize all of the different things you can do with that and all the different use cases for a package like that. That's not gonna help you to build anything.
Ben Wilson [00:53:21]:
It'll help you understand, should I use this or for these type of tasks? But understanding how to how to go through and understand the how to use this thing helps you build stuff faster. And I think the understanding the why it was built helps you understand whether you should use it or not.
Speaker D [00:53:47]:
So
Ben Wilson [00:53:47]:
if it's a good got a compelling why this was built, people are probably using it, and it's well maintained. It's probably well designed. It's somewhat easy to use, hopefully, and it proves valuable. And if you look at something and are trying to figure out the why, particularly this is what I do with hype stuff because I get sent hype stuff every week, of, hey. We should integrate with this new package with MLflow. Why? I don't ask them that, but they'll send me a link to a GitHub repo. I'm like, cool. It this was released 2 weeks ago.
Ben Wilson [00:54:30]:
I have no idea who this person is. I don't know a lot of people in the software, you know, open source community, but I'll look through the code, like, this the source implementation. Usually, it start with the test. I'll go into the test folder because that shows me the the how really fast, And I'll look at what they're they're testing. I'm like, oh, that's what they're doing here. That's cool. Or I'll say, like, why? Like, why does this exist? Search through a couple more tests. Like, isn't this XGBoost with, like, a slightly different optimizer? Like, why does this exist? And if I can't get a justified reason for why this is a thing, I'll file it in my memory or leave a note for myself.
Ben Wilson [00:55:14]:
Check back in 3 months. See, is there now 5,000 stars on this repo and check PyPI download stats? Are they getting a 100,000 downloads a week on this? Maybe they're onto something. Maybe this is fundamentally better than what exists before for some reason. Better APIs or it's faster. Who knows? For the other 99% of the time, though, if I ask that question why and I'm like, I don't know why this exists, it's usually junk. Move on. Not worth my time to understand.
Fernando Lopez [00:55:53]:
And also I I would like to rephrase my first answer. And I would say, if I could sort the what, how, why, I think it's most important or most relevant. 1st, the why, understand why, and then how. And then if it is a case, if you need the what, in this case of the axe, for example, why do I need axe? Because it will help you to optimize your stuff. How Using comp these compilers, using these things, whatever. And not only if you need it, you you learn the what. What is the function that I need? I need this function. I need this this operation.
Fernando Lopez [00:56:34]:
But the whats are the documents. The whats are in in your documents and bibliography references. But I think it's most important to learn the why, then the how, and the only view needed, the what's. So this this is my my conclusion list.
Speaker D [00:56:53]:
I like it a
Ben Wilson [00:56:53]:
bit more.
Michael Berk [00:56:55]:
Yeah. If you just have this, like, think about a blob of a search space, the why narrows down the search space to the important things. Then from there, the how is the most efficient place to get from thing to prototype or thing to implementation. And then the what is sort of the filling in the knowledge around that prototype so you know that you're actually doing the right thing. And, maybe we don't even need what. Like, we should just grunt at things and not use their names.
Speaker D [00:57:25]:
You need docs. You do need docs. You do need docs.
Fernando Lopez [00:57:30]:
Yeah.
Ben Wilson [00:57:30]:
Because not everybody wants to do the self discovery how.
Michael Berk [00:57:35]:
That is true.
Ben Wilson [00:57:38]:
And I don't blame them. I mean, as now somebody who is responsible for maintaining docs on an open source project and doing iterative project design for integration with other services or using other frameworks. I have a a very strong preference and unspoken thanks to maintainers that spend a lot of time making good docs. So if I can if I can answer the the how and the why in less than 2 minutes of going to their docs page, I have profound respect for the person that put in the effort, the unbelievable effort to answer those questions that quickly. I'm just like, man, what a great team of maintainers. And then sometimes you you're like, hey. I I know this package is important. It's like, everybody's using it.
Ben Wilson [00:58:37]:
Everybody's talking about it. I go to the docs, and I instantly have that negative why experience. Like, why did they not spend at least some amount of time answering this question? Like, these docs are so terrible, and I don't care how powerful this tool is. Like, I now have to read your source code to understand how to use your tool. And then by read your source code, I mean, I need to find where the public APIs are in your source code because they're not self documenting. And now I have to reverse engineer your implementation for your entire project. That pisses me off, when that happens. So, yeah, it makes finding the why harder.
Michael Berk [00:59:29]:
So, Fernando, before we close, do you have any tips, for people who wanna make a similar transition from, let's say, to the startup world into a larger organization like Google. How have how did you do it, and how have you seen other people who took a similar path? How have they done it?
Fernando Lopez [00:59:46]:
Yeah. 1st, I advise you to be, like, study a lot in the way how you think, is the best way for you. In my case, it's like repetition. It's a record. Rep repeated every single day solving 2 or 3 problems for being able to be more, like, fast solving these sort of problems. But this is my case. If someone that hears a podcast is trying to do the same thing, I could say first, like, try to identify how is the way that you learned the best. If you need to read books, grade the books.
Fernando Lopez [01:00:38]:
If you need to practice a lot, practice a lot. In my case, it was practicing. And then try to, try to think in a scale. Try to there. Try to think in a scale, because, real problems in real life works in scale. And the second one, well, experience from from some, people I have met here in Google, 3 different stories. Some of them just were hired directly from the university, so they, went directly to Google. So the happy part, I call it.
Fernando Lopez [01:01:28]:
But some of them have done some similar parts as I did, like practicing a lot, studying a lot, then passing interviews, and that's it. Yeah. My last advice would be, like, enjoy Enjoy, what you learn. Enjoy everything because, one day you will hire for a company you want, and you will be, you know, focused on other things like learning and trying to catch up as soon as possible in your projects, the tool that you're learning, the the tools that you need for doing your job. And then the the the the past of solving problems is just the past. Now you are here, and you will need to learn another stuff. And think what's next. So yeah.
Michael Berk [01:02:25]:
Cool. Crystal clear. Yeah. Alright. Well, I will summarize. We had a lot of very nontechnical topics, which I really enjoyed. It's fun to think about how to like, the technical honestly is the easy part in a lot of cases, and it's building your brain so that it can work and scale out technical solutions. I think that's the hard part.
Michael Berk [01:02:47]:
So some things that stood out to me was, first of all, transitioning from startups to a Google like organization. In the startup world, you typically have to sort of pitch yourself more and pitch your products more. When you enter a Google type organization, you're given a very predefined role where you need to execute. And then, of course, the core difference is scale. So data users, infrastructure, everything at Google and those types of organizations, there's just a lot more stuff in every aspect. Imposter syndrome is a real thing that everybody experiences, like, legitimately everyone. But it's really good to communicate this. And if you're feeling it, someone else probably is too.
Michael Berk [01:03:26]:
And it's also important that you're a smart worker and you ask for help when you need it. Sometimes you can bang your head against the wall 50 times. And if you enjoy that, such as myself, feel free. But it's often more efficient to ask for a redirect or at least a a shift in perspective. And then, finally, on the learning side, focus on the how instead of what. And if you're gonna prioritize the questions, first, think about why, then think about how, and then finally, think about what if you need it. Also, figure out your personal method for success of of how your brain works and how you should be learning. Then finally, don't lose perspective.
Michael Berk [01:04:03]:
Enjoy where you're at now. It's grass is always greener on the other side. So, when you've moved on from your current situation, there will always be things that you will miss. So, Fernando, if people wanna learn more about you or, any roles at Google, potentially, where should they go?
Fernando Lopez [01:04:23]:
My LinkedIn slash fair new turn as it sounds fair new turn. In all my net my my social network like GitHub, even Instagram. I am like her new
Speaker D [01:04:36]:
term. Yeah. Do you
Michael Berk [01:04:38]:
mind spelling that out just for fun?
Fernando Lopez [01:04:41]:
Yeah. F e r n e u t r o n. Cool.
Michael Berk [01:04:49]:
Awesome. Alright. Well, until next time, it's been Michael Burke and my co host.
Ben Wilson [01:04:55]:
Ben Wilson.
Michael Berk [01:04:56]:
And have a good day, everyone. We'll catch you next time.