ML at Netflix and How to Learn Deeply - ML 120
In today's episode, we speak with Netflix ML engineer Amir Ziai. Expect to learn about building ML tools for stakeholders, the pros and cons of a Netflix-like culture, and Amir's strategy for learning.
Special Guests:
Amir Ziai
Show Notes
In today's episode, we speak with Netflix ML engineer Amir Ziai. Expect to learn about building ML tools for stakeholders, the pros and cons of a Netflix-like culture, and Amir's strategy for learning.
Sponsors
- Chuck's Resume Template
- Developer Book Club starting
- Become a Top 1% Dev with a Top End Devs Membership
Socials
Transcript
Michael Berk:
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks and I'm joined by my beautiful cohost.
Ben Wilson:
Ben Wilson, I build MLOps tooling at Databricks.
Michael Berk:
And today we have an awesome guest. His name is Amir Zi. He has three master's degrees, which we're gonna get into. They include systems engineering, data science, and computer science. And throughout his professional career, he's been an applied ML engineer in the fitness industry at Heart Incorporated, H-A-R-T, in the brand sort of marketing industry at Zephyr, and most recently in the streaming industry at Netflix. So Amir, you build large scale systems for multi-modal content understanding. I can't even pronounce that. Can
Amir:
That's
Michael Berk:
you
Amir:
a
Michael Berk:
explain
Amir:
mouthful.
Michael Berk:
what that means? Yeah. So what does that mean?
Amir:
That's a mouthful, yes. Well, thanks for having me. That's been what I've been doing for the past few years at Netflix. So I work with a team that basically everything that comes after we have either a licensed A title or a show or we have through our studio made the content. There's a series of steps that people need to go through. So this includes things like creating subtitles, creating dubs, creating promotional assets. things like trailers and posters and artwork, all that stuff. So there's a whole village of people that are basically going through the steps required for launching the title. And my work has been more or less in the past three plus years to build assistive tooling for them that they can really make their life a little bit more efficient. So they have to really understand what the content is about and they have to process lots and lots of shows on a monthly basis. So the tooling we build helps them understand the content. So that's where the content understanding part comes in. And we have a massive catalog. So we need to process all this information, index everything for them, make them really accessible. And I guess that's hopefully explains what the large scale and content understanding pieces mean.
Michael Berk:
Nice. Yeah. So I used to work at Tubi as a data scientist, Tubi is Fox's streaming service. And we actually had a few episodes ago, an ex Netflix current Tubi employee. And it was really interesting to think about the differences in sort of how scale impacts your tech stack, the types of problems you work on. And after you reach a certain threshold, doing all these trailers and content understanding types of things manually just isn't tenable. So it's really cool to see how these companies pivot and hire ML engineers to do that and automate that. So Ben, I know you were gonna say something.
Ben Wilson:
Yeah, it's just an interesting place to come into from a data scientist, ML engineer, software engineer is to go into a company, into a, a problem space where you're never, you never starve for great ideas and things to build as a product that you know, somebody's going to use almost instantly
Amir:
Yeah,
Ben Wilson:
because it.
Amir:
that's a very interesting point because since I've started this, and I've been at Netflix for five years now, and I've been in this role for three and a half, day and night difference in terms of how excited people are to work with ML technologies, people who are doing their day-to-day work, the creators, the trailer editors, the artwork artists, they'd really like ML stuff, and they really are... helping us and working with us in very close proximity. We are having sessions with them. It's that level of excitement I've generally haven't seen in my career before. Working with people who really are excited and not worried about all this new technology, it's very interesting. That just gives me a lot of drive to do the work. And there's, as you said, it's media assets, video files, images, text. There's all this, it's really a data scientist dream to work with all these different pieces of things. And there's real business impact to be had. So it's an interesting space because I guess kind of out of the box, you get all of those boxes checked. So it's just doing the work and with the level of excitement of the people you're working with, it just makes it much easier to work.
Ben Wilson:
I've got a potentially controversial question for you
Amir:
Go for it.
Ben Wilson:
related to what you just said and your history before coming into Netflix, working as sort of a traditional data scientist, right? Where we would be tasked by businesses to, hey, can you solve this problem? We have data in a database somewhere that's collected either through pristine means or less than pristine means and data quality issues related to... you know, tabular data formats. Do you, did you find the transition from the chaotic unstructured, even though it, you know, semi-structured data that exists in data warehouses to solve problems into the pristine environment of dealing with standards based data? Did you feel like audio, video, these are all, the data is either correct or it doesn't work, right? Uh, did you find that that transition? Does it make it easier to build viable products? And do you enjoy what you do more because you sort of have guardrails and controls over the data that you're interfacing with.
Amir:
Yeah, that was actually my intuition as well that I'm going to go in. I did not really know much about media files and media processing, all of that stuff. So I went in thinking that, Hey, like these are MP4 files, WAV files, like JPEGs. They must know where exactly everything is. You must have a very clean system of like keeping track of everything. Cause Hey, we're showing it to members and it's all working pretty well. of many, many steps of before we get there, especially the earlier in the process, the more chaotic the whole system can be. So it's not, at the end of the day, it's not very dissimilar to what you're describing with tabular data, data warehouses. It's just a different file format. That set of challenges tend to be different. So now you have like many, many deliveries of the file. And let's say you're looking at a movie. So let's say it's a two hour movie. There could be many, many deliveries of it. There could be many, many cuts of it. So systems that keep track of like the life cycle, that is where the challenges come in. There's different levels of QC. So a long way of saying that it's not as clean as one might think. And there's maybe not exactly the same challenges, but there's a similar set of challenges that you have to deal with.
Ben Wilson:
Yeah, that was something that I. in my career journey when I went from, you know, pure, pure engineering, like, you know, traditional engineering into, you know, data science and ML and stuff. But one of my final jobs before making that transition was working for a disk manufacturer that basically we were, we were subcontractors for Warner, uh, like Warner Brothers Studio. And we were doing all of their DVD and Blu-ray manufacturing for them. But. every couple of months they would have us come out to the mastering studios in Hollywood and sometimes on the lot and you get to meet all these famous people. But what I was always interested the most in was and asked super annoying questions to the people that were there doing this work. It's like, how do you get that file? I thought that it was just, okay, somebody has a file that master that into a Blu-ray and they're like, let's show you the process. I've never seen data lineage associated that's like that firsthand since that. And I've been working now in tech for like over a decade. I'm sure Netflix is even more complex than what I saw at Warner, but it was insane how many steps they were. Everything from the menus, like all of that Java code that generates the interactive menus when you load a disk into a machine. And then all of the hidden extra features, those are all just hundreds of files that they have to compile together. And then I didn't realize that audio, when they get it is raw, it's not compressed at all. So it's like these lossless formats where the audio is actually a hundred times bigger than the video file. And seeing... They were doing it manually though. And so you think about a studio with the releases that they come out with and seeing how many people were involved. It's like, wow, there's 15 people on this floor just doing subtitles. That's crazy. And they're all working on the same movie right now. Everybody's doing like a different five minute slice, but for what you're doing, you're, your group has to scale through. in such a way that most people wouldn't even be able to comprehend. So what is the process when you're looking at the tooling that would potentially be needed for something that people are still doing manually? How do you and your team break down that problem and say, this is kind of how we would tackle this or here's some theories? I'm really curious about that process and how you think through that.
Amir:
Yeah, it's, I guess it's generally doesn't tend to be very systematic. We are getting a little bit more systematic about it. Usually it's the cases that we, a lot of these ideas come out organically as a part of working with these creatives. So we just watch them work or they complain about something a lot and we get ideas. Oh, you're doing these five things and then you're delivering the file. Why do you need to do that? Can we like apply something to this step? And I guess the first maybe couple of years of me being in this role, there was a lot of that going on. It was just, oh, I noticed something, maybe that there's something there. Let's go see if we can deliver some value with an algorithm with it. Maybe even in a lot of cases, honestly, some simple like application or some simple, not even machine learning related. So that's the other interesting thing that I'm, uh, I guess I've learned a lot in this role is you don't need to be fancy all the time. you can just make a lot of really good impact with the simplest solution possible. So there's been a lot of those like low hanging fruits and there still are. But I guess over time, as we have delivered some of those, made things a little bit more systematic build platforms. Now we're getting to a point where we can kind of look across the board and see like what are really the big opportunities? What are things that we can really build that have like longer term impact? maybe things that the creators are maybe not even thinking about right now, maybe things that they could be doing completely differently. And I guess that's, I guess I'm happy that we're in that state now, cause it's now a portfolio of like really low hanging fruit that we can immediately deliver value and feel good about. And also things that are longer term bets, they're really excited about, but we pitch these to the creatives and are like, why would we do things this way? It doesn't. really make sense and then you have to like go through it with them, see if it really does make sense, like what would it mean if we delivered something like that that they can maybe change their workflows. And that's one of the other things I really learned is changing workflows is really hard. But if you get people excited about like a new piece of technology and they can see that it's going to save them time and they don't have to do the grunt work and they can focus on the creative aspects of the work, they really... become champions of the project. So that's been, I guess that I'm really liking our kind of cadence now of a mix of like both very short term and longterm projects.
Ben Wilson:
That's
Michael Berk:
question.
Ben Wilson:
a really interesting observation with regards to something that I've only seen in a couple of companies that I've interacted with where it seems like it's an uphill battle and you're, you're building ML backed or just as you said, uh, it doesn't have to be fancy. Could be scripts or could be just simple SQL that automates some annoying part of somebody's job. And when you start tackling all of those things and having lived through that myself, at a number of companies, when you get to a point where your initial landscape of low hanging fruit, you list it all out. Like, here's all the things that suck about what we're doing right now. And here's how we're going to tackle all this over the next two or three years. We have this big plan. And when you're six months away from hitting that point where you're kind of done with everything, it's almost like this demoralizing feeling that some people in the team get. What happens when we're done with all this? Are we done? Like done, done? Are they just going to dissolve our team or our department? And they just kind of, the people that have been there done that are just like, no, no, don't worry. Like the fact that these tools now exist are going to, there's going to be a whole new generation of low-hanging fruit that comes out and you're going to enable creative use of the things that you built and how these things can interact with one another is going to When you free up the people that are doing the job and let their creativity shine, they're going to be your unofficial, you know, product managers. They're going to come up with these ideas of, Hey, there's this amazing thing that we want to do. Did you find that surprising? Like that behavior that happens?
Amir:
Yeah,
Ben Wilson:
Like when you were going through it.
Amir:
especially in contrast to other types of industries or roles that I've been in. As I said, I guess earlier, it's like this level of excitement of people wanting you to build them stuff that may seemingly look like it's replacing parts of their job. So I've always been worried that I've been in industries where I'm building tooling for people and they're not necessarily excited about those. those innovations. So I guess like the first reaction of many people is like, hey, like, where is this going? Like, where is this, what's the logical extension of you building all this stuff? And I'm not really sensing that at all. And I think it's comes out of people really understanding that it's the more you free them up to do the creative stuff, they can just do more and more. And there's always gonna be more. I guess like what you said with like, are we gonna be done? I think it's just you're never going to be done. It like a business area may not make a lot of sense to invest in or as much in at some point. But there's always going to be things that we need to build and improve. And I guess that's the, that's the really exciting part. And it's really, I guess, fortunate that in my position, like the people I'm working with kind of have that mindset and you can work closely, closely together to kind of drive things forward. And as you said, they become your product managers. They like. They tell you, hey, like, here's a feature. Like, how about we add this? And it's honestly, it's kind of the other end of the spectrum. Sometimes I get overwhelmed or the team gets overwhelmed. Like there's, I don't know, in our like small team, there's like six of us. Like we can't possibly be building this, all these things this quarter, but that's a good problem to have at least in my estimation of you want people to come to you. You want people to really use your stuff. And that's much better than the, I guess, the opposite problem.
Ben Wilson:
There's also an aspect of, I think, team culture. And from everybody that I've ever talked to that either currently works or has worked at the company that you now work for, there's something very special about how they construct teams, who they hire, and how people sort of all collaborate together to build cool stuff. And even if you have the finest, you know, individual contributors in the world, you could hire the best. technical software engineer out there. And if nobody likes each other and nobody believes in each other or wants to see each other succeed. Even if you have that mountain of backlog of work that now becomes a burden and you don't actually, you just kind of feel overwhelmed, you know, like this sucks. I'm going to find a new job, but everybody I've talked to is just like, no, this is awesome. Like we have all this work. I love the people that I work with. We build awesome stuff together. So what do you think sets apart the culture at Netflix within not just at the company as a whole, but within an engineering team? What is it that makes it different than other places that you worked before? And do you think it's something that is in the hands and in the mind of each individual contributor that's part of that team?
Amir:
Yeah, that's a really good question. I guess it is self-selecting to an extent. So I haven't worked at another big tech company, but I've been at a variety of different startups. And my experience has been that generally, I feel even though I'm at a relatively big company now, I don't feel like I'm at a big company. I feel like that I'm still at a startup or I'm just basically trying to figure things out, come up with ideas, see what is actually gonna stick. I do feel like, and this is very common for a lot of other ICs at Netflix, I do play the product manager role quite a bit. And that's that ownership of a space and really caring about where all of this is going. That is an interesting thing. And that's a little bit counterintuitive because if every IC is in that position, and how do you coordinate and work together? That's always was my question before joining that was, hey, what do you mean? Like there's no process. What do you mean? Like people are going to be able to have like freedom responsibility as I guess the, the culture saying goes. And, but it's, it kinds of works. I can't really explain how, like there is the game of like, yes, you can come up with ideas, then you have to, because it's so, I guess, not centrally coordinated. You need to convince other people. It's very like bottom up and organic. to figure out what the right team is. You can be very nimble and convince a couple of people to work with you on a project. And if you figure out that dynamic, it's a really satisfying place to be in. And I guess maybe it's just a really good fit for my personality of having all that freedom and being able to drive, push things forward versus I've been in environments where it's a little bit more top-down and you're told what to do, which I guess it's also... good in a sense, so it just frees you up to not have to worry about everything. You worry more about like the technical stuff. I sometimes miss that, that like there's all these other things that I need to do. Uh, that is not necessarily maybe, uh, just machine learning. And I sometimes worry about that, that it kind of maybe is chipping away at my like skills and like, am I really learning? Like if I'm really trying to be the best machine learning engineer, uh, is this like the best role for me, but it's at the same time, like I'm getting a lot of really interesting, I guess, business level thinking stuff that I might have not necessarily done otherwise.
Michael Berk:
So that's sort of what you mean by it's still having a startup culture. You wear many hats, there's a lot of freedom and people tend to delegate to lower levels.
Amir:
Yeah, that's exactly how it is. And there is this interesting dynamic of a lot of people have a lot of ideas. A lot of people at the IC level, it's just generally the thinking is that ideas are a little bit more bottom up. So the ICs tend to come up with the ideas that the ICs tend to build the prototypes and kind of try to convince people. And it kind of grows out of that. Now there are obviously like longer term projects that a little bit more, I guess, centrally, or like a little bit more top-down, but by and large, and especially in our space, which is kind of a newer space, there is still a lot of those opportunities where you can like kind of start something from scratch and I guess rally people around you to push things forward.
Ben Wilson:
Yeah, it's actually not too dissimilar to how engineering works at Databricks too. I don't think we have quite as much of the bottom-up focus at all times because our company is run by engineers. Our entire executive staff are world-class software developers. So they have fantastic ideas. They come up with new ideas. like large scale vision. Um, but once they come up with the concept, they then know what it's like to be an IC or, you know, a startup type engineer, and they just sort of kick it over the wall and not really to the wall, but they, they, they task somebody a TL and they're like, Hey, find somebody who you think would be a good fit. To design this, like build a prototype, figure it out, do all the design docs, pitch it to people. get buy-in, get approvals, and then break it, and then fix it, and delegate work to junior people. Even when those ideas come from on high, they try to still have that culture of, hey, I see they own this, really own it. Sometimes you are working with product if it's like, hey, there's 17 engineering teams that need to do stuff to make this happen. That's what product's for. But if it's just something that a single team owns, yeah, it's try to, you know, the idea can come from anywhere, but then people work together. And something that I found. that really sets apart companies that are like, you know, Netflix and Databricks and the engineering space is that that community that feel that you get it, that it doesn't matter what you're doing. It doesn't matter if you're, if for two straight sprints, all you're grokking out is a bunch of bash scripts because that's what the project needs. You still feel fulfilled at the end of the day. I mean, maybe you're going a little cross-eyed from just, you know, writing bash. Uh, but it doesn't matter what tech you're working on. It doesn't matter what problem, so long as you have your peers that are there. They're like, man, this person's super smart. I'm really glad that I I'm getting feedback from them and, and they want feedback from me. And if I notice that they're going, going somewhere that might go off the rails, I can give them advice or giving them feedback on their implementation and they're going to do the same for me. It's like what you said about that self-selecting. It's like the hiring process enforces a culture of that activity where people just love working there. Were you prepared for that before you came into Netflix five years ago? Did you think it was going to be like that?
Amir:
I was honestly a little bit worried, especially the perception of the company, especially at that time was, I didn't know exactly what to expect, but reading, I guess there's a bunch of stuff been written about the Netflix culture, a lot of it from the outside, at least I perceived to be negative. So I was worried that what kind of environment am I walking into, especially like a bigger tech company. I... I still love startups. I still love, at the time, I really loved to be on a more nimble team and do stuff and be able to actually have some impact. So I had all those ideas in my head and I didn't know how to reconcile that with what I was hearing or perceiving about the Netflix culture. So I was worried that I'm going to go into a big company and I'm gonna be given a tiny project in relative terms. and it's not going to matter at all. And if anything, it's been the opposite. The way I describe it to people is that I'm still at a startup. It's just all the anxiety of being at a real startup has gone away. So all the funding issues, all the worries about, are we going to survive the next month? That stuff has gone away, but it still feels like I'm operating at a startup.
Ben Wilson:
And that's interesting that not every single one of the other companies that that share that, that traditional acronym that your company's in, they don't all behave the same way. Um, but I think there needs to be a new acronym that's created. I mean, I'm sure people have, but that culture, I've heard that now from so many people at Netflix. Uh, and it's great. And I think it's. It's 100%. When you have that startup mentality that I think it's hard for people that have worked in massive companies, like old school blue chip companies as a data scientist, okay, we hired a team of 30 because we make billions of dollars a year, we can hire these people. They're going to innovate. And they find that working in that environment, it requires a year and a half of security review and a approval before they can even get a development environment stood up. So they get nothing done. Their best people leave the people that are left behind or kind of demoralized. They're not able to grow because they're not in a wholesome environment that fosters creativity among engineers. But there are places where you have those massive companies that they say, okay, you guys are a skunk works group. you know, behave like a startup will even give you your own budget and you have full autonomy. And how does it work when you have, or how is it at Netflix? When you have 30 engineering groups that all do the same thing. They're basically like their own little startups. Like how do you get things done?
Amir:
Yeah, that's yeah, there is, I guess the, we talked about the positive stuff. Uh, so definitely great. Like you, you get the, you get to push things forward. You get to come up with ideas off that because of, I guess, the lack of, or block or loose coordination between teams. You can get into that situation where, Hey, like this other team is doing exactly what we're doing. And we. didn't really talk to them or we didn't even maybe know they existed until like two days ago. So there is that problem that you have to be very mindful of. In my experience, you have to kind of figure out like what generally what people are working on, what are they doing because of the decentralized nature of the company and the culture. That's a real thing. Now there's like mechanisms in place like forums and like the We're very heavy on writing memos. I like the memos get circulated, all of that. And that's kind of avoid some of this problem, but you still have the kind of the fundamental problem of, am I like working on something that maybe someone else has done? And there's like a, I guess, a graveyard of projects that you can look through. And maybe someone did exactly that two years ago that you're just not aware of. The code is sitting somewhere. And- It's interesting because initially when I started, I think the first three months, I was just trying to understand what people have done in my space, in my project at the time I was working with streaming security folks, very different world. And at the time I was trying to understand, like, hey, this person has done this part of the project, this other person has done this, they maybe are not working on it anymore. So I was spending three months. trying to really understand, put together a picture of what's been done before and what do I need to do? That's really an improvement on top of that. And at some point my manager just said, just pretend that nothing exists. And like, let's just move forward, deliver something. And that's a real thing. Now I think over the years I've gotten a little better at like understanding things when people are working on stuff, like trying to team up earlier, like, I guess, like, we really try to encourage people to document when projects are like postmortems and that kind of thing. Because it's really easy, I guess, the culture of the company is that you work on something, it doesn't work, like just move on. And that moment where you can capture those learnings and put them somewhere in a way that's consumable for, I guess, for, for a future, for future people, that is like very easy not to do that kind of thing. And there is some, I guess, inefficiencies that come as a part of this. So I would say like, maybe not super negative, but there is definitely, if you're trying to figure out ways that you're really delivering additional value, it becomes a little bit difficult to really navigate all the things that have been done or other teams are doing.
Michael Berk:
can imagine another con is this culture requires sort of diversity of thinking and the ability to learn new things and so you sort of select out super specialized people that only can do this implementation. Is that has that been your experience or has everybody been super well versed and deep as well? Like is there a difference between generalism and specialism?
Amir:
Yeah, I think that's a good question. Yeah, my experience has been it's a good mix and people or the culture is set in such a way that it's more on the generalist side of things than the specialist side of things. One example I'll give out is, I guess we do hire research scientists. Maybe I haven't worked at Google or Facebook, but my understanding is there are people who like really purely focus on research. They publish papers. They don't maybe need to worry too much about delivering product impact. We don't really have that kind of role at Netflix. So it's just there is all this gravitational force to get people to ultimately, whatever they're doing, maybe the best research in the world, start thinking about how that could translate to product impact. People will have to flex into like actually building stuff into engineering. So I'm seeing that more and more. I guess that's one example of like someone maybe with a very traditional like academic background of like doing really good research. Now starting to have to figure out, okay, how do I like set pipelines together? How do I actually put this stuff into production? They may have zero interest in doing that when they started, but like over time I'm seeing like people are kind of pushed into that direction. I guess another example is just owning a space and being a, I guess, kind of a product manager for what you're doing. That also pushes people into thinking a little bit more about what am I doing? How is it impacting this other team? Like, do we really need to do this? What's the ROI? All those questions that maybe, again, like I'll pick on the research scientists, maybe traditionally, like someone from the academic world doesn't have to think about, like what is the business impact of this thing? You see people. I guess a year or two into the job, starting to pick those up a lot and like really become at least in my estimation, like more generalist, at least when it comes to like being a product manager and also being, I guess, an engineer in the case of a research scientist.
Ben Wilson:
So would you say that the community culture within a team breeds that behavior pattern of conformity to generalism? Or is it explicitly told to people, like, hey, we're going to put you on a training path. You're going to learn, you know, language A, B, and C, and we're going to make sure that we're pairing you for feature development work with this, you know, senior engineer. Or is it more just that person gets in there, they're excited to be a part of the team. They just look around like, man, everybody's so much smarter than I am. But they don't have that hindsight visibility of seeing that everybody on that team went through that process that they're going through right at that moment of panic and like, am I the dumbest person here? Why does everybody know so much more than I do? So do you think it's organic or is it more, or I should say, what do you think is more effective? The organic? situation like that or the explicit structured form of like making somebody get up to speed to a certain set of skills.
Amir:
Yeah, in my experience, I'm sure that there are people who've had a different experience, but in my experience, it's not explicit at all. But I guess incentives drive everything. So if you come in and see that all your, I guess, coworkers and teammates are mentioned in announcements, internal announcements about tools or productized work or whatnot. I would imagine that starts to get you thinking about what that means. If all the praise and all the accolades are coming as a part of that, if you're mentioned in big memos coming from leadership, that I think more explains of what people perceive to be incentives of working at an environment and a culture like Netflix. And I guess the other part, again... kind of focusing on the research part, there are some teams that are a little bit different, but by and large, our research teams are not like really fundamental research teams, especially if you compare like to a Google or a Facebook where there are like very big teams that do like fundamental research and can be very much detached from, this is me understanding things from a far away again. haven't worked at those companies. But since we don't really have that kind of infrastructure for research, there is like, I guess, what I've observed is that there is a lot of push for people to figure out how to actually make impact with whatever they're doing. Even if it's like really, I guess, long-term research work, there needs to be some aspect of how is this going to help Netflix in the long run.
Michael Berk:
Got it. So it sounds like you guys focus on applied methods and it's not just for fun.
Amir:
Yeah, for sure. Yeah, for sure. And again, I should say there are teams that are doing more fundamental research. There are teams that are what I'm describing is mostly what I'm seeing around me and I guess in the content understanding world. But by and large, I think that is true, yes.
Michael Berk:
got it and sort of piggybacking off of Ben's earlier question about whether we have to have sort of a track of learning or people just learn automatically via culture. Seems like you take learning pretty seriously. So could you sort of explain your strategy for learning and why it works for you?
Amir:
Yeah, I think, well, as everyone knows, like the pace of, I guess, the rapid pace of progress and the machine learning world, it's just getting crazy, especially in the past six months. I feel like there's no way I can catch up. And I've kind of had this feeling early in my career that I need to have a very systematic, long-term approach to learning. mostly came out of understanding that a lot of this, I was talking about like mostly my focus which is machine learning, a lot of this stuff is like very math based. You need to take your time to really work through it, understand it, and it's not, it's great to like read blog posts, it's great to like try stuff and hack, but there is I guess a very different thing about learning math and being systematic about that and learning it over a long period of time. So my strategy there has been to, I guess, again, a portfolio approach of all the block reading and like hacking and all of that is great that you learn a lot doing that kind of thing. There's, there was something that was missing in that process. And that was, how do I systematically learn? Like, let's say, I don't know, discrete math or get better at it. Cause there is like, I kept hitting walls with. trying to read papers, trying to implement stuff. Like, what is this thing? Why is everyone using this linear algebra thing? How is this using, improving things? There's just fundamentally things that I really realize that I don't understand that's really slowing me down in terms of how quickly I can hack things together. So that's been my approach of, I guess, I keep using the word portfolio, but doing a little bit of the, I guess, the short term. and like the more hacky stuff and more long-term planning out what I want to learn for the next few years. And the best way that I found to do that is to actually take courses that are, that the bar is really high. So I'm taking, I guess, a bunch of courses at Stanford. And I found those really to strike a good balance between things that are very much newer and state of the art. And also things that are, I guess, more foundational. That again, the mathy stuff, what is like comics optimization? What is like systems? Like I'm learning all these little things that I think have been slowing me down in my career. I think I could like there are times where I say, OK, I really don't understand operating systems. And like I keep hitting walls. when I'm trying to parallelize work or there's like this little thing that I don't understand. And then I go like take a like serious, uh, operating systems course or, uh, like a parallelization course. And then, well, it's three or six months of pain. And then I'm like,
Ben Wilson:
Thanks for
Amir:
Oh,
Ben Wilson:
watching!
Amir:
like this little thing now clicked for me. So now I can, I guess move forward. And that is, I haven't really found another more, quote unquote, efficient way of doing that other than to enduring the, I guess, the pain of doing an actual course. uh, for credit for like turning in all the assignments, doing everything, like the spending 40 hours a week doing some operating systems, I guess, assignment, which is not fun, I guess. Well, it depends on your, uh, depends on your feelings on the subject. But I guess like long, long way of saying that I think that part, uh, I've really found to be useful. It's, it's a little bit frustrating cause it doesn't, uh, As I said, I don't know how to do that quickly. And I don't know if there's a way to have those learnings very quickly. And I'm like always thinking, okay, I'm going to be a little bit better in five years. Maybe like that's the kind of the time horizon that I'm, uh, I'm setting for myself. I don't have expectations that I'm going to be a much better systems engineer. And like two years, but maybe in five years, if I take enough of these courses and enough of things click for me, maybe I get a little bit better.
Ben Wilson:
So building that sort of ammunition chest of knowledge and skills. What's the end goal out of curiosity? So coming from data science and now ML engineer and taking classes and understanding how fundamental computer science theory and applications work. And you're at Netflix in an ML engineering role. I think listeners would be very curious. Well, somebody like you who's in that, that position, who still sees I see the horizon out there of where I want to go five, 10, 10 years. What would your advice be to people who feel like that's unattainable for them and why it's so important to continue to look for that horizon?
Amir:
Yeah, I can, I don't know if I have a good advice. I can just say that at least for me, the end goal is really, I guess, a little bit modest. I'm not trying to, I'm not thinking I'm gonna, like in three years, I'm gonna start a company and do X, Y, and Z, and I'm like building towards that. It's, I'm really enjoying, I guess, really getting better at my craft. That's, and the, the real, I guess, satisfaction of being able to build cooler things that are more useful, more interesting. And like, I kind of think about it as like architecture, like I'm maybe building houses now, maybe at some point I can build like skyscrapers, like bigger and better and more interesting stuff. So that's really, I guess, where I am right now with my career. And my thinking, there was a time where I was like thinking very I guess deliberately about I'm going to do these five things and I've become going to become a VP at that point. And then I'll start a company and then I'll do this. I, right now that's, that's not where I am, but it's, it's just, I've learned that that's a very, uh, just that internal satisfaction is enough. And that's, that's a good place to be in because all the external stuff you can't really control. And, uh, yeah, I'm, I'm happy to keep that. I'm sure I may do stuff. but that's like not really the main driver.
Ben Wilson:
It's just really interesting. I try to ask that question to everybody that I meet who's like kind of at the cutting edge of sort of career growth and, you know, performance potential. Everybody says the same thing. I have a very similar outlook as like, it's just, it's great to be able to sit back once a year for half an hour and think about what you did the previous day at work, how long that would have taken you two years ago. Or could you have even done that two years ago? And stuff that seems almost trivial to you now, it would have been incredibly daunting to try to tackle something like that. And I think that's the most sort of validating that you're in the right role, you're in the right profession at the right company for this time in your life, and you're working with the right people is that you can do that and realize like, yeah. I am growing, I'm getting better and I can't wait for what's next. And that's a really good analogy, by the way, the little tiny log cabin and then a skyscraper, and then eventually you're going to be building a city.
Amir:
Yep.
Ben Wilson:
It's just fascinating to me that so many different people from so many different companies see that, that question in the exact same way. Not everybody answers it in the exact same way. There are people who are quite
Amir:
Interesting.
Ben Wilson:
obviously miserable and they have like a really bad answer for that. But everybody who seems like they really love their job and that they're going to be continue to be super successful answers it in the same way.
Amir:
interesting. I thought I was unique, apparently not.
Michael Berk:
No, so sorry.
Ben Wilson:
I mean, you
Amir:
Hahaha
Ben Wilson:
are in the, it's the extreme minority of people that I've asked it to, but all the people that are all, you know, like this person's going places. They all answer that in the same way.
Michael Berk:
Alright, well out of curiosity then, what's the most recent thing that both of you guys are super proud of that you couldn't have done two years ago?
Amir:
Ooh, interesting question. Yeah, I think it looks very incremental as you're in the middle of it. I think it's just, I like Ben's framing of it. I do do that at the end of the year. I look back, usually I look back and kind of take stock of the year and like, what did I do? What kinds of things? I haven't done the exercise of... Sometimes I have these moments where I'm writing a piece of code. Like those are, I guess that that's the example that's coming to mind. I'm like, oh, this is like, I remember writing this or maybe if I'm going back to old code, I'm like, oh, I wrote this three years ago and like, look how dumb this is like that.
Ben Wilson:
And it got
Amir:
And
Ben Wilson:
merged into production.
Amir:
so I think those are like, I have those little moments from like, oh, this is now I have a much, I guess. more refined understanding of what this thing is, what this thing is supposed to do. I don't really think of it as like the, I guess my own example of like houses to skyscrapers. That kind of is, I guess, coming along for the ride. If I like look at the kinds of projects that I'm doing, they're definitely getting more and more interesting, like more platform level stuff. I'm able to like think a little bit more systematically. I can't really give you. No specific example that comes to mind that I guess I can share. But I think that the more interesting thing is those moments, as I said, where I look at something I wrote, could be code, it could be a memo, it could be just that I remembered how I thought about something like two years ago. And especially if I can trace it back to, yeah, maybe it took that course, or maybe I really took the time to understand something. really changed my perception about the problem. Those are the really satisfying ones to me. Ben, what do you think?
Ben Wilson:
I mean, first, since I work mostly on open source software, I can talk about all the projects I work on. But one of the things earlier this week was I needed to build a prototype, like a demo to prove out some ideas that we had about a reverse proxy service. And three years ago, if I had... I thought about it after I got it done. But... If this was approached to me three years ago, it would have been like, uh, I'm going to need to read like, like, what is, what is that? What is a reverse proxy? Uh, how would you do that? How would you set that up to do secure authentication without having credentials provided? So I was sort of thinking after I had built the demo that this would have taken me probably two weeks to just be. You know, Googling stuff and reading blogs and looking through, you know, the source code documentation and stuff of different packages that are out there. But earlier in the week, you know, I got the entire end to end demo done in about three hours because I knew like, okay, I want this to be ASGI compliant. I need something to execute the server. I need this to be, you know, a very simple CLI to be written around that. I hate ARX parsing, so I'm using click at Python. And here's the minimum requirements that I need for this. And I just, I want a simple API interface. So I'm going to create these routes and then create dynamic route parsing. I'm going to provide a YAML config and it'll just parse it and just building all of that really quickly. And then, you know, writing about 20 tests for it, just to prove out that, Hey, this actually works. And it's not. You know, just a process running somewhere that can't be terminated and it's not actually serving traffic. Uh, yeah, that would have taken me weeks. So I kind of, I had that moment, uh, two days ago and I was like, man, I really have gotten better at, at doing this and knowing like where to go online to find what I need, if I need to get that question answered, like, Hey, does this software package support these four things? search within their docs. Yeah, yeah, it does. Okay. And not having to sit there and copy example code from, from somewhere, just making the understanding that most API is in a standard language, if they're used by a lot of people, all kind of feel similar because that's how you get a lot of supporters for your open source packages, making it not suck. So you don't really have to spend too much time like really digging through the docs. I do that in implementation. But for like an MVP, like a prototype, yeah, it's, it's crazy how fast it becomes over time. And that surprised me a little bit.
Amir:
Yeah, I think that the interesting part of about what you said, or at least the part I that really resonated with me that I was thinking about is, especially in software engineering, like over time, you kind of know a lot of the libraries, you kind of have your own way of setting things up that I guess, at least I take for granted, like you have all this accumulated knowledge of like, I like This is my starting point. If I'm starting a new project, I know what exactly the repo looks like. I know how to structure things. I know what are all these different tools. I guess ArcParts versus Click really resonated with me. That kind of thing, that little thing. And there's thousands of those little things that you accumulate into a template for how you're going to do the next thing. And that's really, I guess, That's something I've seen, especially with a lot of ML people, there is a tendency to start from scratch every time. And that library building system thinking about what are some low-level components that you can build for yourself and start building stuff on top of that, that I don't see a lot. I'm trying to really do that for myself. And it's massive productivity boost for me. to just have this, I guess, really slowly over time, refactor things that I'm doing a couple of times, put it in a library, use that library in the next project, keep pushing things down the stack. At some point, like a really interesting pattern that I'm kind of learning because we're really an applied ML team and we work with machine learning infrastructure teams. So what I'm seeing is like this kind of. constant refactoring of projects. At some point I can just say, hey, like I've done this thing across five projects, maybe a platform team, can you own this? Or like think about, start thinking about owning this? And that's a really good, I guess, solid way of proposing projects for the infrastructure teams in our case, like the machine learning platform team to own as they can see it. Like it's not a, I guess an abstract idea. They can see it that. it's been used in five different projects, may really make sense for them to provide that abstraction and take it off of our hands. So that incremental, I guess, pushing of things down the stack and ultimately having the platform teams own things, that's a really useful pattern that I've learned and is really coming, really helping my productivity.
Ben Wilson:
Yeah, it's a super valuable bit of advice for anybody listening in who's wondering about that. I think once you hit a certain number of years of experience as somebody in the ML space, you're going to get to that point where you're like, Hey, how many times do I have to do this particular tabular data? You know, I need to make sure that I don't have autoregressive components across these different features. Like are they correlated or not? the first three or four times you build that, you're going to write it from scratch. Are you going to use like, oh, well, pandas has this thing and then I can do this in NumPy. And I remember this equation or I could look it up from this paper and I'm going to implement it myself. And then after you've done that for four or five projects, you're like, man, this sucks. I just need to create like a function that does this. And then, you know, starting to piecemeal that together and you're like, Hey, I've got a module here that's, you know, feature pre-processing or that uses all these different open source packages. And then you graduate eventually to that point where you're like, hey, I've actually got to, you know, maybe it's something I put on PyPy and I can actually install and make a dependency in our individual use cases. But I really liked the way that you talked about how you did that or how you work on something like that. It's something that I don't hear a lot of, I don't hear enough people saying. which is taking software engineering fundamentals about how, how like a pure software engineer would do that, which is, you know, we'll write something. We'll rewrite it three or four times before we're saying, okay, this is library code. Like we've got, you know, 95% test coverage on this. We know all the ways that this can break, but the way that you learn all the ways that that can break is you broke it or you created an, like an incident or something. before it gets to the point where you're like, this is, yeah, this is stable. And I think that's incredibly important, even more so in ML than it is in software engineering in some respects, because the complexity is so much higher. And if it's not written correctly and it's not maintainable and testable, uh, when you try to hand that off to another team to own it, like an infrastructure team, that might be the first thing they asked, like, how does this work? Uh, where are the tests for this? Wait a minute, you don't have any tests? Oh, you just look at the data coming out of it. Yeah, we can't support that. But if it's properly built out and it's this sort of module, a lot of times they'll look at that and be like, all right, we got it. Thanks for doing this, by the way. You saved us, you know, three months of work.
Amir:
Yeah, it's easy to put machine learning in the bucket of, hey, it's like this stuff is stochastic and it returns outputs and I don't know how to test it. And it's like, doesn't really quite fit into like a unit testing framework. And it's a little, I guess it's like an easy way to convince people that's the case and we don't need any tests. But that's, yeah, I'm right there with you. It's just... It is really hard because then the infrastructure team doesn't know how to support it. Especially if you're giving it to someone and they're like, OK, whatever. You say this is ML. I don't need to test it. I'll put it there. The first time it breaks and you're out of the office. I don't know what this thing does. I have no way of knowing if this is even the right range of the output that it should be providing.
Ben Wilson:
Yeah. And even like indeterminate or non-deterministic tasks outputs from particular implementations, you could still write tests for them. You know, that, and the only people that are going to write that valid, you know, expected data range that's going to come out of that are the data scientists. You know, you built this model, you're expecting like, Hey, if we get a vector that is on the output of this deep learning model and the index count is, is not what we're expecting. Okay. we're creating an encoder and we're expecting 304, you know, indices coming out of this vector just write a test that says, are they floats? Like everything in there should be a float. There should be 304 of them or whatever. And if it's not, then throw and say, Hey, somebody changed some code somewhere that now anything that's consuming this is going to absolutely detonate. So that could be, you know, approaches like that. You're not determinism, which is what a lot of unit tests are designed to do. But we write on the MLflow team, like we're constantly writing integration tests and unit tests for ML implementations. And we're not expecting like that packages are going to respect seed values. You know, it's almost impossible to do that. Like what happens when, you know, the compiled C++ libraries that are running it in your, your CI run suite. are a slightly different version. Well, your floating point rounding is now just slightly off. So do you do approximate closeness or, and then that gets kind of messy. Sometimes we do that, but sometimes it's straight up just, is this the correct data structure and are we not creating like insanity on the outputs?
Michael Berk:
Yeah, sounds about right to me. Cool, so we're about at time, so I will wrap. We talked about a lot of really, really cool things and I have a fat set of notes that I will try to condense right now. So a couple of miscellaneous things, data lifecycle management can be really tough. At Netflix, they probably have a good implementation for it, but thinking about how data evolve, it's just a challenge. And then also when you're working with people, specifically building tools for stakeholders, people like it when you automate the boring stuff. And if they have sort of a growth mindset and are ready to pivot, they will realize that it won't take their job. So some important pieces of culture. We use Netflix as a case study, but you can apply this to virtually any company. It's important to have ICs that can think critically and care about the product. They really care about the value that it provides. Freedom and delegation and sort of wearing a lot of hats, all these startup concepts typically work well for fast paced environments. And then there's this sort of feeling of community that is typically seen in a lot of the top engineering firms. Um, but there are some negatives as with any strategy, decentralization means you need to communicate a lot and there can be duplication of work and sometimes you aren't as efficient as possible. It's sort of like democracy versus authoritarian regimes. And then, uh, Amir is big on learning. So his value or his strategy, excuse me, and sort of the value prop behind learning is you need to build stuff, right? You need to be successful. You need to know things. And how you can do that, what works for him is being really systematic, taking classes, adding structure to his learning. And the way that he chooses these classic is sort of letting his implementations drive what he learns and what he works on. But also curiosity is very important. Then finally, over time, you sort of build up this tech stack and set of tooling that can really improve your efficiency and help you do something out of the gate very effectively. So Mir, if people want to learn more about you and learn more about you and your projects, where should they go?
Amir:
Well, first of all, I want to say that summary was amazing. I did that on the fly. That's great. I wish I was that coherent. Yeah, people want to find me. I'm on LinkedIn. I guess I'm no longer active on Twitter. So LinkedIn works. You can find me on LinkedIn. And also a plug for the Netflix block series. It's been a. few blog posts recently that we've written about the content understanding in the media machine learning space at Netflix. If you're interested in that, definitely take a read. I think there's a series and now we so far we've had like seven posts covering some of the areas that I kind of touched on. Yeah, that would be where people can find me and my plug.
Michael Berk:
Cool. And also on Medium, check out Consisting Hashing from First Principles, part one. It's
Amir:
Yes.
Michael Berk:
a joy. It's actually really, really cool. So if you guys have time, check it out.
Amir:
Yes, that is also my personal blog.
Michael Berk:
Right. Cool, well until next time, it's been Michael Burke and my co-host.
Ben Wilson:
Wilson.
Michael Berk:
And have a good day, everyone.
Ben Wilson:
Take it easy.
Amir:
Thank you.
ML at Netflix and How to Learn Deeply - ML 120
0:00
Playback Speed: