Model Serving at Databricks - ML 109

Today we deep dive into the mind of two brilliant Databricks software engineers. Their primary project was building the model serving feature, but expect to learn about ML side projects, traits of successful software engineers, and much more!

Hosted by:

Ben Wilson •

Michael Berk

Special Guests:

Ankit Mathur •

Sue Ann Hong

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

On YouTube

Model Serving at Databricks - ML 109

Social

Transcript

Michael_Berk:

Hello everyone, welcome back to another episode of Adventures in Machine Learning. I'm Michael Burke and I'm a resident solutions architect at Databricks and I'm joined by my co-host.

Ben_Wilson:

I wrote code for open source projects at Data

Michael_Berk:

Nice. And today we have two guests. They both work at Databricks and I'm absolutely ecstatic to talk to them. They have really interesting backgrounds. And so I'll do a quick intro and then it'd be great if they can elaborate as well. But Ankit Mather studied at Berkeley and Stanford and it's currently do, er, sorry, I totally messed that up. Also Ankit, is that how you pronounce your name?

Ankit_Mathur:

it but yeah pretty close

Michael_Berk:

Ankit, okay. Cool. All right. Ankit Mother studied at Berkeley in Stanford and during his summers he interned at both Facebook and Apple and currently he's a software engineer at Databricks where he builds ML infrastructure. And then Suan Hong studied both at Caltech and CMU and after getting her PhD she worked at Facebook and now is a Databricks software engineer as well. So Ankit and Suan, what do you actually do at Databricks?

Ben_Wilson:

Magic, lots of magic. I've seen their code.

Sue Ann_Hong:

We managed kubor.com for you.

Ankit_Mathur:

True. I mean, to sum it up, I think we basically just build different. We've been building this model serving feature for a long time. And so there's a lot of different things related to that that we work on. Su-An has been working a lot on the API side. So it's been pretty fun overall the last couple years.

Ben_Wilson:

So I'm really interested

Sue Ann_Hong:

Yep.

Ben_Wilson:

to hear the story from both of you about the genesis of this product over the years. So going back in time to early days of MLflow, we had this thing called Serving V1, which was a product that up until relatively recently was still being pretty actively used. And now I'm sort of telling people, like, hey, could you stop using this? But how did the scale and complexity of what that product was back then transformed from that to what we're going to talk about today? And what was that process like to organically watch something get more and more complex and bigger?

Sue Ann_Hong:

Yeah, you know, actually, let me answer the first part first. Could you ask about the history of this product, right? Model serving. It actually goes back way back. I think the first instance I know is in 2015 at the Spark Summit, we had a demo on model serving. I think Ali gave it. You know, it was impressive, but you know, we didn't actually build it, right? Fuck that. And over the years, we've definitely talked about it a lot because it's difficult for people to do on their own. And then this V1, which is built on top of Spark clusters, it's just a single node Spark cluster, really came about as, at that time, a proof of concept. Will people use this feature and how actively will they use it before we invest a lot of resources to make it scalable? engineering-wise. So that's how it started, but it had a lot of traction. We have many, many customers using it. So then I think maybe two or three years ago, we decided, you know, we really need to build a scalable version of this. And then we had to do a bunch of work, which I guess was the second part of your question. Maybe I'm kicking this, right? He built a lot of it. He's like our star engineer

Ankit_Mathur:

Thank you. Uh, well, okay. Um, I think that it was, it wasn't like a hackathon project or something initially.

Sue Ann_Hong:

V1, yeah.

Ben_Wilson:

Yeah.

Ankit_Mathur:

Yeah. So like, it was just like, Hey, we have this thing in the model registry. Could we just like serve that with one click? And I think that's where the one click part of it came about. And I honestly

Sue Ann_Hong:

Mm-hmm.

Ankit_Mathur:

think that's why people like started using it so much because it was so easy to use. And it just launched directly into a cluster. Right. And I think that was also the kind of driving It's almost a different engineering problem to build this like massively scalable version of model serving. There's a lot of basically not model serving related things that you have to do to even get there. Like we honestly have been doing a lot of not model serving or not ML specific things for many years because you have to build like scalable real time infrastructure. But I think like the ease of use aspect is like always been kind of the guiding product principle. We evaluated a bunch of different architectures. Should we go with a Kubernetes-based thing? Should we go with something else? I think ultimately we just decided to go with this Kubernetes-based approach of launching something. We had an idea in mind around other technologies within Kubernetes. We could use to scale it out. Then we immediately ran into the wall of infrastructure. So we spent a lot of time just building that. I don't know, what what's your what do you think was like the worst most like unfortunate thing that we came across when we started doing the design?

Sue Ann_Hong:

During the design, I don't know, but I think the team didn't expect to have to write so many configurations.

Ankit_Mathur:

Yeah. For me, it was like, when we got into the weeds of making a Kubernetes cluster, like there's so much subtlety to, and there's so many ways you can mess up. Like every day or every week, we kind of found a new way to mess up the clusters. And like sometimes people would just be like, yeah, this is like pretty bad. I think you just got to destroy this one and make a new

Sue Ann_Hong:

Yeah, I think another aspect is we spent a lot of time on security, right? Making sure everything is isolated, all the workloads are isolated, networking is secure in every way. So there's a lot of work that had to be done there that I think I didn't really think about in the beginning.

Ben_Wilson:

Hey,

Ankit_Mathur:

Yeah.

Ben_Wilson:

because the product didn't really do a gradual change from V1 to V2. It was

Sue Ann_Hong:

Right.

Ben_Wilson:

a quantum leap in going after what V1 taught everybody about people's usage patterns. It just as you said, Ankit, I remember being on the opposite side of it and working with customers at the time and people are like, hang on, an endpoint to serve this model. This is incredibly awesome, particularly it's like smaller teams that were very agile on the ML development side. So they could get a model out there quickly. But a team like that, where there's few data scientists, maybe one or two ML engineers, if they're like, hey, we need to service a hundred thousand requests a minute. And like, how do we do that? How do we build that infrastructure? They're gonna say, we can't. We have to either hire somebody to do it this project is going to take years to build. So with the one click and the fact that it was on Databricks, a lot of people assumed, hey, this will support, you know, infinite load. And then people quickly realized, oh, yeah, it's not really built for that level. But it's interesting to hear you say that's what its purpose was, was to sort of get people, you know, to gauge how it, how And then you take a step back and say, okay, we need to build this for real. And I'm most interested to hear about that concept, what both of you just said about. We approached it thinking that we were focusing on model serving, because you work in ML engineering at Databricks. And a lot of people have that perception about the group is like, oh, you guys work on ML all the time. It's like, no, it's like pure software engineering, like infrastructure stuff. But how big of a pie was that that you had to eat? Like, hey, we need to research how this stuff works and how long did that take?

Ankit_Mathur:

But we've got a really talented group. So I feel like that made it a lot easier. The everyone on the team is top to bottom. A lot of people have machine learning experience. My background was in machine learning, infrastructure, training, and serving models in very specialized ways. But when you come and work on real problems, sometimes you have to just be like, OK, well, I have to go learn about network policies so that I can build secure infrastructure. like the entire team that people just kind of like rolled up their sleeves and did all the work. And now I think we're finally getting to the point where we can build really cool ML specific stuff on top.

Michael_Berk:

What?

Ankit_Mathur:

Well, like, you know, for example, we're thinking a lot, I'm thinking a lot about like accelerators and how we can like, you know, run really large deep learning models very quickly. So that's pretty ML specific thing. The cool thing is it's built on like all of this work for, you know, scalable infrastructure. So you know you can get to like massive scale if you do the right things at that layer, but you have to start with the basic stuff.

Michael_Berk:

Yeah, and model serving automatically scales up or down to meet demand. So what were some considerations when developing that feature?

Sue Ann_Hong:

I mean, the skill up and down, right? There are many benefits of it, but one, I think probably on top of customer's mind is cost. So if you're not using it, you don't want to pay for it. So the more elastic it is, like more cost efficient for people. And also, it just gives us better capacity of just like resources in the cloud.

Ankit_Mathur:

Michael_Berk:

How do you know it's...

Ankit_Mathur:

mean,

Sue Ann_Hong:

Thank you.

Michael_Berk:

Sorry,

Ankit_Mathur:

Michael_Berk:

Ankit_Mathur:

think

Michael_Berk:

ahead

Ankit_Mathur:

it's

Michael_Berk:

and

Ankit_Mathur:

a really

Michael_Berk:

rank it.

Ankit_Mathur:

unique opportunity being a provider across customers, right? Like an individual customer who's evaluating, you know, running their own Kubernetes cluster and stuff like that, the cost considerations become really awful as soon as you're like, oh, well, how do I trade off availability and like quick auto scaling with paying for warm pools all the time? So it's like hard to do that unless you're serving a bunch of different people. kind of make teeny advantage of.

Ben_Wilson:

It's sort of environmental scalability with respect to industry as like a SaaS provider saying, we're going to take this human capital of the fantastic people on the team and Databricks Engineering in general saying, we're going to put all of our minds together to solve this for tens of thousands of customers, hundreds of thousands of people that need to do this. So we'll eat that. I've noticed talking to even particularly bigger companies that they don't specialize in computer software they specialize in banking or in selling goods to people or you know making sure that people's doctor's appointments are scheduled properly, you know, all these different industries. Their main focus is not on getting Kubernetes to run properly and scalable and support these use cases. Their use case is solving their problems. They do it with software and the engineering teams, but really they shouldn't have to worry about how do I do the most efficient, you know, cost optimization for serving in an elastic environment What do I do when a pod goes into panic mode? How do I safely restart that? Like what happens when the service goes down? How do I do code deployments? How do I validate that, you know, my CI checks for integration tests before I do a rollout deployment of a new version of something? How do I handle all of that? All that infrastructure, that's all human capital spend at a company. Where you're, I mean, it took years to build that here. Databricks properly to really build that system out. So a company that doesn't specialize in that, is that really worth their time? That's a real big question.

Ankit_Mathur:

It's like an extension of the story of the cloud in the last 10, 15 years, right? Like what did the cloud do? It came in like said, you don't need to build a data center to build a scalable data science, like data platform and things like that, right? To have a compute platform, you don't need to build that yourself. We're just saying the same thing going a little bit further and saying, like, you focus on business value that differentiates you from your competitors. And we focus on the problems that everyone's going to have.

Ben_Wilson:

That also brings a certain level of complexity that I don't think a lot of people appreciate who don't have to do this when they're writing software, is designing for the common use case. So if we were to do a thought experiment for both of you, where if you were to take one particular use case, like, hey, I need image classification at scale. So we work for a company that people are just taking pictures want to be able to auto tag those. But we have an installed user base of 800 million users. So if you're going to build a system that could support that scalability using some modern deep learning architecture, would the design considerations be easier to do that for that use case? Or if you were going to be building something that's generalized to support any use case?

Ankit_Mathur:

So Rana is the API guru, so I think she should do it.

Sue Ann_Hong:

Yeah, I mean, that's not just the A player, but... I think. support, for example, that use case well, it would be easier if you only had to do that. But we're more like support everything like pretty well, if that makes sense. Yeah. But I think, you know, there are use cases coming up that are just so popular that, you know, I guess I can talk about it. But kind of a funny story is when we went ungated public preview with model serving, just suddenly we started seeing much larger models being served. Because I think what happened was everyone saw this new menu item serving in Databricks, and then they clicked on it and they're like, oh, let me try serving a Hugging Face model, right? Like some transformer. a very different distribution than what we had seen going through a really long period of private preview, talking about specific use cases. But I think it just speaks to how popular these models are becoming, which is why, for example, Ankit is spending quite a bit of time making sure they're supported well. I don't know if that answered your question or not, but yeah.

Ben_Wilson:

Yeah,

Ankit_Mathur:

Yeah,

Ben_Wilson:

Ankit_Mathur:

it is

Ben_Wilson:

mean, maybe

Ankit_Mathur:

easier.

Ben_Wilson:

I used a poor example there because that is like a complex use case for like, hey, those models are huge and you have to make designs

Sue Ann_Hong:

Mm-hmm.

Ben_Wilson:

and considerations. But if that use case had been, hey, we just wanna support like SK Learn linear regression models, that's all of our use cases. You'd have design considerations there where you're like, well, these things are like kilobytes in size and

Sue Ann_Hong:

Mm-hmm.

Ben_Wilson:

we don't need to, you know, provision a bunch of memory have access to, you know, CPU is actually probably more than enough to support the workload there. But then, you know, if you have to support CPU and GPU and auto scaling, you know, memory allocation, that that's a general list sort of support framework that makes everything more complex, at least I think. At least the design

Sue Ann_Hong:

It's.

Ben_Wilson:

review process is much more complicated.

Sue Ann_Hong:

Yeah, for sure. For sure. Yeah. Because actually, when I was at Facebook, I worked on a like particular product. It was like an ads targeting product where we had our own ML platform, essentially, right? We had to build it because that was a long time ago. And like, even Facebook didn't have a generalized infrastructure for machine learning. Yeah. And we were like, we were serving like millions of models, but they were all tiny. database, and then you load them into memory on a bunch of model servers. That's very different from what we have to do. Yeah.

Michael_Berk:

Yeah, and it's interesting that you mentioned that Facebook hasn't completely solved this problem, or at least while you were there. What are some of the sort of competitors to Databricks model serving? Are we the best in your guys's opinion, or do we have weaknesses?

Ankit_Mathur:

Well, I mean, you know, one of the competitors, I would say like, there's a whole host of competitors. I think there's like two companies made every week that are focused on model deployment these days. So, you know, obviously the biggest competitors are like SageMaker, Azure, GCP, Vertex, right? Like, I would say all products have strengths and weaknesses. I would like to think ours is the best. I don't know, Seon, what do you think?

Sue Ann_Hong:

I think ours is the nicest to use.

Ben_Wilson:

I would agree.

Sue Ann_Hong:

Uh...

Ankit_Mathur:

I mean,

Sue Ann_Hong:

Yeah,

Ankit_Mathur:

it's

Sue Ann_Hong:

I guess.

Ankit_Mathur:

a popular space, right?

Sue Ann_Hong:

Yeah,

Ankit_Mathur:

There's

Sue Ann_Hong:

yeah.

Ankit_Mathur:

like all sorts of different stuff going on. Like machine learning is like really, really popular these days, which means, which is like exciting for us, but also means that there's a ton of like new competitors that are coming out every day.

Michael_Berk:

Okay, I was hoping for a more controversial answer, but we'll take it. Cool. Another question that I had for Suan specifically, sort of diving into your prior experience. This might be going back, but back in your college days, you worked on a couple of quote unquote, cool art projects that use machine learning.

Sue Ann_Hong:

Oh my god, what is this?

Michael_Berk:

Yeah, if it's on the internet, it's fair game. So,

Sue Ann_Hong:

Hmm.

Michael_Berk:

do you still think about those things? And I was curious just what they are.

Sue Ann_Hong:

Oh, that was a long time ago. I don't really think about them, I guess, but I think it would have been an interesting career path for me. Yeah, I did work on some art pieces that were based on machine learning. I think that the one that was most fun, maybe, was we had this exhibition at, So in grad school, and it was at the Children's Museum of Pittsburgh. And we built this installation that was called the Curator. So it was really just this like clear box. You put the your art piece in. So right, children come and they make little art drawings. And you put it in there and it decides based on computer vision, time ago. Whether it's good art, whatever that means, right? And if it's good, it goes into this like nice box where you can see the content in their large, clear box. If it's not, it gets shredded. Like

Ben_Wilson:

Wow.

Sue Ann_Hong:

comes up.

Ben_Wilson:

So you're just making kids cry the whole time.

Sue Ann_Hong:

I actually know,

Ankit_Mathur:

Ha.

Sue Ann_Hong:

but they loved seeing it being shredded. That's the thing.

Ben_Wilson:

Oh, yeah.

Sue Ann_Hong:

So it you imagine which was also interesting. But there's a little bit of like a cheesy conceptual art piece to it which is what is good art? It's like how do you even decide? Like in this case we decided the algorithm would be something that is like different But, yeah, conceptually it was just like, you know, like, who knows what Karrar is, it's kind of arbitrary. Yeah, so that was fun, yeah. Um, yeah.

Ben_Wilson:

So you're using stuff like OpenCV to do the matrix extraction of the image and then doing like locality distance measurements of how different

Sue Ann_Hong:

Yeah.

Ben_Wilson:

is this compared to save state. That's pretty cool.

Sue Ann_Hong:

Yeah,

Ben_Wilson:

That's clever.

Sue Ann_Hong:

that sums it up. I think if we were to do this now, it would have much more powerful tools to do stuff, but maybe it also would not be so cool anymore.

Ankit_Mathur:

That's

Ben_Wilson:

or now you

Ankit_Mathur:

such

Ben_Wilson:

can...

Ankit_Mathur:

a cool concept because there was actually an artistic piece to how you designed the thing.

Sue Ann_Hong:

Mm-hmm.

Ankit_Mathur:

And these days, I think there's a lot of cool art as well. Like Mid Journey is a popular model of people who use to create art. But I guess it's a different... Your thing sounds more like actual art than I would see in a museum. Like there's a deep concept behind it, which is pretty cool.

Sue Ann_Hong:

Yeah,

Ankit_Mathur:

Sue Ann_Hong:

I don't, deep or cheesy, I don't know, but

Ankit_Mathur:

mean,

Sue Ann_Hong:

there

Ankit_Mathur:

it's

Sue Ann_Hong:

is a concept

Ankit_Mathur:

pretty

Ben_Wilson:

Yeah,

Sue Ann_Hong:

behind it.

Ben_Wilson:

it's pretty cool.

Ankit_Mathur:

cool. I'm

Ben_Wilson:

Ankit_Mathur:

surprised

Ben_Wilson:

I would,

Ankit_Mathur:

that it's

Ben_Wilson:

I'd

Ankit_Mathur:

Ben_Wilson:

pay

Ankit_Mathur:

Primer.

Ben_Wilson:

to see that. Yeah, it's awesome.

Michael_Berk:

interesting.

Sue Ann_Hong:

But, but.

Michael_Berk:

So it seems like there are a bunch of different ways that you can learn about machine learning. And if you are good enough at it, you'll eventually work at Databricks as a software engineer. But I was wondering if you guys had sort of advice for any of our listeners on side projects or just things of interest that they can do outside of work to improve their machine learning skills. And ideally it's not like build Facebook with like an open tool. It would be like something more specific to ideally your guys' interests.

Ankit_Mathur:

Okay, I can go first on that. So I think the advice that you have from all people is that they should just work on something that they actually feel passionate about, right? So for me, like that for a long time was computer vision. And then the second piece I would say is that you should focus on solving like constrained problems that are hard, but can solve things.

Sue Ann_Hong:

Uh oh.

Ankit_Mathur:

But uh...

Michael_Berk:

Okay, I think you just dropped for like five seconds. Do you

Ankit_Mathur:

Oh,

Michael_Berk:

mind

Ankit_Mathur:

it's not.

Michael_Berk:

just starting from a constrained problem?

Ankit_Mathur:

Sure. Yeah. So I mean, I think people should be solving constrained problems. I mean, you can obviously like focus on solving things like virtual reality or something, right? If you want to, but that's pretty hard. So for me, like, I try out all the new like models that come out on a regular basis. That's fun for me. So like, for example, there's all this stuff around AI art these days, which is like, interesting to see how the models are operating. So I know some folks who work in that space And if you go and like you can pull an open source model, like run it yourself on a GP or something like that. It's actually pretty fun to do. And you learn a lot about what the tooling infrastructure looks like. You learn a lot about how the models like their weird idiosyncrasies and things like that. And it's just fun to to go like, say, hey, give me a bunch of faces of yourself and like you can maybe try generating something.

Ben_Wilson:

As you just see the new version that dropped, the fifth gen

Ankit_Mathur:

It is

Ben_Wilson:

Ankit_Mathur:

y'all.

Ben_Wilson:

it's insane. If you give it the correct prompts, it's really challenging provided that you don't have, you tell it like, hey, don't try to render human hands. But if you tell it not to do that, it's really hard to tell the difference between like an actual image of somebody.

Ankit_Mathur:

It's

Ben_Wilson:

The

Ankit_Mathur:

Ben_Wilson:

resolutionist.

Ankit_Mathur:

photorealistic. It's more photorealistic, right? But speaking to like the earlier point, like the art is probably a little bit more than photorealism. So that's another angle. But yeah, see like the hands thing, like you would probably wouldn't know that unless you'd actually like spent some good amount of time trying out these bottles and being like, wow, these are weird hands.

Ben_Wilson:

But

Ankit_Mathur:

Not all the fingers,

Ben_Wilson:

the

Ankit_Mathur:

weird

Ben_Wilson:

thing that

Ankit_Mathur:

things.

Ben_Wilson:

really blew me away was I had chat EPT 3.5 in one window, and then an auto gen, you know, image generator open in another window. And I was basically asking, like, because I don't know that much about art. I was like, hey, can you tell me some, some like top three random artists from the past five centuries? you know, it starts pumping out a list of names. I'm like, I've never heard of most of these people. I've heard of a couple of them. But then you go into the image generator and you could say, hey, could you just generate this, you know, something funny that would make me sort of giggle to myself in the style of this artist. And then look it up online like some of the other works. And it's crazy how well it sort of makes those, you know, has that sort of attention pattern association you know, what it's seen before and what's in its actual data set to be able to generate something that to a layperson you wouldn't be able to tell. If you, of course, the things that I was telling it to generate, people would be like, what the heck? Why is somebody holding a banana to their ear? I think that's really ridiculous. But if you were to say, hey, create this thing in this style, like, hey, it's a landscape artist. Well, I want, you know, south of France, I want a river and the background or something and it'll do that and then you look at the author's body of work you know like they've never they've never even been to France they've never painted that but you can't really tell the difference unless you're an expert

Ankit_Mathur:

Yeah, exactly.

Michael_Berk:

Have you guys seen art that's compelling that is generated by a machine?

Ankit_Mathur:

So there was actually an exhibit in San Francisco, in like Dogpatch or something that I went to, where people actually like very seriously tried to make art that was AI generated. And it was quite compelling. Like it was quite cool. Like because what people do, right, is by tuning the seed, you can actually tune the generation and then create like a gif of like, you create an image, you can evolve it into another image. You can just do that forever because the models are so good. So you kind of get this like multimodal art, which starts with some beautiful image and then it just evolves, for example, throughout seasons and throughout to some other location. And you can kind of just keep staring at it for as long as you feel like because it keeps going forever. Like you don't maybe even, it may be not, it's unique constantly as well. It doesn't just repeat.

Michael_Berk:

Yeah, there was a really cool exhibit at the New York MoMA, where they didn't actually have AI-generated art, but they leveraged AI concepts in the art. So they would make a neural network structure for some model very pretty. And I think they had some actual Google models in there that were visualizations of an actual neural network that Google serves. So yeah, there's lots of cool stuff that's being developed out there. little bit. We've been talking about side projects and you guys have built up your skills both through side projects, school and prior jobs. So I was wondering and Ben through you as well, what do you guys think makes a good software engineer specifically for the machine learning world?

Ben_Wilson:

I mean, I'll take a first pass at that. Following on from what Ankit said, it's all about How much can you learn? How quickly can you learn it? And how much can you... sort of collaborate and work together with the other brilliant people that are around you. And if you have those three cornerstones, like if you can learn quickly and you're not afraid to fail and you work well with others and listen more than speaking, you're going to be successful in a high functioning team and you're going to learn way more than you're prepared than you even thought That's Databricks Engineering, everyone. So, yeah.

Sue Ann_Hong:

Thank you. Bye.

Ben_Wilson:

What are your thoughts, the two of you?

Sue Ann_Hong:

I mean, you said it really well. It's kind of hard to top it. But I guess specifically for skills, let's say, I think, it's really just core software engineering. But I think it does help to have this context around machine learning, having the experience of working with machine learning so you understand what's lacking in tooling, like Ankit mentioned. And what are the use cases? have a feel for it, I think that is helpful. It's not required to build a Neymar platform, but I think it is very useful. That said, we don't build products that way, right? We talk to a ton of customers, and we understand what they need. But it does give you a bit more intuition, I think, to think about how somebody would use what we build.

Ankit_Mathur:

Yeah, yeah, to build on that, right? Like that point around intuition is really critical. And I think in machine learning, it's really, really hard to get intuition unless you're willing to get your hands dirty. You just got to go try things and really understand what's going on. If I had to isolate two things, like for software engineering and a machine learning world, I'd say like the number one thing, as with all software engineering, actually be like a good high level design thinker, because the big picture is changing so often that if you're like too locked in on a particular vision, sometimes you might miss the bigger picture changing around you. is being willing to just go really, really deep and build quickly. Like what the folks did with Serving V1, like put something out there, see whether it works. It works, then build fast and towards it. So I would say those two things.

Michael_Berk:

Gotta un-k it in. Sorry, go ahead, Ben.

Ben_Wilson:

I just add one final thing that wraps up everything that the three of us said, which is something that I don't think a lot of people realize happens in the ultra-high-level, high-tech software companies that are out there, which is ego-less work. And that's something that's super critical, particularly people that are coming from something sort of these Fang companies and that's the way that those companies operate and the way that people that are writing software operate. It's part of how it's evolved over the decades. But if you come from an outside position into that, there's a lot of people that write code out there that they have an idea locked in their head of like, hey, I have this grand idea. This is going something different or changes, they almost take it as a personal attack. Or if they write code, they have almost an emotional connection to it. They're like, I built this thing and it's very important to me. But you never see that in like within Databricks Engineering. Like usually people are more than happy to delete their own code. And they're like, hey, I can trim off 40% of this. Yes, I wanna do that right now. It's just less complexity, so it's good. Everybody loves those PRs. And you're like, you just deleted 12,000 lines of code and people are throwing a party for that person.

Sue Ann_Hong:

I was about to say that too. Like we can't wait to get rid of V1 code.

Ben_Wilson:

Yeah, yeah, and there's certain things where you're like, hey, nobody's really using this. Let's pull the data Like can we just delete this entire module like this sucks or it's like it's flaky it breaks all the time and nobody really uses it So there's not that emotional ego based connection to anything that people build. It's more What I noticed is it's more of an emotional connection from sort of a technical almost spirituality, you know based of people that are like, hey, I really rely on these people that I work with. And I think they're amazing. And then if you talk to those people, and they're like, Oh, yeah, I think that person's really amazing too. So everybody just has this, this level of respect among peers, and they trust one another. And there's no, there's not ego involved of like, Hey, how could you say that on my, my review, you know, document, like, why would you attack me like that? There's none of that. So it's like, Hey, thanks for pointing that out. That's a great idea. and I learned something today. So

Ankit_Mathur:

Yeah.

Ben_Wilson:

I think that attitude serves people pretty well as well.

Ankit_Mathur:

I guess if you had to sum it up, it's like intellectual humility, right? Which is

Ben_Wilson:

Oh yeah.

Ankit_Mathur:

very easy to be humble when everyone around you is so, so talented.

Ben_Wilson:

Yes.

Ankit_Mathur:

So.

Ben_Wilson:

It's like that, the more you know, the more you realize you don't know. So it, Regardless of the level of somebody's hubris and ego, if you're put into one of these teams, you're going to just instantly feel, I don't know, at least I do. It's like, man, I'm the dumbest person here. This is crazy. And I've asked other self-reengineers and they're like, yeah, I feel like I'm underwater. Like everybody else is smarter than me. I think you're smarter than a ton of people. So it's almost like a communal feeling that people have like that. Just like, man, everybody's a genius. I feel so dumb. way.

Michael_Berk:

Wait, let me just roll it back to a point that I'm still stuck on. So, it sounds like you guys are cold-blooded assassins that just remove code at will, and you actually don't care about what you're building. It's all about the community.

Ben_Wilson:

Sue Ann_Hong:

No,

Ben_Wilson:

mean,

Sue Ann_Hong:

Ben_Wilson:

people

Sue Ann_Hong:

build

Ben_Wilson:

care

Sue Ann_Hong:

what?

Ben_Wilson:

about

Sue Ann_Hong:

Yeah, yeah,

Ben_Wilson:

what they're building, yeah,

Sue Ann_Hong:

yeah.

Ben_Wilson:

but you want something that's maintainable. And if you can trim the fat, yes.

Michael_Berk:

Got it. So

Sue Ann_Hong:

And

Michael_Berk:

what's

Sue Ann_Hong:

not

Michael_Berk:

the difference?

Sue Ann_Hong:

only that, we care about the product we're building,

Ben_Wilson:

Yes.

Sue Ann_Hong:

right? Like we care about the end thing. So if we can make it better, we don't care about the old things that are slowing it down or making it worse. That's, I think.

Ankit_Mathur:

like relentless focus on improvement, right? If you're focusing on improving,

Sue Ann_Hong:

Thank you.

Ankit_Mathur:

you shouldn't be caring about like, oh, well, my code, there's no real your code, it's our product. So.

Ben_Wilson:

Yes. And even if you come up with this really clever implementation, I think people get away from thinking about their implementations like that. Sometimes you write something and you're like, huh, that was kind of neat how that got solved. Or this is really efficient. But most people that are writing code at the frequency and the volume that we do, it's, I don't know, about either of you two. But I can't even remember what I wrote three months ago, just like. even if it was like the super clever thing, I don't even remember. It's like, hey, I'm focused on what we need to do this sprint and next sprint in this quarter. And yeah, so it's a paradigm shift, I think.

Ankit_Mathur:

Yeah, I don't remember what I wrote last week.

Ben_Wilson:

Ha!

Ankit_Mathur:

And probably like two weeks from now, I'll come back to him and be like, what, why, why did I do this? So.

Ben_Wilson:

No, I remember what you wrote last week. There's some really good PRs in the open source ML flow. It's awesome stuff, man.

Ankit_Mathur:

Thanks.

Michael_Berk:

So are you guys proud of any of your prior projects or they just wiped from memory?

Ankit_Mathur:

prior project you're most proud of, Suen. I'm definitely so proud of the prior projects, to be clear.

Sue Ann_Hong:

Yeah, yeah, I don't know what I'm most proud of. I have,

Michael_Berk:

Shredding

Sue Ann_Hong:

I do

Michael_Berk:

Kids'

Sue Ann_Hong:

feel like,

Michael_Berk:

artwork.

Sue Ann_Hong:

okay, yeah, I'm pretty proud of that one, but I think I have some emotional attachment, for example, to follow registry, not to the code, like whatever, but it's a little bit of a baby,

Ankit_Mathur:

Yeah, same.

Sue Ann_Hong:

you know. I think because I feel like we built a really good bottle registry. It was very popular. I think it's actually better than pretty much anything out there. Like people started copying it. Like that's feel like, you know, and it was

Ankit_Mathur:

Sue Ann_Hong:

really,

Ankit_Mathur:

established

Sue Ann_Hong:

really

Ankit_Mathur:

that category.

Sue Ann_Hong:

useful.

Ankit_Mathur:

Sue Ann_Hong:

Yeah.

Ankit_Mathur:

really

Ben_Wilson:

Yeah.

Ankit_Mathur:

established that category. It wasn't a word people used until...

Sue Ann_Hong:

Yeah, so I'm pretty proud of that. Yeah, I guess I also wrote the PRD for it, which was like my dark days of PMing a long time ago. So maybe,

Ben_Wilson:

Wait a minute,

Sue Ann_Hong:

yeah,

Ben_Wilson:

you guys

Sue Ann_Hong:

just

Ben_Wilson:

don't write your own PRDs anymore?

Sue Ann_Hong:

I mean we do, but I was actually like officially the machine learning PM for like a year, a really long time ago.

Ben_Wilson:

So you're a dual-headed then, right?

Sue Ann_Hong:

Yeah, but I think I spent more time doing the PM work than engineering. And then I was like, I really need to build stuff. I can't just write Google Docs all day. It's killing me. Yeah.

Ankit_Mathur:

In a small company, you just

Sue Ann_Hong:

Yes.

Ankit_Mathur:

do what has to get done, I guess.

Sue Ann_Hong:

Yeah, I mean like people write PRDs all the time these days too, right? That's engineers. I just didn't want that to be my only job.

Ankit_Mathur:

Speaking of the earlier point though, like I think like previous projects, you're happy to get rid of them, but you're proud of them in the context that they came out, right? Because like at that point in time, it was important. At this point in time, you get to evolve it into something new because it was successful and now there's people who use it and you can do it better. So it's like, you always want to get better, but like in the context of that time, I'm super proud of the work we did on model registry for sure.

Michael_Berk:

got it. So the takeaway is if you need mindfulness and meditation in presence, you can just be a software engineer. noted.

Ben_Wilson:

I mean, it kind of changes the way that you think about things in general, I think, like working in these teams is people exactly as both of you said, you're focused on the end goal of what your, what the fruits of your labor do. You're producing code that tells the computer to do a set of instructions that creates some, some product feature, but everybody's focused on how do we make that makes our company the most amount of money, but more importantly, makes our customers happiest. Like, hey, how do we solve this difficult problem for people? I think that's across the entire department at least, that's the one motivating thing that everybody shares in common. It's like, hey, we wanna build cool things. We don't care like how we build it, that's irrelevant. And that might change, that might iterate a dozen times in a year about, hey, we have to refactor all this, we have to change this, we need to build these new features, these old ones that nobody cares about. But at the end of the day, everybody's focused on how do we delight our customers. So the point where they're like, hey, model registry is awesome. I would never want to use a platform that doesn't have this baked into it. I've heard people say exactly that about, hey, we're sticking with ML on Databricks because we have this registry. Ever big customers say that. It's like their number one favorite thing about ML flow. They're like, yeah, tracking school and everything. what's running in production. And we know we have versions associated with it. And it's like, man, that's really cool. And everybody, that's what delights people, I think, that are working on things is to hear that and realize, we're doing that for these, not just one or two people, we're doing this for thousands of people.

Ankit_Mathur:

that's like a super unique thing you get to do at Databricks. Like

Ben_Wilson:

Yeah.

Ankit_Mathur:

I have a lot of friends in software engineering, right? Most people don't get to go talk to customers. But we get to do that all the time. And I think that is by far the most rewarding thing that I get to do. It's like here customers say nice things and mostly not nice things, right? Because mostly when you're on the call, they're like, hey, we love this, but we would love this more. awesome because it means that they care enough to tell you that.

Ben_Wilson:

Yeah, I was just about to say something similar to that is, I think everybody loves the negative feedback more, because it's almost like a challenge. It's like, hey, now we have something really cool to focus on that's going to be really challenging to build and let's scope it, but, but and do a design doc around it. But

Ankit_Mathur:

Yeah.

Ben_Wilson:

that whole process is just exciting.

Ankit_Mathur:

99% of things just are born and then go away, right? Like products

Ben_Wilson:

Ankit_Mathur:

and software. And if someone is giving you feedback, that means they're using it.

Ben_Wilson:

Yeah.

Michael_Berk:

Yeah, that makes a lot of sense. Do you guys have any negative things to say about Databricks software engineering?

Ankit_Mathur:

Hmm.

Michael_Berk:

Because all of you guys have worked a variety of prior jobs. So are there some, maybe negative is the wrong word, but cultural differences from those prior jobs that are notable?

Ben_Wilson:

their cultural differences that are notable. Yeah. Thousands. But they're all in Databricks favor. It's not like, oh, I wish that Databricks was like this old company that I was at. I've never thought or said that.

Sue Ann_Hong:

I have.

Ben_Wilson:

Ha ha ha

Sue Ann_Hong:

Also, I used to work at Facebook when it was Facebook, right? And

Ben_Wilson:

Hmm.

Sue Ann_Hong:

like move fast and like not break things, whatever, was real. I think they're probably a little slower. I'm not sure actually have, but haven't been there right in like six years. But I do think at least at that time when I came to the But there's a reason. It's because we build, right? It's more B2B, like we build for the customer. Like you need to build the right thing. Whereas I think at Facebook, it was more like a lot of consumer features. It was very, very like experimentation based. You would just build out features really fast and see what sticks. So I think I missed that, but you know, I think I'm now more used to very like trying to build that right thing for the customer.

Ankit_Mathur:

Yeah, I would say something along the same lines. Like, and the example I'll use is Apple. Like, I think when I was there, you really get a sense of just a really, really, really strong and relentless focus on perfection and beauty, like building a beautiful product that is just out of the box, the best experience. Um, and everyone there is culturally aligned around that. And it honestly feels awesome when you ship something and at their conference, WDC, you see it on the screen and it looks beautiful. is a real satisfaction in that, which you just don't get to do as much in BB software, right? Because like Suan said, you got to get out the door, get it to people. If it has a little bit of a rough edge around the corner here, a little bit something there, it might be okay, right? Because you're focused on really delivering business value. So I do miss sometimes like that really, really sharp focus on perfection and shooting the moon on a vision. But I think broader period of time.

Ben_Wilson:

What do you think is the biggest difference between a company that focuses on that business value with respect to a product being released, even for the initial GA release of something? If you're a B2B software company where you're like, hey, we're kind of focusing on the back-end functionality here, so how would you differentiate that when you have a commodity product that's and form and like how things look.

Ankit_Mathur:

I mean, what I'll tell you is that like at Apple, people can work on things for six, seven, eight years and only then see it come out. Um, and, and by the way, you can't talk to anyone about it that whole time. So, uh, we, you just can't do that here. Right. Like you have to ship it faster. Like eight years from now, like some other company is going to be doing, uh, you know, competing out, competing you in like one day, right? So you got to, you got to just get stuff. when you're a competitive space and you don't have time for perfection maybe.

Sue Ann_Hong:

But I think where you do have to be more perfect is stability and security. Those are

Ankit_Mathur:

What?

Sue Ann_Hong:

like super, super important, which is why the UI may look a little bit janky here and there, but that's what we focused on, right?

Ankit_Mathur:

Exactly. Security

Sue Ann_Hong:

Yeah,

Ankit_Mathur:

is the biggest one by far.

Sue Ann_Hong:

yeah.

Michael_Berk:

Yeah, it's because it's a data platform.

Sue Ann_Hong:

No.

Michael_Berk:

Yeah.

Ben_Wilson:

Plus

Ankit_Mathur:

Thank

Ben_Wilson:

the

Ankit_Mathur:

you for that.

Ben_Wilson:

stability aspects, we've danced around the topic a little bit on the podcast so far about what this V2 serving really is. It's enabling companies to have trust in a managed platform that you don't put models out there to do art generation You put them out there to make your company money or to protect your business or to prevent fraud and theft. Some of these use cases that are going to be running on this service, they're so mission critical to some companies and some industries that if that goes down, in a 72-hour period because this went down and it just was not serving predictions. So yeah, the back-end part of it, okay, this actually has to work and all the time, it's super impressive what your team has built in my opinion and about how good that works and how people can have trust to do that. Okay, it's safe and secure, but it's just going to run So, well

Ankit_Mathur:

And then

Ben_Wilson:

done.

Ankit_Mathur:

what's funny is you said all the time, and that's like three words and sounds super simple, but it's actually like so much work to get

Ben_Wilson:

Oh yeah.

Ankit_Mathur:

something all the time.

Sue Ann_Hong:

We've

Michael_Berk:

Cool.

Sue Ann_Hong:

had customers also tell us, they're really glad they're not on call for this, right? We are on call for that.

Michael_Berk:

All right, well, I have one final question, and it has two parts. Zooming out, what are you guys excited about in the model serving world, not just at Databricks? And then second part, what are you guys excited about in the ML?

Ankit_Mathur:

Okay, I think in the model serving world, we're like starting to finally see wide scale deployments of deep learning, making a really, really big impact in businesses. And model serving is actually the number one aspect of that that is going to control how much business value those things actually are able to provide. You have to be able to serve those things really, really cost effectively the impact in the world that they're capable of creating. So I think there's a really, really exciting opportunity in this space in the next three, four years to build awesome, highly performant products that are taking the silicon that we have to the absolute maximum capacity. And that's gonna be basically essential. Otherwise, you can't create business value with these things. And then, similar answer, take a step back, right? Look at machine learning more broadly. pretty damn smart. So I think seeing the progress on that, seeing how you can go and just train on trillions of tokens of data, and even this relatively smaller model can perform fairly really effectively, that's awesome. You can run it on your phone, your laptop. There's like endless opportunity to create awesome new customer experiences with machine learning.

Sue Ann_Hong:

Yeah, I guess that is so well. I don't know how much more I can add, but yeah, I'm really excited to see what people are going to do. Like people are going to do with these models. You know, I think we haven't even seen or super cliche, but it's like tip of the iceberg. Like people are going to be super creative, I think.

Ben_Wilson:

Yeah, I could harken back to my previous career prior to getting into software. I'm really excited on the serving front for the commoditization of GPU compute with AMD now focusing more away from HPC and into regular data center. So when there's silicon gets into more data centers and they make the strides that they haven't been making which Nvidia has exclusively been doing. But now they're doing it. The last three, four years, they're like, hey, we're gonna start supporting consumers so that you can use these particular cards and do deep learning with them. When that happens, it's gonna do the same thing that the original GPU war, which was video game GPUs that happened 10 years ago, where Nvidia got there first, creeping up higher and higher and higher for these cards. And then AMD was like, yeah, we got this too. But they can undercut the price because AMD has their own foundries up in Upstate New York that pumps these things out. So with Foundry operation at AMD, the new GPU chipsets and TSMC lowering its operating costs when they're doing sub five nanometer GPU lithography, is going to be more opportunities for the proliferation of this. From the MO perspective, I can't wait for the next iterations of some of these deep learning models. I want to know what's after the Transformers attention model. I want to know, I know that there's a lot of research being done right now, a lot of universities are focusing on this, but I want to see what's going to be possible in the next three years, To maintain the same capabilities as we have right now, we can shrink the size of a model by 100x. And that's coming. It's inevitable. But I'm looking forward to that.

Michael_Berk:

Well said. All right, well, I will summarize and wrap. So today we talked about model serving and a bunch of other random things per usual. And starting off when thinking about model serving, there are a bunch of options. So you can use SageMaker, Azure ML, GCP Vertex, and Databricks model serving. Everybody here is biased on the call, so do your own research, but we like Databricks model serving. And then when thinking about side projects to develop your skill set, for software or ML or software for ML. It's often really, really helpful to work on stuff you're passionate about, because getting something over the finish line that you don't care about is really, really hard. Try it and you'll see. And also Ankit mentioned a really good point for focusing on constrained problems. So don't try to solve general AI unless you really, really want to. Try to focus on smaller niche things that you can actually tackle or so. And also getting your hands dirty to try the newest models that leads to developing intuition, which is extremely valuable as a software engineer. Then some traits of good software engineers include learning fast, willingness to fail, and collaboration. And all these sort of fall under the umbrella of extreme humility. Then from a product development standpoint, it's important to talk to customers and then also know the fundamentals of computer science so you If people want to learn more about model serving or you guys, where should they go?

Ankit_Mathur:

Reach out over email, LinkedIn, whatever you feel like, on Kated Databricks. Happy to chat with anyone who has anything interesting to talk about machine learning or otherwise.

Sue Ann_Hong:

Yes, and then you will talk it. No, I'm just kidding. Yeah, I'm still at Databricks. So, same thing.

Michael_Berk:

All right, well, thank you guys so much. Until next time, it's been Michael Burke and my co-host.

Ben_Wilson:

I'm Ben Wilson.

Michael_Berk:

And have a good day, everyone.

Ben_Wilson:

Take it easy. See you next time.

Ankit_Mathur:

Thanks folks.

Sue Ann_Hong:

Thanks.

Model Serving at Databricks - ML 109

0:00

54:20

Playback Speed:

Show Notes

On YouTube

Sponsors

Social

Transcript