Integrating Business Needs and Technical Skills in Effective Model Serving Deployments - ML 184

Welcome back to another episode of Adventures in Machine Learning, where hosts Michael Berk and Ben Wilson delve into the intricate process of implementing model serving solutions. In this episode, they explore a detailed case study focused on enhancing search functionality with a particular emphasis on a hot dog recipe search engine. The discussion takes you through the entire development loop.

Show Notes

Welcome back to another episode of Adventures in Machine Learning, where hosts Michael Berk and Ben Wilson delve into the intricate process of implementing model serving solutions. In this episode, they explore a detailed case study focused on enhancing search functionality with a particular emphasis on a hot dog recipe search engine. The discussion takes you through the entire development loop, beginning with understanding product requirements and success criteria, moving through prototyping and tool selection, and culminating in team collaboration and stakeholder engagement. Michael and Ben share their insights on optimizing for quick signal in design, leveraging existing tools, and ensuring service stability. If you're eager to learn about effective development strategies in machine learning projects, this episode is packed with valuable lessons and behind-the-scenes engineering perspectives. Join us as we navigate the challenges and triumphs of building impactful search solutions.

Transcript

Michael Berk [00:00:05]:
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering at Databricks. And I'm joined by my amazing cohost, Ben Wilson.

Ben Wilson [00:00:14]:
I do quarterly planning at Databricks.

Michael Berk [00:00:18]:
For the whole organization? Oh, most certainly not.

Ben Wilson [00:00:21]:
For a one team. Nice. That's just why we're working on this this right now in the sprints and next sprints.

Michael Berk [00:00:30]:
Lots of great work. Like quarterly planning?

Ben Wilson [00:00:33]:
Do I like quarterly planning? I like the idea of being able to collaborate with a team of super smart people who have amazing ideas and then working through the process of distilling that down to what are we gonna work on that's gonna have the most impact in the least amount of time, and how can we sneak in all of the stuff that I worry about? Just like stability and robustness of the service and making sure that we get bug fixes and stuff done.

Michael Berk [00:01:08]:
Yeah. I heard signatures can break a lot of stuff, so probably want to test that.

Ben Wilson [00:01:14]:
A little bit. I think that was the biggest regression. It's definitely the biggest regression I ever caused in my entire career was it was, like, almost three years ago with one of my first big PRs in MLflow was adding support for additional signatures and then realizing after release, like, less than six hours after release that we had no test coverage for any of that stuff. And then realized that anytime you deployed something to model serving, it would just detonate. So a quick scramble to fix what I screwed up and a rerelease of MLflow and all of that done within twelve hours.

Michael Berk [00:02:03]:
Who reported it? Was it a customer?

Ben Wilson [00:02:07]:
No. That was somebody in the field who is just happened to who luckily be doing testing, and they didn't pin their install of MLflow that afternoon. And they're like, something's weird here. And I was like, what do you mean? And got the repro code, whipped something up, and I was like, oh, no. Everything is broken. This is not good. Mhmm. Yeah.

Ben Wilson [00:02:34]:
Panic. Nice. Extensive root cause analysis. It's the reason we do integration testing now. It's the reason that we have bug bashes. It's the reason we do release candidates. Yeah. Yeah.

Michael Berk [00:02:48]:
We'll get to the official topic in just a sec, but I was wondering this myself as I'm learning to write code. If all of this has just been learned the hard way by someone before us and or learned the hard way by us. Do you think that they're, like, axioms of design that you can take from other fields, or is it just sort of an iterative process where someone has created the exact same issue, learned from it, and now there are these practices like GitHub, tests, things like that?

Ben Wilson [00:03:26]:
I think everything it's human nature to kind of open the box, poke it, and see what happens Right. With everything that we do as a species. We're very curious, and don't always think about repercussions of things until after the fact. That reactionary response and also what that extends into with sort of thinking through, like, post hoc and event that happened, how could this apply to other things that we do? That is definitely something that that happens, and it's pretty common. But the first time something really nasty happens, it's usually, you know, we we poked it with a stick and found out, oh, no. Like, this is bad. So let's make sure we don't ever do that again. Yeah.

Ben Wilson [00:04:21]:
Exactly.

Michael Berk [00:04:22]:
Okay. Cool. So today, we are talking about model serving. And model serving has a bunch of different aspects to it, but we're gonna be using a interesting case study where we have a search engine. And the search engine is surfacing hot dog recipes. And we have a variety of ways to make a hot dog. We can add mustard. We can add ketchup.

Michael Berk [00:04:42]:
We can get crazy and add chocolate. And we basically want to have the most robust, stable, low latency, and, quote, unquote, best feature possible. And this is inspired by a case study that, I had to work on a few months back. It was super successful, but, basically, the problem statement that we were given is exactly that. We have an existing search engine. We want you to make it better. Go. So, Ben, how do you start?

Ben Wilson [00:05:12]:
I mean, I I think I start the same way that you did when we talked about this project a number of times. Well, like, well, we need design talks. We need to know, like, you know, you can go, like, the the Heilmeier approach where you're like, what is it that we're trying to build here? Leave all the technical jargon out of this. Leave all the solutioning out of this. What is the product that we're building? What is what exists today? What do we need to do to make it better? And that can help formulate our process of design later on. So I always start there. It's like, why are we doing this? What is this gonna build for us or gain for us?

Michael Berk [00:05:56]:
Cool. So this is a really interesting and challenging and fun question. For our hot dog recipe lookup, we have a very popular website. It serves, let's say, a hundred thousand users, and, we want to actually, let's just have that be the only constraint, and we wanna make this better. So our line of business is selling hot dogs, and let's say we make money on our website via purchases of our hot dog products or kits or recipes or something like that. We just sell stuff.

Ben Wilson [00:06:31]:
Mhmm.

Michael Berk [00:06:32]:
So how would you go about defining what makes this better? And, again, as a data scientist, this is more of a product question, but, sometimes data scientists need to be involved in this design. And if you're given this make it better request, you often are doing a lot of the design. So how would you approach designing this to improve the search functionality?

Ben Wilson [00:06:53]:
Yeah. You always gotta start I mean, to second what you said, I don't think as a data scientist working on a project that is product related, I don't think you can avoid putting on the product hat, and I don't think you ever should. Because it's super critical about knowing what it is that you're working on or what you're trying to improve. You have to understand the product. So the first thing that I would do is use the product. Like, okay. Somebody said our search engine sucks. I'm gonna go and search some stuff that is very adjacent to things that our customers probably search for because I probably am not in the market for looking for hot dog recipes.

Ben Wilson [00:07:38]:
I don't eat them very often. I would probably pull the logs of the search and say, like, what are people using the search engine for right now, and what are what actually returns results? And you could do that, you know, analytically by pulling the raw data and seeing, like, oh, titles of things that return from the search. Hopefully, you are collecting those logs, and they're in a way that you can access them. If it's not that and, like, the logs are in a place that you just don't have access to, then I would just use the tool and make a bunch of educated guesses of, like, okay. I'm gonna search for hot dogs with pickles and hot dog you know, best New York hot dog recipe. You know, just a bunch of random stuff. And then see what's returned and record that so that I have a mental model and a physical model, like, data that shows whether this is good or not. Mhmm.

Ben Wilson [00:08:38]:
And if this is a

Michael Berk [00:08:39]:
a pet.

Ben Wilson [00:08:40]:
Then you go and talk to the person who asked you to build it and say, what do you want improved? What's so bad about this?

Michael Berk [00:08:49]:
Cool. So let's narrow it down a bit. Let's say they think that the search is too narrow. They have a very simple elastic search that just does exact matching on prefixes of search terms. So if I type in h o t, it will find all things that mention hot. So with that, how would you go about this is jumping the gun a bit, but I'm very curious your take. How would you go about evaluating the different components of this piece of functionality? Would you do AB testing? Would you do offline testing? Would you do integration testing? How would you approach the testing and evaluation aspect?

Ben Wilson [00:09:36]:
For, like, comparing what exists to whatever it is that we're gonna be building? Mhmm. I I wouldn't do anything fancy. I would I would probably brute force it. So if we have two endpoints that are available, one is the existing search API, and then we're we're gonna be building something that might replace that, I would write, like, a simple script that just sends requests of a static dataset that I would curate and be, like, see the result difference between these. And then depending on what the nature of the data returned, do I need to get, like, relevancy scores? Maybe I'd use GenAI to do that if it's a large corpus of data. If it's, like, in the early stages, I'd probably start with, like, 20 or 30 questions or search terms and just use my own brain to determine, like, is this good or not, or how relevant is this, and just human curated annotation. Yep. This the original search query term, this sucks.

Ben Wilson [00:10:48]:
Like, the the results are not, like, not helpful. And then you use that as a baseline and be like, okay. Our new stuff is is better.

Michael Berk [00:10:56]:
Right. Okay. Cool. Yeah. For this project, we did something a little bit similar. We created a very simple front end, and we had side by side results of the prior system and the system we were working on. And so that really helped business stakeholders understand the improvements to search. So if you type in, what's a semantic word for hot dogs? Delicious, let's say.

Michael Berk [00:11:26]:
You don't want just a prefix match on the word delicious because there aren't that many products that we have that say the word delicious in the title. Instead, we want to look at semantic reviews, let's say, and see what where people are calling tasty, fun to eat, whatever it might be. Any synonyms for delicious, and then return those results and those products. So by having a side by side app in the exact same user interface, it really highlighted the improvements to the search quality.

Ben Wilson [00:11:54]:
Yep. Cool. Yeah. I mean, I would do that later on, like, after I Mhmm. Built something, if it warranted it. If the if the customers were people that are, like, oh, it's a bunch of software engineers that are just trying to build this thing, make it better, because there's there's issues with the implementation that exists. Mhmm. Maybe raw data is fine.

Ben Wilson [00:12:17]:
Like, side by side JSON, don't need to build a web UI. But if it's it's like, oh, this is the business. Like, there's some C suite person that's like, I'm interested in this becoming better because this frustrates me. I want something where I can see this side by side, then, yeah, build a simple React app that will show that sort of left and right comparison maybe with, like, contextual highlights of what it, like, what it actually found as a significant term as part of that search query. There's all sorts of clever things you can do and modules you can use.

Michael Berk [00:12:53]:
Right. Yeah. Exactly. Okay. Cool. So so far, we've decided what to build. And, again, hopefully, product will inform this to some degree, but you should also put on the product hat yourself to get intuition about it. And then we've also generally defined our success criteria.

Michael Berk [00:13:11]:
So latency is somewhere in there. Search quality is somewhere in there. And we've, let's say, defined how we're gonna measure this, whether it be JSON or offline testing or you name it.

Ben Wilson [00:13:20]:
Mhmm.

Michael Berk [00:13:21]:
What is your first prototype?

Ben Wilson [00:13:25]:
As dirty as possible. You wanna, like, fail hard and fast as early as possible when you're doing a proof of concept, not worrying about code quality. You're not worried about readability. You are going into full hacker mode and just making something work. You wanna do stuff like have a bunch of conjectures of what the actual systems architecture will look like for this. So you look at what is existing originally. Like, what is that stack? How are they actually serving responses here? What is the back end for the front end? Okay. You said before it's Elasticsearch.

Ben Wilson [00:14:09]:
Awesome. Great solution. If we're gonna make this better, do we go into using managed ES that has some of the more advanced, like, similarity characteristics? Do we take the entire search term and do, like, distance to find similar things? Are we like, does that work? So I would play around with that to see, do we have a quick win here that makes it significantly better than what is existing other than, you know, as per typed, basically, a completion search engine, which sounds like that's what it was before. So do can we get it done with the existing tech and make it work? I don't think that's the case for this because of, you know, the the sort of project that we're talking about. So we probably wanna use something a little bit more advanced and more modern. So if we're going into the Gen AI world, I'm like, okay. Doing similarity on embeddings, we're gonna have that sort of comparison. We need some sort of, you know, rag system.

Ben Wilson [00:15:16]:
It's like, I need to find the document that most resembles this and search rank order that based on that relevancy. Right. So I would just do that as quick and dirty as possible. Okay. I have all of my data somewhere stored in ES that should be backed up somewhere on object store. Let's just embed, like, 5,000 articles that are all somewhat similar, but have enough of samples that are very, very different in context. Load that up. Like, calculate all those embeddings, store them in a vector store, and then just start querying them.

Ben Wilson [00:15:54]:
Like, does this make sense?

Michael Berk [00:15:57]:
Right. So one probably very deep question for you that could be six episodes. I've been working with some really fancy engineers at Databricks and looking at their prototyping code. It's a private preview feature, and I've have the privilege to go and read the repository. And looking at the code, there's a bunch of discrepancies in style and overall, it seems like in quality. Like, there are better ways to do some of the things that they're doing. But it's clear that they're just hacking stuff out in a in a sort of a way that works, and, they don't really care about the, like, intricacies of style. So Right.

Michael Berk [00:16:40]:
I thought that that was super important. And after seeing this, I was like, senior staff engineers writing up, like, 50 if statements in a row noted. Yep. What when you said quick and dirty, what should not be skipped in, quality?

Ben Wilson [00:16:58]:
If I'm so there's a difference between me filing a PR against the code base where I mark it as draft and tell people don't review this. Like, this is a a proof of concept. I'm just trying something out, but I want CI to run on it. That lets everybody else know that I have my my dirty hacker hat on, and nobody's gonna look at it. Or if they look at it, they're it's not a judgment type thing of, like, what the hell did Ben do here? Why would he file this PR? You see that as a direct indicator for everybody else or just like, yeah. He's just trying to make this work. Cool. They'll may maybe they'll look at the implementation and suggest, like, oh, you've like, we should probably add these, like, seven other if statements here for this other like, this edge case.

Ben Wilson [00:17:45]:
But nobody's like, why are you doing it this way? Why aren't we adhering to proper software design? Like, how is this even testable? Where are you gonna test? Nobody's doing that because we know what that phase of development is. Most of the time, we're doing that locally. We'll fork a branch. We'll test it locally. If CI can run on our local machine, we'll do it that way. And then once we have something that kinda works, then you have that open on one screen, and then you have a fresh branch on another, and you're converting that garbage code into something that is proper.

Michael Berk [00:18:24]:
Yeah. I still don't get it, though. For our use case, what are the things where you would really think about and design properly? And what are the things where you'd just be like, 50 of statements is fine?

Ben Wilson [00:18:37]:
I don't worry about implementation details when I'm getting something to work because I know how to do that. What I want is a service that does this thing so I can determine, does this even work, like, based on the data that I have? And, like, I I don't know if this semantic search if this approach with using RAG is the right thing to do. I'm not gonna sit there and write a whole bunch of code that properly interfaces with the vector store when I can test that out in a command prompt. I can just hit it with a curl request and be like, yep. That seems legit. Because the curl request is gonna take me thirty seconds. Writing all that code to make it look good is gonna take me six hours And file that PR, that's another ten minutes, and then ping people for review. If I don't even know if that's gonna work, I just wasted six and a half hours, and I wasted other people's time too without even knowing if this works.

Ben Wilson [00:19:41]:
Because if it doesn't work, I now delete that PR and wonder why I spent the whole day of my life working on something that doesn't work.

Michael Berk [00:19:50]:
Got it. I so I think it clicked in my brain. The thing that you are optimizing when building a component is the amount of time it takes for you to write stuff that will give you signal on whether that component is a valid design

Ben Wilson [00:20:07]:
choice. I think that perfectly encapsulates what a hack is.

Michael Berk [00:20:12]:
Cool. Alright. So you let's say there are in your brain 15 different design points. Do we do this or do we do this? If we go down route x, then we'll have these other six things that we need to answer. If we go down route y, we have these four other things we need to answer. So let's say you've built out a prototype that generally maps this search space of design, and you have an optimal design or at least a seemingly good design. What are your next steps?

Ben Wilson [00:20:42]:
There's a step that you're missing there. Mhmm. What tools do I have available to me? Not what tools exist. It's what can I use? So when we're doing when you're working on a project that is self contained, that you control the entire environment and you can use any tool you want, any open source tool, any tech, doesn't matter. Startups are like this. Right? Like, you can just choose whatever works the best. That's a very different story than you're doing you're deploying a service at a company that already has a tech stack, already has services that have been vetted that you know how to like, there are other people that know how to run those. You might be interfacing with other teams.

Ben Wilson [00:21:27]:
There could be somebody else who's on call for this thing that you're using. You need to involve people in that decision process of the design and very clearly articulate why you can't use something that's already in existence at your company. I know a lot of people that get into this this field from another field, like data scientists to move into, like, software engineering or MLE. They're like, why can't I use this, like, super cool open source project that does all this this amazing stuff at my company for this production appointment? And the people that would be downvoting that saying, no. We don't need to do this, they've been there, done that, and, like, introduced some new tech that doesn't check all the boxes or is, like, super insecure or is not stable or has some maintenance burden on it that nobody understands or they don't have capacity to bring this on board. So you have to be very careful about that stuff. Like, what it what do you have that's already been built that you can use? So sometimes that means you got to get clever with your implementation to be like, well, yeah, we wanna use, like, for the the data serving layer of this thing, we wanna use, like, MongoDB. We don't have a Mongo account.

Ben Wilson [00:22:49]:
We don't have that service at our company, but we have AWS. Can we do this with Dynamo? Because that's vetted and approved. Okay. Change it to use that instead.

Michael Berk [00:23:04]:
Yeah. That's a good point. And, also, if we're iterating on an existing system, I'm sure a lot of the existing system can be repurposed.

Ben Wilson [00:23:12]:
Exactly. And that's the that's the key goal. Like, we do that at Databricks. Right? We've spin up a new service that does this new cool thing. It's a lot of it is reusing components.

Michael Berk [00:23:25]:
It's hilarious.

Ben Wilson [00:23:26]:
Yeah. I mean, it's critical. Like, if you if you saw what it was like to onboard a new service, you would realize why people are like that. Because that can take, like, a year of validating stuff and then building the infrastructure that allows that service to actually run. There's so many back end components too. Something that's like, well, I need to deploy this to every data center in the cloud. How what's the mechanism for that? Do we have all of that set up? How do we do change management with that? When we need to bump the version of this and do a migration, what does that look like? Oh, somebody needs to build all that code that does a migration without a service disruption. How do we do that? You know, it's what seems like a really small and significant thing for your little project.

Ben Wilson [00:24:13]:
All of a sudden, balloons out of control where you're like, I need to involve 17 different engineering teams to to just build this. Right? Like, how much capacity do we need? That's 200 people's time for six months. Not gonna happen. But you have something that's already there that you can just feel like, oh, I can just call this API, and I can spin up a new service. Doesn't do exactly what I want, but it's pretty close. Okay. Let's do that.

Michael Berk [00:24:41]:
Exactly. Yeah. It's been funny to watch new Databricks features get released, and it's just like, half the time, it's just, like, a packaging around an old service. And it does something different, so it's super valuable, but it's the back end is the exact same thing. So it's, like, nice to be able to just transfer your knowledge over to that new feature.

Ben Wilson [00:24:59]:
Yes. And that is intelligent product design.

Michael Berk [00:25:02]:
Mhmm.

Ben Wilson [00:25:04]:
Look at all the cloud providers. They do the exact same thing. Like, a lot of the AWS services that are out there, they're building on the the foundation layer of things that have come before, and they're building new products that are just when you look at what it's doing, you're like, hang on. Isn't this just using Fargate? Like, Kubernetes pod management and then, you know, KTCL stuff is, like, orchestrating this. You look at it and you're like, yeah. That is what it's doing. And it's really clever that they built a new product that leverages this. So really smart engineering organizations do that extensively.

Michael Berk [00:25:42]:
Yeah. What is just out of curiosity, the AWS backbone is Dynamo e c two what else?

Ben Wilson [00:25:51]:
S three.

Michael Berk [00:25:53]:
Isn't s three on Dynamo, though?

Ben Wilson [00:25:58]:
No. S three uses Dynamo for its location lookup, for, like, where Exactly. Hash table. But it's a separate service, separate technology than object store. Got it. But you you look at, like, AWS Lambda. Right? That's their serverless architecture for executing arbitrary functions. So much stuff runs on that because it's the intelligent design.

Ben Wilson [00:26:28]:
They even open sourced the thing too, which is crazy.

Michael Berk [00:26:31]:
Mhmm.

Ben Wilson [00:26:32]:
But they're, like, EKS, like, Kubernetes backbone for AWS, so much stuff runs on that. Fargate is an attraction layer on top of that, like a serverless abstraction. And it's it works at stupid scale, and Azure has the same thing. Right? Same sorts of infrastructure, and they're doing the same sort of build on these these building blocks.

Michael Berk [00:26:59]:
Right. Yeah. Okay. Cool. Fun little aside. So back to the question at hand. We've, sort of iterated well, first of all, we selected some tool stacks, and then we've iterated throughout the design space and have a general idea of what and how we wanna build. I guess the next question is how do we build it?

Ben Wilson [00:27:22]:
Yeah. I how what does the team make up? Mhmm. Who's getting who's putting hands on keyboard?

Michael Berk [00:27:31]:
Right. So let's say it's two engineers.

Ben Wilson [00:27:37]:
And you're bringing in an interesting, caveat to this because the two people working on it don't work for the company. So this is consultancy work, which is a different dynamic than building it in house.

Michael Berk [00:27:53]:
Right. Let's say yes for this use case.

Ben Wilson [00:27:57]:
Yeah. So with consultants, you're you're on the hook for building something and delivering something that you're getting paid for or your company's getting paid for. So you the two extremes of that is, one, you're doing team embedding. So you're helping out a development team, and you're contributing code with them. And they're doing the prod like, product management side of the house. They're giving statements of work to you, like, hey. We need help with these five elements, and you own that end to end, but here's the contract. Like, the input is gonna be this, and this is what we want as an output of the service.

Ben Wilson [00:28:42]:
Go build that thing, and here's the specs. That's more pure consultancy work, or you're you're a coder for hire. The other extreme of that scenario is you get hired as a team of one to 10 people, and the company is like, we want this magical thing as a service. Go build it for us. And they have no direction other than here's the out like, the end result that we want. Go figure that out, smart people.

Michael Berk [00:29:18]:
Mhmm.

Ben Wilson [00:29:19]:
And you usually have no interaction with the the team at the company. You might be talking to somebody some executive or somebody in upper management who's just handed you the statement of work of what what they expect, and they don't care how you build it. They just want that thing to work.

Michael Berk [00:29:38]:
Right.

Ben Wilson [00:29:40]:
My question to you, which which of where on that on those two extremes do you like to operate in?

Michael Berk [00:29:51]:
They're actually fun for different purposes. I've been really enjoying design recently, but sometimes just banging out code is what you need emotionally. So I think it it really depends. I don't know that I have a strong preference on either. Right now, slightly leaning design. But what about you?

Ben Wilson [00:30:16]:
I mean, I don't do any of that anymore. Everything we do is, like, immediately right in the middle of that. Mhmm. Right? We have a team. We know what our our team's capacity is, what our charter is, and we own the design and the implementation and the maintenance of that. So there are a handful of situations where sometimes we'll get help from a consultant. Not a consultant, a contractor. He's like, hey.

Ben Wilson [00:30:49]:
We we just need somebody who's really good front end for helping us out with because we just don't have capacity. But we do have a deadline of this date, so we just need a a gun for hire to come in and bang on some code. We're doing the design for that and then handing over a spec. Like, here's what we need, and then we're reviewing every PR that's involved with that. Right. But when I was doing stuff like what you do now, I didn't enjoy the latter one that much, where it's more like, here's a vague list of requirements that we want. Go figure this out. I preferred on that side, but, like, engagement with the team or engagement with with subject matter experts at the company.

Ben Wilson [00:31:48]:
Where it's like, this is, like, what I used to do at at when I was data scientist in companies or ML engineering companies. I have to go and talk to people and be like, what do you think they should do, and what do you like about this? What what don't you like? What are your expectations? That's fun, and that's more like the product engineer sort of design, side of the house. But the full extreme is very frustrating. Or it's like, yeah. I have access to your systems, but I don't understand your business completely. And Mhmm. I I can't read your mind. And when you go in and try to talk to people, they just don't have anything to say.

Ben Wilson [00:32:27]:
You know? Yeah. Anything is better than this. Like, well, okay. And that brings you that leads you to the point of needing to learn very quickly what's the difference between good enough and perfect.

Michael Berk [00:32:42]:
Alright. Yeah. Before we define those, that was what I had to do for this prior project. They were basically like, our current system sucks. Everybody knows it. Can you guys make it better? And we hammered them for SLAs and success criteria, and we finally got a latency requirement under a certain load. And I think there was one other request, but beyond that, they were just like, make it better. And the backbone of the reason why this worked is we built that front end app that allowed side by side comparison and allowed, a bunch of users to go and explore it themselves, submit feedback.

Michael Berk [00:33:24]:
We had a little, like, feedback bar in that front end that would submit to a Databricks table so we could actually look at the feedback. And then also, it clicked in people's brains. They're like, oh, the old search didn't do this. Now it does this. That's so great. What about this feature? And it was pretty inefficient, but at the same time, it built built a lot of rapport. And the business stakeholders were really happy that we were really taking what they said seriously because, honestly, there wasn't a product person in the room, and so they didn't know what's like, they there was no product spec. So it was just intuition by business stakeholders.

Michael Berk [00:34:02]:
And so we iterated together and it worked, but, it would have been nice if there was a product spec somewhere along the road.

Ben Wilson [00:34:10]:
I mean, I mean, that can be exciting where you're just given free rein to make something amazing. Exactly. The scary part for me historically in those situations is I know the limitations of my own creativity. And the fewer number of human minds are involved in that, the worse it's gonna be. Or Mhmm. You have a a probability of overbuilding something. Because if you're not in that ecosystem and you're not, like, a user of that platform or somebody who actually works there and understands the customers or whatever the use case is, you could start going off the rails of, like, well, I need to I tried this really crazy thing, and this this totally broke this. So I I need to engineer something around this so that this doesn't happen.

Ben Wilson [00:35:04]:
And then you look at the like, you launched a production, you pull the logs, you start looking at what people were doing, and you're like, why did I even build that? Like, nobody uses this. We don't have a single instance after a million queries of anybody doing this stupid thing that came into my head of, like, how could I break this system? And you realize that you just built something that didn't you ain't gonna need it. Like, it's worthless.

Michael Berk [00:35:33]:
It's definitely say that. Yeah. That's actually hilarious you bring that up. We load tested pretty stringently. And, like, after it launched into production, my counterpart, that I on the Databricks side when I queried the logs, and it was, like, a hundredth of what we tested for. So it was it's it's a very seasonal, query amount, but still, like, we could have half assed it a lot more.

Ben Wilson [00:36:08]:
I wouldn't so that's one exception. Like, service stability is something that you don't typically half ass. You set a spec and you guarantee performance within a certain probability range for that performance, and that's more for infrastructure needs. And then that's budgeting. So you build a a rest API that can serve a hundred requests an hour, and you don't ever expect that to go above, like, say, a thousand requests an hour based on the history of of this search tool for all time. You don't go out and set a spec for, you know, 10,000 queries per second because the infrastructure for that is fundamentally different. The expense of that is fundamentally different. The complexity is fundamentally different.

Ben Wilson [00:37:01]:
So you don't build that if you don't need it. But the half ass comment, that can come and bite you if it's like, yeah. We search the logs. There's, like, a thousand requests an hour. Why didn't we just spec this for, like, 10,000 requests an hour? Well, 10,000 requests an hour, you can get by with a single VM running a Flask server. Just expose that behind an API gateway. It'll cost you $5 a month to run that service. It's nothing.

Ben Wilson [00:37:35]:
What happens when your site goes viral and you get a hundred thousand people hitting it in ten minutes, that search functionality just broke, like, broke hard. That server probably crashed. And did you build all the infrastructure for, like, instant resumption of that? Probably not. If you're thinking that that it was gonna be really low and you're like, whatever. So for stuff like that, you would wanna have a minimum bar of quality and performance that you validated to say, like, even at peak times, like, I'm gonna test 10 x peak of all time just to make sure and then have some sort of failover mechanism where I can protect that server from crashing.

Michael Berk [00:38:21]:
Interesting. Yeah. That's a good point. And, ideally, some sort of auto scaling so that you're not spending money when you don't have to. But when traffic comes in, it'll scale up.

Ben Wilson [00:38:31]:
Yeah. Like, simple cloud managed Kubernetes setup that can run a container that has your stuff running on it. It's Yeah. Let somebody else worry about it for you. Yeah. But also set limits on that. Like, don't just say, yeah, defaults are good. Because the default, depending on what service you're using, it could auto scale to something where you're like, well, we just blew the yearly budget in one hour.

Ben Wilson [00:38:54]:
Yeah. Because it auto it horizontally scaled to support this burst of traffic because somebody's being a jerk with seeing if they can take your site down.

Michael Berk [00:39:04]:
Yeah. Heard. Okay. Cool. So rounding out this list, we've designed it. We've prototyped it. We've prototyped within the design space to select tools and finalize what we think is a good approach. And then we've also started building, and assembling this team, and then the team has started actually developing this feature.

Michael Berk [00:39:27]:
So what's next?

Ben Wilson [00:39:31]:
Heads down. Crank out some code. No. Set up a cadence for a team to discuss. Not everybody likes stand ups and stuff like that. Figure out something that works for the team. You need somebody who's leading all of it and setting expectations for what is the project delivery timeline. That's also another thing that I think a lot of people don't really enjoy building or adhering to, but it's super critical to know what is the scope of what we're doing, what are the components that we need to build, who's gonna do what, when are they gonna get it done by, and at what particular cadence in that building of that system can we all take a break for a day and test what we've built so far? Because if you're not doing that, and you're all just heads down, like, hey.

Ben Wilson [00:40:27]:
Let's go build this. Michael, you you build the the rag thing. I'll do the front end to back end, like, translation layer. We're good. Meet again in a month. You don't know, like, whether it works. Like, those components work together. The scope could be completely way off.

Ben Wilson [00:40:49]:
You could estimate, like, oh, yeah. I've built web servers before. It's no big deal. Mhmm. It could be way more complex than you thought. What if you build it, and then when you when you merge these two systems together, the latency is a thousand x what you expected. Oops. Yeah.

Ben Wilson [00:41:07]:
Should've figured that out six weeks ago.

Michael Berk [00:41:11]:
Yeah. No. A %. That was one of the we did a great job building it and collaborating throughout. We had not stand ups, but I think, like, check ins every few weeks. I reviewed all of my counterpart's PRs. He reviewed all of mine. So there was great collaboration on that front.

Michael Berk [00:41:27]:
But the scariest moment was the big boss came in the room, and there wasn't a ton of collaboration on the stakeholder side between big boss and the people we were directly interacting with. And we were just like, here, we didn't get any design specs. We didn't get any details beyond the simple SLAs you provided. Do you like it? And, yeah, it it went super well, but it was a bit unnerving at first. And, hopefully, you can have some of that, back and forth and and feedback, not just from within the team, but from the stakeholders and the leaders that you're looking to cater to.

Ben Wilson [00:42:05]:
Yeah. And making sure that's part of the project plan. So what you don't wanna do is involve an outside user into something like an internal bug bash. Yeah. That can go very poorly, and they can lose faith in you. We don't do that. Most serious engineering organizations never do that. You always test internally, because we all know, like, things go wrong.

Ben Wilson [00:42:29]:
Stuff is totally broken. You list all out all the things that are just kinda messed up, and you get it to a state where you're like, okay. Everything we've tested works pretty well. Now let's bring in people who can test how they're used to using it and see what they come up with. And then you record everything that's broken that they find. Or is it is it unintuitive? Does it not behave the way that they they expect? Okay. How can we adjust it so that it's more intuitive? And then right after that and you're you're doing those fixes, you start your next phase of development work. Okay.

Ben Wilson [00:43:10]:
Maybe we need to work on the performance of this. So now let's focus on that together. Or maybe it's the accuracy of what's coming back. What can we do there? Do we go back and fix, like, the ML side of this thing? Is it something with the data we need to change? And iteratively work on that and show incremental improvements within the team and then have everybody say, like, yeah. This is way better. Let's get the, like, the SMEs in here to have them see. Sometimes you don't do that solely for the fact of giving them a good feel of, yeah, the project. The project's on track, and it's looking good.

Ben Wilson [00:43:52]:
Sometimes it's purely political so that they can come in and see, like, oh, yeah. They made huge improvements since last time. And you can do tricks with that as well. Like, you can show the results of what it was in the past even if it's just total garbage, and then have them look at it what it is right now. And then their perception is like, woah. This is so much better. Is it perfect? No. Is it ever gonna be perfect? Probably shouldn't be.

Ben Wilson [00:44:17]:
But you need to get it to a point where they can have an emotional response where they're like, this is awesome. We're good. We can ship this. Yeah. And that's a one way to to kinda force that by showing how bad it was before versus what the improvements have been.

Michael Berk [00:44:34]:
Yeah. Yeah. It's interesting. I've I've been learning how to do this properly, and it's perception of deliverables is so malleable. Yeah. It's a very human thing. So the engineering team knows all the holes typically. And when you're working internally, you're like, that's gonna break.

Michael Berk [00:44:51]:
That's gonna break. If you do that, these seven things will break. But within a thirty minute to one hour call, there's not enough time for them to completely vet your system, and they also typically don't have the technical capabilities to do that anyway. So it's important not to lie, but it's also important to frame and pitch so that the right things happen, whatever the right things are.

Ben Wilson [00:45:16]:
Yep. Yeah. You don't wanna hide something that's serious and is probable to call it like, to happen. You wanna disclose that and say, like, how urgently should we fix this? But if it's something like, our process on doing stuff like that is if it's a bug that and we haven't released it yet, it's a sub zero. Like, that needs to be fixed. If it breaks the the workflow of what somebody's accustomed to, that's a bug. It's a sub zero. Fix it.

Ben Wilson [00:45:49]:
But if it's something like, we think people are gonna want this feature and we think this is important or critical, that almost always comes down to, like, a p two. Like, we're we're not building that unless we get feedback. We know what that feedback will probably be, but we don't know if we should build that yet. And the smart thing to do is don't build it unless somebody asks for it. Yes. Because when you do go and build that, like, oh, I got this this awesome idea to make this super, like, super great, and then you go spend a week building that thing. There's no worse feeling that you can have aside from shipping, like, total shit broken code other than you had this great idea of this amazing thing, and then you pull the data and or talk to people. They're like, yeah.

Ben Wilson [00:46:42]:
That sucks. Or I don't use that. I don't need that. And they're like, man, why did I build that?

Michael Berk [00:46:49]:
Yeah. Yeah. It it's funny. Ten days before the project ended, we got a new requirement that was, like, a legit like, they called it a requirement. It wasn't defined at all ever, and so we had to sprint to complete it. But, yeah, it thinking through this, I'm wondering if we framed it different. If we did, like, a different framing of the results if that requirement would not have come up. And I don't know.

Michael Berk [00:47:24]:
But the key thing is these, stakeholder meetings are very human oriented, so don't ignore the human aspects. Put on your charisma hat, put on your sales hat, and really, like, pitch what you've built because that can be night and day for the reception.

Ben Wilson [00:47:45]:
Yeah. Could not agree more. It's also a different it's a different ecosystem if you're building something that's, like, internal to a nontechnical team, but is approachable enough that anybody can use it, then I think the emotional aspect of the perception of quality is more important to sell. Mhmm. It's like highlight all of the highlights as much as possible. Yeah. And maybe not dwell too much on things that is like, has a couple of rough edges in this one part. Wait for them to discover that.

Ben Wilson [00:48:23]:
If they never see it or they never complain about it, then it might not be that important.

Michael Berk [00:48:28]:
Right.

Ben Wilson [00:48:29]:
But what you don't wanna do is walk into that room. Hey. Live demo time people. And you're just sitting there. You're like, man, I really hope they don't find this thing. And you didn't disclose that, and it's, like, the first thing that somebody finds, you now look like a complete idiot.

Michael Berk [00:48:44]:
Yes. That is true.

Ben Wilson [00:48:47]:
Always test your stuff yourself. You can have somebody who doesn't have context but is technical to come and test your thing.

Michael Berk [00:48:54]:
Mhmm. Yes. Cool. So we're about at time. We brought it all the way up through implementation and presenting to stakeholders. The last step would be productionization, and that's a whole other can of worms. But, if you have this initial prototype that is working, you can typically push that into prod via whatever your deployment strategy is, and maybe that's a future episode.

Ben Wilson [00:49:20]:
Yeah. A part two. We should revisit this from this stopping point.

Michael Berk [00:49:25]:
Sounds good. Cool. So I will summarize. We revisited sort of how to deploy model serving capabilities specifically with a search case study. At a high level, the steps are, first, figure out what you're trying to build. Put on your product hat, understand the use case, understand how the business makes money or succeeds. Next, figure out how we're gonna measure success. So is there gonna be increased engagement? Are there SLAs, etcetera? Third, build a quick and dirty prototype.

Michael Berk [00:49:55]:
And the thing you should be optimizing for here is the minimum time to signal. So and signal means a indicator on whether this decision on a given design is good or bad. Once you have that sort of initial prototype of working, then we want to look at the tools that you can leverage as the underlying structure for this overall functionality. Fifth, iterate with a bunch of prototypes gets more signal on design including these tools. And then finally, once you are pretty confident that this is how it should be built, be heads down, develop it, and customize the working style to the team that's involved. So maybe you have stand ups, maybe don't, but a lot of feedback is important. Once you have a v 0.1 or even less, it's important to have an internal bug bash and try to break it. Be really creative.

Michael Berk [00:50:47]:
Use your engineering skills. Use your product skills. Combine them to break the product. And then finally, bring in stakeholders for feedback and see what they have to say. Anything else?

Ben Wilson [00:50:59]:
No. That that covers that dev loop.

Michael Berk [00:51:02]:
Cool. I do have one more note. Oh, for service stability, don't half ass it. Ensure you can meet the SLAs.

Ben Wilson [00:51:14]:
Yes.

Michael Berk [00:51:15]:
Alright. Well, until next time, it's been Michael Burke and my cohost. Ben Wilson. And have a good day, everyone.

Ben Wilson [00:51:21]:
We'll catch you next time.
Album Art
Integrating Business Needs and Technical Skills in Effective Model Serving Deployments - ML 184
0:00
0:51:25
Playback Speed: