Navigating Build vs. Buy Decisions in Emerging AI Technologies - ML 180
In today's episode, we dive into the critical decision-making process of building versus buying technology solutions, especially when it comes to agentic logic-based frameworks. With the industry still in its early stages, I recommend waiting for managed solutions to mature, while Ben suggests the educational value of simple project builds. They discuss the importance of understanding the technology thoroughly before diving into business-focused decisions, using tools like customer user journeys (CUJs) to evaluate scalability, cost-efficiency, and maintainability. They also highlight some initial challenges and missteps in project management and the necessity for pre-evaluation by tech teams.
Show Notes
Socials
Transcript
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. And I'm joined by my amazing cohost, Ben Wilson.
Ben Wilson [00:00:17]:
I write LinkedIn posts about, open source releases at Databricks.
Michael Burke [00:00:23]:
So, Ben, if people wanna contact you, should they do it through LinkedIn and what should they say?
Ben Wilson [00:00:31]:
Funny, what we were talking about before the recording. Yeah. I'm a friendly guy. Reach out. I I do respond. If if you're trying to offer me a position in in another company, I probably won't respond to you. But, if you're you just wanna talk, ask some questions. I'm I always respond.
Michael Burke [00:00:54]:
Cool. So you audience have just been directly lied to. But if you ask the right questions and have it be funny, he might respond.
Ben Wilson [00:01:05]:
We'll see. Yeah. It's all about time. Like, if it's if it's something that's clever, then I'll make time immediately to respond. Or if it's something about something that I work on that's actually fundamentally broken, I will respond immediately and try to fix it. But if it's, it's just general chitchat, then I don't. It doesn't spark joy. It does not give me a grin.
Ben Wilson [00:01:30]:
I might wait a couple of weeks to respond.
Michael Burke [00:01:33]:
Heard. Cool. So today's topic is something that I have been seeing a lot in the field, and I think if you work at a company, you probably have seen this sort of antics in line of questioning as well. So there's this thing called Gen AI. It apparently is super cool. It does your job. It takes your job, and it also talks and processes image and all of these amazing feats of sci fi. And so what we're gonna talk about today is you're the CEO of a company.
Michael Burke [00:02:07]:
You heard about this magical thing called Gen AI, and you want to see how you can integrate it within your business. And, specifically, how do you scale this decision making process so that you actually have usable production ready features that are not a maintain to or not a nightmare to build and maintain. And you also wanna do this in a finite spike based research process so that your employees are not just spending 100 of 1,000 of dollars and tons of time in prototyping and exploration. So, Ben, you're CEO of a hot dog stand. You have a bunch of hot dogs, and you have franchise throughout the New York City area. And you have a machine learning team to predict demand. You have a product team to handle branding. You have hot dog suppliers and all this amazing infrastructure, and you wanna use Gen AI.
Michael Burke [00:03:08]:
How would you for hot dogs. Cool. How would you go about starting to answer this question and specifically delegating key components of decision making to your leadership?
Ben Wilson [00:03:24]:
I mean, I would always start off I mean, if I was CEO of this company, I would probably talk to the leaders in tech and in the business side to strategize about, does this make sense for our business? And if I was CEO, I probably wouldn't know that much about the technical details, so I would defer to somebody who does know. Make sure they're in the room and give them full leeway to tell me if I'm being an idiot or not, and start the conversation. And be like, hey. Explain this thing to me. What does it do? What doesn't it do? What are its capabilities? And how does this impact our business? Go think about this, research it, and come up with a a plan for me about what are some options we could do. And I would wait for their responses and then discuss it with them. But I'm not in that role. I don't ever think I will be in that role.
Ben Wilson [00:04:26]:
I'm more on the nerd side, so it would be more of getting assigned a task like that. Say, like, hey. Go build something with Gen AI, And that process would be on the receiving end of that that previous statement of what could we build, and does it make sense?
Michael Burke [00:04:48]:
So let's call you CTO then. CEO says, hey. Let's use Gen AI. I wanna be advanced and cutting edge.
Ben Wilson [00:04:56]:
What do you do? I go talk to whoever the people are that know most about that. Even as a CTO, you're not you're not cranking out code all the time. You might. Yeah. Our CTO does. But you go and build a working group to go and discuss this. Say, like, what are our options from a technical perspective of what theoretically could be done and make sure that everybody's on the same page with understanding what the limitations are and what do some scenarios, like, some some working group scenarios. Like, okay.
Ben Wilson [00:05:32]:
We're gonna like, theoretically, we're gonna try to tackle this project within, like, this type of scope. What it like, what would we need to build, and what are our unknowns? Like, do we know how to do like, out of these 37 things that we need to do to do this hypothetical project, how many of these have we done before, and how long did it take to do these? So we need to figure out, like, deployment. Where would this thing run, Whether we're deploying our own custom model, that would probably be a large infrastructure discussion, and you would need a lot of people in the room to comment on the technical feasibility of doing that.
Michael Burke [00:06:14]:
Well, I think that we skipped an important step, like finding the business need and sort of understanding what Gen AI actually is. So let's assume that our company, it's a hot dog stand. We don't have that much technical capability. We, like, know what Gen AI is. How would you go go about researching the different applications, the different use cases, and how it could actually benefit our business?
Ben Wilson [00:06:40]:
So that would be within the technical discussion, but I would always for something that's new, you always discuss that in tech first. You don't involve the business first, because the danger of that is you get some really fantastic ideas that the business comes up with. I'm like, hey, we could solve these problems that are really hard for us to solve right now. But if you don't know what the capabilities are of what you're gonna be doing or how you would build it or how you how you would actually deploy this thing, and what does the maintenance life cycle look like? What are the performance considerations? Like, what are the SLAs for this thing? What is availability that we need to have? Is it capable of doing this? So having the technical understanding first in sort of a simulated environment to discuss this stuff, that prepares you for that next phase of, okay. Now let's go meet with business and see what problems they want us to solve, and we can tell them uniformly as a group what we think is possible and what isn't possible.
Michael Burke [00:07:41]:
That's super fascinating. I've actually legitimately never heard that before, and it makes a lot of sense. So starting with a deep understanding of what building, maintaining, and using that tech would look like, And then you come with that knowledge into a business meeting and say, hey. What problems do you guys have? Can we apply this tech to those problems?
Ben Wilson [00:08:00]:
Yeah. Because otherwise, you're gonna be holding the bag as a tech organization of, like, hey. The business wants to do this CEO, board of directors, whatever they say, we gotta build this Gen AI thing. And then the tech team is like, okay. You've assigned this to us. And then they spend weeks, like, trying to learn this thing. Like, what what could we use here? Like, it's a it's a cool problem that we should really solve, but you're preloading the tech team with the assumption that they need to build this thing and get it working at all costs, and they'll do that. But if you don't know whether it's possible or not going forward or what the the possibilities are of the tech, you're now potentially setting yourself up for massive amounts of scope creep or just blazing forward in solving a real business problem with so many unknowns that your initial, you know, best effort guess at the complexity of, like, oh, this will probably take us, like, 4 months to build.
Ben Wilson [00:09:05]:
We're gonna hire some consultants to help out for, like, staff augmentation to get this going. And then you find out at month 3 that, like, woah. We didn't know that all of this was gonna turn out like this. We didn't know that it couldn't do x, y, and z. We didn't know that we needed to build a, b, and c. We had no clue that, like, this is how much the system is gonna cost to run. And when you're at that point, you now have to make a really difficult conversation with, you know, c level folks. Like, you have to tell the CEO, like, yeah, we we just burned 3 months of our dev team's time into solving this problem, and now we realize that it's gonna take another 5 months.
Ben Wilson [00:09:52]:
And maybe we have to ingest or purchase additional data to make this work. We have to build all these ETL pipelines. We need to figure out how to deploy our solution in a more scalable manner. We've never done that before in for this type of thing. So everything just becomes a, like, a research project, and your deadlines you just blow through your deadlines. You never have a product that that makes it. And at some point, the executive staff is gonna say, why have we spent all this money and we don't see anything? You told us that this is gonna take 4 months. We're at month 7.
Ben Wilson [00:10:29]:
Like, what is why are you all so incompetent? And the tech team is like, we didn't know that it was this hard, or we didn't know what we didn't know at the time. So if the tech team is allowed to at least evaluate what it is that they're they could be building so that they know, yeah, we can build this. And here's a bunch of stuff that we know is just not possible to do. Going into that business meeting for that ideation session, what could we do? You can head that off the pass and focus on, let's just talk talk about the stuff that we know. Like, we've done some research. We know that this is possible. It's proven. As you were
Michael Burke [00:11:12]:
talking, I was thinking that there'd be a ton of scope creep when seeing the art of the possible from just a tech perspective. So if you're not looking to solve a specific problem, you can say, oh, we can try this or we can do this or we can serve it on our grandma's basement or we can have it write all of our code and see what happens there. But I as I was thinking through that, I think as long as you understand the atomic components of the technology and the basics of, as you said, how do you serve, how do you query, how do you interact with, how do you build with, a lot of those fundamentals can then anchor these decisions. Do you agree, or do you think that lacking a set of initial business problems would lead to too much ambiguity for a useful exercise?
Ben Wilson [00:11:58]:
You have to think about who's in the room during that discussion. So it's not a data science team or an ML team or AI, whatever you wanna call it. It's not just a bunch of ICs that are sitting around the table, you know, brainstorming and think about all the the art of possibility that's out there. You have a tech lead, you have a manager, a director, and the CTO that's in that room. That CTO is not interested in, well, what's the tech stack that we should use for actual serving this thing? Should we use, like, Kubernetes on, like, managed AWS, or should we use, like, Fargate for v that CTO is gonna tell that person very politely or very not politely, please stop talking. We're not here to design. We're here to talk about what is the bigger picture of do we know how to do x? Do we know how to take some code that's written in some language and deploy it in such a way that I can send JSON to it. And if I get burst traffic, it can horizontally scale.
Ben Wilson [00:13:05]:
Or if I need to deploy a more complex version, like an agent that needs to call, you know, Python tools. Where are those Python tools executing? Do we have we done something like that before? Do we know are we gonna use, like, AWS Lambda to do that execution? Are we gonna use what Ben and Michael just released yesterday, unity catalogs or do Python function tooling execution? You need to, like, understand that tech. And if you're and you're just going through a list of what is the the application development life cycle and then the deployment life cycle and say, like, do we know how to do this? Have we done it before? Has somebody else done it before? Is there tech out there that we can use that makes this easy? And what do we do when this goes wrong? How do we redeploy? How do we change the characteristics of this? What's the dev loop for building one of these things? How do we do monitoring? Like, there's a lot of questions that come up with that sort of analysis. But what you're really trying to figure out is, do we know how to do this, and are we comfortable of providing general scoping of this problem? And it's not a, oh, if you have a no there, then we can't do this project. It's more like, okay. Out of these 37 steps, we have 90% that we're golden on. We know. We've done all of this stuff before.
Ben Wilson [00:14:30]:
It's trivial. But, hey, maybe there's a couple things in here that we don't know how that works. So right after that meeting, you assign people sprint tasks. You're like, hey. Go build a demo. Like, just try this out. Do a for us, we call them customer user journeys. Like, hey.
Ben Wilson [00:14:49]:
Go and try to build this thing. Other places, you might call it a hackathon. Like, just go and try to build this. You're not writing tests. You're not making sure that you can actually deploy it. You're doing it in, like, staging environments or dev, and you're just trying to see if you can make it work. And in the process of that, you might be like, yeah. I didn't make it work, but I know what I need to do to make it work.
Ben Wilson [00:15:14]:
So it's a discovery process, which gives you, like, information about what that scoping will be. Like, well, I couldn't figure out how to get this, like, proxy to work right, but I know how to do that. It just takes a week to do. So okay. That's 1 person week to do that part. The rest of it, oh, okay. It's 4 person weeks to do all the the rest of these components for this one task. That's your scope.
Michael Burke [00:15:38]:
Cool. As CEO of said hot dog company, I am very worried about my team sinking too much time into this. How should I think about the Pareto frontier of knowledge gathering and the amount of inf like, completeness of the information versus amount of time and money spent gathering that information?
Ben Wilson [00:16:02]:
If you're the CEO and you're worrying about that, you should not be the CEO. You should have your CTO be in charge of that. Your CTO should not be worrying about that either. They should have hired people that are now in positions of management that are making sure that their teams are time boxing all of their stuff. Like, there's deadlines associated with, hey. I'm gonna give you a day to figure this out, and that professional engineer knows that, like, I just need it. I know what I need to do here in order to get the scope, and they should know what to do to, like, learn that really quickly.
Michael Burke [00:16:42]:
Alright. Let me escalate this a bit. Let's say that we have a bunch of I, as CEO, started this company a while back, and I hired a bunch of idiots. They have no idea how to write software. They are generally incompetent and don't know anything about Gen AI serving models. None of it. So we don't know what we don't know. How do you operate? And I you just joined the company.
Michael Burke [00:17:08]:
You're a pro. You know everything. How do you operate in a space where you don't know what you don't know? Like, how would you guide that team?
Ben Wilson [00:17:19]:
I would if I just got hired in and everybody was so risk averse to learning anything new, my first task would be, how do I actually replace the staff here?
Michael Burke [00:17:33]:
But, like, I guess what I'm the the the motivation for this question is a lot of times, I work with relatively nontechnical teams. They've been handed this addendum of you're supposed to use Gen AI to solve a bunch of problems. Great. They don't know everything, and they also, more importantly, don't know what they don't know. So in that scenario, let's say you're a software engineer. You've been incorrectly managed and and given this task. How do you navigate in a space where you don't know what you don't know? How do you do that efficiently?
Ben Wilson [00:18:07]:
You do research spikes yourself in order to provide the evidence to go back up the chain to say, we should shift gears or down scope this and release something that is that is doable. But you have to provide the evidence for that. So if you're good and you've done this before and you know how all this stuff is supposed to work and everybody around you just doesn't know or they're really eager to, yeah, let's dive in. Let's go as deep as we can and try to build this. And they end up building, like, super complex, unmaintainable abominations. You can be the person who's like, yeah, that's cool. Like, that's our our end state goal. But let's approach this from a position of, let's just build the minimum required features at first.
Ben Wilson [00:18:56]:
Like, let's build something simple and see how this works, and we can all learn from this together. And then when we're ready to increment to the next layer of complexity, so if we're talking about Gen AI, maybe the first thing you do is just a very simple rag. Like, hey, we need an additional context for question answering based on our business data. Go build that agent. Like, okay, you know, you know, encode all of these vectors from texts and stored in a vector, you know, search database, and then we'll connect it to an LOM. And we'll see how that works. We'll evaluate it. We'll do testing with the business, say, like, is this good? Does this answer questions about our business? Then let's deploy it and see how it performs.
Ben Wilson [00:19:39]:
Are are we gonna see stuff in in staging on our own testing? And we're like, oh, yeah. We need to fix that or that's that's kinda broken. And then when you release it to production, you start slowly ramping up users, see how people are using it, analyze their patterns, and say, are they asking questions that align to what that end goal that everybody wanted to build it first is, or are they asking things that are completely different? Do we need to adjust what our end goal is? So if you do incremental complexity increases over time, your your team is learning how to do that. You're taking on way less risk because you're just pushing something out that's somewhat like, some a little bit simpler than that big project that you wanted to do, but you're also able to analyze user behavior. So, like, do people have the right impression of what the capabilities of this are, And are they using it as we intended? If not, then let's go talk to them and say, what do you actually wanna use this for? And then collect your user feedback and adjust your road map accordingly to try to solve that problem if it's possible. And if not, and this is this project going nowhere, then pop smoke and get out. Like, go do something else or build something limited in scope that's just for this one thing.
Michael Burke [00:21:01]:
Got it. So it sounds like you just have to learn it. There's no shortcuts. Like, if you need okay. Cool.
Ben Wilson [00:21:09]:
But the only way to learn these these, like, processes of how to build something at minimal risk. The only people I've ever met, myself included, who have learned this lesson, they usually learn it the hard way. And if you're, like, a team that's a startup straight out of college, say that, like, you're all, like, buddies in a PhD program. You go and build something. You're gonna make mistakes, and you're gonna make these exact mistakes. You're gonna go and build something because you think everybody wants it. The number one goal that you should have in a team like that, where you're young, ambitious, and just wanna build is get an advisor who's made those mistakes before. Somebody who has experience and can be like, listen.
Ben Wilson [00:21:56]:
I made this massive mistake 7 times before. Here's how I learned from it. And I worked with other people who also made mistakes, and I learned from them. And this is why we have this formula process of going forward, of not just trying to invent the end goal state of something that's super complex, because you don't know if people want that or if it's even worthwhile to build that. You could change directions midway through. You could find out that, yeah, this is a total waste of time. Let's cut our losses. We learned a lot, so there's a benefit there.
Ben Wilson [00:22:32]:
But should we pursue this to the end state because somebody told us to do it? No. Because what's gonna happen is you release it, and it's not really what people want. People are gonna they might not tell you to your face. Even the c suite might not say that. But those conversations will happen about, like, nobody's using this. Why did your team build this? Why is this such garbage? Turn it off. But then you've sunk a year of your life into this project that nobody cares about. Got it.
Michael Burke [00:23:06]:
Okay. Another question. If you were given this scenario, you read a blog, you heard about Gen AI, it's super cool, and you wanna use Gen AI, what do you think are the best applications currently as of December 20, 2024?
Ben Wilson [00:23:27]:
The best bets right now that are proven that I've seen running in production, I've used, like, agentic rag with a couple of tools that are doing deterministic execution. So you define a function that does this one, like, this purposeful thing, and you're instructing the LLM, like, if you need to do this task, then this is the function you're going to use. This stuff works. It's been proven out, and it's awesome. The some of the more sci fi stuff that isn't quite mature yet, but it will be 6 months from now, It's stuff like, I ask a very obscure question to something to an agent. This agent calls another agent to go and do some task, and then there's another agent that's reviewing that result and comparing it with the initial question. Another agent that's going and fetching some data, and then another agent that's doing something with that data. And then, you know, it's kinda like the agentic frameworks that you can have the the multi agent architecture in.
Ben Wilson [00:24:35]:
But having that so that it's not so complex to build and can actually like, the I just think the tooling isn't quite there yet. Like, we're all working on this stuff right now to make it easier. Can you go and build all of, like, a very sophisticated, extremely complex agent yourself right now? Yeah. You can. But how much code is that? Like, it's a lot. Yeah. This stuff gets super complicated. So until tooling comes up to the point where it's like, yeah, you can kinda do this and it's not such a burden on the user.
Ben Wilson [00:25:14]:
Because the difference between JAI stuff and traditional ML or deep learning is we know that process. We know we're not gonna be retraining and redoing feature engineering once a week on a model that we deployed. We could be retraining and on new data, but we're not gonna be selecting a new, like, library to do optimizations. We're not gonna be like, oh, well, this week, like, we're using scikit learn last week. This week, let's just switch that to XGBoost, and we'll deploy that. Nobody does that. I mean, if people do that, my condolences. Good luck in your endeavors.
Ben Wilson [00:25:57]:
I I wish you all the best, and I I hope that one day you learn from your errors. So you it's a slower process, and you're not gonna be like, hey. Yeah. The the model is doing okay today, but, hey, let's deploy a new version tomorrow that has, like, 40 extra features in it. We can do that. Right? Can you? Yeah. How well is that model gonna perform? Don't know. I wouldn't try it.
Ben Wilson [00:26:26]:
I would do a dev loop of, like, I need to validate this, and I need to test it, and I need to do AB testing, and I need to see how this performs against the existing one. It's weeks of work to do all that. Right. For the super complex stuff, that could be an entire quarter of work to, like, release a new architecture of a solution that's already in production. With Gen AI, you could make that code change and redeploy in an hour. So it's very fast, and lots of things can go wrong, like, very quickly. Right. And making changes in a complex code base that doesn't have good tooling built yet because it's still kind of in development, in industry in general, it becomes a big maintenance burden.
Ben Wilson [00:27:13]:
Yeah. What happens if we have to rewrite our system prompt for each agent? How many places in the code do we have to do that, and do we need to customize that? Is it a sentence that we have to add, or is it an entire page of text that we have to add and then validate? So that becomes a lot of work that you have to do to do this.
Michael Burke [00:27:36]:
Alright. Cool. So we're gonna keep this episode short and sweet. But, before I summarize, one other thing is that I've been feeling very strongly, and I'm curious, Ben, your take in, like, super fast, the buy versus build. I think and also the wait versus build it now. I think we're still in the infancy of a lot of these agentic logic based frameworks. And if you don't need it now, don't spend money on it until there's a good managed solution. There will be there's, like, 70 startups all working on it.
Michael Burke [00:28:06]:
There's gonna be convergence in the next year of something that's pretty good for logic based LLM based things. What are your thoughts on just waiting?
Ben Wilson [00:28:17]:
I'd say build simple stuff that can solve an exact business need so that you gain experience and understanding of how this stuff works. I would not recommend going to GitHub and seeing a library that was released last week that seems like in its read me, it's gonna do all the things that you want it to do. Mhmm. And then go all in on that and be like, we're gonna learn this library, and we're gonna build the craziest thing that's gonna be amazing. Yeah. Do that in the hackathon. Learn it. It's cool.
Ben Wilson [00:28:51]:
Like but don't I wouldn't push something like that to prod without thorough evaluation and analysis of it. And who knows if that library is gonna be maintained a month from now? You don't know what whoever's maintaining it right now, Are they working for a startup? Is it gonna get bought by somebody and that repo is going away or it's gonna be abandoned because it's now gonna be rolled into some cloud provider service? You just don't know. So I'd be careful about building super complex stuff for purposes of production deployment. But Agreed. For purposes of the team doing hackathons, the team learning this tech, you know, go nuts. Like, learn it, play with it, break it, fix it. It's gonna benefit everybody who's doing it. Because whether the Luddites believe it or not, this stuff is here to stay.
Ben Wilson [00:29:45]:
It's not going away. It's not hype. It's not like, you know, people are just, like, getting excited about this, but it's actually garbage. No. No. People are deploying stuff. I've seen line graph deployments at for customers in both open source and Databricks customers that you look at what they're doing. You're like, yeah.
Ben Wilson [00:30:05]:
That's that's legit. And, yeah, it's, it's complicated, like some of the stuff I've seen. But it actually works because they have a team of, like, 30 people working on it, and they're serious about it. And they did their homework, and they proved that it would work. And they're building, like, a production version of it. Yep. I don't think Langgraph is going anywhere.
Michael Burke [00:30:26]:
Agreed. Cool. So the shortest episode in the history of adventures of machine learning, but I'll quickly summarize. So if you're given this prompt of use Gen AI in the business and you're a technical person, start with the tech, figure out what it will take to build, maintain, and use this technology. And then from there, you can go into business focused conversations with an anchor in reality. A great way to figure out what it would be like to build, use, and maintain, create CUJs or customer user journeys to identify what are the typical user behaviors and try to hack solutions, see if they're scalable, see if they're cheap, see if they're easy to build and maintain. If you haven't done this before, you gotta learn it. There's just no way around it.
Michael Burke [00:31:13]:
So just do research, read books, look at articles, that type of thing. And then finally, with the current state of Gen AI, we're reaching production stability for really sci fi use cases. But right now, some of it is not as robust as it maybe could be and should be. So just wait a little bit for those more logic based, agentic based systems. But RAG, I agree with Ben, is very proven and is super, super high ROI. If you're gonna do anything with Gen AI, focus on RAG. Anything else? None. Cool.
Michael Burke [00:31:49]:
Well, until next time, it's been Michael Burke and my co host. Ben Wilson. And have a
Ben Wilson [00:31:53]:
good day, everyone. We'll catch you next time.