
Barzan Mozaffari on Cloud Data Warehousing and Machine Learning Advances - DevOps 237
Welcome to another episode of "Top End Devs," where we're diving deep into the exciting world of data-intensive systems with our special guest, Barzan Mozaffari. Hosted by Warren Parad and co-hosted by Jillian, this episode explores the intersections of academia and industry, touching on how innovative breakthroughs in data systems are reshaping the digital landscape.
Special Guests:
Barzan Mozafari

Show Notes
Welcome to another episode of "Top End Devs," where we're diving deep into the exciting world of data-intensive systems with our special guest, Barzan Mozaffari. Hosted by Warren Parad and co-hosted by Jillian, this episode explores the intersections of academia and industry, touching on how innovative breakthroughs in data systems are reshaping the digital landscape. Our conversation with Barzan, an MIT alum and a University of Michigan associate professor, uncovers the challenges and triumphs of bridging the gap between theoretical research and practical application. We'll discuss the transformative power of AI in optimizing cloud infrastructure, especially for platforms like Snowflake, and how the evolution of cloud data warehousing is influencing various verticals. Whether you're a data enthusiast or an industry professional, this episode is packed with insights on leveraging AI and machine learning to make smarter, more efficient database systems. Join us as we unravel the complexities of data and learn from Barzan’s vast experience in the field. Tune in and prepare to expand your understanding of how data drives modern advancements!
Transcript
Warren Parad [00:00:01]:
And, we're live. Welcome back, everyone, to another episode of Adventures in DevOps. Hosting today with me is Jillian. And, Jillian, are you looking forward to today's episode? Did she freeze?
Jillian [00:00:13]:
I did I did freeze for a few seconds, but
Warren Parad [00:00:15]:
Oh, you're good.
Jillian [00:00:15]:-nj/
Now I'm gonna assume that I'm gonna say hi here. Hi, everybody. I hope I don't freeze again, but I don't really know what just happened.
Warren Parad [00:00:25]:
Yeah. I mean, when we planned this episode, I had a strong sense that this was gonna be a popular choice for you. And that's today that's because today, I want to invite, Berzhan Mozaffari as our guest, MIT alum and University of Michigan associate professor working with data intensive systems for over fifteen years. Welcome.
Barzan [00:00:42]:
Thank you so much. Great to be here with you.
Warren Parad [00:00:45]:
Yeah. You know, I I have worked in a lot of engineering organizations, and data has always been this aspect of an area that no one wants to touch. There's stuff going on there. And I just always seem like there's bigger business problems that are at play or there's other challenges, present. But I always got scared when the topic of interfacing with one of the data teams comes up. And I I noticed from your your history, your profile, you there's a lot of different aspects of data systems that you seem like you have experienced at. You wanna talk a little bit about that?
Barzan [00:01:25]:
I I agree with you. I think data can be pretty intimidating depending on what people make of it and what's the expectation. But typically it's considered like the gold of the modern digital era. So, there's usually a lot of potential that and like anything with a lot of potential, you know, sometimes it comes with anxiety for the data teams or those who have to interface with them. But yeah, I've, throughout my career as you pointed out, like at this point almost the last two decades I've been working at the intersection of machine learning and and and database systems. Essentially pursuing this idea of how we can leverage statistics or AI in general to build smarter data systems. Where smarter could mean faster, more scalable, easier to use, easier to deploy, etcetera. Some of the work we've done, is now part of open source transactional databases like, you know, MySQL and whatnot.
Barzan [00:02:18]:
Running on millions of servers but that was all like open source work. Some of the other aspects of my work has been on analytical databases, cloud data warehousing and whatnot. Some of the work, we did an approximate query processing for example, is a good example of combining systems and and and, statistical learning theory, that, you know, got commercialized and eventually acquired as part of the, Snappy Data product. But, you know, the latest spin off is some of the work we're doing now with the cloud data warehousing and, building a data learning tier. So we've worked at a very high level like, you know, from the inside generation and, you know, root cause analysis all the way to the almost metal of like how to, you know, run queries more efficiently at the CPU level or the GPU level. And, and, you know, the the the common theme is that it's all fun. Anything that has to do with data, with machine learning, I usually get excited over it. So I could be talking all day about those different aspects, but that's the common theme.
Warren Parad [00:03:21]:
How did you get into it? I mean, I don't think you get very far in the academia. Like, did you did you think that maybe when you were a younger student that this would always be an area that was most interesting, or did it fall into your lap one day based off of, you know, experiments or labs that you were working on and this seemed like the most interesting thing?
Barzan [00:03:45]:
That's a very good question. I it's it's like most things that you end up liking, it's it's hard to tell, like, when it actually started. It's it's usually so subtle and and, hard to tell that you can't really tell where it started. But no, actually started my my passion was in algorithms. I remember I was always, like, you know, from early days, I was, like, into math and statistics and and and figuring out, like, you know, how many tries it's gonna take to sort of get to a particular outcome with high confidence. So I was always, to be honest with you, I was always intrigued by the power of statistics and, like, you know, seeing how you can get a lot further ahead in life if you know more about statistics than than most people. Like, something as simple as, like, you know, people playing heads or tails, right, with with coins. Like, you know, if you know what you're doing, you can sort of come up with creative ways.
Barzan [00:04:35]:
But, no, I think a lot of it started when I, went to grad school. I started my PhD program with, my adviser was a legend in relational database systems, but then he he was also seeing the potential. At the time, he was like data mining was the hot thing. Right? And then that was the foray into statistics and then, like, you know, later applied machine learning, learning theory. It was a progression. And I think the trends that we were seeing in the industry was also, helping with that. But, to your point, like, there's there's a lot of people in academia who are just kind of content, with just coming up with cool ideas that just remain as that. They're ideas and they'll always remain always remain as ideas.
Barzan [00:05:21]:
But that was always, like, you know, left a little bit, disappointed that an idea that I thought, hey, it was a has a lot of potential would never make it to production systems. So a lot of what I did early on in my career was working with industry partners, partnering with different companies, and trying to get adoption for free. Some of it will be open sourced, got massive adoption. But some of it was, like, pulling teeth to go and, like, you know, convince a bureaucratic enterprise, like why this is in your best interest to adopt it. And by the way, we don't expect any money from you in return. I'm just driven by the impact. Right? But at some point you realize, look, you're gonna put your mouth where your, your money where your mouth is. And at some point, like a lot of entrepreneurs realize like, hey, you, you know, an idea is nice, but like it's the execution that that matters.
Barzan [00:06:10]:
And then you just spin it off. And that's how I started commercializing some of these ideas with the main motivation of sort of, closing that gap between the, you know, theory and practice. Like, you know, something that's solid and works, but getting into a place that's consumable by by data teams, by products has a real world impact. So I think that's where a lot of that early interest evolved into.
Jillian [00:06:35]:
I think that's really interesting that you were able to kind of bridge the gap between, research and academia and getting to really build stuff because I think that's that's a tough one. That's a tough one for people who are in academia and get kind of frustrated by the process and, you know, for the reasons that you described to to figure out what to do. And I think most people just end up jumping ship to industry. So
Barzan [00:06:58]:
No. That's true.
Jillian [00:06:59]:
It's really that's really cool that you could find a bridge there.
Barzan [00:07:01]:
Yeah. I I won't, you know, I won't lie to you. It's not as easy. A lot of people fall off that bridge as they try to cross that bridge a lot of them fall into the water.
Jillian [00:07:09]:
I'm one of those people. Like, it's okay.
Barzan [00:07:12]:
I think what happens is that like in academia, we have this system where which is kind of designed, how should I say this? It's designed to kind of reward complexity, Right? And and, you know, if an idea works by this simple, it's not as rewarded. I remember we came up with this algorithm that was improving the average performance of queries, by a significant margin. I forgot the number but it was something like close to an order of magnitude. And then we submitted this paper to this extremely prestigious conference, academic conference. I will mention the conference name so people don't get offended. It's the top conference in our area. And then the feedback, one of the reviewers, their feedback was like, hey, if this was an important thing, someone else would have done it by now. And that just like, you know, kind of rubbed me the wrong way and I was like, if we go with that mindset, nothing gets done.
Barzan [00:08:10]:
Because if it's, if you, you either say, hey, this is just, you know, if it was, if you take that idea and actually apply it to life, nothing gets done. Because anything that you do, people can say, hey, if this was an important enough problem, someone else would have solved it before. So that actually kind of encouraged us and motivated us in the right way where we sort of took an extra mile. I remember one of my, or actually two of my PhD students at the time, they work with the open source community. They went there and they said, hey, you guys have this transaction scheduling algorithm, we have a smarter version of it. We've worked out in an academic setting, but here's why it's significantly more performant. We just want you to consider making this an option. So kudos to those open source developers in the MySQL community.
Barzan [00:08:53]:
They went and they did their own research, they tried our idea, they came back, they said this is so much better than what our default is. We're not gonna add it as an option, we're gonna make this the new default and make the existing algorithm an option. So then the next time around we submitted that same paper, exact same algorithm, exact same result and we said, hey, by the way, it is pretty important because now more than 2,000,000 servers in the world are using this as their default algorithm. And that's just one example. There's a lot of good ideas that get killed. But then again, there's a lot of important but voting problems in the industry as well. Like, you know, for the lack of a better term, you know, sometimes to make something work, you have to put up with doing 90% of work that's not boring so that you can actually get a kick out of that 10% that's exciting to you and just like completes that puzzle. So I think it's just a fine balance between finding problems that are A, I usually tell my, I used to tell my former PhD students that, you know, when you pick a problem, you need to ask three questions.
Barzan [00:09:53]:
Is it important enough? Do you have the skills to solve it? And do you have what it takes to get in the right hand? So I think if you sort of look at those three problems, you know, holistically, you can find your way from interesting, innovative, highly technical ideas, and then still have a real impact. I mean, that's
Jillian [00:10:14]:
I need some spite too. All my favorite stories feature, like, a bit of, you know, like, just that little bit of spite and petty. I think it's such a it's such a human motivator.
Warren Parad [00:10:23]:
I'm surprised though, because, like, a lot of conferences that I applied to, I I don't get any advice back. But I feel like the, like, the feedback of someone would have done that already if it if it was meaningful. Like, is like, what is that? Like, what is the purpose of saying those words? It's sorta like, you're driving, like, oh, I'm gonna you can only go from spite there. And I feel like that's not a good necessarily a good place to be driven from realistically. Like, why why wouldn't they say, like, be specific? Like, hey. You know, it'd be great if it was being used in the industry already. Like, if this is so ingenious, if this is so great examples there.
Barzan [00:11:01]:
And I think, like, if you look at sort of how I could like, how academics excel. Right? The idea is you wanna find out what others have done and you just need to do something better than that. And it doesn't matter if that problem is actually realistic, if the assumptions are realistic, it has to be innovative. Right? Like if it's a simple idea. I mean, the example I can give you is Spark. Right? Apache Spark, a lot of your audience are probably familiar with it. So the initial idea was pretty small, pretty simple. You have a pretty, you know, you have a working set, you have a data set that you want to, you know, keep doing the same computation on it.
Barzan [00:11:34]:
So in the, you know, back in the day, Hadoop days, right? Like for those of your audience who still remember, like, you had to basically take that intermediate result and write it back to disk. And then if it wasn't any of computation, only to read read it back into main memory immediately after you've written. So there's a lot of redundant IO that's just wasted. And, you know, the the the authors of that spark paper were actually my lab mates when I was at UC Berkeley. And the the the observation was pretty simple, but very meaningful that, hey, if you have a piece of data that you have to do some iterative computation on it, let's keep it in memory, pin it in memory so you can finish those iterations and then we can, you know, write it back to disk. The idea is sound, it makes perfect sense, well motivated, very practical, but they had a very hard time publishing that paper in academia because I remember the early feedbacks on their paper was like this idea is not novel enough. The keyword they use is novel enough, which means like it's too simple. Like, can you add a twist to it? Can you, you know, as if, like, it's a it's a it's a it's a movie.
Barzan [00:12:36]:
Right? Like, you wanna you wanna you you don't wanna be able to see the end of the the ending, like, you know, from from the beginning. So that's that's the kind of, mindset that's that's there. And I think there's good and bad to it as well. Like that's how people become more creative. People learn how to take on open ended problems. I think academia does a lot of things really well, but there's certain areas where I think closer partnership with with actual customers help save a lot of, smart brains from burning their calories on problems that no one cares about, or solutions that no one will actually ever adopt.
Warren Parad [00:13:15]:
So maybe to Jillian's point, why stay like, what's the benefit of staying in academia?
Barzan [00:13:22]:
Oh, that's a that's a very good question. I think academia has certain things that you only get in academia. Like, you have access to extremely smart talent. And, like, you know, as as as they say, when you hang out with smart people, you also keep getting smarter too. Right? And and there's some truth to it. Right? Like, there's, it's not, you know, you can't go and like, let's say you have a how should I say this? Like, you know, when when you're operating on venture capital, there's a very specific timeline. There's a certain amount of risk that's encouraged to take, right? But like let's say that you're working on curing cancer, like people have been working on for a long time. Like, you know, incremental ideas will only lead to incremental results.
Barzan [00:14:08]:
So at some point you need to, take some risk. You need to explore solutions that are so crazy, there's a good chance they're not gonna pan out. Right? And for doing that, you need a little bit of patience. That's hard to find outside academia. You need highly motivated, highly smart individuals at the, you know, beginning of their career, with that intellectual freedom to go and venture out, find those problems, explore those crazy ideas. And for every 10 crazy idea we try out, one of them is gonna pan out and that's a really good outcome for academia. In in in the industry, if you go to your, backer, to your board, to your boss, whoever that is, and tell them, hey, I need you to give me 10 times more time because I wanna try and check 10 different crazy ideas. It's a high risk, high reward thing.
Barzan [00:14:58]:
By the time you're through your, you know, third iteration, you're gonna and you're probably gonna be terminated. They're gonna have some difficult performance conversation. So there's a time and place for both, right? If you if you're looking for really creative, really impactful ideas, to give you a very concrete example, like, you know, at Kibo, which is the startup that I'm leading now, we're we're very successful. One of the main things that people love about our product is that it takes thirty minutes and then within with thirty minutes of investment from your side, the AI kicks in and starts optimizing your cloud data warehouse. For example, your Snowflake Within twenty four hours you're seeing an average at 25, 30 percent cut to your overall Snowflake bill which is very meaningful like you know we have organizations or customers who are spending millions of dollars on their Snowflake. Now a lot of people are impressed like how did you guys build something that's so autonomous, works so well and whatnot. Because it's, you know, seemingly from outside perspective, it's very difficult to build that exclusively in the industry. But like what people forget is like, hey, there's decades of other ideas that failed that people learn and and they have all this know how from academia.
Barzan [00:16:09]:
And then when you finally see the final result, it feels very subtle. Same thing that you see with Gen AI. For a lot of people who are not in the academic side of it, it felt like, oh, there was a sudden breakthrough. But it wasn't a sudden breakthrough. It was, you know, two decades of constant work. People were publishing papers after papers and and and pushing this, pushing the boundaries of what's possible until it got to a point where everyone could see the benefits. So that's the I think that's the other side of it that I just want your audience to be aware of.
Warren Parad [00:16:42]:
I I definitely wanna dive into that. But the duality is really interesting that you brought up that in academia, having ideas that really have a business impact, and maybe more than that, have a world impact, are not paid attention to as much. Like, it's just do a little bit better than what you're doing and experiment a lot. Whereas in in business, everything has to be immediately relevant. But on the flip side, that means that we aren't getting the time outside of academia to experiment effectively, that, teams should actually be experimenting because they may find a way to drastically increase the query speed or performance, resource
Barzan [00:17:23]:
usage
Warren Parad [00:17:23]:
of their, database clusters. But on the flip and so I I think it's what you're saying is really both areas that are separate need to learn from each other. More experimentation in, the private space and in academia, more attention to, like, what's relevant in the next, you know, one to ten years that has a business impact that that you know, where the industry is going, what's relevant for them. Otherwise, an idea is just really an idea, and it's not gonna get accepted into any conference doc.
Barzan [00:17:53]:
No. I think that's that's a good way of sum to get up. I think the kind of balance I found is very useful. It was like you find real world problems by definition in the industry. You can't go sit in your closet in academia and just say, hey. I think this is a really interesting problem to solve. You can find interesting problems, by the way. You don't need to talk to people to find interesting problems.
Barzan [00:18:13]:
But to find meaningful, impactful, important problems, you do need to talk to customers, you need to talk to actual users. And that has to come exclusively in my bias mind from the industry. And then there is different types of solutions. Like, if you want something really out of the box, open in the, outer box kind of solution for opening that problem. There's a lot of sharp minds in academia that if they provided with number one, real world problems or well motivated real world motivated problems and a set of constraints that need to be accounted for when designing, you know, that that solution coming out with that solution that otherwise, you know, the the the the people who had that problem would not adopt that solution, then you can sort of expect amazing results from working with economics. So I agree, I agree with you. I think like the problem has to come from industry and and and so should the constraints. But then the solution space like either academia or people who've sort of had that training.
Barzan [00:19:16]:
A lot of people, you know, a lot of engineers have never had that opportunity or they've never been given that opportunity to take on open ended problems and and be given the guidance and mentorship that a lot of professors offer their students, who become professors who, you know, offer this to their own students and so on. So there's a lot of wisdom about how to approach research and how to come up with creative, meaningful solutions that I think the industry could also benefit.
Warren Parad [00:19:43]:
Benefit. I mean, I think in the industry, we actually have this counter perspective, which now seems like it actually has a lot of paradoxical negatives. I hear very frequently hit the ground running, like setting up onboarding docs and tooling and resources so that you can just get started on your first day working at a new, organization and a new company, and you should already know how how it's supposed to work and already start providing value. And now I'm getting the thought of, like, well, actually spending time learning the backwards way that an organization is working before you actually start delivering value may be an opportunity that we've squandered in in a desire to move quickly and get everyone on the same page as fast as possible. There's a much lower opportunity for learning and, I'd say, failure, which I think a lot of people agree is a strategy that really drives, future innovation.
Barzan [00:20:38]:
No. That's right. I think, you know, another way to look at it is, I think there's nothing wrong moving fast. Like, that's the thing. That's that's my my own model. Like like, what I'm working in an academic setting on in, you know, at keyboard like we want to move fast but sometimes people have the perception of moving fast. Right. Sometimes if you're, if you're building a house but you're not taking the time to really understand the measurements and and and what you're doing and and one side of the wall is shorter than the other side, you're not really fast because all that work is gonna be throwaway work.
Barzan [00:21:11]:
So I think this the right speed is actually failing fast because you can't know everything if you, one of the things I've actually seen that's quite prevalent across the board, especially with more junior engineers is that desire to be a perfectionist. People try to like or or, we used to have a senior engineer who used to refer to that as, premature optimization. People try to start, people have a tendency especially in earlier years of their career to try to perfect things way too early, way before it's even proven to have any value. Right? Like I usually for the quickest, dirtiest, hackiest thing to prototype something and see if it holds water. And if it doesn't, that's perfect. That's called failing fast because guess what? You didn't spend the whole year building it. You spend a sprint, two sprints trying it out. And now that you failed, you just learned, you just ruled out one of the ways to fail.
Barzan [00:22:06]:
And and I strongly believe that the number of ways to fail is finite. So as long as you basically, you continue to fail, but very quickly, you're guaranteed to find the path to success.
Warren Parad [00:22:18]:
I sort of I sort of wanna go back. It's been mulling over my head, about your AI agent that runs to reduce your snowflake cost. Like, how how does this actually work? Like, how does it just go in and and, and, like, is it reducing is it finding, is it deduplicating data? Is it improving, query search performance? Is there some other magic going on?
Barzan [00:22:41]:
So so here's here's the like, to sort of see how it works. I think it might be useful to your audience to think about, like, the bigger problem. Right? So, like, one of the biggest things that's happened in our industry over the last, I would say, decade, decade and a half is, like, the rise of cloud databases. Or in particular, like, you know, the, success that the likes of Snowflake, BigQuery, Redshift, and and more recently, Databricks and and, you know, have seen. And if you think about what's happened there is that these cloud data warehouses, the likes of Snowflake have really lowered that adoption barrier. Right? So now it's significantly easier for anyone, any organization of any size, any team with any level of skills to go spin up a cloud data warehouse and start analyzing that data, querying that data, getting that data very quickly. Right? So that adoption barrier has gone down, but the byproduct of that, the side effect of that is that because it's so much easier to leverage data. Now you have more users with varying levels of database proficiency and skills writing queries.
Barzan [00:23:47]:
They're querying more data and they're combining more data sources. So as a result, I would argue that the data pipelines that that organizations are dealing with right now are an order at least if not two orders of magnitude more complicated than, you know, what we used to have fifteen years ago. Right? Because it's just so that much easier. You don't have to go through, all these hoops to get to your data. Anyone can spin up a cloud into warehouse. And as a result, because it's so much easier to use, they're so much more complicated. And because of that, they're so much harder to optimize, manually. Right? Now you're dealing with millions of queries.
Barzan [00:24:23]:
Some of them are coming from this analyst. Some of them are coming from this BI tool. Some of them is this ETL job that this guy wrote, you know, a year and a half ago and he's not with the company anymore. And then there's these reporting queries that hit the cloud data warehouse. Someone is doing data science, someone's training their models. They're looking at millions of queries. As someone who spent the last two decades of their life, like essentially with database systems, I would be intimidated to stare some of these queries and figure out how to optimize them. It's just, they're just not humanly possible, right? But that's exactly where AI or machine learning really comes to the rescue because, you know I call this, you know you can think of Kibo or machine learning algorithms in general as an infinitely competent, infinitely patient DBA, right? You can sort of, they can, you can analyze millions of these statistics.
Barzan [00:25:16]:
Look at hey, how did these variations in the load correlate with the variations in the cost and performance. Right? So for instance, if we just take Snowflake as an example, you know, you have to pick a size for your warehouse. Right? You have to decide, for example, what's your partitioning key? I do have to decide how long do you wanna keep this warehouse running after the query has finished. If I shut it off right away, well, I saved money. I don't have to pay Snowflake for just keeping an idle warehouse running because it pays you go. But then if the next query arrives and my warehouse is shut down, then I have to spin up a cold instance and now equate it would have taken a couple of seconds. Otherwise, now has to take maybe a couple of minutes because it has already been tough of call storage. Like, okay.
Barzan [00:26:00]:
So what's the optimal time to shut down a warehouse? And then this warehouse, you know, I bought most data teams that say, hey. I need a medium for my, you know, BI workload. I need enlarge for this. But do you really need a large warehouse twenty four seven? Is your workload constantly, steadily at a level where it warrants a large? Maybe sometimes you need an x large. Maybe it's actually cheaper to use an x large because you pay more per unit. But then the query finishes let's say, in less than half the time that it would have otherwise. Maybe it's underutilized, you know, can you wake up, your data team and send up, you know, and can you page your DevOps team to go and reduce the size of, your medium warehouse at 2AM from a, you know, to to a small warehouse and after seven minutes wake them up again and say, actually the workload increase again, go back to the to the default size. You can't do that, but you can actually train reinforcement learning models, for example, to do that, right? So you just I
Warren Parad [00:26:57]:
mean, you you can do that. I feel like there's gonna be a bunch of very unhappy people at the end of the day, though.
Barzan [00:27:04]:
Well, the there are things that humans can do. Yeah. And there are things that humans want to do. Right? Like, if you find the intersection of what is it that humans cannot do or don't wanna do and automate that, that's you that's how you've empowered your data team. Right? Like no one I've never met a data engineer who's told me, my dream is to wake up at 2AM, reduce the size of a warehouse for seven minutes, and then go back to sleep. Right? Like, I've never seen anyone who tells me, I wish I could just squint my eyes and look at 2,000,000 queries and figure out which one should be routed to which warehouse. But the reinforcement learning agent is more than happy to do that. You just have to have the right reward function where you penalize the agent every time that it causes a slowdown and you basically reward that agent every time that's, you know, that it manages to make some configuration changes or how to create to the right warehouse that actually saves money for that customer without actually impacting their performance.
Barzan [00:28:04]:
And that's how you can actually save significant amount of money because there's so much variation in your day, daily workload that if you actually know what you're doing, you can build models that that significantly save you, save save you on cost.
Jillian [00:28:20]:
I know it's coming, but I'm still so freaked out by the idea of having these agents that are just doing stuff. I mean, I guess it's not really any different from your case isn't, like, that different from an auto scaler. And that's a known problem, but just in general, I just I'm not there yet. And I really like AI. I think, like, I'm all I'm all about the AI over here. Right?
Barzan [00:28:41]:
No. I think you're spot you're spot on, Jillian. I think, the reason why we've been very successful is because, like, we address that elephant in the room. Right? Like, one of the major reason one of the most common I I think there's actually four very common, failure, failure patterns with AI in general, with AI adoption. Right? One of them is implementation, risk. Right? Like, you have to if you have to if you have to convince a team to spend weeks and month and month, iterating with you, doing a POC, learning new things, you know, rewriting their code, then you're already off to a very difficult start. The second risk I would say is the adoption risk. Like people being scared of, oh, what if it's here to take my job? What if it slows down? Like, I don't wanna be yelled at for slowing down your, you know, for for telling my boss that, hey, I'm leveraging AI and it made a poor choice in the middle of the night and the pipelines failed.
Barzan [00:29:39]:
There's the security risk like, hey, I don't wanna have to justify why I'm giving access to this third party to come and train our most precious data. There's pricing risk. This is actually a pretty big risk as well. Like these days, it's a tangent, but we were attending the there was a summit that we were attending and there was a bunch of different vendors there. And our marketing team made an observation that every single booth had the word AI written on it. Even if it had nothing to do with AI, like, you know, they just like selling cookies. I'm exaggerating here, but like everyone felt compelled to put AI in their positioning just because it's hype. And what it's done is that it's bad in two different ways.
Barzan [00:30:20]:
Number one, it's muddies the water, so people can't tell who's actually leveraging AI and who's just using it as a as a as a buzzword. But the second thing is that you've got these CIOs who got really excited. They were sold a, bill of goods, but then they they, they were let down. They spent all this money and all this time and resources and at the end, nothing was delivered. So there's that pricing that's like, how do I know that this is gonna be successful? I'm not just trusting you. So the reason why actually Kibo was able to create these models and give you a word adoption and and and have a lot of happy customers is because we address these four risks like head on. The first like example was the implementation first. Like we before we even wrote the first line of code, we made this decision that whatever we do should not take more than thirty minutes of one engineer's time to onboard.
Barzan [00:31:11]:
Like that's we used to joke that if we take more than thirty minutes of your time to implement Key Vault, we're gonna give you an iPad for free. Like here I'm sitting talking with you, we've never had to buy a single iPad. The second thing we did was the adoption press, right? That we told people like that our actually internal model is our first slowdown is our last slowdown. So if you think about like an autonomous car, like let's say, you know, self driving Tesla to give you an analogy, if you're a car vendor, if you're a vendor that's building autonomous cars, your first crash is gonna be your last crash. You can't just tell people, hey, this is a fully autonomous car, but almost autonomous. You know, it just has a crash one in thousand. It's not good enough. If there's a likelihood of a crash, then you're, you know, you're dead in the water.
Barzan [00:31:58]:
So what that means from a design perspective is that whenever in our algorithmic work, we had to fork whenever the agent has a choice to choose between increasing the savings for the customer and protecting performance, we take the latter. Because no one yields at you for why instead of saving me hundred and $2,000 you save me only hundred thousand dollars this month. But if you cause one slowdown that makes their jobs fail and their boss yell at them, then they're never gonna trust this thing up again. Right? From a security risk, we made this deliberate decision that we're gonna restrict our models to trade only on metadata. So we don't even see customer data, right? We train on performance telemetry, all of that from, pricing risk. We came up with this idea. A lot of people were critical of us, like, why are you guys making so hard on yourselves? But But like we decided, you know what, we're only gonna charge customers a percentage of whatever we save them. So our incentives are aligned with customers.
Barzan [00:32:49]:
No savings, well no charge. If we save you $3,000,000 we take a portion of it. If you save you $3 we take a portion, we save you zero, we take not, Right? So those risks are there. You just have to be really intentional about how you design your software in a way that accounts for those risks and addresses them head on. You can't build something and then figure out how do I convince customers that those risks are not there. You have to build it with these principles as as as your guiding principles in your design.
Warren Parad [00:33:23]:
So one of the things that I, identified early on with the AI hype bandwagon is I I think a lot of companies were using AI on their marketing pages as a proxy for, I don't actually know how to talk about the value our product delivers. So I'm just gonna put so I'm just gonna put these two letters on there and pretend that that means something to someone, and they'll bring their own ideas about how that could be valuable. And I think before that, we we saw similar things happen in the past. I think just the speed and the velocity of change that's happened for the AI cycle has been so fast that it's really easy to see, from innovate like, innovation hitting the market, not, like, outside of academia, because we all know AI has been around for much longer than just the five years now. You know, we're going back twenty, thirty, forty years, where there's lots of papers out there, but in in business, realistically. And I we can we can actually see the change from innovation all the way to exploitation. And I still think that we have the same number of companies that are start ups, or even big giant, Fortune Fortune 50 companies that honestly have no idea of how to actually convey their value effectively.
Barzan [00:34:39]:
No. That's fair. And I think that's a big challenge. I think it's it goes both ways. Right? You've got at the top of the food chain, right, like CIOs and CTOs who hear these buzzwords, and they feel like we have to do something about it. The board is asking about it. I'd like, you know, that we gotta do something about it. And then on the other side of the equation, you've got sometimes ICs who are worried about, like, their jobs.
Barzan [00:35:08]:
Like, hey. If we, you know, we adopt this thing, then what's gonna happen to my job? And, like, my reaction to that usually is, like, if you worry that AI is gonna take away your job, it's probably going to, to take away your job, right? Like if you like resisting it because it's just like laws of physics, right? If you're standing on the wrong side of history, it's just a matter of time, right? Like you cannot, it's happening, right? The best you can do is to sort of empower yourself with more knowledge of how to best leverage it. Like there's a there's a huge market for, engineers and and, DevOps folks who understand AI. They know how to best leverage it. Like, you know, MLOps is a thing, right? Like, you know, how to, like a lot of the machine learning experts that come out of academia don't have the faintest idea of how to deploy something to real world. Like, so like, you know, you need these engineers who can just who understand the high level concept and they can, you know, you can partner them closely with your, machine learning researchers and and experts to sort of build stuff that can actually get deployed, get trained at large scale and get trained and and have, the right level of robustness and and reliability. So there's a lot of things that people can do to protect their jobs. Just, you know, go take an online class and and brush up on your stats class and and, you know, take a machine learning course.
Barzan [00:36:33]:
Try the few tools that are out there. One of the biggest, anti patterns I'm seeing these days is, which I think has plagued the software industry on the consumer side of things, is is this, unreasonable urge for building versus, you know, for build versus buy. And I think significant amount of engineering cycles are getting wasted by people giving into the their own natural instinct of, oh, I just wanna build everything in house. And you'll be surprised, like, very few CIOs and leaders are able to sort of tell what's the right time, what's the what's the right thing to build versus buy. And I see people get that wrong all the time.
Warren Parad [00:37:17]:
I I I liked your call out here on where you should be concerned and how to train yourself or grow further. I I mean, the the idea that if you fixate on the fact that your job is gonna go away, then it probably is actually really reminisces for me a concept from, of all places, Hawaiian shamanism, which is, like, if you fixate on this thing, you're actually bringing it into reality. You are making it the the case. And I I do really think that manifesting. Yeah. For sure. So I do think that there is a lot there. Like, if you wanna be, like, you can figure out what your job should be and what you wanna be an expert in and how to achieve that.
Warren Parad [00:38:00]:
And maybe it's not a fit for your current company. But for sure, if you just worry about, the fact your your job may or may not be going away, there's definitely an aspect of, and this is something that I I've picked up recently, and I've been trying to live by. It's not necessarily the easiest thing, but I think it's ancient, Confucius wisdom here that if you if you worry about the future, then you you cry twice. You you you feel the pain twice. You know? It you know? There's something you can do about it right now. And rather than worry about a future that probably won't even come, do that thing. And and if it does come, then you're at least prepared.
Barzan [00:38:39]:
Oh, %. No. % agree.
Warren Parad [00:38:43]:
I'm sort of curious about the verticals that you see. I mean, we talk about data intensive systems a lot. And, like, what falls into that category? Like, concrete thing. Yeah. What kind of data?
Barzan [00:38:57]:
One of the interesting things that again has happened with the rise of I mean, there's a reason why Snowflake had one of the largest software IP over. One of the things that this new breed of technology has actually one of the changes that is made in the way that data is being consumed is that it's become number one size agnostic. Like back in the day if you had a bigger company you had more data. If you're a smaller company, you probably had small data. And then there were certain industries that were like, you know, tech was known to be like, you know, much more, data savvy than for example, government or you know healthcare was a lot more protective of their data and you know there's certain segments or sectors of the industry that were more data driven. I think what we're seeing is that it's penetrating everywhere. Like I was talking to our local government in one of the states where you wouldn't think they would be looking at Snowflake. And they're like, no, no, no, we gotta get on that.
Barzan [00:40:00]:
We gotta get on that, you know, cloud data warehouse for these five reasons. This is what you know, we this is what we're trying to do. I was like, do you guys even have the budgets? I was like, but that's irrelevant. We gotta do it for these reasons. And then, so that's from a sector perspective. But the other thing which I think is even more interesting, even from a sales and go to market perspective is that you have no idea how much a customer is gonna spend is spending on their data infrastructure by looking at the size of that company. Civo has customers who are spending north of 15, and and more million dollars a year just in their snowflake bill. And they're a tiny company.
Barzan [00:40:41]:
Like, you know, not like two or three people, but they're less than, like, you know, 500 employees. And then we're basically working with like these massive multinational grocery stores where the entire cloud data warehouse bill is only $200,000 right? So I think that's interesting to see both directions. Like, the data is growing, vertically, but also horizontally across these different center, sectors. I wish I I, you know, I could give you an, you know, easier answer and say, oh, it's only fine, you know, fintech and and, retail. That's where we're seeing all this thing, but it's it's not like, you know, we're seeing it in healthcare. We're seeing it in government. We're seeing it in, you know, all sorts of, verticals. And the only common denominator is that companies have realized that they need data driven decisions.
Barzan [00:41:33]:
They've realized that they either have the data or they are part of the data and they have to go and acquire other data whether from their CRM, from their marketing tool, from their website traffic, from third party vendors. There was this insane, I don't remember, I don't wanna misquote this thing, but there was this insane report I read lately, which was saying an average, you know, a comp companies now on average combined X number of data sources and X was a wasn't insane numbers. Like one of those numbers you're even embarrassed to quote. The thing was like, it was definitely north of 20 and and it's it's insane if you think about it like that's a lot of complexity right there, right? Like companies should not have to deal with this with their own resources. If you're a bank, you gotta focus on what's making you a differentiated bank. If you're a marketing company, you have to focus on your core business. You shouldn't be in the business of building and optimizing your own data infrastructure. You gotta, you know, you gotta automate that part.
Warren Parad [00:42:31]:
I I think part of the problem here is I think as it it's sourced from humanity, this this idea that growth equals good and that your total addressable market can actually increase in size over time, and you can make it happen. And these companies are lacking ways of growing still just a little bit bigger, and so they're spending a nontrivial amount of money pulling in almost nonsensical data, nonsensical sources, things that aren't so relevant in order to even increase their market share by percent, like, pips, you know, hundreds of percentage points because that's all they can do. But once if you realize that your market is only so big and that's it, you know where you should optimize for and potentially just stop there. Focus on cost reduction, on optimizations in what you're doing rather than trying to add yet another product or another feature or service in a way that doesn't really add fundamental value to to to your users. You actually opened the you you stepped in this and you opened the door, and I I wanna wanna ask you about this. I feel like since the exploitation of of of LLMs and the data that we have that's been created since the the Internet was conceived of an as an idea, we're losing public access data. Like, the the datasets that are available just from scraping individual websites or just freely available, I think, is actually decreasing. I I dare say that the end of the Internet has come, or it's it's on its way.
Warren Parad [00:44:00]:
That connectivity is no longer what we're optimizing for. And I'm wondering where you see this going. Like, is it private datasets that are curated and data brokers? I I know that CloudFlare has jumped up and down and said, hey, look, we can do this. We're already blocking bots, and we know what data is being transferred and who has websites on our platform. We will sell you this data, AI scrapers, that you can go purchase from the data sources. Is that the future?
Barzan [00:44:27]:
I think people are doing move on. It's hard to predict the future, but it's also easy because, like, no one's gonna remember to come back and hold you accountable for misprediction. So I didn't call that out but, but I think if we, you know, within the next decade, like I usually have, the The thing is like if you spend too much time in any particular area you can see things that are pretty obvious to you but maybe they sound weird to others who have not been following that thing. But a lot of things that might be a surprise to others for example like the success that chat GPT had was a surprise to a lot of others but not to those who were tracking the progress like over the years. So I think in terms of data and selling data as an asset, I think we're actually already moving past that right? Now people are selling agents that are trained on that data. I like, you know, there's a reason why there's all this, you know, excitement about, you know, like you guys have seen the news about deep seek and what it means for, you know, the use of GPUs and like, you know, the investments that companies like OpenAI and have done. But the bottom line is that there's an arms race, like you basically train these AI agents, instead of having companies sort of just go and purchase this data and then clean the data and then combine the data and then build apps on it and then monetize it and then maintain it and tune it. Like you just buy these agents.
Barzan [00:45:49]:
I think we're past selling data and we're in the place where we're selling agents that are already trained and ready to be deployed.
Warren Parad [00:45:57]:
Deployed. If the data goes private, though, no new agents are gonna be able to be spun up. So, you know, from that standpoint, we're we're at the roads end of where the AI innovation can take us. Like, I I feel like fundamentally in order to keep evolving and innovating, we we still need new fresh sources of data with combining all of the, humanity's collection so far in order to actually train on all of it and get the most effective agent being built? Or is I missing
Barzan [00:46:25]:
something here? I mean, in theory, but if you think about it, the majority of the humanity, the data is actually called minority of humanity. Right? There's, like, three big players. Like, I mean, that's the, almost sad part of, like, how consolidation has been working. Like, you know, you have two or three major providers who are seeing and recording and monitoring 99% mean. It's tough for coding, but, like, I mean, if you think about it, like, you know, Google doesn't need me to send them a copy of my hard drive. Like, they see my emails. They see basically my, you know, usage pattern on my Android. They see, like, you know, the, the the content that I'm consuming.
Barzan [00:47:12]:
They see the books that I'm basically searching for. Amazon knows the items I'm buying. They're looking at every book that I'm reading. Like, they have a lot of this data that, you know, at least in The US, like, I can't speak for Europe. I think they have much better laws, when it comes to privacy protection. You don't even think twice about, you know, clicking and saying I agree to these terms of use. And and I think they have the majority of that data. Like, will we be better off if everyone shares everything and then, you know, we build this stuff? I don't know.
Barzan [00:47:43]:
I think it easily gets in the in into the area of security and privacy, which I don't know anything about. But I think if that was not a concern, probably the answer would be yes. But I know that is a concern. I also know that there's very few players who who have already, you know, plenty of data. I mean, OpenAI has the data that they're actually escaping. But is it gonna plateau? Probably, there's gonna be I think these things are gonna become a commodity. These agents will become a commodity. You know? The arms race will not continue, and then we'll move on to the next thing after that.
Warren Parad [00:48:14]:
Well, that's an interesting point. Maybe you all like, there there's this idea in in biology where you just need a limited set of unique individuals in order to, propagate the species without, too many mutations of which it will, you know, collapse under inbreeding, basically. Like, maybe there is some set of data that we only need that much in order to uniquely be able to create even the best trained, agents that we possibly can. Additional data won't won't help us in that way. And maybe we've gotten that. Maybe we'll get it That's
Barzan [00:48:46]:
actually the the crux of, learning theory. Right? That basically the error will go down, you know, one over n when n is the size of a dataset. Right? Like so that basically means you more data at some point is not gonna significantly reduce. More training data will not significantly reduce your error. Obviously, that depends on the sparsity of the data. You know, the whole idea behind DC dimension and whatnot. But the the main idea is this, like, you know, I know we're not really good at particularly, you know, predicting election outcomes. But the idea of, like, these election surveys is exactly the same thing that you don't need to go and ask every 300, you know, every one of the 300,000,000 voters.
Barzan [00:49:25]:
You know, if you have a sample that's large enough past that, you're not gonna significantly increase the accuracy. I think that's that's definitely true that there's a diminishing return. It doesn't mean more data is not gonna help. It means that there's there's a diminishing return.
Warren Parad [00:49:43]:
I can't see Jillian, and I I think that means she has some secret questions that she's just trying to figure out how to put it to worse.
Jillian [00:49:52]:
Yeah. Because I kept I kept freezing when I have my camera on, so I thought that I'd turn it off for a little bit. Yeah. Yeah. I definitely agree with you guys on the like, you can keep adding more data, and that doesn't necessarily make it better. But we're also always getting new data. Like, we're always producing, like, new and new and different data, and we need the new and different data too. So I work with, like, a lot of medical data, and we're kinda constantly changing just everything, the resolution that we can, you know, see the data at, the amount, just more insights, more more everything.
Jillian [00:50:27]:
So I don't know. I have very mixed feelings about this because I've definitely been on projects where we've been, like or somebody's been pushing more to, like, we'll just make it better. Can't you just add more data? And I'm like, no. You see the last three datasets that we added to train it? They didn't actually do anything. Like, here's here's the graph. They look like more data, and I'm like, well, you're my boss. So, like, okay. But this is silly.
Warren Parad [00:50:46]:
I mean, I'm I'm with you, and I also think that the medical industry, that vertical is actually more unique in this way. I think our lack of full understanding of even our human bodies, but organic material organisms in in general means that we could benefit from having more data there realistically. And I feel like there's there's there's so many things that we haven't figured out there. The other verticals, I I question a lot. Like, I've worked, I think, five different companies now in total, separate from all of the consulting that I've done and advising. And all of them were like, our data is precious. We must save all of it. And I'm like, you don't need that data from ten years ago where you were measuring the deviation on vibration tests of this one product that you don't even manufacture anymore.
Warren Parad [00:51:33]:
Like, do I I assure you, you can throw that away. It's not going to help you. And yet they're like, we gotta keep it. I'm like, okay, AWS, you know, Glacier. Yeah. I think you're right.
Barzan [00:51:47]:
Well, storage is cheap. Right? But I mean, you're right. To your point, I mean, medical data, I remember I was working with one of my, colleagues from the med school, and we were trying to predict, he was a cardiologist, and we're trying to predict, train models that predict the chances of an organ after a, I forgot the medical term. Like, you know, when they basically do an organ transfer, the body the the host body, there's a chance that might reject that organ and they use antibiotics or whatever to suppress the immunity system and whatnot. And there's, you know, complications, all of that. And and the idea was to predict the risk of an organ rejection. Yeah. It's called transplant, trans
Warren Parad [00:52:30]:
Transplant. Yeah.
Barzan [00:52:31]:
Transplant. I think that's a medical term for it. But I remember, you know, they're saying, like, you know, at Umesh, we have one of the largest cancer datasets on the planet. And then when we looked at it, it it was, it was a number that, like, I I forgot the exact number, but it was something, like, close to 300. And I was, like, how is this the largest dataset on planet? It's just, like, medical data by definition is way more sparse because there's, you know, there's only, whatever, 6,000,000,000 or 7,000,000,000 cap on how many you can collect. And for any particular disease, there's a very small subset of them you have access to. So I don't think the laws of large numbers do apply to anything that's about this. I mean, with DNA and stuff, that's different.
Barzan [00:53:10]:
But, like, you know, when we talk about individual humans as data points, I agree. I think that's that's probably an exception. I don't think we're at the place where we don't need more data.
Warren Parad [00:53:19]:
I mean, we
Jillian [00:53:20]:
We need the data science companies to just go sit off in, like, the a corner for this conversation when we're talking about, like, you know, building agents off of data and how much data should we have and when do we stop. Because, like, medical data climatology, I don't I don't think the answer is ever or not right now, anyway. It's not anything I can see.
Warren Parad [00:53:39]:
Yeah. I mean, I'm with you. I think the problem in the medical field, though, is that it's not public. I feel like the the climate climate data and tracking, like, there's a lot out there. Whereas in the medical field, like, that's controlled by private entities who are bound by local regulations on on even sharing that, which is in a ridiculous way. And the data, there's there there are companies out there that do anonymize data exchange in the medical field specifically to sort of help overcome this problem. And, you know, like, there's no not a benefit for the patients. There's not a benefit for the providers, the for the for the government.
Warren Parad [00:54:11]:
Like, there's there's very little benefit here unless except the end company who may be able to use all this for for, the good of humanity. And, that's a hard sell, I think, when there's dollars on the table on the other side.
Jillian [00:54:24]:
And a lot of medical data is supposed to like, if it's used for research, it's supposed to be public. I mean, it's not always or it's, maybe not in, like, organized in such a way that it's even usable or, like, there there's a lot that can go wrong with that. But there is a lot of medical data that's public.
Warren Parad [00:54:40]:
Well, I I think part of it is legacy systems that aren't optimized for even storing the data in elect in a, electric medical record format, like, if it's not electronic. Now you end up before we were talking about hallucinations in the world, which is, you know, still something AI focuses, we had the giraffe problem, where looking at a image from a medical document would likely render, positive on whatever the diagnosis is that you were trying to track just from the existence of, ruler or, the way that because it was an X-ray or things that had nothing to do with the actual, information that was contained in the document. So I don't know. I'm I'm with you. More more data in medical field for sure. Anyone who's working on that, like, you know, don't stop.
Jillian [00:55:30]:
Always more data. Yeah. I don't know. Storing biological data is, like, such a problem too, but that's that's probably another, that's another topic for another show, I think.
Warren Parad [00:55:40]:
Well, now now you've got me.
Barzan [00:55:41]:
A lot
Jillian [00:55:42]:
of, like, the cloud things, we could probably have AI agents because a lot of a lot of it is probably autoscaler problems. But I think, doesn't AWS kind of shut a lot of those down? Like, I was talking to, I forget what, like, a vendor through a client, and they told me, like, yeah, AWS basically made our, like, business model obsolete because we're trying to we're trying to save money. Although this is a lot of hearsay, so I'm not sure that I should be repeating this. But, anyways, it it did seem like they had something where they had, like, agents or AI running around in the background to try to cut down on costs, and, it was it was not well received.
Warren Parad [00:56:19]:
I mean, if you if you build something on a hyperscaler, there is a there is a chance that they will find a way to to recapture that value and claim it for themselves. Like, if every one of your customers needs to do something, it benefits everyone to bring that value that you can deliver back into the platform. So I and I know AWS is actually pretty good about doing that, rather than forcing everyone to use a third party company to achieve the the same, benefit. Like, you know, it it it's surprising to me that companies like Snowflake or Databricks, there's a couple other ones out there. I think Datadog's another good example. There's companies that just sit around and help customers spend less money on these platforms. And if and if that was me, like, if I'm Snowflake or Datadog, I'm just like, okay. I think it was, like, Coinbase was spending almost a hundred million dollars a year on, just data analytics coming from their platform.
Warren Parad [00:57:13]:
And they're not they weren't very big when this got reported. And then they're like, we're gonna have to do something about this because that's apparently too much money. And that is a lot of money to be spending on it. Mhmm. It's just it's just a bit ridiculous because if you know lots of customers have this problem, like, you would think that lowering the price point in some way, not by changing your pricing, but figure it by doing those optimizations helps all of your customers in some way. Otherwise, they're just, like, would you otherwise, they're gonna pay a third party company to help them do the same thing anyway. So I think over time, as you get more and more customers who all have similar problems, they have no choice but to bring that, effort in house either by buying a company that is doing that for them or spinning up their own internal version of it to optimize.
Barzan [00:57:59]:
I think it's, it's it's, it's I think it's there's two parts to it. One of it is, like, why would a big vendor invest in reducing their own revenue? Right? Like, you know, Snowflake's, you know, stock price is a function of their revenue. Right? And if they wanna reduce their own profit margin or actively be in the business of reducing their revenue, I think that that that will not go very well with the with the shareholders. But the other thing but the other part is, like, focus. Right? Like, you know, as a as a vendor, you always have to protect the main body of your revenue. Like, this is like the innovators dilemma. I like that you have to, like, you can't work on niche opportunities. Like your job is to build a database that anyone on the planet can use right? Now what's gonna optimize this kind of workload might be different than what's gonna optimize this other customers particular use case.
Barzan [00:58:51]:
And that's where I think startups excel a lot. But I think they also realize that if they there is a reason why, like we're partners with Snowflake. There is a reason is because there is a reason for this, like they see value in us serving their customers almost in a unpaid customer success capacity. I call ourselves, I sometimes joke that Kivo is Snowflake's unpaid customer success department because, like, we prevent their customers from churning. Right? Like, at the end of the day, if I'm spending a lot of money and I'm not able to get all my use cases onboarded, and I'm under, you know, pressure and the CFO is yelling at me, I'm gonna look outside. So, key was helping customer get a better, performance per dollar, for the for the budget that they have. So I think there's there's also that bigger picture, although, you know, sometimes the sales reps don't share that same compassion. No.
Barzan [00:59:44]:
I mean They're thinking one quarter at a time. Right?
Warren Parad [00:59:46]:
Yep. Yeah. I think that's the biggest problem. If you look at the brand of a large data company for or even any large company, you have to look out multiple multiple years. And you're absolutely right. Like, the the value that you're providing them as part of the Snowflake network is higher than the amount that it would cost them to maintain that same piece of functionality in internally or the amount of revenue that they would lose if, say, all their customers had access to that functionality just straight away or it was automated in some way. So, I mean, if you look at that equation, then realistically, you know, how you want the network to be you want everyone to be happy in a way. And so if what makes them happy is that there's little start ups out there that are helping them reduce their bill a little bit, then you let that be the case.
Warren Parad [01:00:31]:
I mean, I the the economics obviously change at larger scale when all of your customers have this problem, or they're all unhappy because of how it's going. So I think we're we're about at the hour now, and I I I feel like this is a good point to, you know, maybe say, like, okay. Is there maybe one last thing that, you just wanna share? You feel like maybe we didn't touch on that. That could be an interesting topic. Something to close out the episode with.
Barzan [01:01:00]:
I think when it comes to software design, one of the things I've sort of recently seen it explained very well. Sometimes, technical people like to have a lot of knobs because, you know, we usually think more flexibility means more options need means better adoption and all of that stuff. I think one of the things we've learned, the hardware is that actually the fewer choices you give people, the more likely they'll make the, that you'll get adoption, right? There's a, but this week, I, you know, I read this somewhere, and it was summarized pretty well. I think, apparently, there was a very successful shoe salesman in LA back in the fifties, and they interviewed him and asked him, like, what's your, secret? And he said, my secret is the law of two, not three. And they asked him what you mean by that? It's like whenever a customer asks me to bring down a shoes that they can try, and then they ask for a second one, I give them to them, I give those shoes to the to the customer as well. But if, they ask for a third pair, then I tell them which of these two would you like me to put away. And and the reason is they figured out that when they give customers two choices, if they give customers three choices or more, they're likely to buy none. But when they give them two choices, they're likely to pick one.
Barzan [01:02:22]:
And I think that actually applies in some really profound ways to software design and and AI adoption. If you basically overwhelm people with, like, 20 different knobs, then essentially you're sending this message that I don't know how to tune this for you. I'm I'm throwing, you know, throwing all of this over the wall. You have to figure it out. And by the way, if you get any of them, you know, wrong and things go sideways, you have to own that decision. But if you can simplify that, you get a lot better adoption. Like in our, like for example, at Key Vault, we give a slider to customers where they can choose between best performance, you know, good performance, balanced, high savings, highest savings, like these are things that are not, they all, you know, turn into a vector, bunch of numbers and relational algae, you know, sorry, you know, linear algebra and all these operations. But like we give these high level things to customers and 99% of the time, we actually get better adoption with it.
Barzan [01:03:24]:
So I just thought like this was an interesting quote from this, fifth you know, shoe salesman from fifties where, you know, what what is it now? Twenty, seventy five years later, a lot of us still, like, you know, overlook in software design. I thought it's an interesting observation to share.
Warren Parad [01:03:39]:
Yeah. I I also forget that we're we're in the twenties now of, 2,000. So, you know, when when doing when doing math for for last century, that, that still trips me up. No. I think it's a really great point. I I think it's a really interesting perspective there, which goes in the direction of developer experience and user experience for not not just selling the product, but making sure people actually understand what they're doing. And there is an aspect of decision paralysis there that really drives into what people are going to do or how they're gonna use the tool effectively. Okay.
Warren Parad [01:04:15]:
Well then, Jillian, should we move on to picks? Sure. What do you what do you got for us today?
Jillian [01:04:22]:
I'm gonna pick, Infinity Nikki. It's a video game, and it's just this open world game where you just, like it's you you just run around and you just try on pretty dresses, and it's nice. And it has very, like, satisfying mechanics of jumping off buildings. That's it. That's the game. I think there is actually more that you could do in the game, but there's not more that I'm going to do with the game. So that's, like, the extent of my knowledge.
Warren Parad [01:04:45]:
I think what everyone is needs an answer to is how much AI is in the game.
Jillian [01:04:51]:
I don't know. I don't know. Maybe I think it's all procedurally generated. I don't think it has, like, any AI anything.
Warren Parad [01:04:58]:
So you're saying is there's some future DLC for the game studio that's coming?
Jillian [01:05:04]:
I have been wondering, like, if video games are gonna start to make NPCs. Like, if they'll just have them just just have, like, agents and, or not agents, but those will all be AI. So then you could, I don't know what, ask it for a cake recipe or something. But, like, that does seem like some low hanging fruit for the video game industry is to just do that. But I don't know if it would be, like, cost effective rather than just Yeah.
Warren Parad [01:05:26]:
That's script. Yeah. I could say, you know, the video game industry notoriously super high margins and lots of extra capital to spend. Yeah. They
Jillian [01:05:35]:
do. I don't know. So I don't see it happening there, but maybe it'll come up someplace else. Yeah. Because video games are interesting because, like, everything else can be procedurally generated. So I don't know where you put AI.
Warren Parad [01:05:45]:
I mean, I know some people that are using the foundational models that are out there to sort of make single player games out of, as for a gate engagement and getting them to either basically DM your you or be the game master for you, as you sort of play the game. So things that would have normally required multiple other people turning it into a single single person experience. So I I think it's possible. But, yeah, I think the the cost is gonna be okay. I like that pick. Barisan, what do you got for us?
Barzan [01:06:19]:
The, an interesting book, I think, that I would probably, recommend. It's not really related to engineering, but I read a lot of books, so I like I, you know, I really love books this month, but I think this is under the one of the good ones. It's called Never Split the Difference. It's,
Warren Parad [01:06:41]:
Chris Boss. Yes.
Barzan [01:06:42]:
I've read that book. It's it's amazing. It just talks about the different characters of people, like, when it comes to negotiations, like it talks about like, you know, you've got the analyst, you've got the negotiator, and then you've got the accommodator, and if you're basically, sorry, if you got the, you know, the assertive type, right? So if you're the if you're an accommodator, and you talk to an assertive person, you're just giving them an opportunity to socialize with you and that's just offending them and things like that. So I thought it was pretty interesting, like a lot of those things where you kind of learn from muscle memory and the thing if you sort of be more intentional about it, it just makes you a lot more effective in in in in in day to day communications anyways, not just in negotiation. So I thought it was an interesting, book that I have to learn to No.
Warren Parad [01:07:27]:
I actually I actually really liked it. One of the things that I took away from it really importantly that's helped me a lot is to understand the like, I I always thought the idea of, like, a win win scenario was made up nonsense. But, the way he puts it in the book is that you're optimizing for certain things, and the other person's optimizing for different things. And you can both optimize for the things that you you want as long as you make that information public and and you share it and and you you converse about that. As long as you keep it hidden and and secret, then you can't ever really get the other person to move on that potentially. So I think about, like, salary negotiations and engineering. I don't have them at my company for engineers that we hire. It's it's we don't just say, like, hey.
Warren Parad [01:08:10]:
You know, this is how much you get. If someone wants more money, we have a conversation about, like, what is that expectation that comes with the change in salary? It it makes sense to talk about that. If you want this, then there's this other part that's important for us. Like, for instance, people that wanna be, say, a senior engineer and we think they're more at just the, I mean, engineer level two level, we would say, like, well, there's higher expectations. And that means that if you don't meet these expectations, there's a greater chance that we'll have to either reduce your level in the future or we'll have to let you go. So, you know, is that a risk that you wanna take? Increased risk for increased reward potentially. No. I, I I I really like the book.
Warren Parad [01:08:50]:
So Yeah. Yeah. I And
Barzan [01:08:51]:
then, yeah, like, you know, they talk about, like, the idea of the example they say like, they one of the examples they give about this win win situation is, like, if you have a hostage, situation where they have, like, you know, four hostages and you say, hey. Like, you know, they threaten to kill all four of them. You say, how about, you know, meet in the middle and you only kill two. It's like it says, like, this idea of meeting in the middle is ridiculous. Like, you know, it doesn't work that way. You really need to know what is the outcome you're driving towards. And, there's some I think there's a lot of interesting takeaways from that book, like the one that you mentioned. Yeah.
Barzan [01:09:25]:
I mean, I
Warren Parad [01:09:27]:
yeah. No. I think it's a great it's a great pick. My my pick today is gonna be the l eight conference, which this year was in Warsaw, and I just got back from speaking at, I, did a short talk about, building highly reliable software and why having, five nines is nearly impossible, more so than anyone thinks. So if if the LA conference is, you know, in your area and you're you're you're thinking about where to go, also highly recommend this one along with what I said last week. So that's it for today's episode. I wanna thank Berzin, for coming as our guest, and I wanna thank the the audience and all our viewers, for listening to this episode of the podcast. And that's it.
Warren Parad [01:10:12]:
And have a good rest of your week until until next time.
And, we're live. Welcome back, everyone, to another episode of Adventures in DevOps. Hosting today with me is Jillian. And, Jillian, are you looking forward to today's episode? Did she freeze?
Jillian [00:00:13]:
I did I did freeze for a few seconds, but
Warren Parad [00:00:15]:
Oh, you're good.
Jillian [00:00:15]:-nj/
Now I'm gonna assume that I'm gonna say hi here. Hi, everybody. I hope I don't freeze again, but I don't really know what just happened.
Warren Parad [00:00:25]:
Yeah. I mean, when we planned this episode, I had a strong sense that this was gonna be a popular choice for you. And that's today that's because today, I want to invite, Berzhan Mozaffari as our guest, MIT alum and University of Michigan associate professor working with data intensive systems for over fifteen years. Welcome.
Barzan [00:00:42]:
Thank you so much. Great to be here with you.
Warren Parad [00:00:45]:
Yeah. You know, I I have worked in a lot of engineering organizations, and data has always been this aspect of an area that no one wants to touch. There's stuff going on there. And I just always seem like there's bigger business problems that are at play or there's other challenges, present. But I always got scared when the topic of interfacing with one of the data teams comes up. And I I noticed from your your history, your profile, you there's a lot of different aspects of data systems that you seem like you have experienced at. You wanna talk a little bit about that?
Barzan [00:01:25]:
I I agree with you. I think data can be pretty intimidating depending on what people make of it and what's the expectation. But typically it's considered like the gold of the modern digital era. So, there's usually a lot of potential that and like anything with a lot of potential, you know, sometimes it comes with anxiety for the data teams or those who have to interface with them. But yeah, I've, throughout my career as you pointed out, like at this point almost the last two decades I've been working at the intersection of machine learning and and and database systems. Essentially pursuing this idea of how we can leverage statistics or AI in general to build smarter data systems. Where smarter could mean faster, more scalable, easier to use, easier to deploy, etcetera. Some of the work we've done, is now part of open source transactional databases like, you know, MySQL and whatnot.
Barzan [00:02:18]:
Running on millions of servers but that was all like open source work. Some of the other aspects of my work has been on analytical databases, cloud data warehousing and whatnot. Some of the work, we did an approximate query processing for example, is a good example of combining systems and and and, statistical learning theory, that, you know, got commercialized and eventually acquired as part of the, Snappy Data product. But, you know, the latest spin off is some of the work we're doing now with the cloud data warehousing and, building a data learning tier. So we've worked at a very high level like, you know, from the inside generation and, you know, root cause analysis all the way to the almost metal of like how to, you know, run queries more efficiently at the CPU level or the GPU level. And, and, you know, the the the common theme is that it's all fun. Anything that has to do with data, with machine learning, I usually get excited over it. So I could be talking all day about those different aspects, but that's the common theme.
Warren Parad [00:03:21]:
How did you get into it? I mean, I don't think you get very far in the academia. Like, did you did you think that maybe when you were a younger student that this would always be an area that was most interesting, or did it fall into your lap one day based off of, you know, experiments or labs that you were working on and this seemed like the most interesting thing?
Barzan [00:03:45]:
That's a very good question. I it's it's like most things that you end up liking, it's it's hard to tell, like, when it actually started. It's it's usually so subtle and and, hard to tell that you can't really tell where it started. But no, actually started my my passion was in algorithms. I remember I was always, like, you know, from early days, I was, like, into math and statistics and and and figuring out, like, you know, how many tries it's gonna take to sort of get to a particular outcome with high confidence. So I was always, to be honest with you, I was always intrigued by the power of statistics and, like, you know, seeing how you can get a lot further ahead in life if you know more about statistics than than most people. Like, something as simple as, like, you know, people playing heads or tails, right, with with coins. Like, you know, if you know what you're doing, you can sort of come up with creative ways.
Barzan [00:04:35]:
But, no, I think a lot of it started when I, went to grad school. I started my PhD program with, my adviser was a legend in relational database systems, but then he he was also seeing the potential. At the time, he was like data mining was the hot thing. Right? And then that was the foray into statistics and then, like, you know, later applied machine learning, learning theory. It was a progression. And I think the trends that we were seeing in the industry was also, helping with that. But, to your point, like, there's there's a lot of people in academia who are just kind of content, with just coming up with cool ideas that just remain as that. They're ideas and they'll always remain always remain as ideas.
Barzan [00:05:21]:
But that was always, like, you know, left a little bit, disappointed that an idea that I thought, hey, it was a has a lot of potential would never make it to production systems. So a lot of what I did early on in my career was working with industry partners, partnering with different companies, and trying to get adoption for free. Some of it will be open sourced, got massive adoption. But some of it was, like, pulling teeth to go and, like, you know, convince a bureaucratic enterprise, like why this is in your best interest to adopt it. And by the way, we don't expect any money from you in return. I'm just driven by the impact. Right? But at some point you realize, look, you're gonna put your mouth where your, your money where your mouth is. And at some point, like a lot of entrepreneurs realize like, hey, you, you know, an idea is nice, but like it's the execution that that matters.
Barzan [00:06:10]:
And then you just spin it off. And that's how I started commercializing some of these ideas with the main motivation of sort of, closing that gap between the, you know, theory and practice. Like, you know, something that's solid and works, but getting into a place that's consumable by by data teams, by products has a real world impact. So I think that's where a lot of that early interest evolved into.
Jillian [00:06:35]:
I think that's really interesting that you were able to kind of bridge the gap between, research and academia and getting to really build stuff because I think that's that's a tough one. That's a tough one for people who are in academia and get kind of frustrated by the process and, you know, for the reasons that you described to to figure out what to do. And I think most people just end up jumping ship to industry. So
Barzan [00:06:58]:
No. That's true.
Jillian [00:06:59]:
It's really that's really cool that you could find a bridge there.
Barzan [00:07:01]:
Yeah. I I won't, you know, I won't lie to you. It's not as easy. A lot of people fall off that bridge as they try to cross that bridge a lot of them fall into the water.
Jillian [00:07:09]:
I'm one of those people. Like, it's okay.
Barzan [00:07:12]:
I think what happens is that like in academia, we have this system where which is kind of designed, how should I say this? It's designed to kind of reward complexity, Right? And and, you know, if an idea works by this simple, it's not as rewarded. I remember we came up with this algorithm that was improving the average performance of queries, by a significant margin. I forgot the number but it was something like close to an order of magnitude. And then we submitted this paper to this extremely prestigious conference, academic conference. I will mention the conference name so people don't get offended. It's the top conference in our area. And then the feedback, one of the reviewers, their feedback was like, hey, if this was an important thing, someone else would have done it by now. And that just like, you know, kind of rubbed me the wrong way and I was like, if we go with that mindset, nothing gets done.
Barzan [00:08:10]:
Because if it's, if you, you either say, hey, this is just, you know, if it was, if you take that idea and actually apply it to life, nothing gets done. Because anything that you do, people can say, hey, if this was an important enough problem, someone else would have solved it before. So that actually kind of encouraged us and motivated us in the right way where we sort of took an extra mile. I remember one of my, or actually two of my PhD students at the time, they work with the open source community. They went there and they said, hey, you guys have this transaction scheduling algorithm, we have a smarter version of it. We've worked out in an academic setting, but here's why it's significantly more performant. We just want you to consider making this an option. So kudos to those open source developers in the MySQL community.
Barzan [00:08:53]:
They went and they did their own research, they tried our idea, they came back, they said this is so much better than what our default is. We're not gonna add it as an option, we're gonna make this the new default and make the existing algorithm an option. So then the next time around we submitted that same paper, exact same algorithm, exact same result and we said, hey, by the way, it is pretty important because now more than 2,000,000 servers in the world are using this as their default algorithm. And that's just one example. There's a lot of good ideas that get killed. But then again, there's a lot of important but voting problems in the industry as well. Like, you know, for the lack of a better term, you know, sometimes to make something work, you have to put up with doing 90% of work that's not boring so that you can actually get a kick out of that 10% that's exciting to you and just like completes that puzzle. So I think it's just a fine balance between finding problems that are A, I usually tell my, I used to tell my former PhD students that, you know, when you pick a problem, you need to ask three questions.
Barzan [00:09:53]:
Is it important enough? Do you have the skills to solve it? And do you have what it takes to get in the right hand? So I think if you sort of look at those three problems, you know, holistically, you can find your way from interesting, innovative, highly technical ideas, and then still have a real impact. I mean, that's
Jillian [00:10:14]:
I need some spite too. All my favorite stories feature, like, a bit of, you know, like, just that little bit of spite and petty. I think it's such a it's such a human motivator.
Warren Parad [00:10:23]:
I'm surprised though, because, like, a lot of conferences that I applied to, I I don't get any advice back. But I feel like the, like, the feedback of someone would have done that already if it if it was meaningful. Like, is like, what is that? Like, what is the purpose of saying those words? It's sorta like, you're driving, like, oh, I'm gonna you can only go from spite there. And I feel like that's not a good necessarily a good place to be driven from realistically. Like, why why wouldn't they say, like, be specific? Like, hey. You know, it'd be great if it was being used in the industry already. Like, if this is so ingenious, if this is so great examples there.
Barzan [00:11:01]:
And I think, like, if you look at sort of how I could like, how academics excel. Right? The idea is you wanna find out what others have done and you just need to do something better than that. And it doesn't matter if that problem is actually realistic, if the assumptions are realistic, it has to be innovative. Right? Like if it's a simple idea. I mean, the example I can give you is Spark. Right? Apache Spark, a lot of your audience are probably familiar with it. So the initial idea was pretty small, pretty simple. You have a pretty, you know, you have a working set, you have a data set that you want to, you know, keep doing the same computation on it.
Barzan [00:11:34]:
So in the, you know, back in the day, Hadoop days, right? Like for those of your audience who still remember, like, you had to basically take that intermediate result and write it back to disk. And then if it wasn't any of computation, only to read read it back into main memory immediately after you've written. So there's a lot of redundant IO that's just wasted. And, you know, the the the authors of that spark paper were actually my lab mates when I was at UC Berkeley. And the the the observation was pretty simple, but very meaningful that, hey, if you have a piece of data that you have to do some iterative computation on it, let's keep it in memory, pin it in memory so you can finish those iterations and then we can, you know, write it back to disk. The idea is sound, it makes perfect sense, well motivated, very practical, but they had a very hard time publishing that paper in academia because I remember the early feedbacks on their paper was like this idea is not novel enough. The keyword they use is novel enough, which means like it's too simple. Like, can you add a twist to it? Can you, you know, as if, like, it's a it's a it's a it's a movie.
Barzan [00:12:36]:
Right? Like, you wanna you wanna you you don't wanna be able to see the end of the the ending, like, you know, from from the beginning. So that's that's the kind of, mindset that's that's there. And I think there's good and bad to it as well. Like that's how people become more creative. People learn how to take on open ended problems. I think academia does a lot of things really well, but there's certain areas where I think closer partnership with with actual customers help save a lot of, smart brains from burning their calories on problems that no one cares about, or solutions that no one will actually ever adopt.
Warren Parad [00:13:15]:
So maybe to Jillian's point, why stay like, what's the benefit of staying in academia?
Barzan [00:13:22]:
Oh, that's a that's a very good question. I think academia has certain things that you only get in academia. Like, you have access to extremely smart talent. And, like, you know, as as as they say, when you hang out with smart people, you also keep getting smarter too. Right? And and there's some truth to it. Right? Like, there's, it's not, you know, you can't go and like, let's say you have a how should I say this? Like, you know, when when you're operating on venture capital, there's a very specific timeline. There's a certain amount of risk that's encouraged to take, right? But like let's say that you're working on curing cancer, like people have been working on for a long time. Like, you know, incremental ideas will only lead to incremental results.
Barzan [00:14:08]:
So at some point you need to, take some risk. You need to explore solutions that are so crazy, there's a good chance they're not gonna pan out. Right? And for doing that, you need a little bit of patience. That's hard to find outside academia. You need highly motivated, highly smart individuals at the, you know, beginning of their career, with that intellectual freedom to go and venture out, find those problems, explore those crazy ideas. And for every 10 crazy idea we try out, one of them is gonna pan out and that's a really good outcome for academia. In in in the industry, if you go to your, backer, to your board, to your boss, whoever that is, and tell them, hey, I need you to give me 10 times more time because I wanna try and check 10 different crazy ideas. It's a high risk, high reward thing.
Barzan [00:14:58]:
By the time you're through your, you know, third iteration, you're gonna and you're probably gonna be terminated. They're gonna have some difficult performance conversation. So there's a time and place for both, right? If you if you're looking for really creative, really impactful ideas, to give you a very concrete example, like, you know, at Kibo, which is the startup that I'm leading now, we're we're very successful. One of the main things that people love about our product is that it takes thirty minutes and then within with thirty minutes of investment from your side, the AI kicks in and starts optimizing your cloud data warehouse. For example, your Snowflake Within twenty four hours you're seeing an average at 25, 30 percent cut to your overall Snowflake bill which is very meaningful like you know we have organizations or customers who are spending millions of dollars on their Snowflake. Now a lot of people are impressed like how did you guys build something that's so autonomous, works so well and whatnot. Because it's, you know, seemingly from outside perspective, it's very difficult to build that exclusively in the industry. But like what people forget is like, hey, there's decades of other ideas that failed that people learn and and they have all this know how from academia.
Barzan [00:16:09]:
And then when you finally see the final result, it feels very subtle. Same thing that you see with Gen AI. For a lot of people who are not in the academic side of it, it felt like, oh, there was a sudden breakthrough. But it wasn't a sudden breakthrough. It was, you know, two decades of constant work. People were publishing papers after papers and and and pushing this, pushing the boundaries of what's possible until it got to a point where everyone could see the benefits. So that's the I think that's the other side of it that I just want your audience to be aware of.
Warren Parad [00:16:42]:
I I definitely wanna dive into that. But the duality is really interesting that you brought up that in academia, having ideas that really have a business impact, and maybe more than that, have a world impact, are not paid attention to as much. Like, it's just do a little bit better than what you're doing and experiment a lot. Whereas in in business, everything has to be immediately relevant. But on the flip side, that means that we aren't getting the time outside of academia to experiment effectively, that, teams should actually be experimenting because they may find a way to drastically increase the query speed or performance, resource
Barzan [00:17:23]:
usage
Warren Parad [00:17:23]:
of their, database clusters. But on the flip and so I I think it's what you're saying is really both areas that are separate need to learn from each other. More experimentation in, the private space and in academia, more attention to, like, what's relevant in the next, you know, one to ten years that has a business impact that that you know, where the industry is going, what's relevant for them. Otherwise, an idea is just really an idea, and it's not gonna get accepted into any conference doc.
Barzan [00:17:53]:
No. I think that's that's a good way of sum to get up. I think the kind of balance I found is very useful. It was like you find real world problems by definition in the industry. You can't go sit in your closet in academia and just say, hey. I think this is a really interesting problem to solve. You can find interesting problems, by the way. You don't need to talk to people to find interesting problems.
Barzan [00:18:13]:
But to find meaningful, impactful, important problems, you do need to talk to customers, you need to talk to actual users. And that has to come exclusively in my bias mind from the industry. And then there is different types of solutions. Like, if you want something really out of the box, open in the, outer box kind of solution for opening that problem. There's a lot of sharp minds in academia that if they provided with number one, real world problems or well motivated real world motivated problems and a set of constraints that need to be accounted for when designing, you know, that that solution coming out with that solution that otherwise, you know, the the the the people who had that problem would not adopt that solution, then you can sort of expect amazing results from working with economics. So I agree, I agree with you. I think like the problem has to come from industry and and and so should the constraints. But then the solution space like either academia or people who've sort of had that training.
Barzan [00:19:16]:
A lot of people, you know, a lot of engineers have never had that opportunity or they've never been given that opportunity to take on open ended problems and and be given the guidance and mentorship that a lot of professors offer their students, who become professors who, you know, offer this to their own students and so on. So there's a lot of wisdom about how to approach research and how to come up with creative, meaningful solutions that I think the industry could also benefit.
Warren Parad [00:19:43]:
Benefit. I mean, I think in the industry, we actually have this counter perspective, which now seems like it actually has a lot of paradoxical negatives. I hear very frequently hit the ground running, like setting up onboarding docs and tooling and resources so that you can just get started on your first day working at a new, organization and a new company, and you should already know how how it's supposed to work and already start providing value. And now I'm getting the thought of, like, well, actually spending time learning the backwards way that an organization is working before you actually start delivering value may be an opportunity that we've squandered in in a desire to move quickly and get everyone on the same page as fast as possible. There's a much lower opportunity for learning and, I'd say, failure, which I think a lot of people agree is a strategy that really drives, future innovation.
Barzan [00:20:38]:
No. That's right. I think, you know, another way to look at it is, I think there's nothing wrong moving fast. Like, that's the thing. That's that's my my own model. Like like, what I'm working in an academic setting on in, you know, at keyboard like we want to move fast but sometimes people have the perception of moving fast. Right. Sometimes if you're, if you're building a house but you're not taking the time to really understand the measurements and and and what you're doing and and one side of the wall is shorter than the other side, you're not really fast because all that work is gonna be throwaway work.
Barzan [00:21:11]:
So I think this the right speed is actually failing fast because you can't know everything if you, one of the things I've actually seen that's quite prevalent across the board, especially with more junior engineers is that desire to be a perfectionist. People try to like or or, we used to have a senior engineer who used to refer to that as, premature optimization. People try to start, people have a tendency especially in earlier years of their career to try to perfect things way too early, way before it's even proven to have any value. Right? Like I usually for the quickest, dirtiest, hackiest thing to prototype something and see if it holds water. And if it doesn't, that's perfect. That's called failing fast because guess what? You didn't spend the whole year building it. You spend a sprint, two sprints trying it out. And now that you failed, you just learned, you just ruled out one of the ways to fail.
Barzan [00:22:06]:
And and I strongly believe that the number of ways to fail is finite. So as long as you basically, you continue to fail, but very quickly, you're guaranteed to find the path to success.
Warren Parad [00:22:18]:
I sort of I sort of wanna go back. It's been mulling over my head, about your AI agent that runs to reduce your snowflake cost. Like, how how does this actually work? Like, how does it just go in and and, and, like, is it reducing is it finding, is it deduplicating data? Is it improving, query search performance? Is there some other magic going on?
Barzan [00:22:41]:
So so here's here's the like, to sort of see how it works. I think it might be useful to your audience to think about, like, the bigger problem. Right? So, like, one of the biggest things that's happened in our industry over the last, I would say, decade, decade and a half is, like, the rise of cloud databases. Or in particular, like, you know, the, success that the likes of Snowflake, BigQuery, Redshift, and and more recently, Databricks and and, you know, have seen. And if you think about what's happened there is that these cloud data warehouses, the likes of Snowflake have really lowered that adoption barrier. Right? So now it's significantly easier for anyone, any organization of any size, any team with any level of skills to go spin up a cloud data warehouse and start analyzing that data, querying that data, getting that data very quickly. Right? So that adoption barrier has gone down, but the byproduct of that, the side effect of that is that because it's so much easier to leverage data. Now you have more users with varying levels of database proficiency and skills writing queries.
Barzan [00:23:47]:
They're querying more data and they're combining more data sources. So as a result, I would argue that the data pipelines that that organizations are dealing with right now are an order at least if not two orders of magnitude more complicated than, you know, what we used to have fifteen years ago. Right? Because it's just so that much easier. You don't have to go through, all these hoops to get to your data. Anyone can spin up a cloud into warehouse. And as a result, because it's so much easier to use, they're so much more complicated. And because of that, they're so much harder to optimize, manually. Right? Now you're dealing with millions of queries.
Barzan [00:24:23]:
Some of them are coming from this analyst. Some of them are coming from this BI tool. Some of them is this ETL job that this guy wrote, you know, a year and a half ago and he's not with the company anymore. And then there's these reporting queries that hit the cloud data warehouse. Someone is doing data science, someone's training their models. They're looking at millions of queries. As someone who spent the last two decades of their life, like essentially with database systems, I would be intimidated to stare some of these queries and figure out how to optimize them. It's just, they're just not humanly possible, right? But that's exactly where AI or machine learning really comes to the rescue because, you know I call this, you know you can think of Kibo or machine learning algorithms in general as an infinitely competent, infinitely patient DBA, right? You can sort of, they can, you can analyze millions of these statistics.
Barzan [00:25:16]:
Look at hey, how did these variations in the load correlate with the variations in the cost and performance. Right? So for instance, if we just take Snowflake as an example, you know, you have to pick a size for your warehouse. Right? You have to decide, for example, what's your partitioning key? I do have to decide how long do you wanna keep this warehouse running after the query has finished. If I shut it off right away, well, I saved money. I don't have to pay Snowflake for just keeping an idle warehouse running because it pays you go. But then if the next query arrives and my warehouse is shut down, then I have to spin up a cold instance and now equate it would have taken a couple of seconds. Otherwise, now has to take maybe a couple of minutes because it has already been tough of call storage. Like, okay.
Barzan [00:26:00]:
So what's the optimal time to shut down a warehouse? And then this warehouse, you know, I bought most data teams that say, hey. I need a medium for my, you know, BI workload. I need enlarge for this. But do you really need a large warehouse twenty four seven? Is your workload constantly, steadily at a level where it warrants a large? Maybe sometimes you need an x large. Maybe it's actually cheaper to use an x large because you pay more per unit. But then the query finishes let's say, in less than half the time that it would have otherwise. Maybe it's underutilized, you know, can you wake up, your data team and send up, you know, and can you page your DevOps team to go and reduce the size of, your medium warehouse at 2AM from a, you know, to to a small warehouse and after seven minutes wake them up again and say, actually the workload increase again, go back to the to the default size. You can't do that, but you can actually train reinforcement learning models, for example, to do that, right? So you just I
Warren Parad [00:26:57]:
mean, you you can do that. I feel like there's gonna be a bunch of very unhappy people at the end of the day, though.
Barzan [00:27:04]:
Well, the there are things that humans can do. Yeah. And there are things that humans want to do. Right? Like, if you find the intersection of what is it that humans cannot do or don't wanna do and automate that, that's you that's how you've empowered your data team. Right? Like no one I've never met a data engineer who's told me, my dream is to wake up at 2AM, reduce the size of a warehouse for seven minutes, and then go back to sleep. Right? Like, I've never seen anyone who tells me, I wish I could just squint my eyes and look at 2,000,000 queries and figure out which one should be routed to which warehouse. But the reinforcement learning agent is more than happy to do that. You just have to have the right reward function where you penalize the agent every time that it causes a slowdown and you basically reward that agent every time that's, you know, that it manages to make some configuration changes or how to create to the right warehouse that actually saves money for that customer without actually impacting their performance.
Barzan [00:28:04]:
And that's how you can actually save significant amount of money because there's so much variation in your day, daily workload that if you actually know what you're doing, you can build models that that significantly save you, save save you on cost.
Jillian [00:28:20]:
I know it's coming, but I'm still so freaked out by the idea of having these agents that are just doing stuff. I mean, I guess it's not really any different from your case isn't, like, that different from an auto scaler. And that's a known problem, but just in general, I just I'm not there yet. And I really like AI. I think, like, I'm all I'm all about the AI over here. Right?
Barzan [00:28:41]:
No. I think you're spot you're spot on, Jillian. I think, the reason why we've been very successful is because, like, we address that elephant in the room. Right? Like, one of the major reason one of the most common I I think there's actually four very common, failure, failure patterns with AI in general, with AI adoption. Right? One of them is implementation, risk. Right? Like, you have to if you have to if you have to convince a team to spend weeks and month and month, iterating with you, doing a POC, learning new things, you know, rewriting their code, then you're already off to a very difficult start. The second risk I would say is the adoption risk. Like people being scared of, oh, what if it's here to take my job? What if it slows down? Like, I don't wanna be yelled at for slowing down your, you know, for for telling my boss that, hey, I'm leveraging AI and it made a poor choice in the middle of the night and the pipelines failed.
Barzan [00:29:39]:
There's the security risk like, hey, I don't wanna have to justify why I'm giving access to this third party to come and train our most precious data. There's pricing risk. This is actually a pretty big risk as well. Like these days, it's a tangent, but we were attending the there was a summit that we were attending and there was a bunch of different vendors there. And our marketing team made an observation that every single booth had the word AI written on it. Even if it had nothing to do with AI, like, you know, they just like selling cookies. I'm exaggerating here, but like everyone felt compelled to put AI in their positioning just because it's hype. And what it's done is that it's bad in two different ways.
Barzan [00:30:20]:
Number one, it's muddies the water, so people can't tell who's actually leveraging AI and who's just using it as a as a as a buzzword. But the second thing is that you've got these CIOs who got really excited. They were sold a, bill of goods, but then they they, they were let down. They spent all this money and all this time and resources and at the end, nothing was delivered. So there's that pricing that's like, how do I know that this is gonna be successful? I'm not just trusting you. So the reason why actually Kibo was able to create these models and give you a word adoption and and and have a lot of happy customers is because we address these four risks like head on. The first like example was the implementation first. Like we before we even wrote the first line of code, we made this decision that whatever we do should not take more than thirty minutes of one engineer's time to onboard.
Barzan [00:31:11]:
Like that's we used to joke that if we take more than thirty minutes of your time to implement Key Vault, we're gonna give you an iPad for free. Like here I'm sitting talking with you, we've never had to buy a single iPad. The second thing we did was the adoption press, right? That we told people like that our actually internal model is our first slowdown is our last slowdown. So if you think about like an autonomous car, like let's say, you know, self driving Tesla to give you an analogy, if you're a car vendor, if you're a vendor that's building autonomous cars, your first crash is gonna be your last crash. You can't just tell people, hey, this is a fully autonomous car, but almost autonomous. You know, it just has a crash one in thousand. It's not good enough. If there's a likelihood of a crash, then you're, you know, you're dead in the water.
Barzan [00:31:58]:
So what that means from a design perspective is that whenever in our algorithmic work, we had to fork whenever the agent has a choice to choose between increasing the savings for the customer and protecting performance, we take the latter. Because no one yields at you for why instead of saving me hundred and $2,000 you save me only hundred thousand dollars this month. But if you cause one slowdown that makes their jobs fail and their boss yell at them, then they're never gonna trust this thing up again. Right? From a security risk, we made this deliberate decision that we're gonna restrict our models to trade only on metadata. So we don't even see customer data, right? We train on performance telemetry, all of that from, pricing risk. We came up with this idea. A lot of people were critical of us, like, why are you guys making so hard on yourselves? But But like we decided, you know what, we're only gonna charge customers a percentage of whatever we save them. So our incentives are aligned with customers.
Barzan [00:32:49]:
No savings, well no charge. If we save you $3,000,000 we take a portion of it. If you save you $3 we take a portion, we save you zero, we take not, Right? So those risks are there. You just have to be really intentional about how you design your software in a way that accounts for those risks and addresses them head on. You can't build something and then figure out how do I convince customers that those risks are not there. You have to build it with these principles as as as your guiding principles in your design.
Warren Parad [00:33:23]:
So one of the things that I, identified early on with the AI hype bandwagon is I I think a lot of companies were using AI on their marketing pages as a proxy for, I don't actually know how to talk about the value our product delivers. So I'm just gonna put so I'm just gonna put these two letters on there and pretend that that means something to someone, and they'll bring their own ideas about how that could be valuable. And I think before that, we we saw similar things happen in the past. I think just the speed and the velocity of change that's happened for the AI cycle has been so fast that it's really easy to see, from innovate like, innovation hitting the market, not, like, outside of academia, because we all know AI has been around for much longer than just the five years now. You know, we're going back twenty, thirty, forty years, where there's lots of papers out there, but in in business, realistically. And I we can we can actually see the change from innovation all the way to exploitation. And I still think that we have the same number of companies that are start ups, or even big giant, Fortune Fortune 50 companies that honestly have no idea of how to actually convey their value effectively.
Barzan [00:34:39]:
No. That's fair. And I think that's a big challenge. I think it's it goes both ways. Right? You've got at the top of the food chain, right, like CIOs and CTOs who hear these buzzwords, and they feel like we have to do something about it. The board is asking about it. I'd like, you know, that we gotta do something about it. And then on the other side of the equation, you've got sometimes ICs who are worried about, like, their jobs.
Barzan [00:35:08]:
Like, hey. If we, you know, we adopt this thing, then what's gonna happen to my job? And, like, my reaction to that usually is, like, if you worry that AI is gonna take away your job, it's probably going to, to take away your job, right? Like if you like resisting it because it's just like laws of physics, right? If you're standing on the wrong side of history, it's just a matter of time, right? Like you cannot, it's happening, right? The best you can do is to sort of empower yourself with more knowledge of how to best leverage it. Like there's a there's a huge market for, engineers and and, DevOps folks who understand AI. They know how to best leverage it. Like, you know, MLOps is a thing, right? Like, you know, how to, like a lot of the machine learning experts that come out of academia don't have the faintest idea of how to deploy something to real world. Like, so like, you know, you need these engineers who can just who understand the high level concept and they can, you know, you can partner them closely with your, machine learning researchers and and experts to sort of build stuff that can actually get deployed, get trained at large scale and get trained and and have, the right level of robustness and and reliability. So there's a lot of things that people can do to protect their jobs. Just, you know, go take an online class and and brush up on your stats class and and, you know, take a machine learning course.
Barzan [00:36:33]:
Try the few tools that are out there. One of the biggest, anti patterns I'm seeing these days is, which I think has plagued the software industry on the consumer side of things, is is this, unreasonable urge for building versus, you know, for build versus buy. And I think significant amount of engineering cycles are getting wasted by people giving into the their own natural instinct of, oh, I just wanna build everything in house. And you'll be surprised, like, very few CIOs and leaders are able to sort of tell what's the right time, what's the what's the right thing to build versus buy. And I see people get that wrong all the time.
Warren Parad [00:37:17]:
I I I liked your call out here on where you should be concerned and how to train yourself or grow further. I I mean, the the idea that if you fixate on the fact that your job is gonna go away, then it probably is actually really reminisces for me a concept from, of all places, Hawaiian shamanism, which is, like, if you fixate on this thing, you're actually bringing it into reality. You are making it the the case. And I I do really think that manifesting. Yeah. For sure. So I do think that there is a lot there. Like, if you wanna be, like, you can figure out what your job should be and what you wanna be an expert in and how to achieve that.
Warren Parad [00:38:00]:
And maybe it's not a fit for your current company. But for sure, if you just worry about, the fact your your job may or may not be going away, there's definitely an aspect of, and this is something that I I've picked up recently, and I've been trying to live by. It's not necessarily the easiest thing, but I think it's ancient, Confucius wisdom here that if you if you worry about the future, then you you cry twice. You you you feel the pain twice. You know? It you know? There's something you can do about it right now. And rather than worry about a future that probably won't even come, do that thing. And and if it does come, then you're at least prepared.
Barzan [00:38:39]:
Oh, %. No. % agree.
Warren Parad [00:38:43]:
I'm sort of curious about the verticals that you see. I mean, we talk about data intensive systems a lot. And, like, what falls into that category? Like, concrete thing. Yeah. What kind of data?
Barzan [00:38:57]:
One of the interesting things that again has happened with the rise of I mean, there's a reason why Snowflake had one of the largest software IP over. One of the things that this new breed of technology has actually one of the changes that is made in the way that data is being consumed is that it's become number one size agnostic. Like back in the day if you had a bigger company you had more data. If you're a smaller company, you probably had small data. And then there were certain industries that were like, you know, tech was known to be like, you know, much more, data savvy than for example, government or you know healthcare was a lot more protective of their data and you know there's certain segments or sectors of the industry that were more data driven. I think what we're seeing is that it's penetrating everywhere. Like I was talking to our local government in one of the states where you wouldn't think they would be looking at Snowflake. And they're like, no, no, no, we gotta get on that.
Barzan [00:40:00]:
We gotta get on that, you know, cloud data warehouse for these five reasons. This is what you know, we this is what we're trying to do. I was like, do you guys even have the budgets? I was like, but that's irrelevant. We gotta do it for these reasons. And then, so that's from a sector perspective. But the other thing which I think is even more interesting, even from a sales and go to market perspective is that you have no idea how much a customer is gonna spend is spending on their data infrastructure by looking at the size of that company. Civo has customers who are spending north of 15, and and more million dollars a year just in their snowflake bill. And they're a tiny company.
Barzan [00:40:41]:
Like, you know, not like two or three people, but they're less than, like, you know, 500 employees. And then we're basically working with like these massive multinational grocery stores where the entire cloud data warehouse bill is only $200,000 right? So I think that's interesting to see both directions. Like, the data is growing, vertically, but also horizontally across these different center, sectors. I wish I I, you know, I could give you an, you know, easier answer and say, oh, it's only fine, you know, fintech and and, retail. That's where we're seeing all this thing, but it's it's not like, you know, we're seeing it in healthcare. We're seeing it in government. We're seeing it in, you know, all sorts of, verticals. And the only common denominator is that companies have realized that they need data driven decisions.
Barzan [00:41:33]:
They've realized that they either have the data or they are part of the data and they have to go and acquire other data whether from their CRM, from their marketing tool, from their website traffic, from third party vendors. There was this insane, I don't remember, I don't wanna misquote this thing, but there was this insane report I read lately, which was saying an average, you know, a comp companies now on average combined X number of data sources and X was a wasn't insane numbers. Like one of those numbers you're even embarrassed to quote. The thing was like, it was definitely north of 20 and and it's it's insane if you think about it like that's a lot of complexity right there, right? Like companies should not have to deal with this with their own resources. If you're a bank, you gotta focus on what's making you a differentiated bank. If you're a marketing company, you have to focus on your core business. You shouldn't be in the business of building and optimizing your own data infrastructure. You gotta, you know, you gotta automate that part.
Warren Parad [00:42:31]:
I I think part of the problem here is I think as it it's sourced from humanity, this this idea that growth equals good and that your total addressable market can actually increase in size over time, and you can make it happen. And these companies are lacking ways of growing still just a little bit bigger, and so they're spending a nontrivial amount of money pulling in almost nonsensical data, nonsensical sources, things that aren't so relevant in order to even increase their market share by percent, like, pips, you know, hundreds of percentage points because that's all they can do. But once if you realize that your market is only so big and that's it, you know where you should optimize for and potentially just stop there. Focus on cost reduction, on optimizations in what you're doing rather than trying to add yet another product or another feature or service in a way that doesn't really add fundamental value to to to your users. You actually opened the you you stepped in this and you opened the door, and I I wanna wanna ask you about this. I feel like since the exploitation of of of LLMs and the data that we have that's been created since the the Internet was conceived of an as an idea, we're losing public access data. Like, the the datasets that are available just from scraping individual websites or just freely available, I think, is actually decreasing. I I dare say that the end of the Internet has come, or it's it's on its way.
Warren Parad [00:44:00]:
That connectivity is no longer what we're optimizing for. And I'm wondering where you see this going. Like, is it private datasets that are curated and data brokers? I I know that CloudFlare has jumped up and down and said, hey, look, we can do this. We're already blocking bots, and we know what data is being transferred and who has websites on our platform. We will sell you this data, AI scrapers, that you can go purchase from the data sources. Is that the future?
Barzan [00:44:27]:
I think people are doing move on. It's hard to predict the future, but it's also easy because, like, no one's gonna remember to come back and hold you accountable for misprediction. So I didn't call that out but, but I think if we, you know, within the next decade, like I usually have, the The thing is like if you spend too much time in any particular area you can see things that are pretty obvious to you but maybe they sound weird to others who have not been following that thing. But a lot of things that might be a surprise to others for example like the success that chat GPT had was a surprise to a lot of others but not to those who were tracking the progress like over the years. So I think in terms of data and selling data as an asset, I think we're actually already moving past that right? Now people are selling agents that are trained on that data. I like, you know, there's a reason why there's all this, you know, excitement about, you know, like you guys have seen the news about deep seek and what it means for, you know, the use of GPUs and like, you know, the investments that companies like OpenAI and have done. But the bottom line is that there's an arms race, like you basically train these AI agents, instead of having companies sort of just go and purchase this data and then clean the data and then combine the data and then build apps on it and then monetize it and then maintain it and tune it. Like you just buy these agents.
Barzan [00:45:49]:
I think we're past selling data and we're in the place where we're selling agents that are already trained and ready to be deployed.
Warren Parad [00:45:57]:
Deployed. If the data goes private, though, no new agents are gonna be able to be spun up. So, you know, from that standpoint, we're we're at the roads end of where the AI innovation can take us. Like, I I feel like fundamentally in order to keep evolving and innovating, we we still need new fresh sources of data with combining all of the, humanity's collection so far in order to actually train on all of it and get the most effective agent being built? Or is I missing
Barzan [00:46:25]:
something here? I mean, in theory, but if you think about it, the majority of the humanity, the data is actually called minority of humanity. Right? There's, like, three big players. Like, I mean, that's the, almost sad part of, like, how consolidation has been working. Like, you know, you have two or three major providers who are seeing and recording and monitoring 99% mean. It's tough for coding, but, like, I mean, if you think about it, like, you know, Google doesn't need me to send them a copy of my hard drive. Like, they see my emails. They see basically my, you know, usage pattern on my Android. They see, like, you know, the, the the content that I'm consuming.
Barzan [00:47:12]:
They see the books that I'm basically searching for. Amazon knows the items I'm buying. They're looking at every book that I'm reading. Like, they have a lot of this data that, you know, at least in The US, like, I can't speak for Europe. I think they have much better laws, when it comes to privacy protection. You don't even think twice about, you know, clicking and saying I agree to these terms of use. And and I think they have the majority of that data. Like, will we be better off if everyone shares everything and then, you know, we build this stuff? I don't know.
Barzan [00:47:43]:
I think it easily gets in the in into the area of security and privacy, which I don't know anything about. But I think if that was not a concern, probably the answer would be yes. But I know that is a concern. I also know that there's very few players who who have already, you know, plenty of data. I mean, OpenAI has the data that they're actually escaping. But is it gonna plateau? Probably, there's gonna be I think these things are gonna become a commodity. These agents will become a commodity. You know? The arms race will not continue, and then we'll move on to the next thing after that.
Warren Parad [00:48:14]:
Well, that's an interesting point. Maybe you all like, there there's this idea in in biology where you just need a limited set of unique individuals in order to, propagate the species without, too many mutations of which it will, you know, collapse under inbreeding, basically. Like, maybe there is some set of data that we only need that much in order to uniquely be able to create even the best trained, agents that we possibly can. Additional data won't won't help us in that way. And maybe we've gotten that. Maybe we'll get it That's
Barzan [00:48:46]:
actually the the crux of, learning theory. Right? That basically the error will go down, you know, one over n when n is the size of a dataset. Right? Like so that basically means you more data at some point is not gonna significantly reduce. More training data will not significantly reduce your error. Obviously, that depends on the sparsity of the data. You know, the whole idea behind DC dimension and whatnot. But the the main idea is this, like, you know, I know we're not really good at particularly, you know, predicting election outcomes. But the idea of, like, these election surveys is exactly the same thing that you don't need to go and ask every 300, you know, every one of the 300,000,000 voters.
Barzan [00:49:25]:
You know, if you have a sample that's large enough past that, you're not gonna significantly increase the accuracy. I think that's that's definitely true that there's a diminishing return. It doesn't mean more data is not gonna help. It means that there's there's a diminishing return.
Warren Parad [00:49:43]:
I can't see Jillian, and I I think that means she has some secret questions that she's just trying to figure out how to put it to worse.
Jillian [00:49:52]:
Yeah. Because I kept I kept freezing when I have my camera on, so I thought that I'd turn it off for a little bit. Yeah. Yeah. I definitely agree with you guys on the like, you can keep adding more data, and that doesn't necessarily make it better. But we're also always getting new data. Like, we're always producing, like, new and new and different data, and we need the new and different data too. So I work with, like, a lot of medical data, and we're kinda constantly changing just everything, the resolution that we can, you know, see the data at, the amount, just more insights, more more everything.
Jillian [00:50:27]:
So I don't know. I have very mixed feelings about this because I've definitely been on projects where we've been, like or somebody's been pushing more to, like, we'll just make it better. Can't you just add more data? And I'm like, no. You see the last three datasets that we added to train it? They didn't actually do anything. Like, here's here's the graph. They look like more data, and I'm like, well, you're my boss. So, like, okay. But this is silly.
Warren Parad [00:50:46]:
I mean, I'm I'm with you, and I also think that the medical industry, that vertical is actually more unique in this way. I think our lack of full understanding of even our human bodies, but organic material organisms in in general means that we could benefit from having more data there realistically. And I feel like there's there's there's so many things that we haven't figured out there. The other verticals, I I question a lot. Like, I've worked, I think, five different companies now in total, separate from all of the consulting that I've done and advising. And all of them were like, our data is precious. We must save all of it. And I'm like, you don't need that data from ten years ago where you were measuring the deviation on vibration tests of this one product that you don't even manufacture anymore.
Warren Parad [00:51:33]:
Like, do I I assure you, you can throw that away. It's not going to help you. And yet they're like, we gotta keep it. I'm like, okay, AWS, you know, Glacier. Yeah. I think you're right.
Barzan [00:51:47]:
Well, storage is cheap. Right? But I mean, you're right. To your point, I mean, medical data, I remember I was working with one of my, colleagues from the med school, and we were trying to predict, he was a cardiologist, and we're trying to predict, train models that predict the chances of an organ after a, I forgot the medical term. Like, you know, when they basically do an organ transfer, the body the the host body, there's a chance that might reject that organ and they use antibiotics or whatever to suppress the immunity system and whatnot. And there's, you know, complications, all of that. And and the idea was to predict the risk of an organ rejection. Yeah. It's called transplant, trans
Warren Parad [00:52:30]:
Transplant. Yeah.
Barzan [00:52:31]:
Transplant. I think that's a medical term for it. But I remember, you know, they're saying, like, you know, at Umesh, we have one of the largest cancer datasets on the planet. And then when we looked at it, it it was, it was a number that, like, I I forgot the exact number, but it was something, like, close to 300. And I was, like, how is this the largest dataset on planet? It's just, like, medical data by definition is way more sparse because there's, you know, there's only, whatever, 6,000,000,000 or 7,000,000,000 cap on how many you can collect. And for any particular disease, there's a very small subset of them you have access to. So I don't think the laws of large numbers do apply to anything that's about this. I mean, with DNA and stuff, that's different.
Barzan [00:53:10]:
But, like, you know, when we talk about individual humans as data points, I agree. I think that's that's probably an exception. I don't think we're at the place where we don't need more data.
Warren Parad [00:53:19]:
I mean, we
Jillian [00:53:20]:
We need the data science companies to just go sit off in, like, the a corner for this conversation when we're talking about, like, you know, building agents off of data and how much data should we have and when do we stop. Because, like, medical data climatology, I don't I don't think the answer is ever or not right now, anyway. It's not anything I can see.
Warren Parad [00:53:39]:
Yeah. I mean, I'm with you. I think the problem in the medical field, though, is that it's not public. I feel like the the climate climate data and tracking, like, there's a lot out there. Whereas in the medical field, like, that's controlled by private entities who are bound by local regulations on on even sharing that, which is in a ridiculous way. And the data, there's there there are companies out there that do anonymize data exchange in the medical field specifically to sort of help overcome this problem. And, you know, like, there's no not a benefit for the patients. There's not a benefit for the providers, the for the for the government.
Warren Parad [00:54:11]:
Like, there's there's very little benefit here unless except the end company who may be able to use all this for for, the good of humanity. And, that's a hard sell, I think, when there's dollars on the table on the other side.
Jillian [00:54:24]:
And a lot of medical data is supposed to like, if it's used for research, it's supposed to be public. I mean, it's not always or it's, maybe not in, like, organized in such a way that it's even usable or, like, there there's a lot that can go wrong with that. But there is a lot of medical data that's public.
Warren Parad [00:54:40]:
Well, I I think part of it is legacy systems that aren't optimized for even storing the data in elect in a, electric medical record format, like, if it's not electronic. Now you end up before we were talking about hallucinations in the world, which is, you know, still something AI focuses, we had the giraffe problem, where looking at a image from a medical document would likely render, positive on whatever the diagnosis is that you were trying to track just from the existence of, ruler or, the way that because it was an X-ray or things that had nothing to do with the actual, information that was contained in the document. So I don't know. I'm I'm with you. More more data in medical field for sure. Anyone who's working on that, like, you know, don't stop.
Jillian [00:55:30]:
Always more data. Yeah. I don't know. Storing biological data is, like, such a problem too, but that's that's probably another, that's another topic for another show, I think.
Warren Parad [00:55:40]:
Well, now now you've got me.
Barzan [00:55:41]:
A lot
Jillian [00:55:42]:
of, like, the cloud things, we could probably have AI agents because a lot of a lot of it is probably autoscaler problems. But I think, doesn't AWS kind of shut a lot of those down? Like, I was talking to, I forget what, like, a vendor through a client, and they told me, like, yeah, AWS basically made our, like, business model obsolete because we're trying to we're trying to save money. Although this is a lot of hearsay, so I'm not sure that I should be repeating this. But, anyways, it it did seem like they had something where they had, like, agents or AI running around in the background to try to cut down on costs, and, it was it was not well received.
Warren Parad [00:56:19]:
I mean, if you if you build something on a hyperscaler, there is a there is a chance that they will find a way to to recapture that value and claim it for themselves. Like, if every one of your customers needs to do something, it benefits everyone to bring that value that you can deliver back into the platform. So I and I know AWS is actually pretty good about doing that, rather than forcing everyone to use a third party company to achieve the the same, benefit. Like, you know, it it it's surprising to me that companies like Snowflake or Databricks, there's a couple other ones out there. I think Datadog's another good example. There's companies that just sit around and help customers spend less money on these platforms. And if and if that was me, like, if I'm Snowflake or Datadog, I'm just like, okay. I think it was, like, Coinbase was spending almost a hundred million dollars a year on, just data analytics coming from their platform.
Warren Parad [00:57:13]:
And they're not they weren't very big when this got reported. And then they're like, we're gonna have to do something about this because that's apparently too much money. And that is a lot of money to be spending on it. Mhmm. It's just it's just a bit ridiculous because if you know lots of customers have this problem, like, you would think that lowering the price point in some way, not by changing your pricing, but figure it by doing those optimizations helps all of your customers in some way. Otherwise, they're just, like, would you otherwise, they're gonna pay a third party company to help them do the same thing anyway. So I think over time, as you get more and more customers who all have similar problems, they have no choice but to bring that, effort in house either by buying a company that is doing that for them or spinning up their own internal version of it to optimize.
Barzan [00:57:59]:
I think it's, it's it's, it's I think it's there's two parts to it. One of it is, like, why would a big vendor invest in reducing their own revenue? Right? Like, you know, Snowflake's, you know, stock price is a function of their revenue. Right? And if they wanna reduce their own profit margin or actively be in the business of reducing their revenue, I think that that that will not go very well with the with the shareholders. But the other thing but the other part is, like, focus. Right? Like, you know, as a as a vendor, you always have to protect the main body of your revenue. Like, this is like the innovators dilemma. I like that you have to, like, you can't work on niche opportunities. Like your job is to build a database that anyone on the planet can use right? Now what's gonna optimize this kind of workload might be different than what's gonna optimize this other customers particular use case.
Barzan [00:58:51]:
And that's where I think startups excel a lot. But I think they also realize that if they there is a reason why, like we're partners with Snowflake. There is a reason is because there is a reason for this, like they see value in us serving their customers almost in a unpaid customer success capacity. I call ourselves, I sometimes joke that Kivo is Snowflake's unpaid customer success department because, like, we prevent their customers from churning. Right? Like, at the end of the day, if I'm spending a lot of money and I'm not able to get all my use cases onboarded, and I'm under, you know, pressure and the CFO is yelling at me, I'm gonna look outside. So, key was helping customer get a better, performance per dollar, for the for the budget that they have. So I think there's there's also that bigger picture, although, you know, sometimes the sales reps don't share that same compassion. No.
Barzan [00:59:44]:
I mean They're thinking one quarter at a time. Right?
Warren Parad [00:59:46]:
Yep. Yeah. I think that's the biggest problem. If you look at the brand of a large data company for or even any large company, you have to look out multiple multiple years. And you're absolutely right. Like, the the value that you're providing them as part of the Snowflake network is higher than the amount that it would cost them to maintain that same piece of functionality in internally or the amount of revenue that they would lose if, say, all their customers had access to that functionality just straight away or it was automated in some way. So, I mean, if you look at that equation, then realistically, you know, how you want the network to be you want everyone to be happy in a way. And so if what makes them happy is that there's little start ups out there that are helping them reduce their bill a little bit, then you let that be the case.
Warren Parad [01:00:31]:
I mean, I the the economics obviously change at larger scale when all of your customers have this problem, or they're all unhappy because of how it's going. So I think we're we're about at the hour now, and I I I feel like this is a good point to, you know, maybe say, like, okay. Is there maybe one last thing that, you just wanna share? You feel like maybe we didn't touch on that. That could be an interesting topic. Something to close out the episode with.
Barzan [01:01:00]:
I think when it comes to software design, one of the things I've sort of recently seen it explained very well. Sometimes, technical people like to have a lot of knobs because, you know, we usually think more flexibility means more options need means better adoption and all of that stuff. I think one of the things we've learned, the hardware is that actually the fewer choices you give people, the more likely they'll make the, that you'll get adoption, right? There's a, but this week, I, you know, I read this somewhere, and it was summarized pretty well. I think, apparently, there was a very successful shoe salesman in LA back in the fifties, and they interviewed him and asked him, like, what's your, secret? And he said, my secret is the law of two, not three. And they asked him what you mean by that? It's like whenever a customer asks me to bring down a shoes that they can try, and then they ask for a second one, I give them to them, I give those shoes to the to the customer as well. But if, they ask for a third pair, then I tell them which of these two would you like me to put away. And and the reason is they figured out that when they give customers two choices, if they give customers three choices or more, they're likely to buy none. But when they give them two choices, they're likely to pick one.
Barzan [01:02:22]:
And I think that actually applies in some really profound ways to software design and and AI adoption. If you basically overwhelm people with, like, 20 different knobs, then essentially you're sending this message that I don't know how to tune this for you. I'm I'm throwing, you know, throwing all of this over the wall. You have to figure it out. And by the way, if you get any of them, you know, wrong and things go sideways, you have to own that decision. But if you can simplify that, you get a lot better adoption. Like in our, like for example, at Key Vault, we give a slider to customers where they can choose between best performance, you know, good performance, balanced, high savings, highest savings, like these are things that are not, they all, you know, turn into a vector, bunch of numbers and relational algae, you know, sorry, you know, linear algebra and all these operations. But like we give these high level things to customers and 99% of the time, we actually get better adoption with it.
Barzan [01:03:24]:
So I just thought like this was an interesting quote from this, fifth you know, shoe salesman from fifties where, you know, what what is it now? Twenty, seventy five years later, a lot of us still, like, you know, overlook in software design. I thought it's an interesting observation to share.
Warren Parad [01:03:39]:
Yeah. I I also forget that we're we're in the twenties now of, 2,000. So, you know, when when doing when doing math for for last century, that, that still trips me up. No. I think it's a really great point. I I think it's a really interesting perspective there, which goes in the direction of developer experience and user experience for not not just selling the product, but making sure people actually understand what they're doing. And there is an aspect of decision paralysis there that really drives into what people are going to do or how they're gonna use the tool effectively. Okay.
Warren Parad [01:04:15]:
Well then, Jillian, should we move on to picks? Sure. What do you what do you got for us today?
Jillian [01:04:22]:
I'm gonna pick, Infinity Nikki. It's a video game, and it's just this open world game where you just, like it's you you just run around and you just try on pretty dresses, and it's nice. And it has very, like, satisfying mechanics of jumping off buildings. That's it. That's the game. I think there is actually more that you could do in the game, but there's not more that I'm going to do with the game. So that's, like, the extent of my knowledge.
Warren Parad [01:04:45]:
I think what everyone is needs an answer to is how much AI is in the game.
Jillian [01:04:51]:
I don't know. I don't know. Maybe I think it's all procedurally generated. I don't think it has, like, any AI anything.
Warren Parad [01:04:58]:
So you're saying is there's some future DLC for the game studio that's coming?
Jillian [01:05:04]:
I have been wondering, like, if video games are gonna start to make NPCs. Like, if they'll just have them just just have, like, agents and, or not agents, but those will all be AI. So then you could, I don't know what, ask it for a cake recipe or something. But, like, that does seem like some low hanging fruit for the video game industry is to just do that. But I don't know if it would be, like, cost effective rather than just Yeah.
Warren Parad [01:05:26]:
That's script. Yeah. I could say, you know, the video game industry notoriously super high margins and lots of extra capital to spend. Yeah. They
Jillian [01:05:35]:
do. I don't know. So I don't see it happening there, but maybe it'll come up someplace else. Yeah. Because video games are interesting because, like, everything else can be procedurally generated. So I don't know where you put AI.
Warren Parad [01:05:45]:
I mean, I know some people that are using the foundational models that are out there to sort of make single player games out of, as for a gate engagement and getting them to either basically DM your you or be the game master for you, as you sort of play the game. So things that would have normally required multiple other people turning it into a single single person experience. So I I think it's possible. But, yeah, I think the the cost is gonna be okay. I like that pick. Barisan, what do you got for us?
Barzan [01:06:19]:
The, an interesting book, I think, that I would probably, recommend. It's not really related to engineering, but I read a lot of books, so I like I, you know, I really love books this month, but I think this is under the one of the good ones. It's called Never Split the Difference. It's,
Warren Parad [01:06:41]:
Chris Boss. Yes.
Barzan [01:06:42]:
I've read that book. It's it's amazing. It just talks about the different characters of people, like, when it comes to negotiations, like it talks about like, you know, you've got the analyst, you've got the negotiator, and then you've got the accommodator, and if you're basically, sorry, if you got the, you know, the assertive type, right? So if you're the if you're an accommodator, and you talk to an assertive person, you're just giving them an opportunity to socialize with you and that's just offending them and things like that. So I thought it was pretty interesting, like a lot of those things where you kind of learn from muscle memory and the thing if you sort of be more intentional about it, it just makes you a lot more effective in in in in in day to day communications anyways, not just in negotiation. So I thought it was an interesting, book that I have to learn to No.
Warren Parad [01:07:27]:
I actually I actually really liked it. One of the things that I took away from it really importantly that's helped me a lot is to understand the like, I I always thought the idea of, like, a win win scenario was made up nonsense. But, the way he puts it in the book is that you're optimizing for certain things, and the other person's optimizing for different things. And you can both optimize for the things that you you want as long as you make that information public and and you share it and and you you converse about that. As long as you keep it hidden and and secret, then you can't ever really get the other person to move on that potentially. So I think about, like, salary negotiations and engineering. I don't have them at my company for engineers that we hire. It's it's we don't just say, like, hey.
Warren Parad [01:08:10]:
You know, this is how much you get. If someone wants more money, we have a conversation about, like, what is that expectation that comes with the change in salary? It it makes sense to talk about that. If you want this, then there's this other part that's important for us. Like, for instance, people that wanna be, say, a senior engineer and we think they're more at just the, I mean, engineer level two level, we would say, like, well, there's higher expectations. And that means that if you don't meet these expectations, there's a greater chance that we'll have to either reduce your level in the future or we'll have to let you go. So, you know, is that a risk that you wanna take? Increased risk for increased reward potentially. No. I, I I I really like the book.
Warren Parad [01:08:50]:
So Yeah. Yeah. I And
Barzan [01:08:51]:
then, yeah, like, you know, they talk about, like, the idea of the example they say like, they one of the examples they give about this win win situation is, like, if you have a hostage, situation where they have, like, you know, four hostages and you say, hey. Like, you know, they threaten to kill all four of them. You say, how about, you know, meet in the middle and you only kill two. It's like it says, like, this idea of meeting in the middle is ridiculous. Like, you know, it doesn't work that way. You really need to know what is the outcome you're driving towards. And, there's some I think there's a lot of interesting takeaways from that book, like the one that you mentioned. Yeah.
Barzan [01:09:25]:
I mean, I
Warren Parad [01:09:27]:
yeah. No. I think it's a great it's a great pick. My my pick today is gonna be the l eight conference, which this year was in Warsaw, and I just got back from speaking at, I, did a short talk about, building highly reliable software and why having, five nines is nearly impossible, more so than anyone thinks. So if if the LA conference is, you know, in your area and you're you're you're thinking about where to go, also highly recommend this one along with what I said last week. So that's it for today's episode. I wanna thank Berzin, for coming as our guest, and I wanna thank the the audience and all our viewers, for listening to this episode of the podcast. And that's it.
Warren Parad [01:10:12]:
And have a good rest of your week until until next time.

Barzan Mozaffari on Cloud Data Warehousing and Machine Learning Advances - DevOps 237
0:00
Playback Speed: