Functional Programming Shift and Scalable Architecture Insights - ML 158 - Adventures in Machine Learning -

Functional Programming Shift and Scalable Architecture Insights - ML 158

In today's episode, they dive deep into the evolving landscape of software development. Join us as Kirk, the CTO and founder at Graphlit, shares his journey from traditional software at Microsoft to pioneering perception ML for drone-based aerial intelligence. They explore the paradigm shift from object-oriented to functional programming, the crucial role of software architecture, and the challenges of maintaining consistent design and documentation in growing teams.

Hosted by:

Ben Wilson •

Michael Berk

Special Guests:

Kirk Marple

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

They also get insights into Databricks' approach to user-friendly API design and the importance of learning management systems in knowledge distillation. Listen in as our speakers discuss the strategic decisions in scaling products, the nuances of open-source contributions, and the value of automation in modern development. Whether you're navigating a startup or a large enterprise, this episode is packed with expert advice on building robust, scalable systems and the dynamic decision-making needed to thrive in today's tech environment. Tune in and elevate your development game!

Socials

LinkedIn: Kirk Marple

Transcript

Michael Berk [00:00:05]:
Welcome back

Michael Berk [00:00:05]:
to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. And I'm joined by my cohost. Ben Wilson. I update landing page websites for MLflow at Databricks. Today, we are joined by Kirk. He started his career at Microsoft as a software development manager. Then after Microsoft took a variety of technical leadership roles.

Michael Berk [00:00:29]:
Currently he's CTO and founder at Graphlet, and their primary product is an API that automates the ingestion, extraction, and enrichment of unstructured data from any data source. So, Kirk, when you moved to Kespri, am I saying that correctly?

Michael Berk [00:00:43]:
Mhmm. Yep.

Michael Berk [00:00:45]:
A company focused on drone based aerial intelligence, You specialized in perception ML. How was that transition from traditional software into perception ML?

Michael Berk [00:00:56]:
Well, I think, I mean, what we really saw was I mean, it's a lot about the data as, you know, as you guys obviously know of. I mean, how do you get the data into the hands of the the computer vision algorithms? That's what we were focused on a lot. And so we ended up, kind of leaning heavily on the ingestion side, the preparation side, and then letting our ML engineers kind of invent the computer vision algorithm. So I think for for us, a lot of it was just having really clean data, and then, and then innovating a bit on the on the computer vision side.

Michael Berk [00:01:25]:
Got it. Was latency a factor?

Michael Berk [00:01:28]:
We were off essentially offline. So they would, like, the the insurance companies would fly drone. They would upload to the portal, and so we were able to do it kinda in the background. But, turnaround time was was still a factor in that.

Michael Berk [00:01:41]:
Got it. So when I think drones, I recently watched a YouTube video by Mark Rober, if you guys are familiar. And it's basically all the different ways that you can demolish a drone. And I think it was, basically, a big commercial for a drone company. But the best drone, demolishing solution is actually another drone called the Anvil that is a really fast, really powerful drone that just smashes into the other drones. And they can be deployed around bridges or, like, nuclear power plants, whatever it might be, to protect. Did you guys ever work on security for drones, or was it more just are the drones seeing the right thing and acting in the correct way?

Michael Berk [00:02:21]:
Well, it's interesting. So the company had started as a hardware company building their own drones, before I before I started there, and then they were doing a transition to a pure software platform. And so more of a data analytics, data cataloging, tool that we're that I was working on the last year, before COVID, and then they ended up getting sold off and integrating into another product. But yeah. So it was it was really more about the data and then just kind of the perceiving in the real world, cataloging it, search and retrieval.

Michael Berk [00:02:49]:
Got it. And then there, how did you think about building a successful software team? Because you've you've built a bunch of them. Was there anything different because it was more in the hardware space? Because when you started at Microsoft and then went over to GM and then Casper, I'm sure there's different challenges based on the end product. So did you notice any of those challenges change how you create Teams?

Michael Berk [00:03:11]:
That's a really good question, because it it's something we looked at of kind of there's, like, the Spotify approach, I mean, with guilds and that kinda that kinda model. And, I mean, there's different teams of, like, is it a full stack team? Do you separate out a design systems team? We ended up because it was moving into pure software, we really had to think about the UI side. So it was a more UI heavy product than maybe their original stuff had been. And then the other side of it was actually code reuse and thinking about, could we refactor out, like, UI design systems between their two products? And so that was that was definitely something we had to think about in terms of staffing. But then also the, the back end of that product was a was a brand new infrastructure. And so I think learning scalability and cloud I mean, honestly, just the CICD and the infrastructure deployments like Terraform, That skill set wasn't necessarily something we hired for, but it had we had to train for it, in some ways. So you really it was a multilayered product where you really had to think about all of that. And, especially just once you get up to scale I mean, this isn't just a demo anymore.

Michael Berk [00:04:19]:
Like, how do you actually make this thing scale under a lot of data load? And so, it was I mean, we basically had a a small team kind of incubating that, and then that became ended up, kinda growing out.

Michael Berk [00:04:32]:
Got it. Yeah. That's actually a topic that comes up a lot, on the podcast, which is there's these demos and blog posts, let's say, and that's not super representative of what you see in large organizations.

Michael Berk [00:04:44]:
Right.

Michael Berk [00:04:44]:
So getting into that, how did you make stuff scale, and what does that mean to you?

Michael Berk [00:04:50]:
I mean, that so that company actually, I mean, we kinda ended up it was it was an interesting situation just so we kinda got to the point and then COVID hit and it kinda changed the business a bit, but so we never got to a point of scaling that product there. But at my current company, that's really what we focused on from day 1 is getting something that, I mean, under load will automatically scale out, be able to handle, I mean, millions of files or pieces of content. And, a thing I'm a fan of is, like, an event driven system that's I mean, you have a queue that you put everything into. It kinda feeds out into multiple servers or serverless architecture, and that can actually burst out capacity. Because we ended up I mean, we end up getting a lot of bursty consumption. I mean, as people are kind of loading data in, and then maybe it's just consumption for the next hour. And And so we wanna be able to balance those, those two things. So it's, I think it's important to think about if you're going to have to get there, and you might wanna bite the bullet a bit and kinda think about that kind of event driven kind of even I mean, serverless or non serverless, but at least just a a cute architecture.

Kirk [00:05:56]:
I've got a sort of a meta question for you.

Michael Berk [00:05:58]:
Yeah.

Kirk [00:05:59]:
And this came up in a previous podcast as well as we were talking to another innovative founder. Do you think you would have been able to build what you've built, hire the people that you've been able to hire, and move as quickly as you have been able to If you were trying to do this within the confines of an extremely large tech organization that's that's sort of focused on

Michael Berk [00:06:25]:
Yeah.

Kirk [00:06:25]:
Enterprise support?

Michael Berk [00:06:28]:
That's a great question. I would say no. I mean, I've I've been doing I mean, probably 75% of the last 20 years have been start ups or small, like, and 25% at bigger companies. And 100 per I mean, you could totally move faster at a at a small company. I think there's a level of sort of dynamic nature of, okay, you have to adjust on the fly. I mean, maybe it's not pivots per se, but it's, I mean, month to month. You may make some pretty quick turns and left and right as the market dynamics change. I think which is really hard to do as you're thinking about, I mean, the the team management and promotions and all the, I mean, all the other dynamics.

Michael Berk [00:07:09]:
Like, Microsoft was I I mean, I love my time at Microsoft. It was very political. I mean, it was very like, you really had to think about your career there and if you wanted to move up and all that kinda stuff. And at start ups, there's sort of a half life that's a lot shorter. And, I mean, you're not for better or for worse, you're not thinking about that. You're just, like, trying to get product out the door. And, I mean and so it's I enjoy the 0 to 1 a hell of a lot more than than the bigger company now, but it it definitely changes the dynamic of how you, of what you can build. And and you see a lot of companies, slow down after acquisition.

Michael Berk [00:07:43]:
I mean, I just talked to a friend who their company got bought by a big Fintech company, and he's like, yeah. It's, like, boring now. And and it's not I mean, you hear that all the time. So

Kirk [00:07:53]:
Is there an inflection point? You know, as somebody who's definitely probably the final approval on hiring technical staff, Is there an inflection point in the life of a startup that's focusing on something cutting edge like yours is where you say, I need to start hiring a different sort of person

Michael Berk [00:08:13]:
at

Kirk [00:08:13]:
a certain point. And what would that inflection point be for for listeners to who are curious about this? Like, what what does the CTO think about or CEO think about when we need to shift the hiring dynamic?

Michael Berk [00:08:25]:
Yeah. I think I think there's a level of just kind of fleshing out. Like, it's easy to scale one front end dev to 3 front end devs. I mean, if they're all on the same stack or they're all kinda using that, I think they're but you do start to hit a point where I mean, it's like the Amazon, like, 2 pizza teams. Like, they're I think it's around that 5 to 7 kind of per functional area that becomes more difficult because then you have to maybe think about having a lead. I mean, the CTO can't sort of manage them directly. And you whenever you'd have to start adding an extra layer in there or, I mean, or maybe it's an area that, the founding team doesn't have experience with. Like, I mean, I think that's another area that's that's harder because you have to find somebody that can be that that kind of chief to that you can trust to, I mean because every I mean, every startup founder is a micromanager.

Michael Berk [00:09:16]:
I mean, you have to be, I mean, to to do it right. But there there's a point when, okay, how do you hand that off? And, and I think that's when that's when things become tricky as as you grow.

Michael Berk [00:09:28]:
How do you personally create the cutoff for you should hire versus you should learn it?

Michael Berk [00:09:39]:
It's it's a good question. I definitely have seen the there's some things that are just more like, okay. I'll I'll hire somebody on a project basis. Like, I know we have 3 months of work. We'll just hire somebody, plug somebody in, and accelerate. I think there's there's areas that that works really nicely. I mean, like, a good one is is, like, CICD. Like, I built our CICD pipeline, and it's it's gonna be the heart of what you do in the future.

Michael Berk [00:10:09]:
So, like, as a technical manager or whatever, CTO, whatever you wanna say is, you have to know it. Like, you have to be able to, do that because I think my reasoning is you're gonna have to be the fallback. Like, if other people have issues, who are they gonna ask? And so I tend to, I mean, always start learning it and maybe prototype the first one and then hand it off because they have to have a backstop of of to to support the team in the future. And so, I mean, I ended up doing that with our Terraform at a previous company where I just kinda built out the first Terraform scripts and did that. And it's not something I mean, I wouldn't call myself a DevOps engineer, but at least I understand it, soup to nuts, and how it all works. And if if somebody and then I could hire somebody and be like, oh, okay. Here's I mean, take this over and at least be able to answer any questions about it. So

Michael Berk [00:11:02]:
So it sounds like project duration and then project importance Is are the 2 axes where you would, define when to hire and when to learn?

Michael Berk [00:11:11]:
Yeah. I think I mean, project duration for sure. I think, what would what would be a good term for that? I mean, there's some things that are just threaded through the life of the project. And, so it's more like foundational things, I might say, like like the dev ops side of it. And I think that's an area of learning that's important. I think there's there's other areas, like, I mean, I don't even I mean, I don't concern myself a data scientist or an ML engineer. I'm more of just a platform guy that I mean, I can build scalable things that and you plug in different different APIs. And I'm happy to, like, hand that the the true ML innovation off to people that really know that better.

Michael Berk [00:11:48]:
And I think that's the kind of stuff that I wouldn't necessarily try and learn. It's, I know there's people smarter to be out there. And so I think that's another access there of, like, can other people just do a much better job out of the box? And and it's easier to just find really smart people to do that.

Michael Berk [00:12:05]:
Got it. So, market supply and demand in there as well?

Michael Berk [00:12:08]:
Yeah. Yeah. And, I mean, there can be times when cost is a factor. I know when we were at Casper, we were looking to hire ML engineers, and the salaries were just insane. Like, I mean, 200 k, like, starting for, like, somebody with, like, 3 years experience back at the time was I mean, and we're just seeing that was market. And it was just like we couldn't afford it, I mean, of of hiring at at some of those times. And, and sometimes that's when you just be like, okay. I gotta take the next 2 weeks to figure this out myself because it's the market is just, like, so expensive right now for people.

Kirk [00:12:44]:
Got it. Yep. That reminds me of things that I I had something I was working on last week that is sort of related to this. We data breaks engineering have a habit of trying to just make something work, figure it out, time box it, of course. Mhmm. And we were looking back through some old confluence docs that we had because we're just cleaning stuff up, and I found the version 1 of our CICD deployment for, you know, an internal project. And I'm looking through it. Like, yeah, this this definitely works, and it's robust.

Kirk [00:13:26]:
It it I tested it out in a staging environment. Yeah. It still works. This is cool. Except it took me an hour and a half to copy paste all of the shell commands that would trigger all of these different services to do everything associated with the deployment. And then comparing it to what we have now, which is entirely UI based, you click, you validate, there's links. It automates tons of stuff for you, and errors happen. They're bubbled up in a way that's very understandable, clear, and giving you a path for resolving them in a way that you don't have to go in and, like, do a log dump and figure out what's going on.

Kirk [00:14:10]:
That system that is now on, like, version 4, we hired for that. We hired, you know, a a number of people. Well, first hired one person, and now it's a team of 14 or something, and they build all of this stuff. They're all experts in that. They don't they couldn't come and do what we do, but we can't do what they can do.

Michael Berk [00:14:31]:
Right. It's just Right.

Kirk [00:14:32]:
Because, you know, you look at the tech lead on that team, and they're like, they have 25 years experience. And they hired a bunch of people with all 10, 15 years experience in doing this stuff and automating these systems. And I you just look at the volume of what is running through these systems. Like, hang on. There's a 100,000 builds a day that's been going out. This is insane, and it just works.

Michael Berk [00:14:58]:
That's awesome. I mean and that's that that scaling thing of I mean, there's almost like this pain threshold. Like, you're asking kinda what are the axes. I mean, at some point, when it becomes something that you're having to think about every day, the the pain is really growing of, I mean, part is just the importance to the company like that. I mean, if if there's some things that just have to work. And so, I mean, that may be something that, yeah, you're gonna have to have one one pair of eyes dedicated to that for life, I mean, as you as you grow. And I think it's but, yeah, like, you can always get by kind of doing things in a lightweight way, but then once they become kind of such a focus, I mean, you you gotta have some automation around that and scaling and to be able to scale that out. So yeah.

Michael Berk [00:15:41]:
I mean, we, I mean, we try to leverage tooling. Like, we use Azure DevOps for, I mean, of all of our builds and things like that. And, I mean, I'm a big fan of managed services where I assume there's a team at Microsoft that's smarter than I am that can do a lot like, do these things a lot better. And so I'm happy to, like, just let them manage that rather than trying to build our own pipeline and, I mean, things like that. But there's other places that I know we can do a lot better. And so trying to trying to balance that is is the important thing.

Michael Berk [00:16:08]:
Yeah. The buy versus build debate is is really interesting to think about because there's it's I mean, it's not that complex of a decision, but there's so many moving parts and so much uncertainty around what the alternatives can can provide. So, yeah, you're just working with very nebulous and, like, unrobust data. So, it's interesting to talk about.

Michael Berk [00:16:30]:
I think it's where you just have to kinda check yourself on a regular basis and be willing to change. And I think that's bigger, like, your original point, Ben, of, like like, what is harder about being in a bigger company? I think it's that slow moving ship. And, like, in a start up, you'd be like, okay. We made a mistake. Like, we screwed this up. Let's let's go do x instead of y, and and you can just just make a big change. And so, like, we we had something with our, authentication stack, in our UI that we're we're struggling with. And it's like, you have to make a decision.

Michael Berk [00:17:01]:
Like, are we gonna stick with this? Are we gonna flip it to, like, a different vendor? Or at some point, I mean, you you have to make those hard choices and and evaluate these things and and maybe spend the weekend, like, just redoing something because that's just the right thing for the long term. So

Michael Berk [00:17:18]:
For contract based tools, do you usually reevaluate when the contract is up, or do you try to break mid contract if something goes wrong? How do you think about that?

Michael Berk [00:17:27]:
That's interesting. Most of our longer term contracts are more productivity tools. So, I mean, it's like Figma and and Lucidchart and stuff like that. We don't have a lot of I'm trying to think. From a technical standpoint, it's mostly usage based things rather than, so it's, just more API type stuff. So I don't have a good I don't have a good answer for that just from our our like, in our engineering stack. I mean, we use linear. We use stuff like that, but we haven't we haven't actually made a change off those yet.

Michael Berk [00:17:58]:
Got it. Okay. Cool. So one of the topics that I wanted to focus on and shifting gears a bit is, as we alluded to earlier, you've spent a lot of time. You said around 75% of your time, working in smaller organizations.

Michael Berk [00:18:13]:
Mhmm.

Michael Berk [00:18:13]:
And with that comes the concept of 0 to 1 development where you're starting in a greenfield space, you need to innovate and iterate, and then hopefully have something that solves a business problem. So how is that fundamentally different from just incrementally improving an existing piece of tech?

Michael Berk [00:18:31]:
Yeah. I mean, I think it's it I mean, you're developing the vision at the same time you're developing the code. And I think that's that can be some of the harder harder part of I mean, it's I sort of have this analogy I was just using, like, one of the this week, and if you ask it a question, get a response and then ask a follow-up question. It gives a much like, you can ask it to refine it, and it gives a much better answer. And that's kind of like 0 one development where after you've been going for a year or so, you're just refining. Like, the actual software you have in there, you have a baseline of code, you have a baseline of, like, experience with the customer, and iteration can can you have a lot more data to make decisions on. In 0 to 1, it's a lot harder because you really just, like, you barely have talked to customers probably. You are really just kind of going on a gut feel and a vision that you have based on whatever market dynamics you believe is the right way.

Michael Berk [00:19:28]:
But, also, you have to show people something. And so it's I mean, especially for us, like, we're, I mean, an API first platform. It's like, how how do you show somebody an API and be like, hey. Do you wanna use this? And so you have to then build something, demos and things like that. So there it just becomes more of a it's not even an engineering task. It's a product task and a a more of a holistic task, I should say, from 0 to 1, which which I think I enjoyed. I mean, it's not just coding. It's, I mean, developing the market, figuring out the direction, trying to get feedback quickly.

Michael Berk [00:20:03]:
And so it it ends up mixing together all those dynamics rather than just, hey. Go build a feature and and, like, focus on that side.

Michael Berk [00:20:12]:
Ben, how much 0 to 1 development do you do these days?

Kirk [00:20:18]:
We're weird, with what we do. So we can get away with quite a bit of that. So in a given calendar year, we're probably 60% greenfield, which is not typical of a tech company like Databricks. And that just speaks to the culture at the company, the culture of the founders. Like, they they very consciously hired a bunch of like, a big mix of people who have a particular attitude. Like, it's not the move fast break things. It's the move fast and build 80% good enough, and then refine that later on with 40% of your time. And but you really don't want to hire a bunch of people that want to do the 100%.

Kirk [00:21:15]:
Like, hey. I'm building a refined product, and there are companies and industries where you need to do that. It's very important. Fun like, as you you alluded to financial services. I think that's what what taking a startup into that that ecosystem, it those people get bored because now you have to ship something that just works. It cannot be like, it's kinda good enough. People just won't use it if if if they see all those edge cases not working. So with what we do, we go get to that 80% on that greenfield, and we have a huge customer base, so we can get feedback quickly.

Kirk [00:21:55]:
And if it sucks, it's either, can we fix this and make it useful? Like, peep do people care? Then go and do that. And sometimes that is inventing new things in order to plug into that, but it's also the, hey. We we got this done really quick or in preview. Nobody's using it. Nobody cares. Mhmm. On to the next thing. Don't don't delete the code yet.

Kirk [00:22:22]:
Eventually, delete the code, but, you know, it allows you to pivot very quickly. And I think it's very challenging for engineering leadership to maintain that culture.

Michael Berk [00:22:34]:
Mhmm.

Kirk [00:22:35]:
And this is the only place I've ever been that they do that.

Michael Berk [00:22:39]:
And it it's interesting because, I mean, given the background, I mean, there, it's I mean, the academic culture a lot of times is for perfection of, like, I mean, continuing to research something and and it's actually great. I mean, that it's it you got a ship someday, and getting feedback is the most important part. And I learned that, like, super early in my career when I started Microsoft. My first project was this thing called Blackbird, which was essentially, like, pre web because I'm that old. It was, like, right before the Internet kind of launched, and this was their Internet publishing platform, for, like, magazines on the on the Internet, basically, pre HTML, I should say. And they killed it after the second beta. And so I had worked on there for, like, I think about the 1st year or 2 I was there, and they got feedback and they realized, look, the market's changing. Like, HTML publishing is now the thing, not this kind of private thing, like, more like what AOL was doing back in the day.

Michael Berk [00:23:34]:
And to learn that a big company like that, you just invested, like, all this this team and people and all this and marketing and to be like, okay. Go on to the next thing. And it's a hard choice to make. But I I learned, I was like, okay. Well, wow. Like, this is what you can actually do this. Like, and and then but a lot of the technology we built got disseminated in other products. Right.

Michael Berk [00:23:55]:
And so it was really an innovative product. But, but, yeah, it's, I mean, it's a hard choice to make, but you gotta get that feedback.

Michael Berk [00:24:04]:
Ben, do you kill projects?

Kirk [00:24:08]:
I think you would be surprised even though you work at the same company that I do. How many things are ideated on a prototype is built, and then we just they're like, yeah. Move on. You know, get some some candid feedback from people. We like, just tell them, give us brutal honest truth. Like, does this suck? And sometimes people like, yeah. I'm not using that, or I have no need to use that, and then pivot.

Michael Berk [00:24:35]:
Mhmm. Mhmm.

Michael Berk [00:24:35]:
Interesting. Yeah. I can't recall a single thing that has been killed that's been released. So I do field engineering, so I am working on customer implementations, and Ben is an actual software engineer that does Databricks engineering internally. But, yeah, I can't think of something that got killed.

Kirk [00:24:52]:
You wouldn't have heard about it. So That yeah. Okay. If something gets into, definitely general availability, like a GA release

Michael Berk [00:25:01]:
Mhmm.

Kirk [00:25:01]:
But, yeah, that's that's not going away. So that's why you're very careful about what gets to that point. And I'm I'm curious to hear from your perspective, Kirk, about how you think about that in a fast moving start up.

Michael Berk [00:25:16]:
Yeah.

Kirk [00:25:18]:
Do you do you guys have that concept? Or do you embrace that concept that a enterprise tech company would do like you did at mark Microsoft? I'm sure where there's a private preview and then there's a public Mhmm. Preview, and then you you ship GA. Do you guys do that? And what is your process for saying, like, alright. We don't wanna maintain this. Nobody cares.

Michael Berk [00:25:37]:
Yeah. Yeah. I mean, it's a a lot of it ends up being around the API surface area for us, and there there are some things that we end up deprecating. There are some things, like, I took a stab at of, like, architecting one piece of it, if, like, this is the way that, I don't know, part of the rag pipeline works or something like that. And and maybe we end up refactoring that away and saying, look. I don't I don't really like that approach. Here's a new kinda cleaner version. But we kinda have to do it in the sense of keeping I mean, we can't just rip things out of the API and break.

Michael Berk [00:26:08]:
I mean, if if people have code running or even if, I mean, even if they just reference it in the schema, it could break them. And so even if and so that's some of those areas. So, yeah, it it definitely comes up. I mean, there's some things that I've had to do with, like, maybe I have one field for something, and then I expand that into an object in the API. You have to really, like, maintain that and and know that we can future proof for that. And so I end up sometimes being a little over opinion or I don't know, maybe over specific where it's like, okay. If I'm gonna add something, I'm gonna add a place where I can add more things to it. And so we sort of think ahead and be like, oh, I'm gonna I'm not just gonna make that a Boolean.

Michael Berk [00:26:45]:
I'll make it an enum because, I mean, I can I can expand later? And so I think there I definitely think about that of kind of future proofing a bit to make it more easily expandable and and less less breakable from a from a service area standpoint. But a lot of what we do is just kinda hidden behind the scenes. Like, it's there's just workflows running, and, it's the API surface area that we we have to focus on. The the that's what the customers touch.

Kirk [00:27:13]:
Got it. Yeah. And you've been seeing that, Michael, as you've been digging in more and more into MLflow source code. You know, that surface area, everything's very structured and well documented or somewhat well documented, but everything's locked in. And, you know, you I'm sure you've heard us on on meetings and stuff that you've been a part of or somebody's like, oh, can I add this argument? And, you know, we're like, no. No. No. No.

Kirk [00:27:38]:
We're not adding any more arguments here. We don't need that. We can use this other methodology for doing this. But then you start going down to the the turtles that lie beneath and start realizing, like, wait. The back end has all of this private implementation that a user of this tool never sees, and we can change that willy nilly as much as we want because they're they're dev APIs. And that's the real challenge, I think, in designing API first services is making sure that you have that contract that is sort of sacrosanct. It can evolve over time, but it you have to think about, like, all those breaking changes and, you know, deprecations are scary. How do you notify users? Like, hey, this is going away in 6 months.

Kirk [00:28:26]:
You know, Do you need service support in order to help migrate off of this? And and what you said, Kirk, it really speaks to me on a personal level with that whole, like, hey. We have to think about how we can make this extendable in the future. Yeah. And I think that's one of the hardest parts of software engineering. Like, a lot of people who are looking at it from the outside are just like, well, yeah, it's hard to write code, and then you ask a software engineer, and they're like, no. Code's easy. It's it's hard to design code. So that it it's

Michael Berk [00:28:58]:
Yeah.

Kirk [00:28:59]:
That this thing will last 15 years.

Michael Berk [00:29:02]:
Well, it's I mean, my the previous company startup I had I had for 10 years, and we I mean, I I developed some patterns that I'm actually kind of reusing some of those same patterns today of, like, how do workflows structured for this? I mean, are they kinda sequential? Are they like a DAG? Or and, I mean, a lot of the patterns I developed of just what worked over time, you end up just reusing them in your career and and other things. And so so a lot of it is just, hey. Okay. I mean, this is the way we did video for broadcast, and we have this kind of sequential pipeline that that data got flowed through, and you could figure knobs at each step. And a lot of that, I mean, our Justin pipeline is is very similar to that today. But it's just indexing, data prep, I mean, and data enrichment and those kind of things. And then and we kinda do the fan out behind the scenes. So we're not exposing the DAG kinda thing to the to the customer.

Michael Berk [00:29:53]:
They just say, hey. Here's the different things I want as an instruction set. More like, I don't know, more like a configuration as code kind of model. And then then we just kinda do the work they need on the back end.

Michael Berk [00:30:03]:
So Can you list a couple of the things that you have learned over the many years of software development that, might be hard to learn so our listeners can have the shortcut?

Michael Berk [00:30:17]:
Let me think. I mean, I know I'm kinda old school in that. I mean, I started as an object oriented guy, like, c plus plus and and kind of so much of my career and how I go about development is colored by that. And so I think it's interesting today with, I mean, more functional programming and, I mean, just how coding is so different, I mean, in a lot of ways today. And just, obviously, I mean, also the front end development is is, is different. So let let me just think for a sec. I think I mean, I I think one of the missing things that I I see is is archit software architecture. And you're kinda seeing this little, Ben.

Michael Berk [00:30:54]:
It's like, I mean, thinking about what you're gonna do, coming up with a consistent architecture, coming up with patterns, and having somebody that kinda thinks about that before writing the code. And I think that role has kind of been diminished. And, like, even I've been at places where they're like, that's, like, a bad word. Like, nobody ever wants to think about architecture. And I always kinda think that I think it's a it's a lost art in a way where I think people could really I mean, if you had somebody I mean, and it could be somebody that's, like, a technical product owner or a technical PM in that role, but somebody to kind of think at a high level. How does this stuff fit together? How's it gonna work a year from now in the future? Kind of rough data model. Like, what are the entities? Like, what are the objects we're dealing with here? Like, if you look at, like, the Stripe API or something. Like, there's typical entities they're dealing with, and they probably can plan out, like, okay.

Michael Berk [00:31:49]:
Here's how these things are gonna go together. And I think someone needs to own that. And whatever you call that role, I think that's I think just sometimes people get in the weeds a bit too much with the coding and just, like, they're not seeing the big picture as much. And I think we used to look at that, I mean, probably too much back in the old old days, but I think it's the biggest learning. I think I think that could make a little bit of a comeback that people could kinda, like, almost start with, like, whiteboard based development. I'm just let let me try and draw it out first and then and then dig into the details.

Kirk [00:32:22]:
I mean, Michael could talk to to how it's done at Databricks because he's working. Oh, he just went through, his first process of that

Michael Berk [00:32:31]:
Oh, yeah.

Kirk [00:32:31]:
Which, like, we're a a relatively young company when you look at the the average age of people that are doing stuff, but it feels like if you look at people's output, feels like everybody's in their fifties.

Michael Berk [00:32:46]:
Oh, wow.

Kirk [00:32:46]:
Because it's enforced from on high about you have to do a design doc, and that design doc has to cover stuff like this, and you have to think through it. And then you submit it for review to your tech lead, and they tear it apart. They ask all of these questions, like, this isn't consistent with this other API. How how does this feel to a user? We talk about user experience all the time. What is the customer user journey with using this API? Is this too complex? Is this, you know, very simple, but not powerful enough? And what are those trade offs? And, also, there's there's questions about just simply like, hey. This this kind of does this other thing in a similar way to other API, but the signature is different. And it Right. Like, it behaves differently.

Kirk [00:33:36]:
Let's not do that. Let's conform these so that it it just feels natural. And the goal, I think my underlying goal, if I was to say, like, here's my number one thing with with the design doc and any API that's built, the thing that I strive for is a user should never have to look up the in the docs. Mhmm. Unless they're doing something really like, they're trying to, you know, subclass one of our APIs or, like, you know, do something where they're creating an interface to it for their their own use case. But if they're if they're doing applied use of our APIs, they shouldn't have to look at the docs. It should be intuitive.

Michael Berk [00:34:16]:
Yeah. There has to be, like, an elegance to it. I think that's the best. I mean, the things I aspire to were, like, just really well designed APIs. Just ones that seem intuitive like you say. And and I think, I mean, that's that's just it's like being a musician or something or an artist. Like, you just kinda get like, do it over and over again and fail and Yep. Put out put out crap and have it reviewed.

Michael Berk [00:34:37]:
And and, and I think I think that's something that we see is I mean, I try to make like, any new big feature area, I make sure that I can draw it out and visualize it first, and then I can start breaking it down into in tasks. Because if I can't get the big picture out, then I can't build it. And so but then the the API service area is tough because, yeah, there's usually somebody that kinda has it all in their head. Once your team grows and your API grows, how do you keep that consistency without review? Like, you have to add that layer of review. I mean, none of maybe LMs and code review and stuff like that will help in the future to, like, keep API service series cleaner and do reviews like that. It might be an interesting area. But

Kirk [00:35:18]:
Yeah. Based on my testing so far, they're not quite there yet. Yeah. Like, not even remotely close, but Yeah. They're good for discrete things. Like Yeah. Does this API make sense?

Michael Berk [00:35:29]:
Yeah. You you also sorta, like, hinted at answering the question but didn't answer it. How do you as as an API and as a team grows, how do you preserve strong architecture? Because one person can't do all the reviews forever.

Michael Berk [00:35:46]:
I mean, I think it's I mean, documentation is is important, but it can also be a burden. And so I think you've gotta be careful with I'm I'm more of a visual documenter in terms of I like Lucidchart. I like stuff like that that can kind of show people, like, how this stuff fits together, and that's more easily to train people on. But the downside with written documentation is it can go out of date so quickly. And I think I've I've struggled with that balance of I mean, I mean, I I given the word more 0 to 1, it's like I mean, we're more fast to lose for sure with documentation. But it's the knowledge base of Slack or the knowledge base of, I mean, just Google Docs or Lucidchart is really where we focus. We're not very prescriptive of, like, okay. You have to have a design doc in this format and, I mean, this much and this structure.

Michael Berk [00:36:38]:
I mean, we don't do that at all. Like, it's just it's like, hey. At least I can get a diagram for something. We'll come up with a sort of a, like, a a list of features. A strut like, it's just no formality to it, but I'm like, we gotta write it down in some way. And then and I think that's good because there's always gonna be questions of, like, wait. Didn't you say this 3 months ago? And I don't know. Usually, my memory is like, oh, wait.

Michael Berk [00:37:02]:
I don't remember what we actually said because I've moved on to, like, 5 other things. I think just that knowledge capture becomes so important for the team. But there's so many ways to slice that of I mean, we end up just using, I mean, Slack or something like that a ton, but then having this kind of layered of, okay, there's a there might be there's a dock to it, and there's probably a or a diagram to it as well.

Michael Berk [00:37:25]:
Got it. And LMS could help parsing and distilling the knowledge repo, essentially.

Michael Berk [00:37:31]:
For sure. For sure. And I think, I mean, just summarization, just search good search, I mean, I think is is still a struggle, for for a lot of this kind of stuff.

Michael Berk [00:37:41]:
Yeah. Yeah. Semantic search would be really helpful at least in my life. The amount of times I look up stuff in Slack and just have to permute the different combinations of words, it's it's ridiculous.

Michael Berk [00:37:53]:
Well and it's just the way they're doing it with I mean, they're almost trying to be too helpful in finding you, like, oh, well, this is the same thing, but with an extra letter at the end. I'm sure you wanna see this too. Like, because it's like the it becomes like, it's almost not it's not even useful at all in a in a lot of ways. So it, I think yeah. I mean, there's it's not a solved problem even though I think we have the tools now. I think, like, Glean and some of these kind of companies are now trying to really focus on, like, okay. Really good workplace search. And but I think there's other I mean, there's so many ways to to approach that in other ways that I think there's gonna be a lot more innovation there.

Michael Berk [00:38:29]:
Yeah. That makes sense. Another question about soft software architecture. You hinted at the concept of layers. How many layers should you have? And if there's not a single answer, how do you think about determining it?

Michael Berk [00:38:42]:
Oh, man. I mean, I've and this is maybe my leanings is I've always been I've always thought about separating the presentation layer from the sort of functional layer, and I know that's sort of a bit distinct from out of the full some of the full stack development and things like that. But I tend to always think of an API surface area and then building a team and a and a product to that API rather than thinking about them as as all one thing because that lets you productize the API later. And, I mean, that's I know there's contrary opinion about that, but it's always worked for us where we've consumed our own API for our front end. And the main point of that is, like, we had companies where, like, we don't really wanna use your API. We just wanna automate it through our system. And so, that I mean, that's actually worked really well that if you eat your own dog food and have a really good solid API that you you build first to feed your presentation layer, your front end, then you can do a lot of stuff with it. I mean, you can write automation tools and your.

Michael Berk [00:39:47]:
You can do all this other stuff. So I tend to think in that classic layering of presentation, like, to an API, and then there might be back end APIs that are all hidden behind the scenes. But really have that first API layer, set, like, cleanly separated. And it also makes for developer experience. Like, your first consumers of that are your own dev team. And so you're you're validating to your points before about how do you know if it's a consistent API and it's not I mean, because your team is actually using it themselves. And and I've done that time and time again where the internal team is your first customer of the API, and that helps shake out a ton of issues.

Kirk [00:40:28]:
And sometimes, in my experience, gives you the best feature requests. Oh, yeah. So at Databricks, we're building stuff that people are gonna be using. Like, I have a meeting on Monday with a a team about a new feature that they'd like to use, or have built to integrate with the product. And they were they gave me a a quick 2, 3 sentence explanation of what they're gonna be building. I was like, I didn't even know that was a thing yet. But that's super cool. Yeah.

Kirk [00:41:08]:
I'd I'd love to chat about this. Like, when are you gonna be releasing this? And they're like, oh, December. We'll we'll we're shooting for, you know, 1st private preview launch. Like, I just did a quick search on the Internet, and I there's 0 references for this, like, what you're talking about being built. They're like, yeah. Like, nobody's done this yet. And we're very tight lipped about it. Like but mlflow doesn't support this this thing that we need to do.

Kirk [00:41:36]:
So here's our list of requirements. Please review this before a meeting. So we're finding out 6 months, sometimes a year before any customer would even be aware of this thing, and we'll build these features in slowly, you know, or sometimes quickly, in order to unblock that internal team. But we're not building stuff that's like, oh, this is this is in some private repo, and nobody's ever gonna see this. Right. We're releasing it. It's in open source. People can start playing around with it.

Kirk [00:42:06]:
And sometimes the the fallback from that or the the fallout from that is we'll get further feature requests from the community. They're like, this is cool, but can it also do this? And we're like and then talk to the internal team. Do you guys need this? Like, yeah. We were gonna talk to you this week about that. I'm like, sweet. What do you think about what what are your opinions on, you know, what you guys are focusing on is something that is effectively a service layer. Right?

Michael Berk [00:42:38]:
Mhmm.

Kirk [00:42:39]:
And what are your thoughts about intelligent selective open sourcing of components versus keeping it all proprietary?

Michael Berk [00:42:49]:
Yeah. No. It's a good question. I mean, I'm I've never actually released an open source product. I like product, like, in a in a full sense, and we have done some small open sourcing of of different components. So I'm not the I'm not the best one to ask about it. I mean, we we do leverage open source for, I mean, some things we do and and support, I mean, contribute back. But given it's a managed service, it's also hard for us where we're really focused on scale and, I mean, and the infrastructure side of it.

Michael Berk [00:43:21]:
It's it's like a it's like being another AWS or Azure service to us. It's it's an API that people can consume. They don't wanna I mean, there's gonna be other companies that release maybe something like a rag as a service that's fully open source, and you pull all the pieces together yourself. And we're kinda taking the other approach of, like, you don't care what's running behind Stripe's infrastructure. It's just an API. And I think so I don't have a ton of experience to pull from but the way we're looking at open source is we open source the SDKs. So we we're a GraphQL API, but we we just release native SDKs on top of that. So if if we screw something up, somebody can, like, fork it, fix it, and, like, build their own or whatever.

Michael Berk [00:44:03]:
The other is UI layer. And so UI components that sit on our API, we're gonna open source all of those. And so that's what we're we're looking at now, like, kind of a chatbot in a box or, I mean, a an uploader tool or different things like that. So those reusable components that just need our API key are gonna be all open source. And so so I think we wanna I mean, we're definitely supportive, but there's just no way we could open source our whole product because you would still need, like, these are your 3 managed services that, I mean, exist somewhere. And, but on the other hand, I think the the plug in model of, hey. Can you build a plug in to us that is just open source is something we're looking at. Like, we do support webhook kinda hooks in parts of the workflow, and we're looking at, can we just sort of have a little SDK that says, hey.

Michael Berk [00:44:52]:
I mean, maybe it simplifies the authentication or it's something you can just fill in some code inside of. That kind of stuff would all be open source too.

Michael Berk [00:45:02]:
Cool. I have one final question slash topic. Where is the Graphlet value prop, and where do you expect it to go in the next 2 to 5 years? Like, what industry trends are you betting the company on?

Michael Berk [00:45:19]:
Yeah. I mean, I think the the initial one is really kind of the conversion from DIY, we say, is there's been a a the last year was pulled together langchain, llama index, I mean, pine cone, like, all that you basically have to pick all the different ingredients to make the recipe and then figure out how to deploy it, how to tune it, and all that and all that kind of stuff. So our first value prop is we're saving you developer time, and we're saving you ongoing DevOps time too as just, hey. It's just an API. And and we're very cost effective. I mean, it's you're just paying, like, I mean, to do what I mean, even if you think of a couple $100 a month or a 1,000 a month, I mean, that's still a tenth cheaper than a dev. I mean, and just to to build it and maintain it. So that that's our number one thing, and then we can have efficiency at scale.

Michael Berk [00:46:07]:
So, I mean, as new models come out, as we can get cheaper resources, I I mean, we're gonna pass that along to, to the customers. So, it's it's really an efficiency developer productivity and efficiency play at first. And then I think the long term is really okay now what can we do with that API that are higher level constructs like agent models or, more, more integrated workflows, with with other products and things like that and kind of the automation side of things. That's really what we're looking at is we've kinda released ingestion, retrieval, rag with conversations, as well as we call content publishing, which kinda uses rag to generate new content. And then sort of automating that with some sort of agent framework on top of that is, I see that's, like, kind of the next the next step. And and, I mean, we'll see where it goes from there.

Michael Berk [00:47:02]:
Crystal clear. Yeah. It's interesting the emphasis on agents where we have these sort of discrete steps that if you chain them together, they create something more powerful. It's like a complexity theory based design. It's super cool.

Michael Berk [00:47:16]:
Well, and it it's interesting because that part's not new. And that's why I think people are looking at it as, like, hey. This is really and, I mean, it it's not LLM centric either. Like, we've been doing after model stuff for years. And, I mean, and and that's kind of I look at we're kinda taking a a slow role on that just because it's like see where that shakes out. Because I don't think I don't necessarily think you need an LLM specific agent model. You just need something like sort of an actor, like a durable, actor model, and there's so many different things out there. And you could build a lot of what I see in these demos with I don't know.

Michael Berk [00:47:53]:
We use, like, Azure durable functions or, I don't know, ACCA or, I mean, different things in different platforms. I don't I don't see that like, if you just have a good API for retrieval and and rag, you could probably build most of what everybody's doing. And so that's kind of the way we look at it is I don't know if we're gonna build anything our own on that or if we'll just do integration examples on that. And so we're we're kinda taking a a wait and see a little bit for the rest of the year to see what people are really looking for.

Kirk [00:48:21]:
Yes. That's exactly the feedback that we got many months ago when we got a couple of requests from from customers. They're like, yeah, like, chain has agents. We wanna start using that. We're like, you know, we of course, when we hear that, we go and do a mini hackathon and play around, and I built a bunch of them. I'm like, yeah. This is cool, I guess. But, I mean, I could just write a function that does this.

Kirk [00:48:50]:
Right.

Michael Berk [00:48:50]:
Yeah. And

Kirk [00:48:52]:
Then I don't have to do this in line chain where there's, you know, some amount of stability, questions about, like, the frequency of releases and breaking APIs. I was like, do people really want this? So we talked to some of the of our customers that are heavily in the, hey. We're going to production with stuff. We start asking, what's what's the stack that you're using? And we had one that was like, yeah. We write it, like, the whole back ends in Scala using Akka framework.

Michael Berk [00:49:22]:
Mhmm. You

Kirk [00:49:22]:
know, actors are actors, you know, doing this exact thing. So we we can interface directly with the rest API to to chat GPT, and Mhmm. That's our interface layer for that. And then that then goes and farms out to all these other, you know, systems that we have. We don't need to use an open source package for that. We already have all of that. It's core language feature, basically.

Michael Berk [00:49:46]:
Yeah.

Kirk [00:49:47]:
But when people start saying, like, where they start interfacing with us, and I'm sure they're interfacing with your company as well is like, yeah. We did the first project and built this all out. We have all this code. We and it was so successful at our company. We now have 3 people from marketing, 2 people from sales, and 5 people from product that all want these projects built for these that require completely different code base. We don't have the staff. We can't hire 300 developers to come in and build this. Databricks.

Kirk [00:50:23]:
Can you help us out? So, like, focusing on, okay, can we create a UI that Yeah. Does this? And there's know, companies such as yours that are focusing entirely on this this problem. I really believe that that's the future of LLM usage.

Michael Berk [00:50:38]:
Yes.

Kirk [00:50:38]:
You know, chat's useful, you know. It's very useful, you know, as a productivity tool. But for for actual products that people are wanting to build, you need all of those interactions. You need a more complex space, and individual companies are not gonna have the resources available to build all these projects. You need a service.

Michael Berk [00:51:00]:
Yeah. And and I see I mean, I I always talk that, like, chat is just one consumption method. It's kind of the first one that chat gbt kinda lift this bulb for everybody. And it's funny because we had been around for, I guess, almost 2 years building this unstructured data platform. It used NLP and all this stuff. And then there's, like, the chat tbt thing happened, and then people are like, oh, unstructured data. Like, now we get what you've been saying for the last few years.

Kirk [00:51:23]:
Yeah.

Michael Berk [00:51:24]:
And it just and probably for you guys too, it's like it just all this it it took a UI and a consumption method for it to click. And then, literally, this week, there was an there was an industry report all about unstructured data pipelines that looked like our pitch deck from 3 years ago. And it's just it's something where all these pieces are fitting together. People are like, oh, now we can get access to this data. But not everybody doing this is gonna be writing it in Lang chain. I mean or even in Python for that matter. And so, like, we had somebody to be like, how can I do this in Bubble? And I'm gonna, like, write a I mean and I'm like, cool. Okay.

Michael Berk [00:51:59]:
You want just to use our 8 Rag API to build it in a Bubble app. Awesome. That's that's a great use case for us. And so that's really where we're heading is just make it something. It's a really clean, nice, easy to use API, good developer experience. Like, we're adding, like, observability into our developer portal. And, I mean, just copying what the best people do and have who have great developer experience and being, like, learn taking those learnings and being, like, okay. Now we have an awesome logs page that we're building.

Michael Berk [00:52:26]:
And so it's we don't need to reinvent the wheel, but just give people what we know works in some of these areas. So and then we innovate on the back end. Like and that's where that's where we really focus. So

Michael Berk [00:52:38]:
Why did you guys invest in unstructured pipelines pre chat gpt?

Michael Berk [00:52:43]:
So we were doing it as a cataloging data cataloging. And so we were leaning into geospatial data tied to unstructured data. So, like, construction companies, ports, railways that had I mean, they would say they have, like, 10 years of data collected. It's imagery. It's audio. It's documents. And so we were actually building them essentially, a search interface, like, it, for that had a visual view, it had a map view, and it had a time, histogram, to it. And that was the product we're we started the company building.

Michael Berk [00:53:14]:
And then the platform was just we're using it for ourselves. And, and it's it's funny because, I mean, it didn't that UI didn't get the fire that I expected it to because people in there were still trying to figure out SharePoint at a lot of these companies. And so it was I mean, people it would it would demo great, and they were like, oh, this is awesome. But then we're like, wait. How are we actually gonna put this into practice when you have to sell it to our IT group and you have to do all that? And so we ended up just being like, wait. Once we started having people like, wait. Can we get access to the API, like, the API that runs that so we can integrate the capabilities of it into our own products? We got a couple of those, and we started thinking, wow. Maybe that's actually the path.

Michael Berk [00:53:53]:
And then the Chat tbt thing happened, and I was like, oh, wow. We're long term memory for these models. Like, we've already built the funnel, and that's when we pivoted to just be the platform first about 18 months ago.

Michael Berk [00:54:07]:
Super cool.

Michael Berk [00:54:09]:
And I I I would love to come back around and take our original app that we built and release it maybe as an open source app on top of our API. I mean, if I I'd love to, I mean, at some point, be able to to do that. We might do that. That's, like, our because I I don't wanna compete with the people that are building apps on us, but maybe that's just a great full featured example of what you could do on us. And so that I think it'll all circle back around.

Michael Berk [00:54:33]:
I have a use case for that that I would implement right this second, like, as soon as the podcast ends. Cool. Alright. Well, we're coming up on time. I will quickly summarize some interesting points that I heard. 1st, queues and event driven architectures are great for scaling. When to hire versus when to learn? Well, there are a few axes. 1 is project duration.

Michael Berk [00:54:59]:
Another is project importance or sort of centrality to the infrastructure. And 3rd is employee supply in the market. If people are really expensive, you might wanna just spend the weekend learning it. And then a few miscellaneous tips, thinking about architecture and sort of visually drawing out components is really helpful when designing scalable systems that will be scalable into the future. Creating internal knowledge repos like Slack, don't document everything because it can quickly become out of date. And then when you're developing public APIs, it's important to have that public stable API and then at least one private layer where if stuff happens, you can have 1, 7, a 1000000. It doesn't really matter, but having a stable public facing API is super important. And then eat your own dog food.

Michael Berk [00:55:43]:
So, Kirk, if people wanna learn more about you or your work, where should they go?

Michael Berk [00:55:48]:
Yeah. Definitely, graphlet. Graphlet.com is the company, and then we're on we're on Twitter. I'm on Twitter. Just at Kirk Marple. LinkedIn is good. And, yeah. Just, I mean, we're for anybody who wants to play around with it, we're free to sign up, free to use, up to a, good healthy amount of data that you wanna try out.

Michael Berk [00:56:04]:
And so happy to have people, give it a give it a try.

Michael Berk [00:56:10]:
Alright. Well, until next time, it's been Michael Burke and my cohost.

Kirk [00:56:14]:
Ben Wilson.

Michael Berk [00:56:15]:
Have a good day, everyone.

Kirk [00:56:16]:
We'll catch you next time.

Functional Programming Shift and Scalable Architecture Insights - ML 158

0:00

56:20

Playback Speed: