The Role of Open Source in Modern Development Practices - ML 170

Today, they dive deep into the fascinating intersection of open-source development and machine learning. Michael and Ben are joined by distinguished guest, Görkem Erkan, CTO and seasoned engineer at Jozu.

Hosted by:

Ben Wilson •

Michael Berk

Special Guests:

Görkem Ercan

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

Görkem shares his illustrious career journey from Nokia to Red Hat, his contributions to the Eclipse Foundation, and his current focus on MLOps. They explore his passion for open-source projects, the cultural and communication impacts on software design, and the unique challenges posed by integrating open-source frameworks with proprietary systems. Ben provides critical insights on the complexities of managing scalable backend services and the hurdles in translating SaaS offerings to open-source platforms.

Tune in to learn about the innovative practices at Jozu, the role of open communication in team success, and the nuanced debate on maintaining separate proprietary and open-source codebases. This episode is packed with valuable lessons for developers, tech leaders, and anyone interested in the future of machine learning and open-source development.

Socials

LinkedIn: Görkem Ercan

Transcript

Michael Berk [00:00:05]:
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. I'm joined by my beautiful cohost,

Ben Wilson [00:00:14]:
Ben Wilson. I resolve merge conflicts before release, at Databricks.

Michael Berk [00:00:21]:
Today, we are speaking with Gorkon. He started his career at Nokia where he quickly became a principal engineer. He then moved to Red Hat and became a distinguished engineer. He's basically held every title, like, prefix for engineer. So just check out the LinkedIn. It's pretty cool. He's played a large role at the Eclipse Foundation, contributing to their web, mobile, and Java development tools. And just to add another open source example, he's built the Java and YAML extensions for Versus Code.

Michael Berk [00:00:49]:
Currently, he's working as CTO and distinguished engineer at Jozu, an MLOps framework focused on bridging the notebook based ML production gap that, is often very hard to bridge. So, Gor Gorgam, you've contributed to many amazing OSS projects. Do you have a favorite?

Görkem Erkan, [00:01:11]:
Do I have that's a that's an interesting question. I don't think I do have a favorite, in the sense that, you know, all, all the projects that I have contributed to have been, to a degree, a developer tool. But if I have to, like, choose the one that I had most fun was we did now archived OpenSearch project back in Nokia days called ESWT. It was a a mobile version of the the Eclipse's SWT UI toolkit. And I think that was most most fun, I had because we it was cross platform. It was, if anyone remembers that, the Symbium, operating system for Nokia phones, it was, you know, we were building on Symbium and then we were building on, Linux, mobile Linux, as well. So, that was really fun and very challenging. I kinda learned a lot while building that, but it turned out to be nothing at the end of the day.

Görkem Erkan, [00:02:30]:
So, I guess my the one that I had more most fun is the one that had is the least successful, at the end of the day.

Michael Berk [00:02:42]:
Interesting. Is do you think there's any causal relationship there?

Görkem Erkan, [00:02:47]:
No. Not really. I mean, it's just sometimes the technology can be solid, but then the market is is moves away. Like but they're out the time that we did ESWT, iPhone came out and then Android came out. So, like, the market turned into a completely different place. So, I don't think that's has anything to do with the project or the technology itself. It's just if you don't have a good, finance that is supporting your project, open source or even closed source, that's not gonna go very far.

Michael Berk [00:03:32]:
Right. And then why have you spent so much time in open source? What what has attracted you to that style of software development?

Görkem Erkan, [00:03:39]:
Oh, that started when when I was at the university essentially. So I, one of the first things that I noticed, when going through university was the best way for me to be better at what I'm doing is to learn from the open source projects. So I started reading a lot of code, at the time. And and, you know, you're, like, 2 years to 3 years into the university. You know some things, but you don't know that much. And I was reading code for, I don't know, Apache Tomcat. It wasn't even called Tomcat at the time and so on and so forth. Like, with the the grandiose of being able to contribute back to those projects, But it you know, at that time, I wasn't able to, do that.

Görkem Erkan, [00:04:36]:
But I learned a lot by just reading the code. And it kind of introduced me into the culture of open source, and it did improve my coding skills a lot at the time. So, just trying to fix a bug on on Tomcat was, a lot of fun. You learn a lot. At the end, you couldn't always succeed at at doing what you're supposed to do. But, the the the the challenge of, trying to fix a bug on an open source project, was enlightening, essentially. So I kind of caught the bug at that point because I wanted to learn, be better at at at coding. So and it continued on from then on.

Görkem Erkan, [00:05:28]:
It's like and not all my daily jobs, day jobs had an open source component to it. Even when they didn't, I actually continued on doing open source, at at nights because, you know, once you once you get into that culture, I guess it just pulls you in and and you like it. So right now, even at Jeju, which we are a very small startup, I or or any of my, cofounders cannot think of a way of doing things that is not open source way. Like, our first reaction to, what we see is, oh, can we what can we do this open source way? So Do you

Ben Wilson [00:06:14]:
think it it changes proprietary development practices when you have people that have done open source contrib contributions? Does it influence, from your perspective, the designs that you come up with? Whereas, you know, when you're building something in open source, you're like, we have to make this work for a lot of use cases and things that we might not be thinking of right now. Whereas in proprietary, you can just sort of lock down functionality, hard code stuff. Have you noticed that, and has that made software that you work on in a proprietary environment better, do you think?

Görkem Erkan, [00:06:51]:
Yes. It does. It it's a it's a night and day kind of change. And I have in my team, I have people from Red Hat, that came with that open source, background, and we do have people who came from more propriety software, background as well. It's a it's a night and day difference, where, as you said, like, when you're doing open source, you're kind of, forced to think more general problems, more integration problems, and and sort of extensibility problems as well. With propriety software, you can, like, limit yourself to that whatever use case and the requirement is and solve that use case and requirement solely, without thinking the day 2 problem or the day the the day 10 problem of, you know, someone will try to do something different with the software. Right? So, yes, there is definitely, that as well. And, also, the communication is very different.

Görkem Erkan, [00:08:04]:
When you are doing open source, you are the the idea that you have to communicate what you are doing asynchronously to a bunch of people that you may or may not ever see is built into your thinking. Therefore, that's what you are doing. Even in a small company, that actually helps a lot because, you know, not everyone is gonna be able to listen to you, as you explain your code or something. So that a synchronous documentation of what you are doing, what your vision is with this code, where where you are going with this, how to build it. Right? Or how to test it. That sort of information that is coming in, with the open source is is actually helpful with proprioity software as well.

Ben Wilson [00:08:58]:
Yeah. I I couldn't agree more, actually. You know, my team manages open source packages, and it's fundamentally changed the way that I approach building things because of exactly what you mentioned. It's like, hey. If you wanna work with a bunch of people in the open source community, you need to be very clear about what the direction is and what the functionality needs to be. And also considering not going too far on the abstraction layer of saying, yeah, you don't wanna get into that area where it's like, well, it'd be nice to have this functionality. Like, nobody's ever gonna use that. Let's not build that.

Ben Wilson [00:09:32]:
But No. We need to make we need to make what we're building, you know, developed in such a way that that day, you know, release date plus 10 or release date plus 12 months is not gonna be a nightmare to integrate with later on.

Görkem Erkan, [00:09:50]:
Yeah. I mean, just to give an example, back in the day when we were doing the Eclipse IDE, one of the rules about APIs was you could request a new API, from Eclipse to integrate with. But then there had to be 2 different clients for it. Meaning, that if you are, Databricks and you're requesting that API, if someone else other than Databricks or someone else other than Michael, also request that same API and say that, hey. Yeah. We do need that functionality so that we can integrate this and that and and in such and such way ways. Then Eclipse would come up with an API set for it. And before that, they would, like, say, no.

Görkem Erkan, [00:10:40]:
There isn't enough request for this functionality at the moment. Because think of a project like Eclipse where everyone is trying to, you know, integrate with that. The if you if you if you grow your API set very quickly and without actually proving that the the the API is actually needed and and designed as as intended to be used, it quickly becomes a a bloated garbage of API sets, essentially.

Michael Berk [00:11:11]:
Yep. How do you prioritize what not to build specifically at Josu?

Görkem Erkan, [00:11:20]:
Our first assumption is we don't need to build it build it. Like, we we we just look at like, if something comes up to our us as a problem that we need to solve with software. Right? So our first assumption is we probably don't need to build this. There is probably a solution out there that we can contribute to or use or integrate that solves this problem. That's our first, approach. So, and if we can find an an open source alternative, a standard alternative to the problem, we we try to solve it that way. And if we can't, at that point, we actually start to build our own solution. And at some point during that build, we did we we decide whether this makes sense to open source or not.

Michael Berk [00:12:26]:
Got it. How do you leverage customer demand? So for instance, with Eclipse, they could actually submit a request for an API.

Görkem Erkan, [00:12:34]:
Yeah. Do you

Michael Berk [00:12:34]:
guys think that's a good model? Do you use like, do your own research? Do you have product managers that just deep dive with customers? How do you find out what is actually a useful feature?

Görkem Erkan, [00:12:45]:
So yeah. The we do have, at this stage, we do have design partners at Joomson, which we talk with a frequency and get feedback from them, on how they use, the open source project, Kitap's open source project, or the the products. And then, get that direct feedback into into the projects. We do have people who are have been making requests at the open source project because that's the other thing about open source projects. You have a community that you can more or less directly communicate with. So, we we are we were getting feedbacks from there as well. So, I think the direct feedback is very important. And the fact that we are able to get that direct feedback directly into the software team is, I think is, has been valuable to us so far.

Görkem Erkan, [00:13:59]:
But, you know, as things grow, I think it's going to be a bit less effective because, you know, you can do this with a limited number of design partners. But if you have a lot of users, then, you can't get feedback from 100 different companies at the same time, which is when you get a product manager, I suppose.

Ben Wilson [00:14:29]:
Or a whole team of them when you have thousands of users.

Görkem Erkan, [00:14:32]:
Of them. Exactly.

Ben Wilson [00:14:33]:
Thousands of users. Yeah. Those are good problems to have though.

Görkem Erkan, [00:14:39]:
I think so. Yeah. I think so. Yeah. I mean, on the open source side, it's always direct and you have to, like, there was a term that was used and and I to be honest, I haven't heard it, lately, but there was always this been a little dictator on an open source project which kind of decides on the on the the direction. I guess that that if if the project is at that size and at that state, you kinda need that, as well. I'm not saying that it needs to be one person, but the project needs to have a vision.

Michael Berk [00:15:22]:
Yeah. Yeah. At

Ben Wilson [00:15:23]:
some point, I'm curious what your thoughts are on that because I was gonna ask you that. The difference between a benevolent dictator and a benevolent senate that has quorum voting. What do you think is more effective for maintaining quality control and the prevention of the extreme uncontrollable expansion of functionality and just sheer number of APIs? Which one is more effective between the 2 of those?

Görkem Erkan, [00:15:57]:
Unfortunately, the dictator, the person is more effective. Like, if someone has the last say, they are, like, I don't know if, you know, people talk about the founder's mode or something like that. One of the things about the founder's mode is, because they created the company, they they won't have any problems making changes. Right? Or rejecting changes for that matter. I think it's it's a similar situation with an open source project as well. It's if you have people who have been, creators of a project, they will have less problems with accepting or rejecting changes to the project. And, unfortunately, that also comes with a different set of problems. But, you know, from just to address the problem that you're saying or the the issues that you're saying, if you want someone who is effectively able to filter the noise about new features and APIs, then I think the benevolent dictator works better, as long as the the person is the right person, for doing that.

Görkem Erkan, [00:17:12]:
I have seen very successful uses of the senate as well as you call it. A committee or group or something like that. A project lead committee. It takes a lot of discussion, but at the end, it can reach to with direct group of people, we it can reach to to very, very good results, as well. So, it kinda depends on, you know, how fast you are willing to move, as well. With the, with the, you know, senate approach, there's always a discussion that needs to happen, especially for the large enough items. But with the by the middle of dictator, it's, you know, one email and that's

Ben Wilson [00:18:08]:
it. But for any sufficiently complex project, you know, like, you you've worked on some pretty big ones. There comes a point where a benevolent dictator is not gonna be able to maintain context on everything. Yeah. So what do you find is the most effective delegation tool, or or rule set to follow for management of super complex, very in high demand open source projects?

Görkem Erkan, [00:18:39]:
Motherization. Right? It's like so if if you have large functionality that is large enough and then you can divide that into model modules and then establish the rules between modules Mhmm. Then, you are able to, at that point, maintain each module as its own thing. Think of it this way. You have let's take Eclipse IDE as an example. You have Eclipse IDE which has, let's say, a explorer view which is 1 module and then editor view which is another module. Right? The needs of the explorer view and the editor view are very different and they can be managed by different people, different committees, different, rules. But then the way that they interact with each other, can be a standard.

Görkem Erkan, [00:19:41]:
Like, you can add another model, another model, and the interaction is is, the rules of interaction is very well established. So one of the things that, I like when such large systems is if you can actually treat everything as a plug in. Mhmm. Like, you have a very small kernel core, but then everything else is just that plug into it. Like, you're up you you're building the the core and then the file system is a plug in, and then you can just plug out the file system and come up with another file system if you will. You'll probably never do that, but you will. You can think of it that way. And then if your editor is a file is a plug in and so and so forth.

Görkem Erkan, [00:20:31]:
And the old plug ins are equal so that your ways of extending and interacting with the system is same for everyone, and nobody has privileges at that point. I think if you can't come up with a system, that works that way, then you are able to delegate the management of these model modules to wherever. Like, they don't even need to be under the same open source project. They could be somewhere else and come in and be part of the system. I guess at that point, system kind of dictates how the management happens. Yeah. I mean, it's it's, you know, how you how you design a large system. Right? If you there's always going to be whether it's in the software side or the governance side of the system, problems that you're gonna have to to overcome or face.

Görkem Erkan, [00:21:33]:
So, a modular design that is equally modular for everyone is is always has always advantages, from my experience.

Michael Berk [00:21:46]:
Right. Alright, Ben. Zinger of a question. Do you mind giving a concise overview of MLflow's architecture and design principles?

Ben Wilson [00:21:57]:
Concise? I don't know about that. But from its early very early days, like, pre one point o, it the desire was to create a solution for not just the immediate need of what the initial design was and in non abstract terms. That is, okay, we know that we need to perform functions x, y, and z on and within that functionality of x, we need x prime, you know, as well. So we need the ability for to store data somewhere in order to make the functionality work? Well, if you just do a single implementation of of that and say, like, okay. We need to log, you know, runs and the data associated with runs. Where do we put that? Well, in the initial prototype, file store. Right? Just local storage. Now if you built that in at the beginning to just be that one single implementation when you now need to think about, well, I need to store this on RDBMS.

Ben Wilson [00:23:12]:
Do I now have to go and refactor that or create this whole new implementation that allows for that? Or from the beginning, do I just create an abstraction that says, this service interfaces with this abstraction. I don't care where this thing goes. I just know that I have a contract of functionality that needs to happen, and then I can externalize the processing of that in a number of effectively plug ins. So it's just, you know, how many tracking stores do we have now? Like, there's a lot. So each of those just has to agree to adhere to the contract that's established for functionality that it needs, and we handle that through registries. Those aren't really exposed to the user per se, but from a code base perspective, we can externalize all that stuff. That same approach for handling just storage of data, which is the core tenant of MLflow, like what it does, store data for you. That's been a sort of pattern for almost everything that's built, is every major service is a at its core is a factory.

Ben Wilson [00:24:19]:
It's like, hey. I have this registry, and I define what type of configuration I have or what I wanna use when I start this service, and then you can plug in anything you want. You can write your own, you know, back end service abstraction or service layer and install it with MLflow, and all of a sudden, you can write it to a database that we don't support. In fact, there's tons of plug ins that people have done like that, and same thing with model registries. And the artifacts store, you can store it on, you name a cloud or on prem solution or, you know, people have made plugins for, like, fairly archaic storage platforms. A couple years ago, somebody was like, I need to support SFTP. Can we get that? And we're like, no. But you can do a plug in for that, and here's how to do it.

Ben Wilson [00:25:13]:
And they just went off and build it, and then they merged it in later on. So, yeah, like, that architecture allows you to expand, the functionality, the maintainability, the extensibility of your project, but also encourages more people to contribute to it. Because they can see, like, oh, I can I can get this working from my use case, and it's simple to externalize this in such a way that makes it more testable and doesn't pollute the core functionality of the tool?

Michael Berk [00:25:47]:
Got it. What are the downsides? Size. I'm sorry. It's pretty good. What are what are the downsides of this design, though, or is it just perfect?

Ben Wilson [00:25:58]:
Nothing's perfect. What's really annoying is when you add a new feature to the abstraction layer, and you're supporting 37 different implementations of that abstraction. You now have to add that in 37 locations. That sucks. But I think it's a positive trade off, you know, because of the expanded usability of your your framework, your tool that you're building. If you're supporting that many open source users, that are using these 37 different implementations, taking the time to just do all of those updates to all of those things and then write all the tests to validate for functionality and all of those, it benefits the community. And sometimes the original contributors to a particular, you know, modular plug in, they will volunteer it. Like, oh, I'll do that for this one that I built.

Ben Wilson [00:26:54]:
And they're like, awesome. Great. Thanks. That saves us, like, a couple days. Got it. I think it fosters a a better community.

Michael Berk [00:27:03]:
And one more question on this. When you're adding a new core thing, like a tracking server, I know that doesn't happen often, but, like, when you're adding a new abstraction layer, do you think benevolent dictator or benevolent senate is a better way to design it?

Ben Wilson [00:27:22]:
Gorkam, you wanna take that one first?

Görkem Erkan, [00:27:31]:
So we're we're talking about extending the the core. A new abstraction is that core is aware of. Right? Yeah. I would say a senate because, and by senate, I don't mean that whether we accept it or not. But I do believe that, if you're coming up with a new new abstraction, there should be more than one user for it. Like, you cannot really design a system, an extensible system, and be successful if you are designing it for just 1 one person or one tool. It's that's not really abstraction. That's custom built essentially for that.

Görkem Erkan, [00:28:17]:
So I'd prefer if there is a committee, that is designing the solution, just because it just makes the design better.

Ben Wilson [00:28:29]:
Yeah. On our side of the house, Michael, we do that every once in a while. It's a it's not frequent because it's it's a big thing to go down the road to do. But the successful things that have persisted in in MOflow over the years that are new levels of abstraction that we've built this major function like, core functionality. We bring in people for internal review. Like, the the maintainers will design something, but we need a lot of differing opinions from a lot of different people to do that, you know, that architectural review. And a lot of people who are not even they haven't committed a single line of code to MLflow, but they have 25 years experience in software engineering, and they come from a different side of the house that's dealing with, you know, services and code bases that we don't have experience with. So they're getting we're getting that benefit of their perspective to look at what we're trying to do, and they ask a lot of pointed questions.

Ben Wilson [00:29:39]:
Then these questions are coming from a a place of ignorance about the project project, but wisdom about the craft. So they'll look at it and be like, are you sure you need this, or could you maybe think about it in this way? And sometimes that feedback, we can be like, we actually don't need an abstraction here because we're never gonna extend this, or it doesn't make sense to extend it. Let's just do the implementation the the fast way. And every so often, we get the other side of the house where we're like, oh, we wanna we wanna build this functionality, and then somebody will be like, what about this? Like, my team could use this later. In fact, we have a project next quarter that could use this. Could you just abstract this? We're like, sure.

Görkem Erkan, [00:30:22]:
Okay.

Michael Berk [00:30:24]:
Got it.

Ben Wilson [00:30:25]:
So, yeah, the the committee, like, expanding that committee to a lot of people, you're always gonna get better results, in my opinion.

Michael Berk [00:30:34]:
Gork, question or go ahead.

Görkem Erkan, [00:30:36]:
There's an adverse effect too. Like, if you're, like, inFlow, I'm sure there are parts of, modules that they just want to be part of MLflow because they think that it's gonna give their whatever solution is more exposure.

Ben Wilson [00:30:54]:
Oh, yeah.

Görkem Erkan, [00:30:56]:
So they want I'm like, yeah. Let me just put that into MLflow as well because it's gonna make it, you know, like, any open source or any platform that is, successful faces that problem of, yeah, let's put this as well. Well, this is not that sort of platform. I don't know if putting that here makes any sense, but, you know, people think that having that exposure is is good for their project product whether we're that that they're trying to, build.

Ben Wilson [00:31:32]:
Yes. We have denied many feature requests like that. It it's not so frequent these days, but it couple years ago, it was pretty common. Sometimes somebody would just file a a PR out of the void. And you look at it, and you're like, alright. There's there's, like, 10000 lines of code here. Let me look through what this is. And you're like, no.

Ben Wilson [00:31:55]:
We're not gonna merge this. Like, this is your company's proprietary product. Like, I get what you're trying to do here, but, no. Here's a plug in that you can use, and you can put this in your code base and have your users install this, and then you get the functionality. But this optically, this is not good for the open source project. Like, we don't even really do that, and we have our own version of MLflow for Databricks. That's completely separate. It's a separate implementation, separate repository, uses the same UI, same client APIs, but we have that modular separation between clients that users are using, and the the back end implementation is completely separate.

Michael Berk [00:32:38]:
Gorkam, I'm curious from your perspective. Is it a good idea to maintain these edge versions in your proprietary stack and then an open source version as well, or should you just have 2 separate products? Or is there a time and a place?

Görkem Erkan, [00:32:51]:
Yeah. I don't okay. This is not good. This is not going well. I don't I don't believe in having 2 versions of it. Like, I think the open source project should be, I mean, there is an open core model. At the end of the day, the the open source project needs to have a, some some sort of economics that is tied to it. Right? So, and and the open core model, for instance, you can have the the the core that is coming from open search project.

Görkem Erkan, [00:33:30]:
And if the if the system is modular enough, you can actually build your own that is, that includes your own propriety model modules that is tied to your product and that's essentially fine. Or you can even go to Red Hat model where it's actually building just open source project, but it is coming from Red Hat and supported by Red Hat. Right? So there is that model as well. It's essentially not exactly, but we are very close to what you would get from upstream. But you're getting, the open source bets, almost the same bets. So, like, there are things like that. But I don't think that diverging the code base between product and the open source project is good for the open source project. Like, at the end of the day, the major contributor of the open source project, if they are the major contributor to the product, something one of them is gonna be neglected in one way or the other.

Görkem Erkan, [00:34:45]:
Right? So, I mean, that's my preference and I haven't, you know, I have not been able to work in an environment where there was this distinction and and you could achieve same amount of attention to both the open source code base and the the product code base.

Ben Wilson [00:35:15]:
Is there a way to do it if there's 2 separate teams that do it with different missions?

Görkem Erkan, [00:35:26]:
To be honest, I have never worked in a company that that is that rich. But it's like, let's build this solution 2 times. But if, you know, if again, it becomes, like, can you actually align them so that the abstractions are that solid and you have enough at that point, you're you're you're basically building a a a kit, a test kit that

Ben Wilson [00:35:59]:
it approves

Görkem Erkan, [00:36:00]:
or verifies that the not only the the the API abstraction, the API contract is the same, but also the behavior is the same. Because there's always behavioral changes even though the APIs are the same, the behavior may be different. Right? So you're basically building a test kit for that as well, just to make sure that what you're building as part of the product and what you're building as part of the open source project matches 99%. So it's tough, to be honest. That one is that's a that's a very tough decision. There may be reasons that I am not seeing for doing that, and I have been in a project like that before. So, and we hit our reasons for it. Our reason was it was hardware, bound project, so that hardware was not available to, OpenSearch project.

Görkem Erkan, [00:37:06]:
So we had to, like, implement it on a software abstraction and then implement it internally as a on top of the hardware abstraction. So, but other than and even in that case, we were like, yeah. This is this is open source, but it's 2 code basis.

Ben Wilson [00:37:30]:
Yeah. Can confirm. It is challenging to do that on our side.

Görkem Erkan, [00:37:33]:
Yeah.

Ben Wilson [00:37:34]:
And ours is more data volume, like, why we have to do it. And Yeah. You know, if you're a a user who's just downloading this package and installing it for your own single company, you might have, you know, 10 users, or you might have a 100, or you might have a 1000 for for our deployments that we have in order for to to make it economically viable to even just run the service. We have to merge a bunch of things together, in, like, a cloud region zone. Yeah. And then at that point, the open source implementation, like, a standard, you know, single RDBMS instance is not gonna support that number of events that that's happening. And building a robust rest interface is kind of overkill to do that in the open source version of it because nobody's ever gonna need that, and people aren't gonna have aren't gonna wanna download that, install it, and go through the complexity of installing all of these services. You can't get started quick with that.

Ben Wilson [00:38:41]:
It's like, oh, do you have 3 weeks to set this up and set up all your cloud infrastructure to do it? So that's why there's a difference between the 2, but I think it's really important, and I 100% agree with you on you can't neglect that open source version of it. And I'm not a huge fan of that core model at all. Yeah. Where it's like, yeah, here's here's something you can build off of. If you look at most of those projects, they're not super successful. Like, they might have a bunch of stars or something because people were excited at some point. But when you look at the downloads of them and the actual usage, who's talking about them, nobody does anything with them because, yeah, you're getting this interface. You're basically downloading an API contract that you now have to build an entire implementation for.

Ben Wilson [00:39:26]:
Yeah. It's much better if you're downloading something that just works and solves the problem. And the key is, yeah, not neglecting that functionality and making sure that, hey. That that open source version, it had better work, and, you know, it it had better be a rich experience for the targeted demographic of users. Just like, hey. I just wanna run this at my at my company. I don't need to buy a service to do this. This is free, and it helps me out in the the process of building AI solutions.

Ben Wilson [00:39:58]:
Yeah. That that's what our team is dedicated to is, like, let's help those people out and make it so that they have something that's useful.

Görkem Erkan, [00:40:08]:
And, and that's like, for instance, we actually went through a process like that where recently we we just released JOSU Hub to, we just opened it up to to to public. And one of the things that we did when we were, building JojoHub was should we actually open source the sales or not? But it's a it's a SaaS service. If we open source it, it means that we need to say that, hey, you know what? In order to, you know, just give this a trial on the open source project, you need to build this, this, this containers, and that that that services, and then have them running together and then coordinate that, apply this Terraform and this panels. Like, I have worked on projects like that as well, and they never get contributors. Like, they don't get the contributor attention because nobody wants to go through that pain, of putting something up so that they can start testing it. Right? So we decided not to open source to Juzu Hub, at that point because we didn't believe that there would be any community benefits to it. You know, if someone can convince me to that there are community benefits to it, we'll consider that. But, at the moment, I don't I don't see it.

Görkem Erkan, [00:41:35]:
But on the KitSoft side where we hit we came up with the KitCLI and so and so forth, today, KitCLI is like, get checkout, go run, and you're testing the latest code. So, I mean, when things are that easy, you actually have come to benefits. People can get into the the the project very quickly and, see the benefits very quickly, as well. So, like, I mean, that is one of those things where you don't see too many SaaS open source projects that are, you know, getting a lot of outside contributors other than the core team that actually started the project.

Ben Wilson [00:42:21]:
Yeah. Exactly. I mean, that that just maybe like, gave me not so much a flashback, but I was just imagining what it would be like to open source the actual implementation that we have for the back end of MLflow, and, like, how many people would need to be involved in setting that up. Because there's considerations for, you know, a managed service where you have customers that are paying you to do this thing. Well, you have to think about stuff like EU regulations. So GDPR, somebody generates data that has something to do with them that there's an identifying set of characteristics in there. Well, you need to be able to purge that data by international law. Well, do we do that in the open source? Well, that's another service that needs to run and as part of your open source deployment.

Ben Wilson [00:43:17]:
So there's a a VM that needs to run and needs to be running this, you know, series of scripts that are doing this thing. And then your database layout, like, okay, for full scalability, you need to deploy Kubernetes and run all of these things on pods and have auto scaling storage and have other considerations about what happens when the indexes need to be rebuilt. So you now are stacking all of these services up and up on top of this this open source package. I can't even imagine what a getting started guide would be like for that. And then people are like, oh, just do it all in containers. I'm like, well, you can't run all this stuff in containers. So some of this is manual to set it up for the first time. And, yeah, when you have that much complexity to get started,

Görkem Erkan, [00:44:08]:
I think

Ben Wilson [00:44:08]:
it would just be amusing for somebody to do that kind of as a joke.

Görkem Erkan, [00:44:11]:
Yeah. I mean, usually, the first thing that I have seen that people have is authentication and authorization. Right? Like, that's like, I mean, we do have solutions like OIDC and so on and so forth, but at that point, everyone's authentication and authorization is different. Yes. Right. And now you're starting from that. The everything else you said, like the GDPR, that's a big service that and, you know, data governance, rules as well. It's like, and and all the isolations that you have to put into place and so so it just makes the the SaaS service so complicated.

Görkem Erkan, [00:45:01]:
Right? That trying to get give someone a getting started becomes an an issue, essentially.

Ben Wilson [00:45:11]:
Yeah. And if you extend out the ability to open source a lot of those back end abstractions, which for SaaS, you're not necessarily gonna do that. You know, it can be like, well, we need to support lineage tracking for all the data that's generated. As a SaaS service, you're gonna build integration into your lineage tracker. But if you're or like, well, we need to make a pub like, a plug in approach for this internally. Nobody's gonna approve that because you're like, well, why would we need to do that? That would introduce so much more complication and code complexity. So let's not do that. But when you open source it, you would have to build that interface.

Ben Wilson [00:45:48]:
Yeah. And then you would have to document, like, how to integrate with that. I think from a sheer documentation perspective, given, like, a full export of a SaaS service to open source is just untenable.

Görkem Erkan, [00:46:05]:
Yeah. And yeah. And then but and also you need, like, the a SaaS gets very, very complicated very easily. Right? It's like you as I said, you once you put in data sovereignty into the picture, it becomes a completely different beast or some other concern like that. And that concern on an OpenSearch project is not there. Right? That's like, if you're giving a package on an OpenSearch project that actually solves the data sovereignty problem and GDPR problem and all that those problems, I don't think that anyone the the getting started for that is is not followable. No one can follow that. Right? It's it's, and it becomes an over the complex thing that not nobody other than the core team is able to follow.

Ben Wilson [00:47:02]:
Exactly. And even then, there's probably not even one person on the core team that understands all of it.

Görkem Erkan, [00:47:09]:
No. I don't know.

Ben Wilson [00:47:09]:
Like, there's no way. Like, I don't know how many teams do we have dealing with our back end. 7 for just the actual code and then services, 12 different teams, each with 8 to 8 to 15 engineers. You know, it's yeah. It becomes such a a beast, of complexity when you're you're talking about that.

Michael Berk [00:47:39]:
Kokam, I have a question for you. So you get to sort of play conductor as CTO. You get to start with the design, figure out how things should be built, figure out how things should be decided. Have you enjoyed that process? I know you've done a lot of projects, but, you get to sort of start from scratch.

Görkem Erkan, [00:48:01]:
Yeah. I think that when, you know, when when you're senior enough as an engineer, you get to to be, able to do that in in in other companies as well. But, yeah, I have always enjoyed that, process of, you know, let's start something new or something additional on this. And let's you know, if if it is large enough, let's pull in enough teams or enough people to this. Or, if it is not large enough, you know, sometimes it's, there are projects that I just started over the weekend that grew into something else. At some point, it's just, you know, code is just there to show the vision. And I always joke about it. It's like when I introduce this to engineers, I say, I wrote some code.

Görkem Erkan, [00:48:57]:
Now shows how it works. Now make it good. Because no. It's just a code that I have. It's a different way of of explaining your vision at that point. It's like, oh, this is how I see this happening. And then you explain that and and and see if the engineering team actually is is, able to internalize it and help them internalize it, and then they can take the project to the next level. And at some point, it becomes, you know, you work either part of that, project as any other engineer or you kind of, you know, coach the team to the vision as well, where it needs to go and then coordinate with the rest of the company, because no project is successful without enough, marketing, enough product product management, program management, enough customer interaction.

Görkem Erkan, [00:50:01]:
So,

Michael Berk [00:50:02]:
So do you still file PRs?

Görkem Erkan, [00:50:05]:
Oh, yeah. How? Oh, yeah. More than ever.

Ben Wilson [00:50:13]:
Yep. So what's the size of the engineering organization that you will no longer be filing PRs against company repositories?

Görkem Erkan, [00:50:26]:
I I can't imagine the size. Like, I will I will probably continue to do that. I I can't really imagine the size where I will be checking out code and, and doing something with the code. Or, you know, sometimes it's easy to explain a vision or or a feature in words, but sometimes it's just easier to express that on in code. Right? When someone sees actually something working and how badly written or how, easily broken that is, like, it gives them a better idea of where this needs to go. So I can't imagine myself giving up on that. It becomes a different challenge if because your schedule is full filled with, things that do not allow me to, you know, spend enough time on on a code base, but that's a different and, also, sometimes, you actually annoy your engineers by sending PRs that are completely off where the code base is. It's like, yeah.

Görkem Erkan, [00:51:53]:
I guess you need to rewrite that. Well, and that's the the other thing. I like to work with engineers who will tell me that if I am being ridiculous with that code. So I do work with those sort of people now. So it's like, yeah. You written that, this, but I don't know what you were thinking is is is a very common, out pull request for the view that I get. So, like, you started the code base or you were part of the code base actively 6 months ago, but, you know, code bases don't stay the same.

Ben Wilson [00:52:30]:
So in your history of working at at companies, I'm sure you've worked for people in your current position, that come from 1 of 2 sides of the house. The actual builder, which would be you, who's, like, still actively involved in, like, what is engineering actually building? And is it like, I wanna collaborate with them and try to come up with new ideas and new implementations. And I'm sure you've worked for, you know, CTOs that are that come more from the executive side. Just like, yeah. They they have an engineering degree from, like, 30 years ago, and they stopped coding 28 years ago. Which one do you do you feel like you or should should say, which type of engineer do you look for in hiring? Like, what their opinion is of a better CTO? Like, do you seek out the people that wanna collaborate with top c suite technical people or people that are comfortable with the the business side of of, executive leadership?

Görkem Erkan, [00:53:39]:
Yeah. You mean, like, if with the engineering team that we are hiring or is it, you know, which one I would prefer? Mhmm. Which one? The like, if, I would say, the engineering team that I'm hiring is should be comfortable with working with anyone depending regardless of their title in the company. And they should be able to say what they need to say to that person, you know, respectfully, obviously, to because, you know, I wouldn't mind if someone comes to me and says, well, your code is just not good.

Michael Berk [00:54:29]:
Right? Just not good.

Görkem Erkan, [00:54:31]:
Yeah. It's like, I wouldn't mind hearing that and or, you know, I'm I'm not claiming to be up to date with every code base that we have, as well. So, you know, that's just normal. What would it be normal if if someone is holding back because it's coming from a c level or a VP level person. So I I I don't think that's that's right. And I encourage people to actually have different opinions or opposing opinions and and be able to express that. That's how we grow. Otherwise, we wouldn't be able to grow.

Görkem Erkan, [00:55:17]:
Right? So and for people who I'd like to work with on the c level or VP levels and and others, you know, the executive levels, I would prefer people who are who encourage an open discussion as well. And everyone should understand that at the end of the day, there will be a discussion. And if you don't agree you don't agree, but someone who has the responsibility makes the decision. So you will have to commit to that decision, but at least you will have your, opinions heard and, given the thoughts. I think that's that's the right way to do it.

Ben Wilson [00:56:06]:
So I think that you really enjoyed your time working for people that are like you. You know? And Yeah. Myself, personally, I also love that. Like, when you have a CTO who knows how to code and is still coding, like, every day at least, And they can have that conversation because they're they might not have all the context that, you know, your immediate project is working on. You don't need that either. You just need that wisdom. And, like, somebody who's been there, done that many, many times and has seen all sorts of stuff to come in and and be able to understand everything that's being said in that meeting room or that discussion and contribute, sometimes even examples of, like, hey. We're not we're not speaking the same language.

Ben Wilson [00:56:52]:
Give me give me 2 hours. I'll send you some code, and then we'll be on the same page. And when you get that response, you're like, man, I love this place. This is this place is awesome. Yeah. Because you can have that back and forth with somebody who's technical and, you know, that code you you get from them might not be, you know, the standards of what your project is right now. It doesn't need to be. It's just that you're now speaking the same language that the team is speaking, so everybody can understand very quickly what the idea here is.

Ben Wilson [00:57:26]:
And from my own experience working at companies where the CTO is so checked out of the tech world that it doesn't matter if you're explaining it to the CEO, the COO, or the CTO. It's the same conversation that you have to have with all 3 of them because they're all equally ignorant about of, like, what your team is doing or how it operates. I find that very frustrating and stifling for an organization when Yeah. The person in that role is just not effectively qualified for it.

Görkem Erkan, [00:58:01]:
Yeah. Or yeah. Because, like, it's just you know, I sometimes joke about it. It's like they're running a tech company this month, and they're gonna run a automotive company next month's kind of kind of executives. Like Yep. But they they are tough to work with. Yep.

Michael Berk [00:58:25]:
Well, I'm glad that I'm low enough that I don't have to deal with any of this politics. I can just type on my computer, put on headphones, call it a day.

Görkem Erkan, [00:58:35]:
Or at

Ben Wilson [00:58:35]:
some point in the future, Michael, you'll be at a level where you'll be in the room with the CTO discussing something. And if they come from a tech background, you'll realize that that the result and the efficiency of that meeting for a half an hour, you'll get everything discussed that needs to be discussed, and you have, you know, a a vision forward that's probably gonna work out really well, or it's gonna fail really fast, and then you'll have a bunch of alternatives to explore. And you're gonna have mutual respect with that person, and it's it's super powerful versus the having that same discussion with somebody who doesn't have that background is a 4 hour meeting, and the only result of that 4 hour meeting is confusion on both sides or frustration on both sides.

Michael Berk [00:59:24]:
No. Yeah. I'm aware. I'm in I'm in a project right now where we have let me make sure I'm abstracting the details enough, where we have basically 2 leaders. 1 is a tech leader, and 1 is a tech in quotes leader. And, we like, 2 hours of the actual tech leader's time has been more valuable than 6 weeks of the non tech leader's time because, like, it's just these are the seven things. This is what you should do. I will follow-up on 3 of them.

Michael Berk [00:59:53]:
Done. So I from an external perspective, I can definitely appreciate. Well, anyway, I think we're about at time. Let me quickly summarize. So open source software, you have to think a lot more into the future than with proprietary code. Choosing what to build, the default should be not to build it because we all have important lives where we wanna not be working all the time, and we need to prioritize. And a way to do that is getting direct feedback from customers, specifically design partners that you can have open dialogues with. Both the benevolent dictator and the benevolent senate models can work, but they both have their pros and cons.

Michael Berk [01:00:35]:
And then finally, don't include any open source and proprietary version of the same project. Make like, try to in and conglomerate them into 1 project instead of, creating 2 separate teams. But as we talked about today, might be hardware. It might be just scale. You sometimes have to split out these projects. So, Gorkam, if people wanna learn more about you or Jozu, where should they go?

Görkem Erkan, [01:01:00]:
Jozu is on jozu.com. So and, we have just released our hub, which, is an experience on OCI registries for model kits. Model kits are our, OCI artifact format for storing, AI, ML, artifacts. And if you want to learn more about the open source project, where the model kits are coming from and the CLI is hosted, it's kitops.ml. And you can find me, as Gurkha Marjan on Twitter, as well and, yeah, and LinkedIn. I'm sure there aren't too many on LinkedIn.

Michael Berk [01:01:57]:
Awesome. Well, this has been a lot of fun. I had definitely learned a lot. And until next time, it's been Michael Burke and my cohost. Ben Wilson. And have a good day, everyone.

Ben Wilson [01:02:07]:
We'll catch you next time.

The Role of Open Source in Modern Development Practices - ML 170

0:00

01:02:12

Playback Speed: