Harnessing Open Source Contributions in Machine Learning and Quantization - ML 148
Lukas Geiger is a Deep Learning Scientist, open-source developer, and an astroparticle physicist. He shares his experience using machine learning to analyze cosmic ray particles and detect secondary particles. We explore the challenges and opportunities of open source as a business model, the potential of models for edge computing, and the importance of understanding open-source code. Join us as we delve into the intersection of physics, machine learning, and the intricate world of software development.
Special Guests:
Lukas Geiger
Show Notes
Lukas Geiger is a Deep Learning Scientist, open-source developer, and an astroparticle physicist. He shares his experience using machine learning to analyze cosmic ray particles and detect secondary particles. We explore the challenges and opportunities of open source as a business model, the potential of models for edge computing, and the importance of understanding open-source code. Join us as we delve into the intersection of physics, machine learning, and the intricate world of software development.
Sponsors
Socials
Transcript
Michael Berk [00:00:10]:
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. Today we are joined by a guest named Lucas Geiger. He, studied astrophysics in university and then entered the workforce at Mozilla and now at Plumer Eye, which is spelled plumerai. And, he also has a really impressive suite of open source contributions, For instance, Lark, which is a quantization focused neural network training library, and then also the underlying compute engine for Lark, and then some other contributions like TensorFlow. It's some the small repo that I've heard I think I've heard of. So, Lucas, can you explain what, surface detector traces are and why you'd use adversarial networks to refine them?
Lukas Geiger [00:01:09]:
Yes. It's it's been some time. I, in my masters, I, worked, well, like, in my master's thesis. I worked an experiment called, Piaget Observatory, and it's basically a an array of, like, huge water tanks, somewhere in Argentina. And the idea is to basically, study what kind of, cosmic ray particles, sort of come into our on on on on to Earth. So we have, like, galaxies and stuff that accelerates particles, and any moment in time, we get, like, mostly, protons steps that just come into our atmosphere, and they interact with the particles there, and a thing called, like, an air shower emerges. We basically get get get like a cascade. And, on Earth, we can sort of detect these secondary particles.
Lukas Geiger [00:02:15]:
It's it's done like with with these these giant water tanks. So these particles, do something called the torrential torrential radiation, which then can can detect there. And, we used machine learning, to, study the underlying particle that that induce this this basic air shower. And so you can, measure a couple of different different things there. It's like just the arrival direction, which basically you can get from the time information of, like, a shower front, arriving, at at at the at the Earth's surface. The digital energy, which more or less is the amount of signal that you get there. And, one thing, that sort of news you wanna try to, know the mass of of of the particle, like, whether this is, I don't know, carbon or whether this is just, hydrogen, or like the the core of hydrogen atom. And simulating this is all very, very hard and very difficult, and they are basically, we used or we tried to use, generative, adversarial networks to adapt the simulated data to more accurately reflect the real world, sort of measured data, train supervised models on top of that, and try to predict a a a value called x max.
Lukas Geiger [00:03:45]:
It's It's basically the maximum of the of the of the air shower. Sorry if I've been a bit confusing. It's been some time since since since since I I worked there, but, that experiment had direct measurements for, like, a small subset of events, which meant we could do, like, lots of this this machine learning craziness, and then in the end, have, like, a really independent measurement which would allow us to, evaluate and sort of proper properly test that, with with, yeah, to also check, like, what what sort of the the error is. Yeah. I only worked there for, like, a a a a year, but, yeah, it was very exciting.
Michael Berk [00:04:34]:
Cool. So let me see if I understand. I'm picturing Earth. There's like a blue ball, and we have an atmosphere. And there's sunlight coming in. Those are particles. And we have carbon, whatever, hydrogen. They're interacting with each other.
Michael Berk [00:04:49]:
And when they hit the Earth's surface, we create a little swimming pool, and that swimming pool will then measure how they interacted. Is that about right?
Lukas Geiger [00:05:00]:
So so, basically, what arrives here at at Earth is mostly, then sort of the the the end product of, the particle positions in of of all the particle physics interactions, and that is mainly neurons and electrons. And if you have, like, a large body of, of of water, that's what these Cherenkov detectors are. These, these particles have a a velocity inside this medium water that is higher than sort of the, the, rate. The
Ben Wilson [00:05:43]:
And the speed of light.
Lukas Geiger [00:05:45]:
Exactly. That's that's what I was looking for. And that's basically what gives you, like, this this this little blue glow if you ever seen, like, video of, like, nuclear reactors that sort of radiation. So you basically get, like, photons, that you can measure with, like, PMT, tubes. So basically, just get a signal and see. So, oh, this is, this is like an event. But it's very hard to distinguish between neurons and electrons, and that's why I try to use deep learning because otherwise, you need, other detectors, which which I think they've upgraded now as well.
Ben Wilson [00:06:22]:
So that's almost similar to the detection protocol that they use for neutrino detection underground. Right?
Lukas Geiger [00:06:29]:
Yes. It's it's it's it's pretty similar.
Ben Wilson [00:06:32]:
Where you're basically trying to measure particles traveling through interstellar space at near relativistic speeds. What does it hap like, what happens when it encounters matter? That's fascinating, man. What a cool project.
Lukas Geiger [00:06:45]:
Yeah. That was that that that was really, really exciting. Just like using machine learning to sort of study physics. What Mhmm. What I Yeah.
Ben Wilson [00:06:55]:
It's like pure research.
Lukas Geiger [00:06:56]:
Yeah. And then, like, playing around with all of these sort of modern things at the time, doing machine learning and, yeah, machine learning and physics and particle physics has been has been has been huge, I think.
Michael Berk [00:07:13]:
Yeah. So was that your introduction to machine learning?
Lukas Geiger [00:07:17]:
Pretty much, basically. So I, I I I I was part of, like, a very good group and was supervised very well, by professor and peach d student, that sort of supervised me. And, basically, the the year before, I took, like, a deep learning and physics lecture decided this is exciting. I wanna do, like, a master's, thesis in there. And in in Germany, you're, like, embedded, like, for a year, in, in in the research environment. It was great. We're allowed to basically do, like, lots of lots of cool stuff and learned a lot. Basically, all of all of the stuff I know about machine learning sort of started off from from from from this this this this year, at at at Aachen.
Lukas Geiger [00:08:08]:
Nice.
Ben Wilson [00:08:08]:
And when you think about a use case like that, it's sort of the the penultimate example of why you would need applied machine learning, to solve a problem like that because I imagine the data volumes that you're dealing with in detector systems like that are pretty pretty large. And then your signal to noise ratio for measuring things like that, where you're talking about, okay. I have this this matter matter interaction or photon high energy photon interaction with matter that then generates this cascade of particles that I'm trying to detect. Imagine scintillating out what the, like, what you're trying to detect, say. And I have some some history with this because with nuclear reactors, those detectors are used as well for, your primary coolant loop, and you're trying to detect, like, what is the the ratio of of vision by byproducts that are happening within a core for, like, research reactors. They do that. And it's those detectors are stupidly expensive, but they're they're single purpose. Like, you're detecting a very narrow band of radiation because of the limitations of the physical hardware and physics in general of how it's interacting with that detector.
Ben Wilson [00:09:29]:
I imagine just getting that raw dump of data using photon detection and saying, where did this come from? It's probably pretty much impossible to do by hand, certainly at scale.
Lukas Geiger [00:09:43]:
Oh, yeah. Definitely. I mean, there's there's there's lots of, like, classical methods, used, like, to to even get the data into, like, a form where you can, like, even even even start thinking about doing, doing, like, machine learning on top of it. There's, like, loads of details with, like, calibration and, sort of, like, simulation of all of these these these electronics and cross validation going on. But in a sense, there there there was, like, a really good use case to use machine learning because you have, like, sort of a detector grid and you detect sort of time traces, like, you have, like, pixels and basically you get, like, video data more or less. So it works actually quite well with, like, convolutional networks or like sort of like group and stuff or transformers now. And like everything is noisy, everything is fuzzy. And there the challenge is, like, I guess within any any, like, deployed, machine learning system, how do you validate that the stuff that you're doing is is is correct? Can do, like, lots of fun things and, like, lots of cool papers, but, in in in the end, how do how do you sort of, like, test and verify that the stuff that you sort of did is sort of still, still valid? And then with this data, as you said, like, I can't just look at the image or see, oh, yeah.
Lukas Geiger [00:11:11]:
It's a cat and it detected it as a dog. Stupid model. It's like, yeah, a bunch of, like, raw data. So you need to do, like, lots more analysis on top of that and sort of also really understand your data. It's like very sort of experimental, process and you need to get your hands dirty and sort of trying to do that. Luckily, I was just a master student and haven't so, like, everything was already nicely prepared, But, like, the the the the the the real the, real stuff, there's lots of details that are, yeah, just a lot of work to, to make these these these experiments work.
Ben Wilson [00:11:53]:
I think that statement really resonates with a lot of stuff that we've talked about, Michael, on the podcast about building a team to do something like that, where you have hardware engineers that are building all that stuff. You have, you know, physicists that are trying to understand a phenomenon that exists within the universe. And the ML folks that are working on the analysis and classification of of events. How successful do you think that project would have been? It's just a hypothetical for you. If you had a team of 10 data scientists that just got that data as a dump and then said, hey. Figure this out. You couldn't talk to a physicist. You couldn't talk to anybody who built the hardware.
Ben Wilson [00:12:41]:
You couldn't talk to anybody who was working on the detector array. How do you think that would have gone?
Lukas Geiger [00:12:53]:
I I mean, I can only speculate here. Sure. And, I I I don't think it would have gone very far, in a way. I I don't know. So so so some things might have been better, like, the code might have been nicer. Well, let's say you get, like, actual engineers and, and, like, some things. But I think at the core with with all of these things, it's all about understanding your data and understanding sort of, your distribution, or sort of the the the the problems that you're actually trying to solve, with sort of these physics experiments. At at some point, I've thought more, okay.
Lukas Geiger [00:13:34]:
I'm I'm I'm not doing physics anymore here. I'm just, like, playing with TensorFlow and training models. But but then at the end of the day, it's all about, like, asking the right questions, designing the experiment, and understanding what I'm trying to measure. And in in a very similar way to to, like, in in in industry, where where in the end of the day, it's all about, like, you you're trying to solve a problem or you're trying to sort of get an answer to a question. And and I I I I I I think that's quite crucial to have, like, domain knowledge in in in in that. And I I think we've seen that trend in general and also, like, in in in academia and, like, research where, I mean, example, like like, in doing COVID, we have we've seen so many papers, like, from machine learning folks doing like some great machine learning on some medical data. And we've we've we've we've seen also, like, medical folk folks, like, taking some machine learning and applying it to their data. And both of these pro approaches, like, I think don't really work very well if you don't really have, like, an understanding of, like, both the tools that you're using and sort of the the the underlying, like, mass capabilities of your analysis chain Yep.
Lukas Geiger [00:14:59]:
And also the domain area because, like, if I'm like, now as a machine learning person, take some medical data from my nice model on it, like, I don't know. I I I I wouldn't measure something interesting or wouldn't be able to answer something interesting in a way. And the other way around, it's it's it's sort of I've I've seen it a lot. Sadly, like, practitioners not understanding sometimes the limitations of some of these approaches with respect to uncertainty or, just just just other things. Yeah.
Michael Berk [00:15:32]:
Yeah. You you nicely reconfirm something that Ben and I think very strongly, which is collaboration between subject matter experts and data scientists is essential to build good tools. And specifically in industry, it's often going and talking to your sales team, your supply chain team, your whatever team, and using your expertise to apply, solutions to their problems. And in your case, it's it's physics. And, I mean, just just thinking about Ben's question of 10 data scientists, I I based on my explanation of the blue ball in the swimming pool, I think I would do a great job, but all other data scientists might struggle a bit without without some physics under their belt. Yeah. Anyway, really really cool intro to machine learning. And I wanted to ask you, how did you get into small models and quantization specifically? Was it a a natural step, or or was it via open source? How did you transfer into that field?
Lukas Geiger [00:16:34]:
It it it it was pretty at random, more or less. Like, I stumbled off on Climbra, the company I work now at, through a a conference, and, decided, well, to join. And, because at the beginning, we did quite a lot of work on, like, extremely quantized neural networks, and, like, binarized neural networks, and we were a I was able to do, open source, in sort of the first one and a half years or so of the companies, we did quite a lot of work and and and open source. And sort of that just excited me. And so at the end of the masters, I was like, what should I do? And I wanted to learn more about machine learning, deep learning, and thought, okay. The best way is, to, at least the best way for me was to join a start up and and and and see how that goes. And, I mean, like, 5 years later, I'm still the same company. So clearly, some some thing went well.
Lukas Geiger [00:17:43]:
But, yeah, that's so I I I didn't particularly go into, like, small models or efficiency with with with with with, like, a concrete goal. Other than that, it seemed like a fun and interesting problem, to solve. And yeah.
Michael Berk [00:18:04]:
Yeah. And before we get into the technical, and the technical is super, super interesting, I've noticed that open source is now a really business model for companies and, for instance, Databricks. And, Ben, I was wondering your take. How has open source become sort of a linchpin for profit? It's free code. How can people make money off of it?
Ben Wilson [00:18:32]:
To echo some of the conversations I've had with Databricks founders, If your goal is to write some cool open source and then start a company, there's no harder business model for you to succeed with. It requires you to do 2 things that are almost impossible. 1 is create a super popular, critically important open source project, which if you look at the commits to Maven or PyPI or, heck, look at Apache Foundation projects. How many of those are still in use, 3 years after they, like, were first released? And how many of them, setting aside, you know, crowd hype stuff with, like, GitHub stars. Don't pay attention to that. It's all about downloads. How many people are downloading your your package? And if you're curious, go on to Google. They they maintain download statistics for pretty much all open source repositories.
Ben Wilson [00:19:39]:
You can pull, you know, you get a a BigQuery account, even the free tier. You can query those datasets and see what your project is doing. And if you're thinking of starting a company off of something like that, you should be in the 100 of thousands of downloads a week, as a start. So it's like, hey. I've established street cred here. People know what this is kind of maybe in some niche environment. But then to make a successful company, whether or not it's open source or closed source, you still have to make a successful company. So coming from open source to a successful company, there there's no causation there.
Ben Wilson [00:20:23]:
It's 2 independent activities that have to the same group of people have to nail at the same time. So can you do it? Of course. Like, look at Databricks. It's valuation. It's insane. There's other companies that started like that, but it's point 1 percent of people that have submitted an open source package and then tried to start a company. They and they become successful enough that they can still pay salary for 10 years. It's hard.
Ben Wilson [00:20:55]:
But if you have a great idea and you feel like, hey. Part of our tech can be open sourced. Do that while you're building your company. Like, that's that's total legit. Give back to the rest of the world. I think it's it's important. You're gonna have to create some sort of edge feature so that people are, like, repaying them that you like, so that they manage this open source for us where they build features for us that also go into open source. You know, there's a lot of ways to do that, but people looking at a unicorn and Databricks is a unicorn.
Ben Wilson [00:21:29]:
It's very, very, like, a special place. Right time, right people, right ideas, right execution. If you look at that and think, I can replicate that, it it's not trivial to do that. It's hard.
Lukas Geiger [00:21:47]:
So So
Michael Berk [00:21:48]:
so for Ploomer, I were you guys starting from open source, or were you guys starting from closed source and then decided to open source Lark?
Lukas Geiger [00:21:57]:
The fact that we open source Lark, that was something, from from from from the start. It's sort of like that that idea was before I joined the company, or or already there. In in a sense, we started out, so so so the so the goal of the company is is is still make very efficient models or like basically make AI on the edge and on tiny devices possible. So in that sense, we sort of started out from the, from the most extreme form of quantization and then research in binarized neural networks, which is basically where I take weights and activations, and instead of making them having them in, like, float 32 or, int8, you just like have a single bit, which comes with some very interesting challenges, if you try sort of, to to train these, things. And in parallel, building out, the inference side and also, work on sort of the, the the the the the hardware side, in how to potentially put that into a chip form into, in in into hardware. And sort of now we've we've we've taken all of sort of these learnings, that that that we had from, from this and sort of the the the the setup that's sort of, like, rooted in a way in in in in in in in in research, across, like, hardware software, training algorithms, and our focus on sort of, like, smart home, in on very tiny devices and microcontrollers, like, I don't know, like, on my desktop, like, tiny little board which runs in a deep learning model on computer vision, which on microcontrollers is quite something unique. I in in in the past, most of the deep learning models running on these devices have been either sensor data speech, which is sort of a much more, I don't wanna say, easier modality to work with because it every everything has its challenges, but, just the the the input data is already much, much, much smaller, which means, just getting, like, camera pipeline working that runs efficiently can sometimes be challenging, on on on these things. So yeah.
Michael Berk [00:24:37]:
Got it. So question. We're making models smaller. Does that mean that training time is faster and cheaper?
Lukas Geiger [00:24:48]:
I mean, it depends. It depends. So sometimes it can take longer to train these models, but the the the good thing about these these these models is, like, we're not training 1,000,000,000 parameters, large language models that don't even fit on a GPU. So in in in in a sense, like, the core structure of the model, is definitely smaller, and I think, that definitely helps with sort of the the cloud budget and sort of the general the compute budget you need for training these networks. And it it it also helps in terms of patience. Like, I don't need to wait, like, 3 weeks for this model to finish training before I can sort of get an idea whether I, the the the model is sort of does something sensible. There are sort of certain certain things you need to do in in terms of if if you talk about conversational pruning that that, might lengthen training time, but that's that's that's not inherent. That's that's just like limitations of, like, frameworks that we're using or or sort of times sometimes hardware support, on on on on the GPU side.
Lukas Geiger [00:26:07]:
I I would say training tiny models is, like, a lot of fun because we can do you can do like a lot, and you can train actually quite a lot of models also in parallel, which you probably can't if you train the next GPT 5. Probably don't wanna run free runs in parallel just to see what what happens.
Michael Berk [00:26:29]:
Got it. Yeah. So the basis of the question was, if we're using quantization to reduce, the precision of your, weights. Theoretically, there's less information in each weight and you would need more nodes. Is that correct?
Lukas Geiger [00:26:48]:
Yes. However, I mean, yeah, there's just less information there and sort of less things that that that this this one value can represent. However, if you look at, like, many common architecture that are out in the field, I don't know, like, ResNet 50 or so, these these these models I mean, for now, current standards, quite tiny, but, like, for computer vision, still sort of, fairly fairly usable. These models are quite heavily over parameterized, so you would be surprised with, like, how how much, precision you can, like, reduce these rates and activations, and and sort of still keep high accuracy. Like in our current models that that I'm training, like, we see no difference between, like, let's say int8 and, float32 if you do conversation aware training. And sometimes it can even help like you you you just sort of remove, like, a different like, some class of of problems. So you don't have to worry that much about overfitting. Like, it's it's a pretty good regularizer constraining your, your your precision.
Lukas Geiger [00:28:07]:
And to to make things efficient, it's not only about the weights, but also, about the activations. Does that answer your question?
Ben Wilson [00:28:16]:
Yeah. I think so. I mean, I saw the same thing when doing playing around with the images many years ago with some actual, you know, deployed projects where your first thing that you wanna do, if you're like, hey. I don't know what tooling I'm gonna use here, and I certainly don't wanna spend 3 weeks taking a whole bunch of image data labeling it properly just so that I can build a model to get a prototype out of, which is gonna take 3 days to train. And I might it might not even do what I want it to do. So you go, oh, I'm gonna take one of these open source models, and I'll just, you know, freeze everything up until the last, like, 3 or 4 layers and then, you know, fine tune it. And we did that, and we're like, wow. This is actually working really well.
Ben Wilson [00:29:08]:
This is so much better than, you know, the the old machine vision that we were using before. TensorFlow is pretty awesome. And that was where our prototype stops. When we went to planning out the project to say, what are we gonna do here for production? And we were looking at the size of that deployed model. Like, yeah, we can't deploy this thing on a VM or on Kubernetes and have and build all that infrastructure. Like, it's gonna cost a fortune to run this. So we decided to, like, approach it with understand the architecture and and the structure of the open source implementation that's generalizable. You know, it's a generalized image classifier.
Ben Wilson [00:29:53]:
Like, do we need this many layers and this many nodes? Like, is this even useful? So we did a bunch of tests, and we found out that the things that met our needs was, like, 5% the size of ResNet. And with really good labeled data and some clever little feature engineering on the images, it was able to beat the performance of the fine tuning by a large margin. And it it was dirt cheap. I mean, we probably could have deployed it to people's cell phones. It was so small. So I I'm a huge believer in it. And there's actually people doing that for LLMs now, where it's like, hey. You have this specific domain.
Ben Wilson [00:30:33]:
You know, take a pretrained model, but take the smallest one that they have, like, the smallest version of it that has the the least number of weights. It knows how to understand human speech and just train it on your specific domain, and it turns out it's faster, cheaper, and a lot of times more accurate.
Lukas Geiger [00:30:54]:
Yeah. I I think I think you make a very good point here. I mean, like, for efficiency in general, I think we had a very exciting time, because for a very long time, efficiency wasn't that big of a deal because, oh, just get, like, one additional GPU, like, yeah, whatever. Now with these these these huge I I mean, I'm I'm talking about a server side on on, like, embedded stuff and, efficiency always was, like like, huge. But what what we're seeing now at, like, the the complexity that you were talking about of, like, having a model that, like, I don't know, my data science team trained on a Jupyter notebook or something and trying to get that into Kubernetes building all this infrastructure, the monitoring. If it runs on a GPU, then you sort of like get You probably want to batch things, there's like a whole other set of things that you need to do on top and moving it basically to the edge if you are able to, like, on the cameras itself for us, computer vision onto the or on whatever device your user runs on just, like, removes this entire block of complexity and DevOps and, like, worry. You still need need some of it, of course, but but it's it's it's it's it's it's not it's not so so, model specific. And then the other thing that you said sort of designing for, like, efficiency first, that's that's that's something I think that that resonates super well with me because also if you look at literature, at big conferences and what what people are doing in practice often, we have sort of these reference architectures, and then you get a paper saying, oh, I can quantize ResNet 100 down to 1 bit, or, like, 3 bits, and don't lose any accuracy.
Lukas Geiger [00:32:58]:
It's like, yeah, obviously not. I mean, not obviously. It's very hard work. I don't wanna, wanna wanna make any enemies in in in a way, but in in a sense, it's like a kind of, like a like a very obvious paper paper sort of to write, but, whether it sort of met these, efficiency goals and practice, that's sort of then like another, other story. And oftentimes, we're, much better at sort of starting from scratch. What's the problem we're trying to solve, and and and design an architecture and sort of, like, the whole inference stack, Yeah. With it.
Ben Wilson [00:33:39]:
So I've got a crazy question for you about edge deployment and edge computing in particular. A job I had many, many years ago was working in a factory that made the chips that go into smartphones. And we were at when I've started working there, we are 45 nanometer process. That's how long ago this was. And now we're talking about, you know, near UV photolithography that's being used to produce, you know, sub 3 nanometer transistors. These things are getting crazy. So computing power on a system on a chip that's deployed to a phone, and you could add 2 SoCs on a motherboard with with not a lot of expense these days. Do you see a a point in in the next 10 years where a phone manufacturer is gonna open up their their SDK and their hardware specs to say, you can run arbitrary application code on an embedded GPU on this phone as almost a sidecar.
Ben Wilson [00:34:39]:
If you had the capacity to deploy code instead of model weights to the edge and had a framework where you could say, I wanna be able to train a model specific for this exact piece of hardware and have, you know, validation work for it. You know, some sort of framework that would allow you to do that. Do you think that's where edge computing is moving in the future for sort of, like, code deployment and it it'll generate model weights for you on the device?
Lukas Geiger [00:35:15]:
I'm not 100% sure I understand, your question what you mean with code deployment because, if you have, like, all the engineering resources that you want, you can sort of already do that. Right? You can sort of, target as sort of the GPU that's sort of like on on on the device. The question is then just like often, like, to which granularity on, like, a CPU, we we can sort of easily use, like, intrinsics and sort of, like, do like some some d magic, on on accelerators, especially when you sort of have Android on top. Again, it's oftentimes sort of harder to sort of, like, specifically target these things. But what do you mean with, like, generate the weights on the device then?
Ben Wilson [00:36:06]:
I can actually do training.
Lukas Geiger [00:36:08]:
Oh, yeah. We do training. Yeah. So it
Ben Wilson [00:36:11]:
collects the data from that phone or whatever you don't wanna classify people's pictures better or do something with, like, hey. I've seen the raw images that are captured and how somebody edits them on their phone. I wanna be able to detect, like, per unique image that's created, a customized edit that this person's going to like.
Lukas Geiger [00:36:34]:
You know,
Ben Wilson [00:36:34]:
that sort of thing. And you need that training data. You can't just, like, fetch raw data. I mean, a phone company could or some providers could, but if you're somebody who's a service that's deploying edge models, you're not gonna have as like, you're not gonna be able to hit a Google API without paying a lot of money to get people's data like that for your training. And you wouldn't wanna train, you know, in the cloud 1,200,000,000 models so that you could deploy them all to people's phones individually. That's that it that's not really an economy of scale thing. But if you took a, you know, code that you say, here's my training loop, run on the device in the background.
Lukas Geiger [00:37:13]:
Do you
Ben Wilson [00:37:13]:
think that's coming?
Lukas Geiger [00:37:15]:
Yeah. Absolutely. I think so. I I, don't I mean, from a technical perspective, I don't see why not. I mean, federated learning and sort of research has been a thing for a while. I I think at first, it will start off sort of having existing models that sort of just adapt to Mhmm. To to user. I I'm I'm not very familiar in in sort of that side of field.
Lukas Geiger [00:37:43]:
Like, all I've I've I've done is basically have a model, and it's sort of fixed once it's sort of on the, customer's device. But, I mean, even what you can do with, like, just as you said earlier, like, with just fine tuning the last layer or, like, training linear classifiers on top of that. That that doesn't really require that that much compute power. And I think the problems are they are mostly in sort of the software stack to support that less sort of on the hardware side. Well, you need for for many of these accelerators, you probably would need to have, like, proper int aids or, like, integer training, if they don't support floating point operations. But for me, I guess, like, one one one open question that I always have about these things is, like, how do you deal with, like, monitoring and validation? And and, I mean, like, if I train a network now, I, like, look at my Tensor boards or ML flow or, like, look at my graphs and, like, see, oh, yeah. That was the lost bike. I should I need to, like, use something else.
Lukas Geiger [00:38:57]:
How that process can be sort of, like, completely hands off and sort of ensure that this model doesn't get into, like, a weird state just because the customer's usage is not something that we have anticipated before. I think that needs to have, like, a lot of sort of, like, validation and and sort of and sort of checking to see when when we're sort of in my own area of, like, where where does this works. Sorry. I've rambled along, but I think I think I think that's how it's how how it's going.
Ben Wilson [00:39:35]:
Yeah. So SDKs that would allow for reinforcement learning with human feedback as the user being the human feedback to say, I accept your suggestion or change it this way. And then it actually gets that feedback and adapts its weights to create the next iteration.
Lukas Geiger [00:39:51]:
Not sure if you wanna do reinforcement learning directly on there, but,
Ben Wilson [00:39:57]:
you need some serious
Lukas Geiger [00:39:58]:
like like like like adapting, predictions or doing, like, a couple of training steps to sort of, for, like, image generation things or so as you sort of need to have a retrainable component. I definitely see that in the in the realm of technically possible.
Ben Wilson [00:40:20]:
Yeah.
Michael Berk [00:40:22]:
The monitoring system would be kinda crazy. Ben, how would you do it? So for reference, Ben has built a 1000000000 monitoring systems. And so the
Ben Wilson [00:40:34]:
the monitoring system for edge computing where you're getting feedback in order to update your training code that you would deploy.
Michael Berk [00:40:46]:
Yeah. Would you basically send out the data to some server from the phone? Would you have some yeah, Lucas?
Lukas Geiger [00:40:54]:
I don't think you wanna send your data over the phone. I think that's the whole point of, like, doing it on device. Like, I don't Yeah.
Michael Berk [00:40:59]:
Exactly. But It becomes kind of ironic. Like, we don't wanna send data to the phone, so we'll send data from the phone instead.
Ben Wilson [00:41:07]:
But you'd still wanna have some sort of callback that's running so that the creators of the code would want a very stripped down metadata version of what the status was. So if you're looking at, like, thumbs up, thumbs down from, like, the recommendation that was presented, the user liked that. You would anonymize that, get some sort of ID from your commit hash of your code that was sent over, and you would know the device, the hardware, like, certain things about that that device where this was run. And you would instrument that in an anonymized way to some centralized servers that you could look and say, how are our models doing? Because otherwise, you're just operating in the dark, and you would have to offload that judgment to the device, which I personally would never do something like that. Like, it is fire and forget. You have no idea if you just shipped garbage to a to a user. How would you do a rollback? When would you do a rollback? Like, that's that's something that I'm curious about. Lucas, how do you roll back for edge deployment?
Lukas Geiger [00:42:21]:
So, in in in our our current, I mean, for for our customers, what's what's what's what what what we do often, it's quite a lot of, like, testing in advance, and there once a model is deployed, since it doesn't really change anymore, a rollback can be, like, done like any other over the air update, for for for for these devices. So it's not something that, like, adapts in an hourly way Right. And then sort of get gets in a weird state that we need to, oh, we need to roll back. And So
Ben Wilson [00:43:02]:
you could do either a roll forward or a rollback by just saying, here's the the actual, you know, commit hash that we used for this deployment.
Lukas Geiger [00:43:11]:
No. Actually, it it's the binary. It's it's the Yeah. The binary.
Michael Berk [00:43:15]:
Do you ever deploy the devices that don't have a connection to the Internet of the world?
Lukas Geiger [00:43:20]:
Yep. How do
Michael Berk [00:43:21]:
you roll back on that? Go get the device and reinstall USB
Lukas Geiger [00:43:25]:
So, that's so so so so what what we we provide mostly the the the sort of the, the software and sort of the the whole amount solution. So from my side, it's like, oh, yeah. That's our customer's problem in in a way. In some form or another, these devices will always be connected to something because otherwise why are you running this device? Unless it's, like, completely integrated, and then you would basically test it and just never roll back, I guess. But you will, at some for for devices that sort of have, like, I don't know, send notifications to your app or, like, a smart home camera or something. Like, there will always be, like, an Internet connection at least at some point because you wanna give feedback to the user or send a notification or, like, stream a video to some place. So in that sense, you don't really have have have have really have really that problem. But if if it's really like a deployment and sort of, like, forget in a isolated environment, you would need to do sort of, like, a lots of lots of, pretesting, or not.
Lukas Geiger [00:44:43]:
You need to do lots of testing anyway before anything goes in production.
Ben Wilson [00:44:50]:
So we've been talking a lot about low level MelOps and DevOps stuff. And we you know, I intentionally guided this this discussion to this point just to give people a little taste of the super unfun things that we deal with as software engineers. We have to think about all this stuff. Like, hey. What how can I prevent myself from getting woken up at 3 AM? You know, respectfully what this is. Or my my whole team from getting on call paged because fires are everywhere. When you came out of your master's program and came to the start up and started working on this, 2 questions for you. First one, how long did it take you to learn all this stuff? And secondly, how did you learn all this stuff?
Lukas Geiger [00:45:42]:
That that is a good question. When I joined the company, I was, like, the second engineer or so, like, at least from, like, the software side. We're like 3 people in an office. And with respect to DevOps or MLOps, if you're not providing a platform as a service or you're not providing a cloud service, you won't be woken up, at the middle of the night because it runs on your customer's devices, more or less. I mean, there's still things that sort of can can go wrong, but it's sort of not So so what I work I I do, like, I'm we don't have, like, a a sort of, like, site reliability engineers so so that need to do these kind of things. So that's a great benefit, I think, of, like, running these things on device because it will, like and any any kind of code that runs anywhere needs monitoring and needs needs testing. But I feel like at least for me, but maybe I just don't understand enough about sort of the DevOps world. There's, like, just like a lot of lot more things that could go wrong once you get a network involved.
Lukas Geiger [00:46:59]:
At least that's sort of how I think about that, and just like a lot more craziness once you get into load balancing and actually have, like, a system of, like, VMs at the end of the days than, like, I don't know, containers that are deployed on top of that with, like, Kubernetes and some sort of scheduling. This is like this entire state space that can go wrong, that you kind of exclude once once you're at least your model parts of the equation. So coming coming back to to me joining the company, what what we did set up, quite quite fast, and that was actually, I think, like, in the first, I don't know, month or so. I I have experienced with that before in, like, Uni sort of, like, managing the the GPU cluster. That was much more about getting a experimentation platform, up and running and and getting us as as like a team of like researchers basically or researchers slash engineers in a position to run experiments effectively and fast at a low cost, basically. Yeah. And I mean, that was a messy process. I mean, I I I I can't lie.
Lukas Geiger [00:48:23]:
We took very early, on the complexity of deploying to Kubernetes just because the tooling we use sort of supported that as a runtime environment. So that was fun. We've never worked with, like, Kubernetes before, sort of deploying that, getting all of that running. But, basically, you're the only users is your internal team. So Mhmm. It's always between working hours. If I'm on holiday and it breaks, well, then they would have to read a paper or have to go to the pub. I don't know.
Lukas Geiger [00:49:01]:
So it's it's it's it's not it's not like there's a customer, that that is that is sort of dependent, dependent on this. But with the team growing, we've definitely had to sort of make all of this infrastructure much more stable. But for us, it was always, like, crucial that we get to a point where we can run experiments super, super quickly, to, like, get a fast iteration cycle. Like, basically, I have my code. I change a line, and hit run, and it should, like, get me a GPU, ideally, like, a spot or preemptible instances because I don't wanna, like, pay loads of stuff. And then you just need some logging monitoring and a way to basically make all of that kind of reproducible. And I think the the the world has moved on from, like, I don't know, 4 years ago or 5 years ago, in, like, the the tuning tooling is much more sort of mature with that. Like, it's like Kubernetes jobs or so.
Lukas Geiger [00:50:07]:
You can do now much more sort of, like, right out of the box. And so I think that to to answer the third part of your question, it was all learning by doing. But, yeah, so so far, that's also been, for me, always the easiest thing. Same with, like, open source or anything others like. I never really liked or got motivated by sort of the I'm gonna do this tutorial, and I'm gonna learn Rust. So, like, that's that's that's I I could never, like, sort of find the motivation to sit down and do, like, toy problems. So yeah.
Michael Berk [00:50:47]:
Did that ever lead to any painful experiences?
Lukas Geiger [00:50:53]:
And and, I mean, I don't know. Like, I I I feel like if you're in research or, like, in engineering or even in physics, at some point, you build up a far quite a high tolerance of of frustration level. So, yeah. Okay. This thing runs out of memory. I don't know. Now that you asked, we have, like, some some very fun bugs, like, just software running out of memory or, like, weird things. And then it ends up being, oh, yeah.
Lukas Geiger [00:51:27]:
I need to use this different memory allocator. It's like, that's just something that you would have never thought of before and sort of you acquire. And then you, like, call support or, like, end up messaging on, like, GitHub issues. So in that sense, like, I think these things have been sort of the painfulest thing for me if it's like if, like, my model doesn't train, it's sort of like I know where the problem is. It's like sitting in front of the computer. But if if, like, something if, like, something isn't working as it's intended, always felt like it is something very unsatisfying if the thing that you're trying to do that should be supported just kind of doesn't work because of some code error that's sort of, like, not within the code base you control, but within the code base of, like, some open source project or some cloud provider that just doesn't do what they should, like, write in their documentation or some some I think for me, these have been the more painful things in terms of, like, I don't know, cluster going down. Yeah. At some at some point, it was reasonably stable.
Lukas Geiger [00:52:47]:
And then be be be be be be be, before that, actually, my colleagues did go to the pub once on a Friday afternoon when, when when everything went went went downhill. But then it was a long time ago that doesn't happen. So now our infrastructure is much more stable.
Michael Berk [00:53:04]:
Good. Yeah. And follow-up question to that. Although the pub does sound really nice. Follow-up question. When I first joined Databricks, my ability to sort of iterate in a sequential manner was not amazing. And that was a focus, when I first joined, which is basically, I want to be able to do trial and error type of things and have every single trial be of high quality so that when I get data returned from that trial, I can take the next step. And so an example is if you're trying to find a number between 1a100, you always split in the middle until you get to your number.
Michael Berk [00:53:43]:
That's the most efficient way. And it's sort of the same process for problem solving. So both of you are successful professionals, very smart. I was wondering what percent of the time you guys do sort of this trial and error approach where you're like, this new memory allocator, let's freaking see if that helps, versus actually going and understanding the root cause and having a pointed solution? Or is it sort of somewhere in the middle?
Ben Wilson [00:54:08]:
So exactly as Lucas alluded to, we are a customer facing organization. So, the group that I work in, we have that severity measurement for anything that goes wrong. It can be, the the docs are wrong. Okay. That's that's a sub 2. We'll get to that in a week. You know? There's other things that are higher priority. And then there's I can't save my model.
Ben Wilson [00:54:38]:
I just trained it. I can't save it. That's the sub one. Okay. Something's broken here. I need to do something about this in the next couple hours. And then there's a I can't do anything with MLflow. That's a sub zero.
Ben Wilson [00:54:56]:
That's bad. That's an incident, usually due to a regression, that we accidentally shipped. So the reason I'm talking about severities and and responses is that your first response, if you are running a service, is to mitigate. And that mitigation can be what changed recently. Roll back now. Just revert to previous good known state. I don't care about the root cause right now. Just get it working now.
Ben Wilson [00:55:28]:
Fix the the problem. I don't care about fixing the actual cause of the problem. I just need to get the system into a state that works again. And you've seen me do that actually, Michael, where, like, releasing a patch release that just reverts, like, 4 PRs. Did I know which one was the root cause or what the mechanism was for why it failed? No. But I know that I did that that system of trial and error that you're talking about, that binary search approach where I take the the bad commit history, run run the code that should work but doesn't work, Verify. Like, yep. This is totally broken.
Ben Wilson [00:56:10]:
Revert 6 commits or 8 commits or 10 commits. Go back 4 days, on the state of master branch. Build a branch off that. Run it. Does it work? Yep. Okay. Go in half again. Broken or not broken? And you you just kinda get a a feel for where was the state of this that was good and then release that.
Ben Wilson [00:56:34]:
Like, patch release, fix the problem. That gives you breathing room now to go and do a root cause analysis. And if you're investigating something as a professional software engineer, you have to do an RCA. You need to know what went wrong, not to say, hey. Jack broke this. Jack sucks. That's not helpful, like, at all. Makes Jack feel really bad and makes everybody else feel really bad.
Ben Wilson [00:57:01]:
The blame is being thrown. Should never do something like that. If anything, you're gonna point a finger. It's at the tech lead or, you know, whoever's in charge of of running that team because they let Jack do that. They let Jack down. They let the team down. They let the company down. And as a a tech lead, you are responsible for that.
Ben Wilson [00:57:21]:
It's your your problem. But proactive approach for this is figure out the root cause because you wanna make sure that a system is now put in place or tests are now put into place or something that safeguards you from allowing this to happen again, that's the real the real key.
Lukas Geiger [00:57:42]:
I I don't know if that
Ben Wilson [00:57:42]:
was exactly what you're asking or if you're asking about, like, how to figure out how to build a feature or something. But you can even use that sort of approach, that thought process to when designing stuff too.
Lukas Geiger [00:57:55]:
Yeah. It was for on
Michael Berk [00:57:56]:
the feature side, but I like you said, it applies to features as well. Yeah. Lucas, what's your take? How do you guys do it?
Lukas Geiger [00:58:03]:
Yeah. So so since I'm more on the r and d side, I have the luxury that, like, these sort of, like, super fast incidents response only, like like, rarely happens. And if there's something that's critical, obviously, that's sort of the first thing. Like Ben said, we work and pray. And I I I well, Ben was talking, I've been trying to think of, like, sort of the the default process that sort of goes goes through my head. And I think the the first thing of obviously, it's, like, sort of this frustration of, yeah, okay. I'm just, like, trying random things. I I think that that especially in an ML, that's, like, some, like, very or, like, it's it's stuff that you don't fully understand down to sort of, like, the the the the the lowest level is sort of often the sort of direct reaction.
Lukas Geiger [00:59:02]:
It's like this thing doesn't train. I know. I don't know. I'll double the learning rate or half the learning rate or, like, add some weight decay, clip some gradients, reduce the batch size because it runs out of memory. I've I mean, I do that, of course, but I often find it, like, quite, like, unsatisfying. So there's, like, sort of 2 approaches. If it's like a code problem, I usually just like if it's open source, I look at the code. That's a first step, and then just, like, try to understand really what's going on and sort of, like, what is the underlying problem, often down to, like, okay, this commit by this open source project, like, 3 years ago introduced that.
Lukas Geiger [00:59:49]:
That's the problem. We can, like, that sort of, like, gives me kind of comfort. And it's like as working in a company versus, like, being a student or, like, working in the open source, sometimes there needs to be a limit to that. Like, is it actually important that I know or is it, like is the work around good enough? So I've definitely had to learn that when, like, joining a company, like, what to investigate and what to just work around in a way. And then the other thing is sort of like the the root cause analysis that Ben was talking about. Yeah. I think it's sort of like my I I I feel my background in physics sometimes shows. It's like all about, okay, let's assign an experiment, and then basically try to simplify and try to remove as many variables, as possible.
Lukas Geiger [01:00:47]:
So like a I mean, it it is a bit of a trial and error, but then it's, much more sort of, I think, sometimes much more targeted. And then over time, you sort of, working at, like, a code base that either sort of you own or sort of, like, like, working with a tool like PyTorch or TensorFlow. So there will be sort of, like, little hunches or sort of little little areas that you're uncomfortable with, and so that might be then sort of a thing that will lead you to prioritize one experiment over another, or like some some experience in the past that that where where something similar went wrong. And but that's not really about how to design the experience, more of like how do I prioritize where to start. And this always helps to ask colleagues, like, go for a coffee, explain them the problem, or, ask for help. That's that's that's usually step 1 or sort of like like like, once you're at the point where you're able to explain the problem to some some some, someone, which can sometimes be very difficult. Like, writing a very good issue on a GitHub repo is something that has surprised me hard. And the reason why I often sent a pull request first instead of, like, actually writing an issue because it's oftentimes easier to fix the code instead of trying to explain very clearly what went wrong.
Lukas Geiger [01:02:24]:
Anyway, I I'm not sure if I answered the question, but I think that's sort of, like like, the thought process behind that. That makes sense.
Michael Berk [01:02:33]:
Yeah. I I think that answered. It's funny to hear both of you have, like, sort of different approaches. My approach and I think it's very subject specific. So it seems like, Lucas, your trial and error is exploration almost. You're collecting data, and then you always go back and do a root cause analysis and determine why the data said what it said. For me, I I work a lot in sort of a consulting capacity, and I'm often given subpar code repos that are just very difficult to understand. And I frankly don't wanna understand them.
Michael Berk [01:03:11]:
So if I can get a green check mark to make it run, oftentimes, I'm very happy about that and so is the customer. But for really important things that are mission critical or that require maintenance, yeah, you need that root cause analysis. You need to go deep and actually understand what was the problem and how to resolve it. But sometimes for little prototypes, iteration is is is what I do, admittedly.
Ben Wilson [01:03:38]:
One thing to add onto the feature definition aspect of it. I mean, I just gave you the task the other day, Michael. Right? You're gonna be building a flavor in ML flow, and there's a bunch of design considerations that you you have to come up with. And the way that you approach that or the way that we do it at Databricks Engineering is through design docs where we do a bunch of prototypes, but it's informed prototypes, and it's exactly the process that Lucas explained where, I don't know. Databricks isn't filled with physicists. A lot of computer scientists, but there are some physicists. But it it's all using the scientific method and around designing of experiments where you're doing exactly that. If you broaden the decision of how to build something to infinite possibilities to say, we could do any of these different ways of building this collection or iterating over this collection.
Ben Wilson [01:04:39]:
There's in any given language, there's probably thousands of ways to do that. If you get creative enough, you could do some crazy stuff in in even high level languages. We don't explore all of those. We just say, what are the simple ones, or what's something that I know for sure based on my own wisdom and, you know, experience? I know this is gonna work. It's probably not the best solution, but it's the first thing I'm gonna prototype and show. Like, does this actually work? Does the code execute? Yep. Okay. I've got that in my back pocket.
Ben Wilson [01:05:14]:
I now wanna explore something maybe a little bit more complex, but potentially way more performant. And I'll do a prototype of that. It's pseudo code usually, like, a a repro, of something very simple. You're not implementing something for production. But then do that, show that, and then start weighing the trade off. So then you're you're going into the analysis of experiments. I've I've done these 5 experiments and collected the data associated with them with, like, code complexity, maintainability, readability, as well as performance. Like, how does this how long did it take to run? How many how big of a collection could I send through this, and how long did it take to run if we're talking about some, like, network IO thing or memory allocation space thing.
Ben Wilson [01:06:04]:
And once you have those 4 or 5 things that you've tested out, then you go for peer review just like you would in the scientific process. You get other experts to weigh in, and people will vote. You know? Some actively vote with saying, yeah. I agree with your conclusion that option 2 is really good. Other people vote with their feet. They just either they they think that the idea is really stupid. They can tell you something as well. Or you're gonna have people that strongly do not agree with your conclusion.
Ben Wilson [01:06:40]:
And that's why it's important to have that data there to back up, and you you challenge them. You say, here's the data. Please explain to me with your hypothesis of why this is a bad decision. And that's how you learn. They might have context that you just aren't privy to. Like, oh, I had no idea that we needed to integrate with this other thing in another 2 months. Yes. Okay.
Ben Wilson [01:07:01]:
Option 3 is better here. And then move on. And then you go and just build option 3.
Lukas Geiger [01:07:06]:
Sounds about right. You'll see.
Ben Wilson [01:07:08]:
You know? You're gonna you're gonna do this over the next 6 weeks, man.
Lukas Geiger [01:07:11]:
Woo hoo. I I I I want to add one one one more thing about sort of this this this you have, like, a bug or something goes wrong or something unexpected happens in, lately, very complex codes or some project that I plan very complex codes or some project that I plan to well maintain you know over a longer period of time or I'm developing a machine learning model that's not just like a quick prototype that needs to get out of the door, but something that will become a basis of something to work, on and, like, keep improving over a long time. I've I've actually always had a quite, like, a positive feeling if something really went wrong goes wrong because that just, like, means there's there's something I don't understand and there's something, like, we either conceptually don't understand, or, or there's, like, something to learn, instead of in in in in that area. So I think that's that's that's always quite quite something that that I find exciting about to to really investigate. The only times where this this comes frustration thing, I feel like, is if if it's like something that I think, yeah, no, this this should really work. It's like sort of like some fundamental infrastructure layer is like broken and weird things or like some some, some supposedly stable API doesn't do what it says on the 10. That can be very frustrating. But sort of for the for the type of things that that are a bit more exploratory or a bit sort of, still in the very early development phase.
Lukas Geiger [01:08:55]:
I I I found it that that that is quite, quite, can be quite fun and sort of, like, very interesting also to debug these things.
Michael Berk [01:09:04]:
Were you always like that?
Lukas Geiger [01:09:09]:
I don't I I don't know. I mean, like, in in in, if I think back to, like, you and you are still, like, there's been, like, definitely, like, days where, like, 5 people sit on, like, this this this this sheet of paper that, like, thousands of students have solved before, and you have no clue. And I guess after, like, doing that for a couple of years, then then it's sort of like, yeah. Okay. It's fine, but I don't know this right now. I'll I'll hopefully know next week or yeah. Hopefully. Yeah.
Lukas Geiger [01:09:47]:
Yeah. No. I I always find it quite positive because it's like that there's there's something something fun happening, and I know my idea of fun might might not be everyone's idea of fun. But
Ben Wilson [01:09:58]:
yeah. Yeah.
Michael Berk [01:10:01]:
And I just one note before we wrap. I know we're over time. You can train that. You can teach yourself to enjoy the failure and enjoy the pain. Like, it's it's a really interesting process, and I suggest you sort of explore rewarding yourself for breaking stuff. I've done that. I've I've I've I've did a day literally this week where if I broke 5 things on this project, I would order myself takeout sushi. And that sushi was delicious, and I loved it to death.
Michael Berk [01:10:32]:
And I so whatever motivates you, adding rewards is is definitely a a useful tactic.
Lukas Geiger [01:10:38]:
I mean, the weirdest thing is if you want to break something and it doesn't break, then I'm really confused. So, like like, you say, okay. Now I like That's
Michael Berk [01:10:47]:
a good point.
Lukas Geiger [01:10:47]:
An extra learning rate and it doesn't break, then then my confusion starts.
Michael Berk [01:10:52]:
Yeah. Yeah. The the flip side case is really relevant as well. Like, stuff should break where it should. And if it doesn't, that's also not understanding. So
Ben Wilson [01:11:01]:
Unless your repository is just full of a bunch of dead code, which I'm sure you've seen those repos, Michael, in your line of work. I know I did when I was doing your job. Like, yeah. This is cool. This is 30,000 lines of code in a single document. Why would you do this, and how do you find anything? And then you you prove out that an entire function is never called by just saying assert one equals 0 in, like, within the function call itself and then run the entire, like, from main and nothing fails. And you're like, see? Please review your code base for like, copy this whole thing and put it in an IDE, turn on the linter, and see what shows up. And it's just Christmas tree of just red everywhere.
Ben Wilson [01:11:45]:
Like, yep. Yeah. Yeah.
Michael Berk [01:11:48]:
No. I actually have never seen that. In the in the, like, 18 months since you left the field, the all code is solved. Cool. Uh-huh.
Lukas Geiger [01:11:58]:
One thing that I've always surprised me or like sort of like something that I didn't think about when I started out is like how long code actually lives. Like, it never dies and it sort of never gets removed. Well, sometimes it gets removed, but like some piece of code I'm like, I might have written 15 years ago. Well, not quite. 10 years ago, let's say. Still around. That's that's that's, like, that's crazy, and it applies to so many of these these these these tools we rely on. I I find it quite fascinating.
Michael Berk [01:12:34]:
Yeah. It's a good point. It's sort of alarming how code you wrote as, like, a junior in high school is still running in a production system. So
Ben Wilson [01:12:44]:
We had that lady on, like, a year and a half ago on the podcast who before we started recording, she was talking about stuff that she did back in, like, 1975 on, these mainframe systems. That was the foundation for a lot of the solvers that are now used in TensorFlow, PyTorch. So the low level libraries that are used for mathematical computations that are written in Fortran, she was on the team that was writing that stuff. And it's like, talk about longevity where your code lasts 50 years without modification. You can still go to that source code and look at it, and it'll have a, like, a a date that this thing was committed. And you're like, that's before I was born. That's crazy, and it's still running.
Lukas Geiger [01:13:32]:
Yeah. Yeah. Not many people realize that, like, if you use SciPy, yeah, you're using lots of the spot records, probably.
Michael Berk [01:13:40]:
Yeah. Cool. So in summary, lots of really interesting notes here. We actually didn't talk about tiny models as much as I had hoped. So maybe we'll just have to do another episode. But, the things that stuck out to me were, first, collaboration between subject matter experts and data science experts is often necessary to correctly solve a problem. In terms of open source, if you're looking to start a super successful company, you probably shouldn't start with an open source library. But also, if you're interested in seeing what your open source library is doing, you can just go to BigQuery.
Michael Berk [01:14:14]:
There's a free tier. It'll show you what's happening. On the tiny model front, int 8 versus float 32 can have identical accuracy. And, Lucas has been proving that out with his his work. The thing that the sort of the purpose of quantization is it's faster to train, cheaper to train. And honestly, it can generalize better because there's less weights and it's simpler. And also on device training is coming. Then finally on solving problems, if you're accountable, the first thing you should do is fix the problem, get your system running again.
Michael Berk [01:14:48]:
But root cause analysis is an essential final step, whether it be via, data collection and sort of learning about your problem or just building a maintainable robust solution. You really wanna know why stuff broke and why your solution solves what it's solving. So, Lucas, if people wanna learn more about you, small models, your company, whatever it might be, where should they go?
Lukas Geiger [01:15:11]:
Yeah. About Plumerai, there's plumerai.com. You can sort of find, all of, all of the stuff we're doing on there. Some real nice demos as well running in in the web browser, not on tiny devices, but, at least running locally there. Yeah. About me, I'll probably you'll find me on, GitHub, like, l Geiger, is my GitHub handle, on I'm not so active on social media, but you can find me on Twitter, with, like, an underscore l Geiger. And, other than that, yeah, I'm I'm sure that that we'll we'll put links, to the to the profile in the description. Yeah.
Lukas Geiger [01:15:50]:
It's been a pleasure to talk to you. First time on a podcast, it was very exciting. I had a lot of fun.
Ben Wilson [01:15:55]:
Yeah. Thanks for showing up.
Michael Berk [01:15:58]:
Yeah. So until next time, it's been Michael Burke and my co host
Ben Wilson [01:16:01]:
Ben Wilson. And have a
Michael Berk [01:16:03]:
good day, everyone. We'll catch you next time.
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. Today we are joined by a guest named Lucas Geiger. He, studied astrophysics in university and then entered the workforce at Mozilla and now at Plumer Eye, which is spelled plumerai. And, he also has a really impressive suite of open source contributions, For instance, Lark, which is a quantization focused neural network training library, and then also the underlying compute engine for Lark, and then some other contributions like TensorFlow. It's some the small repo that I've heard I think I've heard of. So, Lucas, can you explain what, surface detector traces are and why you'd use adversarial networks to refine them?
Lukas Geiger [00:01:09]:
Yes. It's it's been some time. I, in my masters, I, worked, well, like, in my master's thesis. I worked an experiment called, Piaget Observatory, and it's basically a an array of, like, huge water tanks, somewhere in Argentina. And the idea is to basically, study what kind of, cosmic ray particles, sort of come into our on on on on to Earth. So we have, like, galaxies and stuff that accelerates particles, and any moment in time, we get, like, mostly, protons steps that just come into our atmosphere, and they interact with the particles there, and a thing called, like, an air shower emerges. We basically get get get like a cascade. And, on Earth, we can sort of detect these secondary particles.
Lukas Geiger [00:02:15]:
It's it's done like with with these these giant water tanks. So these particles, do something called the torrential torrential radiation, which then can can detect there. And, we used machine learning, to, study the underlying particle that that induce this this basic air shower. And so you can, measure a couple of different different things there. It's like just the arrival direction, which basically you can get from the time information of, like, a shower front, arriving, at at at the at the Earth's surface. The digital energy, which more or less is the amount of signal that you get there. And, one thing, that sort of news you wanna try to, know the mass of of of the particle, like, whether this is, I don't know, carbon or whether this is just, hydrogen, or like the the core of hydrogen atom. And simulating this is all very, very hard and very difficult, and they are basically, we used or we tried to use, generative, adversarial networks to adapt the simulated data to more accurately reflect the real world, sort of measured data, train supervised models on top of that, and try to predict a a a value called x max.
Lukas Geiger [00:03:45]:
It's It's basically the maximum of the of the of the air shower. Sorry if I've been a bit confusing. It's been some time since since since since I I worked there, but, that experiment had direct measurements for, like, a small subset of events, which meant we could do, like, lots of this this machine learning craziness, and then in the end, have, like, a really independent measurement which would allow us to, evaluate and sort of proper properly test that, with with, yeah, to also check, like, what what sort of the the error is. Yeah. I only worked there for, like, a a a a year, but, yeah, it was very exciting.
Michael Berk [00:04:34]:
Cool. So let me see if I understand. I'm picturing Earth. There's like a blue ball, and we have an atmosphere. And there's sunlight coming in. Those are particles. And we have carbon, whatever, hydrogen. They're interacting with each other.
Michael Berk [00:04:49]:
And when they hit the Earth's surface, we create a little swimming pool, and that swimming pool will then measure how they interacted. Is that about right?
Lukas Geiger [00:05:00]:
So so, basically, what arrives here at at Earth is mostly, then sort of the the the end product of, the particle positions in of of all the particle physics interactions, and that is mainly neurons and electrons. And if you have, like, a large body of, of of water, that's what these Cherenkov detectors are. These, these particles have a a velocity inside this medium water that is higher than sort of the, the, rate. The
Ben Wilson [00:05:43]:
And the speed of light.
Lukas Geiger [00:05:45]:
Exactly. That's that's what I was looking for. And that's basically what gives you, like, this this this little blue glow if you ever seen, like, video of, like, nuclear reactors that sort of radiation. So you basically get, like, photons, that you can measure with, like, PMT, tubes. So basically, just get a signal and see. So, oh, this is, this is like an event. But it's very hard to distinguish between neurons and electrons, and that's why I try to use deep learning because otherwise, you need, other detectors, which which I think they've upgraded now as well.
Ben Wilson [00:06:22]:
So that's almost similar to the detection protocol that they use for neutrino detection underground. Right?
Lukas Geiger [00:06:29]:
Yes. It's it's it's it's pretty similar.
Ben Wilson [00:06:32]:
Where you're basically trying to measure particles traveling through interstellar space at near relativistic speeds. What does it hap like, what happens when it encounters matter? That's fascinating, man. What a cool project.
Lukas Geiger [00:06:45]:
Yeah. That was that that that was really, really exciting. Just like using machine learning to sort of study physics. What Mhmm. What I Yeah.
Ben Wilson [00:06:55]:
It's like pure research.
Lukas Geiger [00:06:56]:
Yeah. And then, like, playing around with all of these sort of modern things at the time, doing machine learning and, yeah, machine learning and physics and particle physics has been has been has been huge, I think.
Michael Berk [00:07:13]:
Yeah. So was that your introduction to machine learning?
Lukas Geiger [00:07:17]:
Pretty much, basically. So I, I I I I was part of, like, a very good group and was supervised very well, by professor and peach d student, that sort of supervised me. And, basically, the the year before, I took, like, a deep learning and physics lecture decided this is exciting. I wanna do, like, a master's, thesis in there. And in in Germany, you're, like, embedded, like, for a year, in, in in the research environment. It was great. We're allowed to basically do, like, lots of lots of cool stuff and learned a lot. Basically, all of all of the stuff I know about machine learning sort of started off from from from from this this this this year, at at at Aachen.
Lukas Geiger [00:08:08]:
Nice.
Ben Wilson [00:08:08]:
And when you think about a use case like that, it's sort of the the penultimate example of why you would need applied machine learning, to solve a problem like that because I imagine the data volumes that you're dealing with in detector systems like that are pretty pretty large. And then your signal to noise ratio for measuring things like that, where you're talking about, okay. I have this this matter matter interaction or photon high energy photon interaction with matter that then generates this cascade of particles that I'm trying to detect. Imagine scintillating out what the, like, what you're trying to detect, say. And I have some some history with this because with nuclear reactors, those detectors are used as well for, your primary coolant loop, and you're trying to detect, like, what is the the ratio of of vision by byproducts that are happening within a core for, like, research reactors. They do that. And it's those detectors are stupidly expensive, but they're they're single purpose. Like, you're detecting a very narrow band of radiation because of the limitations of the physical hardware and physics in general of how it's interacting with that detector.
Ben Wilson [00:09:29]:
I imagine just getting that raw dump of data using photon detection and saying, where did this come from? It's probably pretty much impossible to do by hand, certainly at scale.
Lukas Geiger [00:09:43]:
Oh, yeah. Definitely. I mean, there's there's there's lots of, like, classical methods, used, like, to to even get the data into, like, a form where you can, like, even even even start thinking about doing, doing, like, machine learning on top of it. There's, like, loads of details with, like, calibration and, sort of, like, simulation of all of these these these electronics and cross validation going on. But in a sense, there there there was, like, a really good use case to use machine learning because you have, like, sort of a detector grid and you detect sort of time traces, like, you have, like, pixels and basically you get, like, video data more or less. So it works actually quite well with, like, convolutional networks or like sort of like group and stuff or transformers now. And like everything is noisy, everything is fuzzy. And there the challenge is, like, I guess within any any, like, deployed, machine learning system, how do you validate that the stuff that you're doing is is is correct? Can do, like, lots of fun things and, like, lots of cool papers, but, in in in the end, how do how do you sort of, like, test and verify that the stuff that you sort of did is sort of still, still valid? And then with this data, as you said, like, I can't just look at the image or see, oh, yeah.
Lukas Geiger [00:11:11]:
It's a cat and it detected it as a dog. Stupid model. It's like, yeah, a bunch of, like, raw data. So you need to do, like, lots more analysis on top of that and sort of also really understand your data. It's like very sort of experimental, process and you need to get your hands dirty and sort of trying to do that. Luckily, I was just a master student and haven't so, like, everything was already nicely prepared, But, like, the the the the the the real the, real stuff, there's lots of details that are, yeah, just a lot of work to, to make these these these experiments work.
Ben Wilson [00:11:53]:
I think that statement really resonates with a lot of stuff that we've talked about, Michael, on the podcast about building a team to do something like that, where you have hardware engineers that are building all that stuff. You have, you know, physicists that are trying to understand a phenomenon that exists within the universe. And the ML folks that are working on the analysis and classification of of events. How successful do you think that project would have been? It's just a hypothetical for you. If you had a team of 10 data scientists that just got that data as a dump and then said, hey. Figure this out. You couldn't talk to a physicist. You couldn't talk to anybody who built the hardware.
Ben Wilson [00:12:41]:
You couldn't talk to anybody who was working on the detector array. How do you think that would have gone?
Lukas Geiger [00:12:53]:
I I mean, I can only speculate here. Sure. And, I I I don't think it would have gone very far, in a way. I I don't know. So so so some things might have been better, like, the code might have been nicer. Well, let's say you get, like, actual engineers and, and, like, some things. But I think at the core with with all of these things, it's all about understanding your data and understanding sort of, your distribution, or sort of the the the the problems that you're actually trying to solve, with sort of these physics experiments. At at some point, I've thought more, okay.
Lukas Geiger [00:13:34]:
I'm I'm I'm not doing physics anymore here. I'm just, like, playing with TensorFlow and training models. But but then at the end of the day, it's all about, like, asking the right questions, designing the experiment, and understanding what I'm trying to measure. And in in a very similar way to to, like, in in in industry, where where in the end of the day, it's all about, like, you you're trying to solve a problem or you're trying to sort of get an answer to a question. And and I I I I I I think that's quite crucial to have, like, domain knowledge in in in in that. And I I think we've seen that trend in general and also, like, in in in academia and, like, research where, I mean, example, like like, in doing COVID, we have we've seen so many papers, like, from machine learning folks doing like some great machine learning on some medical data. And we've we've we've we've seen also, like, medical folk folks, like, taking some machine learning and applying it to their data. And both of these pro approaches, like, I think don't really work very well if you don't really have, like, an understanding of, like, both the tools that you're using and sort of the the the underlying, like, mass capabilities of your analysis chain Yep.
Lukas Geiger [00:14:59]:
And also the domain area because, like, if I'm like, now as a machine learning person, take some medical data from my nice model on it, like, I don't know. I I I I wouldn't measure something interesting or wouldn't be able to answer something interesting in a way. And the other way around, it's it's it's sort of I've I've seen it a lot. Sadly, like, practitioners not understanding sometimes the limitations of some of these approaches with respect to uncertainty or, just just just other things. Yeah.
Michael Berk [00:15:32]:
Yeah. You you nicely reconfirm something that Ben and I think very strongly, which is collaboration between subject matter experts and data scientists is essential to build good tools. And specifically in industry, it's often going and talking to your sales team, your supply chain team, your whatever team, and using your expertise to apply, solutions to their problems. And in your case, it's it's physics. And, I mean, just just thinking about Ben's question of 10 data scientists, I I based on my explanation of the blue ball in the swimming pool, I think I would do a great job, but all other data scientists might struggle a bit without without some physics under their belt. Yeah. Anyway, really really cool intro to machine learning. And I wanted to ask you, how did you get into small models and quantization specifically? Was it a a natural step, or or was it via open source? How did you transfer into that field?
Lukas Geiger [00:16:34]:
It it it it was pretty at random, more or less. Like, I stumbled off on Climbra, the company I work now at, through a a conference, and, decided, well, to join. And, because at the beginning, we did quite a lot of work on, like, extremely quantized neural networks, and, like, binarized neural networks, and we were a I was able to do, open source, in sort of the first one and a half years or so of the companies, we did quite a lot of work and and and open source. And sort of that just excited me. And so at the end of the masters, I was like, what should I do? And I wanted to learn more about machine learning, deep learning, and thought, okay. The best way is, to, at least the best way for me was to join a start up and and and and see how that goes. And, I mean, like, 5 years later, I'm still the same company. So clearly, some some thing went well.
Lukas Geiger [00:17:43]:
But, yeah, that's so I I I didn't particularly go into, like, small models or efficiency with with with with with, like, a concrete goal. Other than that, it seemed like a fun and interesting problem, to solve. And yeah.
Michael Berk [00:18:04]:
Yeah. And before we get into the technical, and the technical is super, super interesting, I've noticed that open source is now a really business model for companies and, for instance, Databricks. And, Ben, I was wondering your take. How has open source become sort of a linchpin for profit? It's free code. How can people make money off of it?
Ben Wilson [00:18:32]:
To echo some of the conversations I've had with Databricks founders, If your goal is to write some cool open source and then start a company, there's no harder business model for you to succeed with. It requires you to do 2 things that are almost impossible. 1 is create a super popular, critically important open source project, which if you look at the commits to Maven or PyPI or, heck, look at Apache Foundation projects. How many of those are still in use, 3 years after they, like, were first released? And how many of them, setting aside, you know, crowd hype stuff with, like, GitHub stars. Don't pay attention to that. It's all about downloads. How many people are downloading your your package? And if you're curious, go on to Google. They they maintain download statistics for pretty much all open source repositories.
Ben Wilson [00:19:39]:
You can pull, you know, you get a a BigQuery account, even the free tier. You can query those datasets and see what your project is doing. And if you're thinking of starting a company off of something like that, you should be in the 100 of thousands of downloads a week, as a start. So it's like, hey. I've established street cred here. People know what this is kind of maybe in some niche environment. But then to make a successful company, whether or not it's open source or closed source, you still have to make a successful company. So coming from open source to a successful company, there there's no causation there.
Ben Wilson [00:20:23]:
It's 2 independent activities that have to the same group of people have to nail at the same time. So can you do it? Of course. Like, look at Databricks. It's valuation. It's insane. There's other companies that started like that, but it's point 1 percent of people that have submitted an open source package and then tried to start a company. They and they become successful enough that they can still pay salary for 10 years. It's hard.
Ben Wilson [00:20:55]:
But if you have a great idea and you feel like, hey. Part of our tech can be open sourced. Do that while you're building your company. Like, that's that's total legit. Give back to the rest of the world. I think it's it's important. You're gonna have to create some sort of edge feature so that people are, like, repaying them that you like, so that they manage this open source for us where they build features for us that also go into open source. You know, there's a lot of ways to do that, but people looking at a unicorn and Databricks is a unicorn.
Ben Wilson [00:21:29]:
It's very, very, like, a special place. Right time, right people, right ideas, right execution. If you look at that and think, I can replicate that, it it's not trivial to do that. It's hard.
Lukas Geiger [00:21:47]:
So So
Michael Berk [00:21:48]:
so for Ploomer, I were you guys starting from open source, or were you guys starting from closed source and then decided to open source Lark?
Lukas Geiger [00:21:57]:
The fact that we open source Lark, that was something, from from from from the start. It's sort of like that that idea was before I joined the company, or or already there. In in a sense, we started out, so so so the so the goal of the company is is is still make very efficient models or like basically make AI on the edge and on tiny devices possible. So in that sense, we sort of started out from the, from the most extreme form of quantization and then research in binarized neural networks, which is basically where I take weights and activations, and instead of making them having them in, like, float 32 or, int8, you just like have a single bit, which comes with some very interesting challenges, if you try sort of, to to train these, things. And in parallel, building out, the inference side and also, work on sort of the, the the the the the hardware side, in how to potentially put that into a chip form into, in in into hardware. And sort of now we've we've we've taken all of sort of these learnings, that that that we had from, from this and sort of the the the the setup that's sort of, like, rooted in a way in in in in in in in in research, across, like, hardware software, training algorithms, and our focus on sort of, like, smart home, in on very tiny devices and microcontrollers, like, I don't know, like, on my desktop, like, tiny little board which runs in a deep learning model on computer vision, which on microcontrollers is quite something unique. I in in in the past, most of the deep learning models running on these devices have been either sensor data speech, which is sort of a much more, I don't wanna say, easier modality to work with because it every everything has its challenges, but, just the the the input data is already much, much, much smaller, which means, just getting, like, camera pipeline working that runs efficiently can sometimes be challenging, on on on these things. So yeah.
Michael Berk [00:24:37]:
Got it. So question. We're making models smaller. Does that mean that training time is faster and cheaper?
Lukas Geiger [00:24:48]:
I mean, it depends. It depends. So sometimes it can take longer to train these models, but the the the good thing about these these these models is, like, we're not training 1,000,000,000 parameters, large language models that don't even fit on a GPU. So in in in in a sense, like, the core structure of the model, is definitely smaller, and I think, that definitely helps with sort of the the cloud budget and sort of the general the compute budget you need for training these networks. And it it it also helps in terms of patience. Like, I don't need to wait, like, 3 weeks for this model to finish training before I can sort of get an idea whether I, the the the model is sort of does something sensible. There are sort of certain certain things you need to do in in terms of if if you talk about conversational pruning that that, might lengthen training time, but that's that's that's not inherent. That's that's just like limitations of, like, frameworks that we're using or or sort of times sometimes hardware support, on on on on the GPU side.
Lukas Geiger [00:26:07]:
I I would say training tiny models is, like, a lot of fun because we can do you can do like a lot, and you can train actually quite a lot of models also in parallel, which you probably can't if you train the next GPT 5. Probably don't wanna run free runs in parallel just to see what what happens.
Michael Berk [00:26:29]:
Got it. Yeah. So the basis of the question was, if we're using quantization to reduce, the precision of your, weights. Theoretically, there's less information in each weight and you would need more nodes. Is that correct?
Lukas Geiger [00:26:48]:
Yes. However, I mean, yeah, there's just less information there and sort of less things that that that this this one value can represent. However, if you look at, like, many common architecture that are out in the field, I don't know, like, ResNet 50 or so, these these these models I mean, for now, current standards, quite tiny, but, like, for computer vision, still sort of, fairly fairly usable. These models are quite heavily over parameterized, so you would be surprised with, like, how how much, precision you can, like, reduce these rates and activations, and and sort of still keep high accuracy. Like in our current models that that I'm training, like, we see no difference between, like, let's say int8 and, float32 if you do conversation aware training. And sometimes it can even help like you you you just sort of remove, like, a different like, some class of of problems. So you don't have to worry that much about overfitting. Like, it's it's a pretty good regularizer constraining your, your your precision.
Lukas Geiger [00:28:07]:
And to to make things efficient, it's not only about the weights, but also, about the activations. Does that answer your question?
Ben Wilson [00:28:16]:
Yeah. I think so. I mean, I saw the same thing when doing playing around with the images many years ago with some actual, you know, deployed projects where your first thing that you wanna do, if you're like, hey. I don't know what tooling I'm gonna use here, and I certainly don't wanna spend 3 weeks taking a whole bunch of image data labeling it properly just so that I can build a model to get a prototype out of, which is gonna take 3 days to train. And I might it might not even do what I want it to do. So you go, oh, I'm gonna take one of these open source models, and I'll just, you know, freeze everything up until the last, like, 3 or 4 layers and then, you know, fine tune it. And we did that, and we're like, wow. This is actually working really well.
Ben Wilson [00:29:08]:
This is so much better than, you know, the the old machine vision that we were using before. TensorFlow is pretty awesome. And that was where our prototype stops. When we went to planning out the project to say, what are we gonna do here for production? And we were looking at the size of that deployed model. Like, yeah, we can't deploy this thing on a VM or on Kubernetes and have and build all that infrastructure. Like, it's gonna cost a fortune to run this. So we decided to, like, approach it with understand the architecture and and the structure of the open source implementation that's generalizable. You know, it's a generalized image classifier.
Ben Wilson [00:29:53]:
Like, do we need this many layers and this many nodes? Like, is this even useful? So we did a bunch of tests, and we found out that the things that met our needs was, like, 5% the size of ResNet. And with really good labeled data and some clever little feature engineering on the images, it was able to beat the performance of the fine tuning by a large margin. And it it was dirt cheap. I mean, we probably could have deployed it to people's cell phones. It was so small. So I I'm a huge believer in it. And there's actually people doing that for LLMs now, where it's like, hey. You have this specific domain.
Ben Wilson [00:30:33]:
You know, take a pretrained model, but take the smallest one that they have, like, the smallest version of it that has the the least number of weights. It knows how to understand human speech and just train it on your specific domain, and it turns out it's faster, cheaper, and a lot of times more accurate.
Lukas Geiger [00:30:54]:
Yeah. I I think I think you make a very good point here. I mean, like, for efficiency in general, I think we had a very exciting time, because for a very long time, efficiency wasn't that big of a deal because, oh, just get, like, one additional GPU, like, yeah, whatever. Now with these these these huge I I mean, I'm I'm talking about a server side on on, like, embedded stuff and, efficiency always was, like like, huge. But what what we're seeing now at, like, the the complexity that you were talking about of, like, having a model that, like, I don't know, my data science team trained on a Jupyter notebook or something and trying to get that into Kubernetes building all this infrastructure, the monitoring. If it runs on a GPU, then you sort of like get You probably want to batch things, there's like a whole other set of things that you need to do on top and moving it basically to the edge if you are able to, like, on the cameras itself for us, computer vision onto the or on whatever device your user runs on just, like, removes this entire block of complexity and DevOps and, like, worry. You still need need some of it, of course, but but it's it's it's it's it's it's not it's not so so, model specific. And then the other thing that you said sort of designing for, like, efficiency first, that's that's that's something I think that that resonates super well with me because also if you look at literature, at big conferences and what what people are doing in practice often, we have sort of these reference architectures, and then you get a paper saying, oh, I can quantize ResNet 100 down to 1 bit, or, like, 3 bits, and don't lose any accuracy.
Lukas Geiger [00:32:58]:
It's like, yeah, obviously not. I mean, not obviously. It's very hard work. I don't wanna, wanna wanna make any enemies in in in a way, but in in a sense, it's like a kind of, like a like a very obvious paper paper sort of to write, but, whether it sort of met these, efficiency goals and practice, that's sort of then like another, other story. And oftentimes, we're, much better at sort of starting from scratch. What's the problem we're trying to solve, and and and design an architecture and sort of, like, the whole inference stack, Yeah. With it.
Ben Wilson [00:33:39]:
So I've got a crazy question for you about edge deployment and edge computing in particular. A job I had many, many years ago was working in a factory that made the chips that go into smartphones. And we were at when I've started working there, we are 45 nanometer process. That's how long ago this was. And now we're talking about, you know, near UV photolithography that's being used to produce, you know, sub 3 nanometer transistors. These things are getting crazy. So computing power on a system on a chip that's deployed to a phone, and you could add 2 SoCs on a motherboard with with not a lot of expense these days. Do you see a a point in in the next 10 years where a phone manufacturer is gonna open up their their SDK and their hardware specs to say, you can run arbitrary application code on an embedded GPU on this phone as almost a sidecar.
Ben Wilson [00:34:39]:
If you had the capacity to deploy code instead of model weights to the edge and had a framework where you could say, I wanna be able to train a model specific for this exact piece of hardware and have, you know, validation work for it. You know, some sort of framework that would allow you to do that. Do you think that's where edge computing is moving in the future for sort of, like, code deployment and it it'll generate model weights for you on the device?
Lukas Geiger [00:35:15]:
I'm not 100% sure I understand, your question what you mean with code deployment because, if you have, like, all the engineering resources that you want, you can sort of already do that. Right? You can sort of, target as sort of the GPU that's sort of like on on on the device. The question is then just like often, like, to which granularity on, like, a CPU, we we can sort of easily use, like, intrinsics and sort of, like, do like some some d magic, on on accelerators, especially when you sort of have Android on top. Again, it's oftentimes sort of harder to sort of, like, specifically target these things. But what do you mean with, like, generate the weights on the device then?
Ben Wilson [00:36:06]:
I can actually do training.
Lukas Geiger [00:36:08]:
Oh, yeah. We do training. Yeah. So it
Ben Wilson [00:36:11]:
collects the data from that phone or whatever you don't wanna classify people's pictures better or do something with, like, hey. I've seen the raw images that are captured and how somebody edits them on their phone. I wanna be able to detect, like, per unique image that's created, a customized edit that this person's going to like.
Lukas Geiger [00:36:34]:
You know,
Ben Wilson [00:36:34]:
that sort of thing. And you need that training data. You can't just, like, fetch raw data. I mean, a phone company could or some providers could, but if you're somebody who's a service that's deploying edge models, you're not gonna have as like, you're not gonna be able to hit a Google API without paying a lot of money to get people's data like that for your training. And you wouldn't wanna train, you know, in the cloud 1,200,000,000 models so that you could deploy them all to people's phones individually. That's that it that's not really an economy of scale thing. But if you took a, you know, code that you say, here's my training loop, run on the device in the background.
Lukas Geiger [00:37:13]:
Do you
Ben Wilson [00:37:13]:
think that's coming?
Lukas Geiger [00:37:15]:
Yeah. Absolutely. I think so. I I, don't I mean, from a technical perspective, I don't see why not. I mean, federated learning and sort of research has been a thing for a while. I I think at first, it will start off sort of having existing models that sort of just adapt to Mhmm. To to user. I I'm I'm not very familiar in in sort of that side of field.
Lukas Geiger [00:37:43]:
Like, all I've I've I've done is basically have a model, and it's sort of fixed once it's sort of on the, customer's device. But, I mean, even what you can do with, like, just as you said earlier, like, with just fine tuning the last layer or, like, training linear classifiers on top of that. That that doesn't really require that that much compute power. And I think the problems are they are mostly in sort of the software stack to support that less sort of on the hardware side. Well, you need for for many of these accelerators, you probably would need to have, like, proper int aids or, like, integer training, if they don't support floating point operations. But for me, I guess, like, one one one open question that I always have about these things is, like, how do you deal with, like, monitoring and validation? And and, I mean, like, if I train a network now, I, like, look at my Tensor boards or ML flow or, like, look at my graphs and, like, see, oh, yeah. That was the lost bike. I should I need to, like, use something else.
Lukas Geiger [00:38:57]:
How that process can be sort of, like, completely hands off and sort of ensure that this model doesn't get into, like, a weird state just because the customer's usage is not something that we have anticipated before. I think that needs to have, like, a lot of sort of, like, validation and and sort of and sort of checking to see when when we're sort of in my own area of, like, where where does this works. Sorry. I've rambled along, but I think I think I think that's how it's how how it's going.
Ben Wilson [00:39:35]:
Yeah. So SDKs that would allow for reinforcement learning with human feedback as the user being the human feedback to say, I accept your suggestion or change it this way. And then it actually gets that feedback and adapts its weights to create the next iteration.
Lukas Geiger [00:39:51]:
Not sure if you wanna do reinforcement learning directly on there, but,
Ben Wilson [00:39:57]:
you need some serious
Lukas Geiger [00:39:58]:
like like like like adapting, predictions or doing, like, a couple of training steps to sort of, for, like, image generation things or so as you sort of need to have a retrainable component. I definitely see that in the in the realm of technically possible.
Ben Wilson [00:40:20]:
Yeah.
Michael Berk [00:40:22]:
The monitoring system would be kinda crazy. Ben, how would you do it? So for reference, Ben has built a 1000000000 monitoring systems. And so the
Ben Wilson [00:40:34]:
the monitoring system for edge computing where you're getting feedback in order to update your training code that you would deploy.
Michael Berk [00:40:46]:
Yeah. Would you basically send out the data to some server from the phone? Would you have some yeah, Lucas?
Lukas Geiger [00:40:54]:
I don't think you wanna send your data over the phone. I think that's the whole point of, like, doing it on device. Like, I don't Yeah.
Michael Berk [00:40:59]:
Exactly. But It becomes kind of ironic. Like, we don't wanna send data to the phone, so we'll send data from the phone instead.
Ben Wilson [00:41:07]:
But you'd still wanna have some sort of callback that's running so that the creators of the code would want a very stripped down metadata version of what the status was. So if you're looking at, like, thumbs up, thumbs down from, like, the recommendation that was presented, the user liked that. You would anonymize that, get some sort of ID from your commit hash of your code that was sent over, and you would know the device, the hardware, like, certain things about that that device where this was run. And you would instrument that in an anonymized way to some centralized servers that you could look and say, how are our models doing? Because otherwise, you're just operating in the dark, and you would have to offload that judgment to the device, which I personally would never do something like that. Like, it is fire and forget. You have no idea if you just shipped garbage to a to a user. How would you do a rollback? When would you do a rollback? Like, that's that's something that I'm curious about. Lucas, how do you roll back for edge deployment?
Lukas Geiger [00:42:21]:
So, in in in our our current, I mean, for for our customers, what's what's what's what what what we do often, it's quite a lot of, like, testing in advance, and there once a model is deployed, since it doesn't really change anymore, a rollback can be, like, done like any other over the air update, for for for for these devices. So it's not something that, like, adapts in an hourly way Right. And then sort of get gets in a weird state that we need to, oh, we need to roll back. And So
Ben Wilson [00:43:02]:
you could do either a roll forward or a rollback by just saying, here's the the actual, you know, commit hash that we used for this deployment.
Lukas Geiger [00:43:11]:
No. Actually, it it's the binary. It's it's the Yeah. The binary.
Michael Berk [00:43:15]:
Do you ever deploy the devices that don't have a connection to the Internet of the world?
Lukas Geiger [00:43:20]:
Yep. How do
Michael Berk [00:43:21]:
you roll back on that? Go get the device and reinstall USB
Lukas Geiger [00:43:25]:
So, that's so so so so what what we we provide mostly the the the sort of the, the software and sort of the the whole amount solution. So from my side, it's like, oh, yeah. That's our customer's problem in in a way. In some form or another, these devices will always be connected to something because otherwise why are you running this device? Unless it's, like, completely integrated, and then you would basically test it and just never roll back, I guess. But you will, at some for for devices that sort of have, like, I don't know, send notifications to your app or, like, a smart home camera or something. Like, there will always be, like, an Internet connection at least at some point because you wanna give feedback to the user or send a notification or, like, stream a video to some place. So in that sense, you don't really have have have have really have really that problem. But if if it's really like a deployment and sort of, like, forget in a isolated environment, you would need to do sort of, like, a lots of lots of, pretesting, or not.
Lukas Geiger [00:44:43]:
You need to do lots of testing anyway before anything goes in production.
Ben Wilson [00:44:50]:
So we've been talking a lot about low level MelOps and DevOps stuff. And we you know, I intentionally guided this this discussion to this point just to give people a little taste of the super unfun things that we deal with as software engineers. We have to think about all this stuff. Like, hey. What how can I prevent myself from getting woken up at 3 AM? You know, respectfully what this is. Or my my whole team from getting on call paged because fires are everywhere. When you came out of your master's program and came to the start up and started working on this, 2 questions for you. First one, how long did it take you to learn all this stuff? And secondly, how did you learn all this stuff?
Lukas Geiger [00:45:42]:
That that is a good question. When I joined the company, I was, like, the second engineer or so, like, at least from, like, the software side. We're like 3 people in an office. And with respect to DevOps or MLOps, if you're not providing a platform as a service or you're not providing a cloud service, you won't be woken up, at the middle of the night because it runs on your customer's devices, more or less. I mean, there's still things that sort of can can go wrong, but it's sort of not So so what I work I I do, like, I'm we don't have, like, a a sort of, like, site reliability engineers so so that need to do these kind of things. So that's a great benefit, I think, of, like, running these things on device because it will, like and any any kind of code that runs anywhere needs monitoring and needs needs testing. But I feel like at least for me, but maybe I just don't understand enough about sort of the DevOps world. There's, like, just like a lot of lot more things that could go wrong once you get a network involved.
Lukas Geiger [00:46:59]:
At least that's sort of how I think about that, and just like a lot more craziness once you get into load balancing and actually have, like, a system of, like, VMs at the end of the days than, like, I don't know, containers that are deployed on top of that with, like, Kubernetes and some sort of scheduling. This is like this entire state space that can go wrong, that you kind of exclude once once you're at least your model parts of the equation. So coming coming back to to me joining the company, what what we did set up, quite quite fast, and that was actually, I think, like, in the first, I don't know, month or so. I I have experienced with that before in, like, Uni sort of, like, managing the the GPU cluster. That was much more about getting a experimentation platform, up and running and and getting us as as like a team of like researchers basically or researchers slash engineers in a position to run experiments effectively and fast at a low cost, basically. Yeah. And I mean, that was a messy process. I mean, I I I I can't lie.
Lukas Geiger [00:48:23]:
We took very early, on the complexity of deploying to Kubernetes just because the tooling we use sort of supported that as a runtime environment. So that was fun. We've never worked with, like, Kubernetes before, sort of deploying that, getting all of that running. But, basically, you're the only users is your internal team. So Mhmm. It's always between working hours. If I'm on holiday and it breaks, well, then they would have to read a paper or have to go to the pub. I don't know.
Lukas Geiger [00:49:01]:
So it's it's it's it's not it's not like there's a customer, that that is that is sort of dependent, dependent on this. But with the team growing, we've definitely had to sort of make all of this infrastructure much more stable. But for us, it was always, like, crucial that we get to a point where we can run experiments super, super quickly, to, like, get a fast iteration cycle. Like, basically, I have my code. I change a line, and hit run, and it should, like, get me a GPU, ideally, like, a spot or preemptible instances because I don't wanna, like, pay loads of stuff. And then you just need some logging monitoring and a way to basically make all of that kind of reproducible. And I think the the the world has moved on from, like, I don't know, 4 years ago or 5 years ago, in, like, the the tuning tooling is much more sort of mature with that. Like, it's like Kubernetes jobs or so.
Lukas Geiger [00:50:07]:
You can do now much more sort of, like, right out of the box. And so I think that to to answer the third part of your question, it was all learning by doing. But, yeah, so so far, that's also been, for me, always the easiest thing. Same with, like, open source or anything others like. I never really liked or got motivated by sort of the I'm gonna do this tutorial, and I'm gonna learn Rust. So, like, that's that's that's I I could never, like, sort of find the motivation to sit down and do, like, toy problems. So yeah.
Michael Berk [00:50:47]:
Did that ever lead to any painful experiences?
Lukas Geiger [00:50:53]:
And and, I mean, I don't know. Like, I I I feel like if you're in research or, like, in engineering or even in physics, at some point, you build up a far quite a high tolerance of of frustration level. So, yeah. Okay. This thing runs out of memory. I don't know. Now that you asked, we have, like, some some very fun bugs, like, just software running out of memory or, like, weird things. And then it ends up being, oh, yeah.
Lukas Geiger [00:51:27]:
I need to use this different memory allocator. It's like, that's just something that you would have never thought of before and sort of you acquire. And then you, like, call support or, like, end up messaging on, like, GitHub issues. So in that sense, like, I think these things have been sort of the painfulest thing for me if it's like if, like, my model doesn't train, it's sort of like I know where the problem is. It's like sitting in front of the computer. But if if, like, something if, like, something isn't working as it's intended, always felt like it is something very unsatisfying if the thing that you're trying to do that should be supported just kind of doesn't work because of some code error that's sort of, like, not within the code base you control, but within the code base of, like, some open source project or some cloud provider that just doesn't do what they should, like, write in their documentation or some some I think for me, these have been the more painful things in terms of, like, I don't know, cluster going down. Yeah. At some at some point, it was reasonably stable.
Lukas Geiger [00:52:47]:
And then be be be be be be be, before that, actually, my colleagues did go to the pub once on a Friday afternoon when, when when everything went went went downhill. But then it was a long time ago that doesn't happen. So now our infrastructure is much more stable.
Michael Berk [00:53:04]:
Good. Yeah. And follow-up question to that. Although the pub does sound really nice. Follow-up question. When I first joined Databricks, my ability to sort of iterate in a sequential manner was not amazing. And that was a focus, when I first joined, which is basically, I want to be able to do trial and error type of things and have every single trial be of high quality so that when I get data returned from that trial, I can take the next step. And so an example is if you're trying to find a number between 1a100, you always split in the middle until you get to your number.
Michael Berk [00:53:43]:
That's the most efficient way. And it's sort of the same process for problem solving. So both of you are successful professionals, very smart. I was wondering what percent of the time you guys do sort of this trial and error approach where you're like, this new memory allocator, let's freaking see if that helps, versus actually going and understanding the root cause and having a pointed solution? Or is it sort of somewhere in the middle?
Ben Wilson [00:54:08]:
So exactly as Lucas alluded to, we are a customer facing organization. So, the group that I work in, we have that severity measurement for anything that goes wrong. It can be, the the docs are wrong. Okay. That's that's a sub 2. We'll get to that in a week. You know? There's other things that are higher priority. And then there's I can't save my model.
Ben Wilson [00:54:38]:
I just trained it. I can't save it. That's the sub one. Okay. Something's broken here. I need to do something about this in the next couple hours. And then there's a I can't do anything with MLflow. That's a sub zero.
Ben Wilson [00:54:56]:
That's bad. That's an incident, usually due to a regression, that we accidentally shipped. So the reason I'm talking about severities and and responses is that your first response, if you are running a service, is to mitigate. And that mitigation can be what changed recently. Roll back now. Just revert to previous good known state. I don't care about the root cause right now. Just get it working now.
Ben Wilson [00:55:28]:
Fix the the problem. I don't care about fixing the actual cause of the problem. I just need to get the system into a state that works again. And you've seen me do that actually, Michael, where, like, releasing a patch release that just reverts, like, 4 PRs. Did I know which one was the root cause or what the mechanism was for why it failed? No. But I know that I did that that system of trial and error that you're talking about, that binary search approach where I take the the bad commit history, run run the code that should work but doesn't work, Verify. Like, yep. This is totally broken.
Ben Wilson [00:56:10]:
Revert 6 commits or 8 commits or 10 commits. Go back 4 days, on the state of master branch. Build a branch off that. Run it. Does it work? Yep. Okay. Go in half again. Broken or not broken? And you you just kinda get a a feel for where was the state of this that was good and then release that.
Ben Wilson [00:56:34]:
Like, patch release, fix the problem. That gives you breathing room now to go and do a root cause analysis. And if you're investigating something as a professional software engineer, you have to do an RCA. You need to know what went wrong, not to say, hey. Jack broke this. Jack sucks. That's not helpful, like, at all. Makes Jack feel really bad and makes everybody else feel really bad.
Ben Wilson [00:57:01]:
The blame is being thrown. Should never do something like that. If anything, you're gonna point a finger. It's at the tech lead or, you know, whoever's in charge of of running that team because they let Jack do that. They let Jack down. They let the team down. They let the company down. And as a a tech lead, you are responsible for that.
Ben Wilson [00:57:21]:
It's your your problem. But proactive approach for this is figure out the root cause because you wanna make sure that a system is now put in place or tests are now put into place or something that safeguards you from allowing this to happen again, that's the real the real key.
Lukas Geiger [00:57:42]:
I I don't know if that
Ben Wilson [00:57:42]:
was exactly what you're asking or if you're asking about, like, how to figure out how to build a feature or something. But you can even use that sort of approach, that thought process to when designing stuff too.
Lukas Geiger [00:57:55]:
Yeah. It was for on
Michael Berk [00:57:56]:
the feature side, but I like you said, it applies to features as well. Yeah. Lucas, what's your take? How do you guys do it?
Lukas Geiger [00:58:03]:
Yeah. So so since I'm more on the r and d side, I have the luxury that, like, these sort of, like, super fast incidents response only, like like, rarely happens. And if there's something that's critical, obviously, that's sort of the first thing. Like Ben said, we work and pray. And I I I well, Ben was talking, I've been trying to think of, like, sort of the the default process that sort of goes goes through my head. And I think the the first thing of obviously, it's, like, sort of this frustration of, yeah, okay. I'm just, like, trying random things. I I think that that especially in an ML, that's, like, some, like, very or, like, it's it's stuff that you don't fully understand down to sort of, like, the the the the the lowest level is sort of often the sort of direct reaction.
Lukas Geiger [00:59:02]:
It's like this thing doesn't train. I know. I don't know. I'll double the learning rate or half the learning rate or, like, add some weight decay, clip some gradients, reduce the batch size because it runs out of memory. I've I mean, I do that, of course, but I often find it, like, quite, like, unsatisfying. So there's, like, sort of 2 approaches. If it's like a code problem, I usually just like if it's open source, I look at the code. That's a first step, and then just, like, try to understand really what's going on and sort of, like, what is the underlying problem, often down to, like, okay, this commit by this open source project, like, 3 years ago introduced that.
Lukas Geiger [00:59:49]:
That's the problem. We can, like, that sort of, like, gives me kind of comfort. And it's like as working in a company versus, like, being a student or, like, working in the open source, sometimes there needs to be a limit to that. Like, is it actually important that I know or is it, like is the work around good enough? So I've definitely had to learn that when, like, joining a company, like, what to investigate and what to just work around in a way. And then the other thing is sort of like the the root cause analysis that Ben was talking about. Yeah. I think it's sort of like my I I I feel my background in physics sometimes shows. It's like all about, okay, let's assign an experiment, and then basically try to simplify and try to remove as many variables, as possible.
Lukas Geiger [01:00:47]:
So like a I mean, it it is a bit of a trial and error, but then it's, much more sort of, I think, sometimes much more targeted. And then over time, you sort of, working at, like, a code base that either sort of you own or sort of, like, like, working with a tool like PyTorch or TensorFlow. So there will be sort of, like, little hunches or sort of little little areas that you're uncomfortable with, and so that might be then sort of a thing that will lead you to prioritize one experiment over another, or like some some experience in the past that that where where something similar went wrong. And but that's not really about how to design the experience, more of like how do I prioritize where to start. And this always helps to ask colleagues, like, go for a coffee, explain them the problem, or, ask for help. That's that's that's usually step 1 or sort of like like like, once you're at the point where you're able to explain the problem to some some some, someone, which can sometimes be very difficult. Like, writing a very good issue on a GitHub repo is something that has surprised me hard. And the reason why I often sent a pull request first instead of, like, actually writing an issue because it's oftentimes easier to fix the code instead of trying to explain very clearly what went wrong.
Lukas Geiger [01:02:24]:
Anyway, I I'm not sure if I answered the question, but I think that's sort of, like like, the thought process behind that. That makes sense.
Michael Berk [01:02:33]:
Yeah. I I think that answered. It's funny to hear both of you have, like, sort of different approaches. My approach and I think it's very subject specific. So it seems like, Lucas, your trial and error is exploration almost. You're collecting data, and then you always go back and do a root cause analysis and determine why the data said what it said. For me, I I work a lot in sort of a consulting capacity, and I'm often given subpar code repos that are just very difficult to understand. And I frankly don't wanna understand them.
Michael Berk [01:03:11]:
So if I can get a green check mark to make it run, oftentimes, I'm very happy about that and so is the customer. But for really important things that are mission critical or that require maintenance, yeah, you need that root cause analysis. You need to go deep and actually understand what was the problem and how to resolve it. But sometimes for little prototypes, iteration is is is what I do, admittedly.
Ben Wilson [01:03:38]:
One thing to add onto the feature definition aspect of it. I mean, I just gave you the task the other day, Michael. Right? You're gonna be building a flavor in ML flow, and there's a bunch of design considerations that you you have to come up with. And the way that you approach that or the way that we do it at Databricks Engineering is through design docs where we do a bunch of prototypes, but it's informed prototypes, and it's exactly the process that Lucas explained where, I don't know. Databricks isn't filled with physicists. A lot of computer scientists, but there are some physicists. But it it's all using the scientific method and around designing of experiments where you're doing exactly that. If you broaden the decision of how to build something to infinite possibilities to say, we could do any of these different ways of building this collection or iterating over this collection.
Ben Wilson [01:04:39]:
There's in any given language, there's probably thousands of ways to do that. If you get creative enough, you could do some crazy stuff in in even high level languages. We don't explore all of those. We just say, what are the simple ones, or what's something that I know for sure based on my own wisdom and, you know, experience? I know this is gonna work. It's probably not the best solution, but it's the first thing I'm gonna prototype and show. Like, does this actually work? Does the code execute? Yep. Okay. I've got that in my back pocket.
Ben Wilson [01:05:14]:
I now wanna explore something maybe a little bit more complex, but potentially way more performant. And I'll do a prototype of that. It's pseudo code usually, like, a a repro, of something very simple. You're not implementing something for production. But then do that, show that, and then start weighing the trade off. So then you're you're going into the analysis of experiments. I've I've done these 5 experiments and collected the data associated with them with, like, code complexity, maintainability, readability, as well as performance. Like, how does this how long did it take to run? How many how big of a collection could I send through this, and how long did it take to run if we're talking about some, like, network IO thing or memory allocation space thing.
Ben Wilson [01:06:04]:
And once you have those 4 or 5 things that you've tested out, then you go for peer review just like you would in the scientific process. You get other experts to weigh in, and people will vote. You know? Some actively vote with saying, yeah. I agree with your conclusion that option 2 is really good. Other people vote with their feet. They just either they they think that the idea is really stupid. They can tell you something as well. Or you're gonna have people that strongly do not agree with your conclusion.
Ben Wilson [01:06:40]:
And that's why it's important to have that data there to back up, and you you challenge them. You say, here's the data. Please explain to me with your hypothesis of why this is a bad decision. And that's how you learn. They might have context that you just aren't privy to. Like, oh, I had no idea that we needed to integrate with this other thing in another 2 months. Yes. Okay.
Ben Wilson [01:07:01]:
Option 3 is better here. And then move on. And then you go and just build option 3.
Lukas Geiger [01:07:06]:
Sounds about right. You'll see.
Ben Wilson [01:07:08]:
You know? You're gonna you're gonna do this over the next 6 weeks, man.
Lukas Geiger [01:07:11]:
Woo hoo. I I I I want to add one one one more thing about sort of this this this you have, like, a bug or something goes wrong or something unexpected happens in, lately, very complex codes or some project that I plan very complex codes or some project that I plan to well maintain you know over a longer period of time or I'm developing a machine learning model that's not just like a quick prototype that needs to get out of the door, but something that will become a basis of something to work, on and, like, keep improving over a long time. I've I've actually always had a quite, like, a positive feeling if something really went wrong goes wrong because that just, like, means there's there's something I don't understand and there's something, like, we either conceptually don't understand, or, or there's, like, something to learn, instead of in in in in that area. So I think that's that's that's always quite quite something that that I find exciting about to to really investigate. The only times where this this comes frustration thing, I feel like, is if if it's like something that I think, yeah, no, this this should really work. It's like sort of like some fundamental infrastructure layer is like broken and weird things or like some some, some supposedly stable API doesn't do what it says on the 10. That can be very frustrating. But sort of for the for the type of things that that are a bit more exploratory or a bit sort of, still in the very early development phase.
Lukas Geiger [01:08:55]:
I I I found it that that that is quite, quite, can be quite fun and sort of, like, very interesting also to debug these things.
Michael Berk [01:09:04]:
Were you always like that?
Lukas Geiger [01:09:09]:
I don't I I don't know. I mean, like, in in in, if I think back to, like, you and you are still, like, there's been, like, definitely, like, days where, like, 5 people sit on, like, this this this this sheet of paper that, like, thousands of students have solved before, and you have no clue. And I guess after, like, doing that for a couple of years, then then it's sort of like, yeah. Okay. It's fine, but I don't know this right now. I'll I'll hopefully know next week or yeah. Hopefully. Yeah.
Lukas Geiger [01:09:47]:
Yeah. No. I I always find it quite positive because it's like that there's there's something something fun happening, and I know my idea of fun might might not be everyone's idea of fun. But
Ben Wilson [01:09:58]:
yeah. Yeah.
Michael Berk [01:10:01]:
And I just one note before we wrap. I know we're over time. You can train that. You can teach yourself to enjoy the failure and enjoy the pain. Like, it's it's a really interesting process, and I suggest you sort of explore rewarding yourself for breaking stuff. I've done that. I've I've I've I've did a day literally this week where if I broke 5 things on this project, I would order myself takeout sushi. And that sushi was delicious, and I loved it to death.
Michael Berk [01:10:32]:
And I so whatever motivates you, adding rewards is is definitely a a useful tactic.
Lukas Geiger [01:10:38]:
I mean, the weirdest thing is if you want to break something and it doesn't break, then I'm really confused. So, like like, you say, okay. Now I like That's
Michael Berk [01:10:47]:
a good point.
Lukas Geiger [01:10:47]:
An extra learning rate and it doesn't break, then then my confusion starts.
Michael Berk [01:10:52]:
Yeah. Yeah. The the flip side case is really relevant as well. Like, stuff should break where it should. And if it doesn't, that's also not understanding. So
Ben Wilson [01:11:01]:
Unless your repository is just full of a bunch of dead code, which I'm sure you've seen those repos, Michael, in your line of work. I know I did when I was doing your job. Like, yeah. This is cool. This is 30,000 lines of code in a single document. Why would you do this, and how do you find anything? And then you you prove out that an entire function is never called by just saying assert one equals 0 in, like, within the function call itself and then run the entire, like, from main and nothing fails. And you're like, see? Please review your code base for like, copy this whole thing and put it in an IDE, turn on the linter, and see what shows up. And it's just Christmas tree of just red everywhere.
Ben Wilson [01:11:45]:
Like, yep. Yeah. Yeah.
Michael Berk [01:11:48]:
No. I actually have never seen that. In the in the, like, 18 months since you left the field, the all code is solved. Cool. Uh-huh.
Lukas Geiger [01:11:58]:
One thing that I've always surprised me or like sort of like something that I didn't think about when I started out is like how long code actually lives. Like, it never dies and it sort of never gets removed. Well, sometimes it gets removed, but like some piece of code I'm like, I might have written 15 years ago. Well, not quite. 10 years ago, let's say. Still around. That's that's that's, like, that's crazy, and it applies to so many of these these these these tools we rely on. I I find it quite fascinating.
Michael Berk [01:12:34]:
Yeah. It's a good point. It's sort of alarming how code you wrote as, like, a junior in high school is still running in a production system. So
Ben Wilson [01:12:44]:
We had that lady on, like, a year and a half ago on the podcast who before we started recording, she was talking about stuff that she did back in, like, 1975 on, these mainframe systems. That was the foundation for a lot of the solvers that are now used in TensorFlow, PyTorch. So the low level libraries that are used for mathematical computations that are written in Fortran, she was on the team that was writing that stuff. And it's like, talk about longevity where your code lasts 50 years without modification. You can still go to that source code and look at it, and it'll have a, like, a a date that this thing was committed. And you're like, that's before I was born. That's crazy, and it's still running.
Lukas Geiger [01:13:32]:
Yeah. Yeah. Not many people realize that, like, if you use SciPy, yeah, you're using lots of the spot records, probably.
Michael Berk [01:13:40]:
Yeah. Cool. So in summary, lots of really interesting notes here. We actually didn't talk about tiny models as much as I had hoped. So maybe we'll just have to do another episode. But, the things that stuck out to me were, first, collaboration between subject matter experts and data science experts is often necessary to correctly solve a problem. In terms of open source, if you're looking to start a super successful company, you probably shouldn't start with an open source library. But also, if you're interested in seeing what your open source library is doing, you can just go to BigQuery.
Michael Berk [01:14:14]:
There's a free tier. It'll show you what's happening. On the tiny model front, int 8 versus float 32 can have identical accuracy. And, Lucas has been proving that out with his his work. The thing that the sort of the purpose of quantization is it's faster to train, cheaper to train. And honestly, it can generalize better because there's less weights and it's simpler. And also on device training is coming. Then finally on solving problems, if you're accountable, the first thing you should do is fix the problem, get your system running again.
Michael Berk [01:14:48]:
But root cause analysis is an essential final step, whether it be via, data collection and sort of learning about your problem or just building a maintainable robust solution. You really wanna know why stuff broke and why your solution solves what it's solving. So, Lucas, if people wanna learn more about you, small models, your company, whatever it might be, where should they go?
Lukas Geiger [01:15:11]:
Yeah. About Plumerai, there's plumerai.com. You can sort of find, all of, all of the stuff we're doing on there. Some real nice demos as well running in in the web browser, not on tiny devices, but, at least running locally there. Yeah. About me, I'll probably you'll find me on, GitHub, like, l Geiger, is my GitHub handle, on I'm not so active on social media, but you can find me on Twitter, with, like, an underscore l Geiger. And, other than that, yeah, I'm I'm sure that that we'll we'll put links, to the to the profile in the description. Yeah.
Lukas Geiger [01:15:50]:
It's been a pleasure to talk to you. First time on a podcast, it was very exciting. I had a lot of fun.
Ben Wilson [01:15:55]:
Yeah. Thanks for showing up.
Michael Berk [01:15:58]:
Yeah. So until next time, it's been Michael Burke and my co host
Ben Wilson [01:16:01]:
Ben Wilson. And have a
Michael Berk [01:16:03]:
good day, everyone. We'll catch you next time.
Harnessing Open Source Contributions in Machine Learning and Quantization - ML 148
0:00
Playback Speed: