Adaptive Industry ML: Challenges, Automation, and Model Applications - ML 149
Terry Rodriguez is the Co-Founder at Remyx AI. They discuss the challenges and opportunities in deploying and updating AI models for robotics, exploring the potential applications across various industries, and delving into the complexities of conducting experiments and controlling for interaction effects. You'll also hear from industry experts who have worked on recommender algorithms and enhancing content recommendations through experimental workflows and hypothesis testing. Get ready for an insightful and dynamic conversation about the latest developments in the ML landscape!
Show Notes
Terry Rodriguez is the Co-Founder at Remyx AI. They discuss the challenges and opportunities in deploying and updating AI models for robotics, exploring the potential applications across various industries, and delving into the complexities of conducting experiments and controlling for interaction effects. You'll also hear from industry experts who have worked on recommender algorithms and enhancing content recommendations through experimental workflows and hypothesis testing. Get ready for an insightful and dynamic conversation about the latest developments in the ML landscape!
Sponsors
Socials
Transcript
Michael Berk [00:00:09]:
Welcome back to another episode of adventures in machine learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. And I'm joined by my co host,
Ben Wilson [00:00:19]:
Ben Wilson. I review design docs
Michael Berk [00:00:22]:
at Databricks. And today we have a guest named Terry Rodriguez. Guys. Terry started his career in ML for robotics at Formant, and then he joined the machine learning team at 2 b, which is my old company, and their Fox's streaming service. And at Tubi, he focused on sort of content ranking and recommended recommendation engines. He also created Smells Like ML, an innovation focused machine learning blog and consulting service. But most recently, he founded Remix AI, which is an automated machine learning infrastructure solution. So before getting into the latest and greatest, what were you doing at Tubi?
Terry Rodriguez [00:01:02]:
Oh, at Tubi, I was working on their core, recommender algorithms. And so a lot of the work that I was focused on was, testing, experimenting with their ranking logic or their recallers that are used in the, ranking in the recommender system. And a lot of that, was also focused around, like, content embeddings. So I was, working with some of the other teams in in a way of developing text embeddings and image embeddings and, other embeddings that that we were deriving from the proprietary data they have so that we can try to find content similar to things people were really interested in watching and use that to recommend new content that we think that they would be interested in watching. So there's a lot of experimental workflows where you might, come up with a hypothesis based on a previous experiment or based on some new technique you wanna try. And we would, we would come up with a hypothesis and test it offline and come up with a test that we wanted to change and use AB testing to measure the effect of that. Yeah. To to put a little
Michael Berk [00:02:13]:
bit more clarity about what this actually meant, So at Tubi, we had a data science team and then a machine learning team. And this split typically is not seen in other companies. But at 2 b, it meant that data science focused on decision science and inferential modeling. So we own the a b testing pipelines. We did a lot of causal analysis and sort of guided decisions. And then the ML team, they were, at that time, at least the cool kids, and they just, like, went deep into recommendation engines and figured out how on the home screen to organize content to maximize viewing. And so we never really sort of overlapped, but we definitely, used each other's tools a lot. And, it was a really interesting setup.
Michael Berk [00:02:54]:
The the experimentation platform was awesome. That was, one of my core projects. What just curious. What were your pain points with that platform if you had any?
Terry Rodriguez [00:03:05]:
Oh, I mean, I I guess I never, found it to be too much of a challenge to work with, that I can remember. I mean, experiment making an experiment is hard, and you might think from an offline, evaluation that your experiment's gonna do really well and, you know, oftentimes, it doesn't. It's just a very difficult thing to do, but it was nice to have a framework that kind of, made it easy to see when it's working, if you should pull the experiment. There was a lot of ways to, customize the experiment so you could experiment more conservatively. I'm a big fan of, what y'all developed there.
Michael Berk [00:03:43]:
Awesome. That's good to hear. Because I have a massive list of complaints, so I'm glad it didn't exit the data science team.
Ben Wilson [00:03:51]:
I'm curious about the sorry to interject, Michael, but I've got nerd questions for Terry, specifically around embedded content to sort of hydrate poor quality data that's coming in. I lived and breathed that for a number of years at a previous company where you're getting third party data. Like, you're selling somebody's stuff, whether it's media content in your case or in my case, it was clothing and shoes and belts and stuff. They come from 15,000 different manufacturers around the world. Everybody has their own I don't know if you could call it standards, but they were generating data. Mhmm. And some of it was for names everywhere or some of it was internal, like, tool coding for patterns that they were using. The data wasn't consistent, and some manufacturers have human readable texts or machine readable text that you could extract from.
Ben Wilson [00:04:55]:
I'm sure you saw something similar to that at 2 b when a new maybe it's an independent film or something that's coming out, and you look at the the data that's associated with that. You're like, okay. That's one sentence of not really sentence, maybe it's just a couple of tags. Can you explain how powerful it is in a recommender to add additional features to to that in order to figure out that, like, hey. You might also like this.
Terry Rodriguez [00:05:25]:
Sure. Yeah. We, would deal with, proprietary datasets like Graysenote provides a lot of data about, movie content. And as you said, you have these large content libraries, Tubi in particular, that was specialty of theirs is they they have a very large, library of video and and, series 2. And so especially when people are trying to decide, like, what content should they acquire, what's gonna perform for for our audience over there, it's, it's very valuable to enrich whatever information you might have with the kind of structured information that you would get from Gracenote and try to use that to address the cold start problem. It's like before you've seen the content's performance, on the platform, you you at best have a guess for how it's gonna perform for your audience. And so if you can relate that content to content that is performing well, then you have a a good hypothesis for why you might wanna go acquire that title or how much you think it could be worth. And so, we would bring in data like that.
Terry Rodriguez [00:06:31]:
They would bring in, data from all kinds of sources to learn more about the, their customers too. And so the more information that you can get about, you know, a person, things like, you know, their their demographics or lifestyle can, help you to identify patterns in in a viewership and find which content is gonna resonate with with your audience.
Ben Wilson [00:06:55]:
So one of the things that I've seen when highly ambitious young data scientists try to tackle this problem at a company where it's like people think that if I throw all the data in, like, everything that I could touch and just put it into this one model, it'll magically just start recommending awesome stuff, whether they're using some hybrid approach with, like, old school ALS with, you know, additional deep learning on top or it's just pure deep learning implementation. Personally, I've I was always amazed at how bad those performed when when you just start putting more and more information in.
Terry Rodriguez [00:07:36]:
Mhmm.
Ben Wilson [00:07:37]:
Can you talk about how you approach a problem like that or how you Yeah. You would design experiments around, like, should we include this exact raw data of, like, a continuous distribution of a a lot of information within this one feature, or should we start quantizing this to make it more generalized?
Terry Rodriguez [00:07:57]:
Mhmm. I know, one thing that is interesting is a simple idea like alternating least squares was really effective at representing the content over at Tubi. So it was one of the more powerful features that we could use to retrieve similar content. But but as you said, like, just stuffing everything, besides ALS, you know, other content embeddings, other other aggregate statistics that you might just kinda concatenate onto a a giant data frame and then feed into, say, a deep learning algorithm. You know, recommender systems, are often made with, like, simpler algorithms like, XGBoost. And so at the time that I was there, we were interested in exploring more, with deep recommendation, techniques like deep learning for recommender systems. But, everything that I had seen up until the time that I was there was focused on using XGBoost for the the ranking. And so we found that, deep learning worked pretty well for content embeddings.
Terry Rodriguez [00:09:02]:
And and you can imagine, like, having this Gracenote dataset I was talking about could tell you the year, a title was released, the title, who's acting in it, just, almost anything you you would care about if you were scrolling IMDB's page to go learn about the title. And all of that, could be kind of brought into a form that a deep learning model could ingest that, or you could subset parts of that rich data data structure that I described and feed in, like, try to generate an embedding that's just based on something like a plot summary. And so we experimented a lot with different sources to create embeddings in and, testing them offline to see kind of intuitively. Like, if I were to use a text embedding that is trained on the, plot summaries for a title. You know, will I find does this make sense intuitively? Will I find other titles similar to that if I use an embedding to, try to recall, like, similar titles based on cosine similarity at the embeddings? See if that works intuitively, then try, using that that, those titles that you were able to recall with that candidate generation step and use that in the next step for ranking, so producing the final ranked results for the title. And, you know, if it passes that smell test and if if you can measure, like, improvement in the, in the what is it? I almost forget now. The NDCG, the, metric that we were interested in optimizing for for our offline tests, then you're finally at a place where you have a case worth, worth, showing to the rest of the team, seeing if the rest of the team thinks that this is gonna be a valuable change to the pipeline. And there's engineering trade offs to make there too.
Terry Rodriguez [00:10:59]:
It's, it's not as simple as just throw it all at one model, and it's definitely not as simple as just pile in all the data. Right? You know, they they they could, they could get a lot of different sources that would be harder to work with or might have a weaker signal. And so it was important for, humans to be in the loop to try to, make those determinations before possibly, you know, leading to bad user experience for millions of people.
Ben Wilson [00:11:29]:
Yeah. I mean, that that's, like, I couldn't have said it better. I mean, that's exactly how all of the sophisticated in production systems that I've either worked with or, you know, codeveloped with people or built myself. They're all like that. It's it's highly curated feature data that you're controlling and observing at every step of that pathway, and you make this in the parlance of LLMs nowadays. And, you know, people are talking about, oh, chain models, chain LLMs. It's like, it's not new. Like, that's how production at mail typically works where it's you know, you might have 4 embedding models that are all generating embeddings based on this massive text payload that's coming in.
Ben Wilson [00:12:13]:
You wanna standardize that and reduce the computational complexity of that that feeds into a later stage, you know, final model. Yeah. With systems that that we did with, like, fashion, we were doing the same thing with, like, text embedding after doing a lot of cleanup on, like, the the labels that are coming in and descriptions and stuff. But then we found that the actual text data that we got was only correct about 40% of the time. So the color would be wrong. Like, it comes in 4 colors and the labels that they have in the text. Somebody just, you know, command c, command v from something last season, and it's like they don't come in those pastel colors. Now this season, it's all, like, dominant primary colors.
Ben Wilson [00:13:05]:
And you look at all the images that come up, and the images are correct because those are from, like, the warehouse where it's being sold from. You know, like, this isn't right. So find out, like, oh, we can't use any of that data because it's almost always wrong. So let's do a you know, let's build a simple, you know, image classifier that detects colors and not even use deep learning. It's just old school computer vision to to get that stuff. But then when you wanna do style similarity, we found that image embeddings, like, just generate the embedding at at this like, basically, the shortest point after a pooling layer. So, like, this is what this thing kind of looks like to the human eye
Michael Berk [00:13:47]:
Mhmm.
Ben Wilson [00:13:47]:
As an estimate. And then doing cosine similarity distance measurements, with that, it was such a powerful strong signal for us. So, yeah, deep learning was involved, but it wasn't, like, what some people do. They're like, oh, I just have this one massive model that has 37,000 inputs. Let's bill it and see what happens because you can never diagnose what goes wrong with that.
Terry Rodriguez [00:14:14]:
So it's much more of a designed pipeline to make that work. And as you said, getting closer to that raw, ground truth data with the image is, kinda where deep learning might shine.
Ben Wilson [00:14:27]:
Mhmm.
Michael Berk [00:14:28]:
Yeah. And to add one more argument for context awareness when you're building these models and going for simpler, With Tubi, things were displayed on a grid. Mhmm. And that poses, like, a trillion problems when you're validating a model and also when you recommend the content. Like, if you're going to have container level recommendation where, like, the first one is horror, the next one is comedy, that's a very different, recommendation and or, like, paradigm than, like, Google search, which is just sequential. There's only one container. And even worse, when you're AB testing this, if you're on a Roku or some, connected TV, you have a, remote where you start in the top left corner. So the propensity of users to get all the way to the right is pretty low.
Michael Berk [00:15:18]:
Like, are they really gonna press right 7 times and then down 2 times and then over and then down? So, like, basically, controlling for that effect and then 70 other things with how users interact with content. It's super important when you're thinking about ML in that space. And so throwing deep learning at it and calling it a Kaggle competition just will not work for these complex use cases.
Terry Rodriguez [00:15:41]:
Yeah. I I saw a lot of people were, you know, making very valuable experiments to try to suss out that presentation bias. Right? Like that, as you said, the propensity to just kinda get, the content that's near the top left. And, so so I'm really measuring, these, like, really subtle changes in user behavior with those kind of effects, in the mix is, is is very difficult. Yes. So that's why I'm a a fan of the platform, and I think, you you must have, you know, really, really thought it out or had people with a lot of experience building an AB testing platform to do it. But, it's a very difficult problem with, how the how the user experience, how the how the UI, affects that experiment.
Ben Wilson [00:16:28]:
Exactly. Yeah. That was that was really eye opening to me the first time that I worked on a system that users are gonna be interacting with and then running AB test because we would before we understood what was happening, the the team, we weren't really talking that much to the front end developers and the UX designers. We we knew. Like, hey. Here's the data we're generating. Is this good for you to consume? Can you do stuff with this? They're like, yeah. Let's change that element, and we need this in a slightly different structure.
Ben Wilson [00:17:03]:
I'm like, sure thing. Let's make sure that the API is versioned and and you're comfortable with this. And then we release it, and we're, you know, watching, you know, the test trends are hold out against our test group for the first initial release. Everything's working really well. We're seeing, you know, sales going up and interaction going way up and looking at by position data of how, you know, the recommended recommendations are are displayed because we had what what you said, Michael. It was, like, genre effectively. And not every customer would even see the same carousels because we were actually doing recommendations based on carousel and content within carousel. So that was all dynamic.
Ben Wilson [00:17:52]:
When you go to that site, everybody gets a different home page, basically. And but we were, you know, watching the data, and then all of a sudden, we'd see this huge spike in activity. And if we don't know what's going on, we're like, man, we're really good at this. Our model's awesome. And then you get somebody saying like, hey. The science team, did our change did it did it work really well? Can you look at the data? I'm like, what changed? We didn't release another version. Like, no. No.
Ben Wilson [00:18:23]:
We changed the the layout. Like, oh, we should control for all of that. Like, let's put make that an AB test. And then we start getting feedback to realize, hang on. There's an interaction here between the display and the model itself. So they did some front end tweaks, and we're like, if we now do a test within 2 different versions of the algorithm to coincide with that, let's see if we can boost that even more. And we've started finding interactions like that. Okay.
Ben Wilson [00:18:54]:
This is way more complicated than we thought.
Terry Rodriguez [00:18:57]:
You know, the, that story, actually reminds me, like, you'll run into all kinds of effects, things that even aren't happening on your platform that can, make it difficult to explain, like, what's going on. And I remember one time, when I was there, the the the Slack channels were lit up. Like, what is going on? Like, there's just a a spike of traffic right now, and people were trying to figure out if something was was, broken or what. And it's funny because I had just seen something, you know, and social media is like YouTube is down. So I'm like, this type of YouTube's down. I think this is what it is. And sure enough, they were able to kind of, like, quantify that effect afterward. But, like, it can be things in the UI.
Terry Rodriguez [00:19:41]:
It can be something's broken. It can be, something's broken on somebody platform that, make it difficult to determine, like, why is this why is this, performing the way it is right now. And, you know, I remember they were also, ranking containers for individuals. So like you said, it was, they would have, like, a pinned container or something that they were trying to test, but then the containers themselves, the genres, right, would be would be personalized to individuals too. So it's like so many different effects. Different teams are testing different things. They're testing for different platforms like the Android team, the iOS team, people for Roku. So it's, it's really impressive to be able to, say anything with, all of these moving parts.
Terry Rodriguez [00:20:24]:
Right?
Michael Berk [00:20:27]:
No comment. But, yeah, you're you're hinting at sort of controlling for interaction effects. And without getting too deep, it's it's really, really, really, really, really hard, but it can be done.
Ben Wilson [00:20:44]:
Yeah. I mean, the the thing that you have to worry about with that is your degrees of freedom. So Yeah. As you start slicing your data, at a certain point, you're basically just p hacking. Right? You're going down and you're saying, like, well, if I if I get this subset of customers and then do an AB test across them, it'll be more accurate. I can I can control for this? I'm gonna do region, and I'm gonna do age range, and I'm gonna do gender. And then I'm gonna do, concatenation of top three favorite genres or types of thing that people are into. Not specific content, but like, hey.
Ben Wilson [00:21:23]:
This person's sci fi horror drama or this person's documentary sci fi, you know, cartoon or something, and you create a cohort of them. I've gone down that path and then and wrote all the code to do all that and generate, you know, an AB test group for that. And then quickly backed away once I looked at the result of executing my code. Like, okay. 30% of my groups have less than 5 people in it. Like, there's no statistical significance to be gained from that. So it's all about that trade off of, like, how precise can I be based on my my population to do sampling of?
Terry Rodriguez [00:22:06]:
I really liked working on experiments on things like content embeddings. It's like a more fundamental component of the stack where some experiments were kind of moving down that path of, like, segmentation. And, it's like it just makes it for a much more complex pipeline too. I mean, what we just said a moment ago, right, there is, of course, there is not just one model that you can use. And so how can you find that trade off of, you know, techniques that are gonna give you a lot of leverage, and they aren't gonna be brittle, and they aren't gonna make the code way more complex?
Michael Berk [00:22:44]:
Right. So, Terry, what is Smells Like ML?
Terry Rodriguez [00:22:49]:
Smells Like ML is, is a GitHub handle first, and, then after after, my partner, Salma, and I had been working on a lot of projects to teach ourselves about applied machine learning AI. We, started a blog. I guess it must have been around 2017, 2018. And this is, like, a place where we were featuring some of our, learning some of our projects, and, we kind of expanded out to, viewing that is like a like a, an innovations lab, or consultancy when we could work with people on projects. But I would say it's mostly a a personal or professional hacker brand, for Salma and I as we were, like, participating in hackathon, contests and, putting out blog content.
Michael Berk [00:23:43]:
Nice. And do you have any memorable blogs or memorable topics that you you thought were very interesting?
Terry Rodriguez [00:23:49]:
You know, like, because it's so top of mind now nowadays with the generative AI, I think, some of my favorite projects were in that area. And, you know, we had been messing around with, these, like, do as I do motion transfer, videos where you could make, an image you could take an image of yourself and transfer that, likeness over to, like, a video of somebody dancing to make yourself look like you were doing that kind of stuff. And so we we started, with a fascination for that technique and learning more about generative AI, experimenting with GANs, and, eventually getting into, diffusion. And so I guess it must have been in 2021. It was, before late in diffusion, but there was, earlier, diffusion models that weren't controllable by text. And we started experimenting with that to, of all things, generate movie posters. We had a large large, collection of movie posters that we had scraped. And so with, with, a dataset that was more complex than, say, faces or ImageNet, and things like that, could we could we, trick train a model from scratch to generate posters? And so that that was kind of a favorite project of mine because I think I, had a chance to see, like, how powerful diffusion really was, how much, how much how much greater capacity those models had for, kinda capturing the distribution of a image dataset.
Terry Rodriguez [00:25:24]:
But other projects I really liked, you know, one from maybe around that year or 2, now that I think about it, was, putting a custom image classifier on a on an Arduino BLE Sense. So this is like a battery powered microcontroller. And just the fact that you could put a a image classifier on that that could run-in real time and, and in our case, we were creating model that did something kinda interesting. I think something that people don't really think that you can do with with, image classification. For more detail, like, the the, the, the For more detail, like, the the prototype that we were working on was something like smart irrigation, And so we were exploring the, ability for an image classifier to infer if a plant was, like, dehydrated, if it was showing, like, droop. You know, it's a common a common thing for plants to do when they are are out of water to to go limp. And so, making a classifier that could see like that nuance between a hand a plant that was, like, perky and not needing water versus one that was, like, showing signs of drought stress, That was an interesting project to kinda test, like, what could you do with microcontrollers? What could you do with, image classifiers? So that was, probably a favorite project too.
Michael Berk [00:26:51]:
Nice. And it sounds like you're a big projects guy. Have you always been like this, or did you are you building this up to sort of improve your technical skill set? Why do you do this?
Terry Rodriguez [00:27:02]:
I would say it's, more more, the the ladder there. Like, for for me, you know, I'll read a paper or I'll check out a GitHub repo, but, of course, like, papers can cherry pick the results and kinda show you, like, the best sides of things. It's important to get your hands on on the, the code base and, like, determine how it works for yourself. It's like, maybe a technique works really well for benchmark datasets that researchers are using, but when you go to apply it to something that you care about, not so much. And so getting, getting familiarity with different techniques that I was curious about learning more about it or or trying to push the limits of what could be done by trying out these ideas. I think, learning by doing is really important. And, you know, I I earlier in my career, I would I would do things like Kaggle where you kinda have the dataset curated for you, but that's like a different kind of workflow. Right? Like, at the end of the Kaggle contest, people are really hacking to to get like that marginal improvement because at the end, it it really is about something, you know, that that tends to overfit to that performance metric.
Terry Rodriguez [00:28:17]:
And so I, I wanted to use projects to explore more of the engineering considerations that go behind, like, a model choice. It's like, yeah, there's a a a great new model out there, but maybe it involves 3 d convolution. And now you need, like, some expensive hardware and running it in real time is no longer a possibility. And so it might be of theoretical interest, but of practical interest, not so much. So trying to trying to ground our experience on, what works practically for problems, that we were interested in.
Ben Wilson [00:28:53]:
Do you ever build a physical prototype of a solution that involved ML where you're like, this could be useful. Maybe I should try to get funding to to have some manufacturing plant make this and sell it.
Terry Rodriguez [00:29:08]:
Well, on that, on that smart irrigation center idea, we did get outreach from, some people who were representing a a fund that was, backed by, like, big big, agri tech, agri agricultural tech group. And they they were interested in, like, developing that prototype, getting users, and things like that. We, we had a had a project with it, but by the time that we had heard from this group, we were, like, you know, months or maybe a year past, actually working on it and trying to develop it out. But there were other projects that we worked on and kinda committed to, standing up a website for and, like, exploring, people's interest in that, things that, we had, like, applied to YC with or something. So, earlier in our career, we were we were really focusing a lot on kind of, embedded computer vision, IoT, projects, things before there were, like, smart cams, before there were, like, AI chips. We would be just, like, trying to reduce the model as much as we could and run it on a, like, a pi 0 or something with a very small form factor, something that was not battery powered. And, of course, like, since then, there's a there's a there's a whole bunch of hardware and and, SDKs that have emerged to, like, really advance that space. But we were we were, exploring, like, some of those ideas too.
Terry Rodriguez [00:30:33]:
And I guess, like, you know, we must have come to the conclusion at some point that hardware actually is hard. And, so we were kind of, like, you know, exploring more of what we could do with, our our careers and organizations that were well resourced so that we could work on problems that we were interested in without the the need for all that upfront capital to get going. I guess, you know, software is, a lot easier now too. And and, with generative AI, we were really thinking, like, here's a new opportunity to do something on our own as a start up because a lot of barriers have come down. You know, you don't need as much data or even as much compute to do some of the things that were impossible, like, 5 years ago.
Michael Berk [00:31:21]:
Nice. That I think that segues perfectly into Remix. What's Remix?
Terry Rodriguez [00:31:25]:
Well, at Remix, we've, been working on an agent to guide the development and deployment of AI applications. And so what that means for us right now is more of a focus on, deep learning models, on data sources that traditionally require, like, humans to to, label or review. So we're interested in, kind of pushing the limits of what's possible in automation, and, cogeneration has come so far. Content generation has come so far. We started looking at, things like image generators, text generators, and thinking about how we could address some of these, machine learning challenges. So when I was working at a robotics group, you know, we were building a robot that we wanted to be able to, deploy in different warehouses, not ones that we owned. So not ones where it was easier or cheaper for us to scan the space and understand the content inventory for that warehouse. They they're trying to build a robot to help in micro fulfillment centers.
Terry Rodriguez [00:32:28]:
So you could imagine a lot of variability in in the, scene where a warehouse might be deployed. And so at that time, I was really thinking a lot about how, we would be able to, simplify that process of understanding the the environment. And I was thinking a lot about sim to reel, and the the, image generators had gotten to a point where you could tell, like, that, practically speaking, like, as far as an image classifier, an object detector is concerned, the image is is good as real. And I mean that to say, like, you have a lot of control over the design of a of a of an image and that you can when you bring an image like that to the resolutions that are used, like, you know, 200 pixel wide image frames, the the fine grain details that a person might pick up on are not really noticeable. So as as, you know, people working in the content generation space, we're kind of chasing the nines on improving the, quality, the realism of, of those images. We were looking at it and saying, like, this is quite good. And and really what we care about is improving the efficiency of generating images for image datasets. And, you know, when you're making, like, an image classifier, there's a there's a, I would say, a fairly well understood kind of best practices.
Terry Rodriguez [00:34:01]:
You're gonna use transfer learning using a, a tried and true backbone that's been pretrained on a large image dataset. You're going to, use certain practices in in the model selection, and that should depend on the hardware too. Like, if you are running as we were saying earlier, you can't just pick the most accurate model, but, like, there's constraints. Like, will this model fit onto the device that I need to run it? Will it run fast enough? That can that can depend on the, the SDKs that are available for that hardware. Like, an op might not be supported in the newest, latest, and greatest model. Something like transformers, came to to, some of these, like, smart camera devices probably like a year, after transformers were being deployed in servers. And so with, with all of these considerations, it's like a a lot of potential, hurdles for people when they're trying to deploy a custom image, classifier detector, and these are important components to perception systems. I mean, now now things are trending towards, like, multimodal models, but those models too are still quite big.
Terry Rodriguez [00:35:14]:
They don't run-in real time. You know, you can't you can't find everything that you care about in, like, open vocabulary, open set detectors. So there's a lot of, kind of the best practices and recipes that, are are picked up, and they're easy to kind of, build a build a pipeline or a system around, and we wanted to, we wanted to make it a lot easier for people who are trying to deploy, like, these models on on devices to run-in real time and in a perception and vision. Could be like robotics and IoT, AR, and VR. But, as we were working on that, problem, let's say, at the beginning of 2022, especially when we were kind of formally launching what we've done at Remix, of course, Chat GPT is making remarkable progress, and we're seeing the possibility of bringing, like, high level chat interfaces to that workflow. So what we had started with, was focused around computer vision, but, we had seen that making, LLMs is getting cheaper too. And that's, like, data efficient, parameter efficient techniques are getting better. And so we're thinking about how we can expand to include, like, text generation, text embeddings, and really, like, how we can, help people bring, bring that domain knowledge that it takes to pull a model off the shelf and make it do something useful.
Terry Rodriguez [00:36:48]:
As I said, like, most of these models, they're released, haven't been fine tuned on some research benchmark dataset. And so you could imagine a situation where you have a robot and it needs to avoid the ladder, but you go look through, one of these datasets like ImageNet, and there's no ladders. And so, you know, what are you gonna do? You need to use transfer learning. You need to fine tune the model so that it can recognize this concept that's really important to your your setting, your warehouse. And so being able to make that workflow, like, kind of a a really smooth for people, whether it's the training, whether it's the conversions, you know, we wanna we wanna work to make it easier to deploy the models. Training the models is really a small part of the ML application life cycle. So we're thinking a lot about how we can take our experience building these kinds of systems and, make, like, a very simplified user experience for people who want access to AI models, but maybe they don't they don't, have a bunch of experience with training deep learning models and linear algebra and and things like that.
Ben Wilson [00:37:59]:
So is the intention eventually for mix, which is your
Terry Rodriguez [00:38:04]:
chatbot.
Ben Wilson [00:38:07]:
For that to be able to say, like, in this scenario you just presented, hey, mix. The model that you built yesterday with this ID or something, My robot hit 7 boxes that were sitting out on the floor. Can you can you build a model that will prevent that? And it will autonomously go and generate training image data that would show, like, here's a dangerous condition. Like, based on images you took of your warehouse, I'm gonna superimpose boxes in the middle of the floor and say, this is dangerous. And then the same image without a box saying, this is safe. Is that the intention eventually to get to a point where it would would use, you know, image generation techniques to create a training set, then train the model and help you deploy it?
Terry Rodriguez [00:39:04]:
To, at some level, yeah, where it is right now, it would be more like I need to differentiate these categories of objects. And so if you were to pull a model off the shelf, it's probably been trained to identify, you know, 20 or a 1000 objects that you maybe don't care about. And so just first closing that gap of off the shelf model versus the model that's gonna work for your deployment is, is where we are, now. And what we would like to do is be able to, serve those continuous updates. We'd like to be able to, help help, solutions to monitor the performance of the models, and and we would like to be able to, use, like, real images at your warehouse and, techniques to synthesize or superimpose, like, some of these obstacles. It might be less of a this is safe or not, situation than than just identifying the the obstacle. And, you know, that could involve not just classifiers or detectors, but maybe we could help with, your ability to improve a depth estimation model or, you know, any any number of, like, current state of the art techniques. We wanna make it easier for people to use the right workflows, access the right data.
Terry Rodriguez [00:40:28]:
And so, for now, we're using, image retrieval and image generation to design these datasets for for, the user's purpose. And, to the extent that we could, it would be it would be great to do more with automation. It'd be great to do more with procedurally generated data where we could, create a 3 d asset and, put it in different poses and things like that. So, we're we're really exploring, like, all of the techniques that are, in in the best practice and trying to, codify that and trying to find ways to just make it a bit easier to, to deal with, deal with the agent at the level of, like, I I I have a request or, you know, to to make, like, executive decisions about it and let the, the pipelines, the tools that are used in the background, do the rest. So the the user experience is like chatting with the agent and then having their intent kind of parsed and mapped to a specific workflow that that would make sense to use that some you know, one of us would have designed for that application. And, yeah, just constantly be, pushing on the state of the art there. If multimodal models end up smaller and faster and more real time, we'll be there, to help people build those. And in general, you know, the the there's a a lot of excitement around being able to, go find that that one model that does everything you care about.
Terry Rodriguez [00:42:06]:
But in practice and engineering, I've never been in a place where they said, like, off the shelf model was good enough. We don't need it to be both faster and more accurate. So there's gonna be, like, all the tricks that you could possibly pull with data augmentation, with, training the model, testing it, you know, optimizing hyperparameters to some extent. But, deep learning is pretty robust too. Like, there's a there's a lot of consistency out of out of the models if you're if you're not looking for, like, state of the art performance on differentiating a 1,000 different object categories. In practice, maybe there's, you know, a much smaller set of concepts that you need to identify and work with. And the other thing that we think a lot about too is, like, just generally trying to invest in techniques that are going to, help with that cold start or that few shot problem. Like, how can we how can we help people who are data limited get started and then start the data flywheel that will feed into model improvements.
Terry Rodriguez [00:43:15]:
So that should surely involve, like, a better UI for humans to help, help cheaply say, like, this is a good example, that's a bad example so that we can take take advantage of, their judgment. But we're not we're not, quite there yet. I think the next step for us is getting to the point where we're, helping to deploy the models. And I'm thinking of, my experience with Foreman, robotics company before I joined Tubi. You know, they had an interesting idea, but but I guess, like, I wanted to go a different way with, where they were going with the product. You know, they they had an agent that could be deployed on your robot, and it would monitor ROS messages that are being, sent on the devices. And a big part of what they were trying to do is just get that data off off of the, the these RAS topics and back them up to the cloud. But for me, I was also interested in deploying models on the agent, like helping helping roboticists run those types of models, helping them help and simplify like their perception problems.
Terry Rodriguez [00:44:26]:
And so I think a lot about that deployment model of being able to have like a lightweight binary that you can install. It's configured to point to a source. Maybe it's like, you know, your your device where the camera's at. Maybe it's a a ROS topic or maybe it's a directory of images. You know, maybe it's your data lake. But how can you deploy an agent so that it's just configured with, all the all the information it needs to ingest from the input and then produce that stream of detections or or classification, what whatever it is, like the structured information from from the, image sources, making that, really easy, so that someone has, like, a very simple high level CLI for I need a model that does this. Now I need to deploy it. Here's the basic information I need about where where the data is to consume and where to send it.
Terry Rodriguez [00:45:21]:
Maybe we're helping people back up data to the cloud so that we can use the cloud resources to update models. We can run more powerful models in the cloud so that we have, some way of measuring what's happening down on the edge with these, you know, smaller, perhaps less accurate models.
Michael Berk [00:45:41]:
Got it. So I have, like, a trillion questions. Let me ask maybe 2, maybe 1, maybe 7. We'll see what happens. First question. This seems like a really, really interesting use case for drift detection. And for instance, with that, crash report, let's say, in a factory floor that says, hey. You hit a person.
Michael Berk [00:46:05]:
You you hit a box. That's gonna be in plain text. And if you feed that into your system, it can then assess what new training data needs to be generated, retrain on the fly, and then redeploy. Have you guys thought about that angle?
Terry Rodriguez [00:46:19]:
Yeah. We wanna we wanna make, the the switching cost of updating or putting in a new model, like, really low. And I was thinking about this a lot when we are working on this robot because we were using, an OKD, camera smart camera by Luxonis, and it has this, vectorized compute, on it, a VPU. It's made by Intel. And so they they you're able to flash, like, MobileNet V2 kinda like, you know, maybe 5 megabyte sized, model files on there, like, 10 at a time. So you can build, like, some pretty, sophisticated perception pipelines just just with where that's at. But, depending on what context the robot's in, you might not be as interested in, detecting and tracking every person in the field of view, but maybe there's some other object of interest that you wanna be monitoring. And so being able to, make the training of these models, really fast, really cheap, and, make it possible so this agent could, depending on the context, pull those models down.
Terry Rodriguez [00:47:28]:
And I think even as we're thinking more about, like, open set, open vocabulary, detections that the latest generation of models are capable of, Like, there's gonna be that speed accuracy trade off, and there's gonna be the need to distribute the work, onto the edge, if possible. So, you know, in this robot that we were working on, you might have, like, a a powerful, Xavier or something like this with the with the GPU, and it can it's capable of, like, you know, a lot of a lot of, computation. But, still, you have, like, 8 cameras on there, technically, like, 12 cameras. They're like stereo cameras with the center central camera. So you would have, all these camera feeds, and you can't really deal with that without pushing more of the models closer to the source of capture. And so, there's gonna be a need for these small efficient lightweight models to be, deployed there and probably a need for switching depending on the context. And you could imagine, like, the bigger, LLM kind of controller living on that Xavier. And this would be like a way that probably people are trying to design these systems currently with with what's available.
Terry Rodriguez [00:48:46]:
So we wanna we wanna, really explore, like, data efficient, parameter efficient, and and really fast kind of model training and updates because we think that the, that's that's a good strategy for dealing with some of the context switching that you you would expect when you're putting a robot in the wild.
Michael Berk [00:49:09]:
Cool. Second question. It seems like you guys are still in sort of this exploratory phase where you're figuring out best practices, trying to see where your market niche could be. Ben and Terry, a question for both of you. It seems like with this text input to inform data generation, to inform retraining, to inform new model deployment, if that's generally the pipeline you guys are building. I'd be curious what area or what industry you would target first. And I was thinking about it, the criteria you can sort of think of as, alright, it needs text feedback. Mhmm.
Michael Berk [00:49:49]:
It probably needs to be low stakes, and it probably needs to be high volume. So an example of not that would be military applications. Like, oops. We just blew up a family. That happens once every month. That's probably not the best industry to start out at. You want it to be a bit more robust. Likewise, with factories, maybe that's a little bit too high stakes.
Michael Berk [00:50:08]:
You probably have volume. But if you knock over an entire cart of semiconductors, that's probably not ideal, and that would cost a lot of money. So what is the ideal entry point for this tool in terms of an industry?
Terry Rodriguez [00:50:24]:
Well, before I answer that, I'll even say that there's another part of the pipeline too. Like, with multimodal models, it's, a lot of them are getting specialized and adapted using pipelines with expert models. And so you could imagine it's text to data to model to then a pipeline for more data for, like, multimodal model or something like this. So so you can there's there's, like, so many ways that the data and the models are kind of, like, feeding into each other. But, I totally agree. Like, some application areas are not where we wanna be. Like, you know, I would not recommend to somebody to use a generated dataset to try to make, like, the very best person detector or something like this. Like, somebody who's, really working with, very expensive, large, ground truth labeled datasets, you know, less likely to lead to a robot that ran into something that didn't look like a person.
Terry Rodriguez [00:51:25]:
Things that are lower stakes would probably be like these pipelines that feed into other models. So that's an area that I'm interested in as as we're seeing multimodal models kind of, mature and become more efficient. But, I think that an area that, I could imagine too is, like, wrangling data in a data lake. Like, maybe you have, millions of images in s 3 somewhere, and you really just need to find all the instances that have blank in it. And so the cost of, specializing one of these models and the cost of running it on on those images is probably a better trade off to make than throwing, like, bigger generalist model at the task. I would I would rather make a, custom ladder detector to throw at my, corpus of images and just map, that model to a a column of image file paths than to feed each of those rate limited, of course, to GPT 4 v. So there's, like, different areas in the cloud or in in model creation. There's probably some areas where, like, you know, you can make a really good detector.
Terry Rodriguez [00:52:41]:
And so if you if you have a situation where you just need to detect a handful of objects, say, like, aerial photography or something like this, these these techniques can be used to to help with that. But then there's other vision tasks where the, where the the, I guess, the information that you need to identify, it's subtle. It's, like, maybe, like, you're thinking of something like, finding finding defects in parts at a factory or something. It might just be like a subtle scratch. We don't yet have a a pipeline or a method for producing models that can help flag, like, those kinds of, like, fine grain defects. Maybe when we get to, like, generating a 3 d asset or something like this, then we'll be closer to being able to, like, synthesize that kind of data too. But, yeah, I would I would say that this is probably most beneficial for people who are earlier in their process of creating models, model pipelines. And at some point, you you could imagine an organization gets to the point where having a team and and really getting into investing in that pipeline is, is, valuable for them.
Terry Rodriguez [00:54:01]:
But for a lot of organizations, it may be something kinda simpler, like the kind of problems where you could pay a consultant for a quick project, but it's not like, you know, a core feature or something that you're highly invested in, like recommender systems. Right? Like, that's gonna be a difficult problem to, get to the point where you could just fully automate
Ben Wilson [00:54:23]:
it. Yeah. Could not agree more. Yeah. As far as industries, I can't think of a specific industry where it'd be like, oh, you could only like, this would only succeed at first in this sort of industry. This is this seems so broad and applicable to so many different industries and jobs that I've interacted with. To give context on that. Let's say your initial idea that you came up with that you're mentioning about a physical project, you're like, we're gonna use an image classifier to detect when this plant needs water.
Ben Wilson [00:55:06]:
Something like that does not require human expertise per se, like, in an ideal world at a farm, like, large scale agriculture, you you shouldn't have a person going around looking at stuff like that. What what's the solution right now is to deploy cameras on crops. And then some humans still has to go and look at those and say, I gotta go out to to plot 43 b and check on those plant on that corn crop. But when they get over there, it's not like they're doing a lot on a modern industrial scale farm. They're going up there, and they're visually classifying what the problem is and then activating some automation to do something for that. Like, oh, time to turn the sprinklers on or time to turn the sprinklers on less because it's too much water or, you know, some sort of soil sample that needs to be taken and analyzed for potassium content. Like, what is what's going on? Or there's are there bugs or something eating the crop? All of that stuff, I don't think humans enjoy doing, at industrial scale. So an application where you can monitor something like that autonomously at the edge, but then adapt to changing conditions.
Ben Wilson [00:56:29]:
Like, oh, we have this massive storm system that's coming through. Do I need to swap out to determine what the effect of this is with a slightly different model that's detecting damage to crop rather than, you know, moisture to crop. Something like that resonates with me a lot. And then industries like advertising and marketing, I think stuff like this is incredibly useful where you can rapidly update to latent factors that are affecting a model that you're not aware of, or you just you haven't been able to collect the data because the data didn't exist until yesterday, and then a rapid retraining, redeployment, and then reuse of generated assets, I think a simplified pipeline for that that adapts to a fast changing environment is would be a very big game changer. I mean, I hope you guys solve it. Like, it like, really solve this and make it so that you all of a sudden are in this position where you're like, man, we gotta hire a 100 people. Like, we can't we can't deal with this.
Terry Rodriguez [00:57:35]:
No. That that I I hope so. I guess, like, to to add to that too. Right? Like, a lot of problems, you know, don't need machine learning. And, you know, you have to understand if if, the relative cost is is, low enough to justify that type of solution. And so while we're, you know, over here, geeking on the the drought stress detection models and things like that. Like, to what extent can you can you change the irrigation? Can you or is does the plant need it? You know, nature does pretty well with, like, you know, the the environmental conditions the plants have evolved in for all this time. So it's like, where is it valuable? And and I think, I think a lot about, like, the value would be especially for people who want to make a new product, and they want to be able to, use AI cheaply.
Terry Rodriguez [00:58:37]:
But they don't have, all the expertise that they need to, like, get get, the the data wrangling, the the pipelines, the monitoring, the, you know, model updates, that whole kind of, you know, the the whole the whole ML life cycle. So we we would like to be able to, just keep kind of expanding on what we can do with, image and text data and and building pipelines that help people, get quicker to an application. There's, like, you know, finitely many reference applications, reference patterns that people practically care about now. So how much can we just, accelerate their access, to those kinds of pipelines? I used to think earlier on in my career, like, everybody's gonna wanna be a machine learning engineer. But, you know, after working in a few different orgs, I understand, like, a lot of people are are never gonna, like, get into, like, you know, data mining and and trying to find these patterns and things like that. So, like, what tools are possible to, accelerate that for orgs so that they can test an idea, get to market, find out if it's worth investing in, like, the, the team and and everything else. Because a lot of times, it's a little bit backwards now. An organization says, I wanna do AI.
Terry Rodriguez [00:59:59]:
I need to hire an AI expert. And then they've they've went so far into this investment, and I've seen it, you know, in my own experience where it's like, do this company really need a machine learning specialist, or do they need, like, a consultant to work for a couple of months? Yeah.
Ben Wilson [01:00:16]:
Yeah. And and something that I've seen that's really blown my mind with the ML field in general is that we're now interacting with an entire new generation of people that are interacting with AI. You know? And they don't come from the background where they've ever had to go through the process of generating a feature set, training a model, testing that model, going through 173 iterations of training until they got it kind of right, and then send it out for review by experts, and then go back to the drawing board, do it, you know, train it up 30 more times. That whole life cycle, that whole process of what it is to deploy production amount and then write a full code best based around that. Even if you're experienced and you're solving a a sizable enough problem at a company, that's months of work to do get something that's a major project out there. Like, do you need a REST API around this? Okay. We have to think about deployment. What's the scalability of this? How many requests are coming in? Oh, 6000 a minute.
Ben Wilson [01:01:30]:
Okay. We need Kubernetes. Now we need to think about that aspect of this. But now people with interacting with somebody else's infrastructure, you know, you can interact with OpenAI SDK Yeah. And ask it whatever you want it. You can prompt it with something complicated to control how it is, you know, doing it. Or if you wanna go with something more specialized and you're you're using Hugging Face transformers SDK and you're taking a, you know, a foundation model, you're fine tuning it, or using something like PES to do, you know, updated weights on top of that. You're not doing that life cycle of ML, like, getting that to the point where it's ready for production.
Ben Wilson [01:02:15]:
So what's amazing to me is that people are expecting behavior with AI interactions that are orders of magnitude faster. Like, nobody cares. They're like, yeah. I just wanna test out 400 different prompts and see which one performs the best. Those 400 prompts, they can get an answer to which one of these solves their problem in 3 days. And it just blows my mind, this revolution that's happening right now. And I think that's kinda that distills down, like, what you guys are trying to work on is how do we get that same sort of behavior and speed and rigor, but applied to some of these more traditional ML things.
Terry Rodriguez [01:03:00]:
Yeah. I would say, you know, a lot of lot of, admiration for what OpenAI has been able to do with simplifying access to, like, very high quality models. You know, you can go on their store and, like, whip up something that's using, like, rag in the background and, you know, test an idea, like, faster than ever. Right? Like, you used to used to, as you said, have all this overhead just to get your model into somebody else's hands to try it out. And now, you know, probably in 30 minutes, you could test a new idea through through something like their store. And if that's good enough, justify, like, okay. Now I'm gonna write the code base that uses their API. I'm not yet at the scale where I'm thinking about owning the models and the infrastructure.
Terry Rodriguez [01:03:46]:
But, eventually, the more invested into the problem you get, the more you're gonna wanna, control those costs. You're not gonna wanna cut us a $1,000,000 check to OpenAI when you have enough users. And so, you know, that's where I think the the, pain points are now. Right? It's like I wanna have OpenAI level LLM inference, but I I wanna self host or do something that's more cost effective. Like, how how could we help people to, to adapt the models to their use case to help them to self host or or do something on prem or things that, are more like real time and at the edge and these other areas where you won't find a solution from a product like OpenAI yet.
Michael Berk [01:04:31]:
Right. So I I think this might be the best breaking point, for the next, like, 6 hours. So I'm gonna take advantage. And, I know we're at time and we have busy lives, so we should return to those. But, would love to keep talking. So, in summary, we talked a little bit about to be at the beginning and recommendation engines. Specifically, they're often subject matter driven and simpler than you might think. Throwing everything into a deep learning model, unfortunately, once again, doesn't solve it.
Michael Berk [01:05:04]:
And, evaluating these models with a b tests, is, again, really challenging because of the user interactions and, the interactions between even concurrent experiments. So it's important to think about what exactly your north star metric is showing and what exactly you're testing. Image generation almost always leverages transfer learning, and there's often a big backbone created by one of the big tech companies, and you wanna sort of fine tune or leverage that to your use case. And that's exactly what Terry is doing at Remix. They're sort of trying to remove a human in the loop and instead use smart LLMs and, generative models to inform how smaller models, specifically on edge devices, should be retrained. And they're sort of still in the exploratory phase. But, yeah, I'm I'm very excited to see how you guys grow over the next few years. And, again, if you solve this, as Ben mentioned, 100 of people will be working for you in no time.
Michael Berk [01:06:03]:
So, Terry, if people wanna learn more about you, your work, or Remix, where should they go?
Terry Rodriguez [01:06:08]:
Well, check out Remix, with a y dot ai, and, check us out. I'm Terry Rodriguez. Find me on LinkedIn, or you could see, our Twitter smells like ML. So, yeah, we're we're around.
Michael Berk [01:06:23]:
Awesome. Cool. Well, this was a lot of fun. Until next time, it's been Michael Burke and my cohost Ben Wilson. And have a good day, everyone.
Ben Wilson [01:06:32]:
We'll see you next time.
Welcome back to another episode of adventures in machine learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. And I'm joined by my co host,
Ben Wilson [00:00:19]:
Ben Wilson. I review design docs
Michael Berk [00:00:22]:
at Databricks. And today we have a guest named Terry Rodriguez. Guys. Terry started his career in ML for robotics at Formant, and then he joined the machine learning team at 2 b, which is my old company, and their Fox's streaming service. And at Tubi, he focused on sort of content ranking and recommended recommendation engines. He also created Smells Like ML, an innovation focused machine learning blog and consulting service. But most recently, he founded Remix AI, which is an automated machine learning infrastructure solution. So before getting into the latest and greatest, what were you doing at Tubi?
Terry Rodriguez [00:01:02]:
Oh, at Tubi, I was working on their core, recommender algorithms. And so a lot of the work that I was focused on was, testing, experimenting with their ranking logic or their recallers that are used in the, ranking in the recommender system. And a lot of that, was also focused around, like, content embeddings. So I was, working with some of the other teams in in a way of developing text embeddings and image embeddings and, other embeddings that that we were deriving from the proprietary data they have so that we can try to find content similar to things people were really interested in watching and use that to recommend new content that we think that they would be interested in watching. So there's a lot of experimental workflows where you might, come up with a hypothesis based on a previous experiment or based on some new technique you wanna try. And we would, we would come up with a hypothesis and test it offline and come up with a test that we wanted to change and use AB testing to measure the effect of that. Yeah. To to put a little
Michael Berk [00:02:13]:
bit more clarity about what this actually meant, So at Tubi, we had a data science team and then a machine learning team. And this split typically is not seen in other companies. But at 2 b, it meant that data science focused on decision science and inferential modeling. So we own the a b testing pipelines. We did a lot of causal analysis and sort of guided decisions. And then the ML team, they were, at that time, at least the cool kids, and they just, like, went deep into recommendation engines and figured out how on the home screen to organize content to maximize viewing. And so we never really sort of overlapped, but we definitely, used each other's tools a lot. And, it was a really interesting setup.
Michael Berk [00:02:54]:
The the experimentation platform was awesome. That was, one of my core projects. What just curious. What were your pain points with that platform if you had any?
Terry Rodriguez [00:03:05]:
Oh, I mean, I I guess I never, found it to be too much of a challenge to work with, that I can remember. I mean, experiment making an experiment is hard, and you might think from an offline, evaluation that your experiment's gonna do really well and, you know, oftentimes, it doesn't. It's just a very difficult thing to do, but it was nice to have a framework that kind of, made it easy to see when it's working, if you should pull the experiment. There was a lot of ways to, customize the experiment so you could experiment more conservatively. I'm a big fan of, what y'all developed there.
Michael Berk [00:03:43]:
Awesome. That's good to hear. Because I have a massive list of complaints, so I'm glad it didn't exit the data science team.
Ben Wilson [00:03:51]:
I'm curious about the sorry to interject, Michael, but I've got nerd questions for Terry, specifically around embedded content to sort of hydrate poor quality data that's coming in. I lived and breathed that for a number of years at a previous company where you're getting third party data. Like, you're selling somebody's stuff, whether it's media content in your case or in my case, it was clothing and shoes and belts and stuff. They come from 15,000 different manufacturers around the world. Everybody has their own I don't know if you could call it standards, but they were generating data. Mhmm. And some of it was for names everywhere or some of it was internal, like, tool coding for patterns that they were using. The data wasn't consistent, and some manufacturers have human readable texts or machine readable text that you could extract from.
Ben Wilson [00:04:55]:
I'm sure you saw something similar to that at 2 b when a new maybe it's an independent film or something that's coming out, and you look at the the data that's associated with that. You're like, okay. That's one sentence of not really sentence, maybe it's just a couple of tags. Can you explain how powerful it is in a recommender to add additional features to to that in order to figure out that, like, hey. You might also like this.
Terry Rodriguez [00:05:25]:
Sure. Yeah. We, would deal with, proprietary datasets like Graysenote provides a lot of data about, movie content. And as you said, you have these large content libraries, Tubi in particular, that was specialty of theirs is they they have a very large, library of video and and, series 2. And so especially when people are trying to decide, like, what content should they acquire, what's gonna perform for for our audience over there, it's, it's very valuable to enrich whatever information you might have with the kind of structured information that you would get from Gracenote and try to use that to address the cold start problem. It's like before you've seen the content's performance, on the platform, you you at best have a guess for how it's gonna perform for your audience. And so if you can relate that content to content that is performing well, then you have a a good hypothesis for why you might wanna go acquire that title or how much you think it could be worth. And so, we would bring in data like that.
Terry Rodriguez [00:06:31]:
They would bring in, data from all kinds of sources to learn more about the, their customers too. And so the more information that you can get about, you know, a person, things like, you know, their their demographics or lifestyle can, help you to identify patterns in in a viewership and find which content is gonna resonate with with your audience.
Ben Wilson [00:06:55]:
So one of the things that I've seen when highly ambitious young data scientists try to tackle this problem at a company where it's like people think that if I throw all the data in, like, everything that I could touch and just put it into this one model, it'll magically just start recommending awesome stuff, whether they're using some hybrid approach with, like, old school ALS with, you know, additional deep learning on top or it's just pure deep learning implementation. Personally, I've I was always amazed at how bad those performed when when you just start putting more and more information in.
Terry Rodriguez [00:07:36]:
Mhmm.
Ben Wilson [00:07:37]:
Can you talk about how you approach a problem like that or how you Yeah. You would design experiments around, like, should we include this exact raw data of, like, a continuous distribution of a a lot of information within this one feature, or should we start quantizing this to make it more generalized?
Terry Rodriguez [00:07:57]:
Mhmm. I know, one thing that is interesting is a simple idea like alternating least squares was really effective at representing the content over at Tubi. So it was one of the more powerful features that we could use to retrieve similar content. But but as you said, like, just stuffing everything, besides ALS, you know, other content embeddings, other other aggregate statistics that you might just kinda concatenate onto a a giant data frame and then feed into, say, a deep learning algorithm. You know, recommender systems, are often made with, like, simpler algorithms like, XGBoost. And so at the time that I was there, we were interested in exploring more, with deep recommendation, techniques like deep learning for recommender systems. But, everything that I had seen up until the time that I was there was focused on using XGBoost for the the ranking. And so we found that, deep learning worked pretty well for content embeddings.
Terry Rodriguez [00:09:02]:
And and you can imagine, like, having this Gracenote dataset I was talking about could tell you the year, a title was released, the title, who's acting in it, just, almost anything you you would care about if you were scrolling IMDB's page to go learn about the title. And all of that, could be kind of brought into a form that a deep learning model could ingest that, or you could subset parts of that rich data data structure that I described and feed in, like, try to generate an embedding that's just based on something like a plot summary. And so we experimented a lot with different sources to create embeddings in and, testing them offline to see kind of intuitively. Like, if I were to use a text embedding that is trained on the, plot summaries for a title. You know, will I find does this make sense intuitively? Will I find other titles similar to that if I use an embedding to, try to recall, like, similar titles based on cosine similarity at the embeddings? See if that works intuitively, then try, using that that, those titles that you were able to recall with that candidate generation step and use that in the next step for ranking, so producing the final ranked results for the title. And, you know, if it passes that smell test and if if you can measure, like, improvement in the, in the what is it? I almost forget now. The NDCG, the, metric that we were interested in optimizing for for our offline tests, then you're finally at a place where you have a case worth, worth, showing to the rest of the team, seeing if the rest of the team thinks that this is gonna be a valuable change to the pipeline. And there's engineering trade offs to make there too.
Terry Rodriguez [00:10:59]:
It's, it's not as simple as just throw it all at one model, and it's definitely not as simple as just pile in all the data. Right? You know, they they they could, they could get a lot of different sources that would be harder to work with or might have a weaker signal. And so it was important for, humans to be in the loop to try to, make those determinations before possibly, you know, leading to bad user experience for millions of people.
Ben Wilson [00:11:29]:
Yeah. I mean, that that's, like, I couldn't have said it better. I mean, that's exactly how all of the sophisticated in production systems that I've either worked with or, you know, codeveloped with people or built myself. They're all like that. It's it's highly curated feature data that you're controlling and observing at every step of that pathway, and you make this in the parlance of LLMs nowadays. And, you know, people are talking about, oh, chain models, chain LLMs. It's like, it's not new. Like, that's how production at mail typically works where it's you know, you might have 4 embedding models that are all generating embeddings based on this massive text payload that's coming in.
Ben Wilson [00:12:13]:
You wanna standardize that and reduce the computational complexity of that that feeds into a later stage, you know, final model. Yeah. With systems that that we did with, like, fashion, we were doing the same thing with, like, text embedding after doing a lot of cleanup on, like, the the labels that are coming in and descriptions and stuff. But then we found that the actual text data that we got was only correct about 40% of the time. So the color would be wrong. Like, it comes in 4 colors and the labels that they have in the text. Somebody just, you know, command c, command v from something last season, and it's like they don't come in those pastel colors. Now this season, it's all, like, dominant primary colors.
Ben Wilson [00:13:05]:
And you look at all the images that come up, and the images are correct because those are from, like, the warehouse where it's being sold from. You know, like, this isn't right. So find out, like, oh, we can't use any of that data because it's almost always wrong. So let's do a you know, let's build a simple, you know, image classifier that detects colors and not even use deep learning. It's just old school computer vision to to get that stuff. But then when you wanna do style similarity, we found that image embeddings, like, just generate the embedding at at this like, basically, the shortest point after a pooling layer. So, like, this is what this thing kind of looks like to the human eye
Michael Berk [00:13:47]:
Mhmm.
Ben Wilson [00:13:47]:
As an estimate. And then doing cosine similarity distance measurements, with that, it was such a powerful strong signal for us. So, yeah, deep learning was involved, but it wasn't, like, what some people do. They're like, oh, I just have this one massive model that has 37,000 inputs. Let's bill it and see what happens because you can never diagnose what goes wrong with that.
Terry Rodriguez [00:14:14]:
So it's much more of a designed pipeline to make that work. And as you said, getting closer to that raw, ground truth data with the image is, kinda where deep learning might shine.
Ben Wilson [00:14:27]:
Mhmm.
Michael Berk [00:14:28]:
Yeah. And to add one more argument for context awareness when you're building these models and going for simpler, With Tubi, things were displayed on a grid. Mhmm. And that poses, like, a trillion problems when you're validating a model and also when you recommend the content. Like, if you're going to have container level recommendation where, like, the first one is horror, the next one is comedy, that's a very different, recommendation and or, like, paradigm than, like, Google search, which is just sequential. There's only one container. And even worse, when you're AB testing this, if you're on a Roku or some, connected TV, you have a, remote where you start in the top left corner. So the propensity of users to get all the way to the right is pretty low.
Michael Berk [00:15:18]:
Like, are they really gonna press right 7 times and then down 2 times and then over and then down? So, like, basically, controlling for that effect and then 70 other things with how users interact with content. It's super important when you're thinking about ML in that space. And so throwing deep learning at it and calling it a Kaggle competition just will not work for these complex use cases.
Terry Rodriguez [00:15:41]:
Yeah. I I saw a lot of people were, you know, making very valuable experiments to try to suss out that presentation bias. Right? Like that, as you said, the propensity to just kinda get, the content that's near the top left. And, so so I'm really measuring, these, like, really subtle changes in user behavior with those kind of effects, in the mix is, is is very difficult. Yes. So that's why I'm a a fan of the platform, and I think, you you must have, you know, really, really thought it out or had people with a lot of experience building an AB testing platform to do it. But, it's a very difficult problem with, how the how the user experience, how the how the UI, affects that experiment.
Ben Wilson [00:16:28]:
Exactly. Yeah. That was that was really eye opening to me the first time that I worked on a system that users are gonna be interacting with and then running AB test because we would before we understood what was happening, the the team, we weren't really talking that much to the front end developers and the UX designers. We we knew. Like, hey. Here's the data we're generating. Is this good for you to consume? Can you do stuff with this? They're like, yeah. Let's change that element, and we need this in a slightly different structure.
Ben Wilson [00:17:03]:
I'm like, sure thing. Let's make sure that the API is versioned and and you're comfortable with this. And then we release it, and we're, you know, watching, you know, the test trends are hold out against our test group for the first initial release. Everything's working really well. We're seeing, you know, sales going up and interaction going way up and looking at by position data of how, you know, the recommended recommendations are are displayed because we had what what you said, Michael. It was, like, genre effectively. And not every customer would even see the same carousels because we were actually doing recommendations based on carousel and content within carousel. So that was all dynamic.
Ben Wilson [00:17:52]:
When you go to that site, everybody gets a different home page, basically. And but we were, you know, watching the data, and then all of a sudden, we'd see this huge spike in activity. And if we don't know what's going on, we're like, man, we're really good at this. Our model's awesome. And then you get somebody saying like, hey. The science team, did our change did it did it work really well? Can you look at the data? I'm like, what changed? We didn't release another version. Like, no. No.
Ben Wilson [00:18:23]:
We changed the the layout. Like, oh, we should control for all of that. Like, let's put make that an AB test. And then we start getting feedback to realize, hang on. There's an interaction here between the display and the model itself. So they did some front end tweaks, and we're like, if we now do a test within 2 different versions of the algorithm to coincide with that, let's see if we can boost that even more. And we've started finding interactions like that. Okay.
Ben Wilson [00:18:54]:
This is way more complicated than we thought.
Terry Rodriguez [00:18:57]:
You know, the, that story, actually reminds me, like, you'll run into all kinds of effects, things that even aren't happening on your platform that can, make it difficult to explain, like, what's going on. And I remember one time, when I was there, the the the Slack channels were lit up. Like, what is going on? Like, there's just a a spike of traffic right now, and people were trying to figure out if something was was, broken or what. And it's funny because I had just seen something, you know, and social media is like YouTube is down. So I'm like, this type of YouTube's down. I think this is what it is. And sure enough, they were able to kind of, like, quantify that effect afterward. But, like, it can be things in the UI.
Terry Rodriguez [00:19:41]:
It can be something's broken. It can be, something's broken on somebody platform that, make it difficult to determine, like, why is this why is this, performing the way it is right now. And, you know, I remember they were also, ranking containers for individuals. So like you said, it was, they would have, like, a pinned container or something that they were trying to test, but then the containers themselves, the genres, right, would be would be personalized to individuals too. So it's like so many different effects. Different teams are testing different things. They're testing for different platforms like the Android team, the iOS team, people for Roku. So it's, it's really impressive to be able to, say anything with, all of these moving parts.
Terry Rodriguez [00:20:24]:
Right?
Michael Berk [00:20:27]:
No comment. But, yeah, you're you're hinting at sort of controlling for interaction effects. And without getting too deep, it's it's really, really, really, really, really hard, but it can be done.
Ben Wilson [00:20:44]:
Yeah. I mean, the the thing that you have to worry about with that is your degrees of freedom. So Yeah. As you start slicing your data, at a certain point, you're basically just p hacking. Right? You're going down and you're saying, like, well, if I if I get this subset of customers and then do an AB test across them, it'll be more accurate. I can I can control for this? I'm gonna do region, and I'm gonna do age range, and I'm gonna do gender. And then I'm gonna do, concatenation of top three favorite genres or types of thing that people are into. Not specific content, but like, hey.
Ben Wilson [00:21:23]:
This person's sci fi horror drama or this person's documentary sci fi, you know, cartoon or something, and you create a cohort of them. I've gone down that path and then and wrote all the code to do all that and generate, you know, an AB test group for that. And then quickly backed away once I looked at the result of executing my code. Like, okay. 30% of my groups have less than 5 people in it. Like, there's no statistical significance to be gained from that. So it's all about that trade off of, like, how precise can I be based on my my population to do sampling of?
Terry Rodriguez [00:22:06]:
I really liked working on experiments on things like content embeddings. It's like a more fundamental component of the stack where some experiments were kind of moving down that path of, like, segmentation. And, it's like it just makes it for a much more complex pipeline too. I mean, what we just said a moment ago, right, there is, of course, there is not just one model that you can use. And so how can you find that trade off of, you know, techniques that are gonna give you a lot of leverage, and they aren't gonna be brittle, and they aren't gonna make the code way more complex?
Michael Berk [00:22:44]:
Right. So, Terry, what is Smells Like ML?
Terry Rodriguez [00:22:49]:
Smells Like ML is, is a GitHub handle first, and, then after after, my partner, Salma, and I had been working on a lot of projects to teach ourselves about applied machine learning AI. We, started a blog. I guess it must have been around 2017, 2018. And this is, like, a place where we were featuring some of our, learning some of our projects, and, we kind of expanded out to, viewing that is like a like a, an innovations lab, or consultancy when we could work with people on projects. But I would say it's mostly a a personal or professional hacker brand, for Salma and I as we were, like, participating in hackathon, contests and, putting out blog content.
Michael Berk [00:23:43]:
Nice. And do you have any memorable blogs or memorable topics that you you thought were very interesting?
Terry Rodriguez [00:23:49]:
You know, like, because it's so top of mind now nowadays with the generative AI, I think, some of my favorite projects were in that area. And, you know, we had been messing around with, these, like, do as I do motion transfer, videos where you could make, an image you could take an image of yourself and transfer that, likeness over to, like, a video of somebody dancing to make yourself look like you were doing that kind of stuff. And so we we started, with a fascination for that technique and learning more about generative AI, experimenting with GANs, and, eventually getting into, diffusion. And so I guess it must have been in 2021. It was, before late in diffusion, but there was, earlier, diffusion models that weren't controllable by text. And we started experimenting with that to, of all things, generate movie posters. We had a large large, collection of movie posters that we had scraped. And so with, with, a dataset that was more complex than, say, faces or ImageNet, and things like that, could we could we, trick train a model from scratch to generate posters? And so that that was kind of a favorite project of mine because I think I, had a chance to see, like, how powerful diffusion really was, how much, how much how much greater capacity those models had for, kinda capturing the distribution of a image dataset.
Terry Rodriguez [00:25:24]:
But other projects I really liked, you know, one from maybe around that year or 2, now that I think about it, was, putting a custom image classifier on a on an Arduino BLE Sense. So this is like a battery powered microcontroller. And just the fact that you could put a a image classifier on that that could run-in real time and, and in our case, we were creating model that did something kinda interesting. I think something that people don't really think that you can do with with, image classification. For more detail, like, the the, the, the For more detail, like, the the prototype that we were working on was something like smart irrigation, And so we were exploring the, ability for an image classifier to infer if a plant was, like, dehydrated, if it was showing, like, droop. You know, it's a common a common thing for plants to do when they are are out of water to to go limp. And so, making a classifier that could see like that nuance between a hand a plant that was, like, perky and not needing water versus one that was, like, showing signs of drought stress, That was an interesting project to kinda test, like, what could you do with microcontrollers? What could you do with, image classifiers? So that was, probably a favorite project too.
Michael Berk [00:26:51]:
Nice. And it sounds like you're a big projects guy. Have you always been like this, or did you are you building this up to sort of improve your technical skill set? Why do you do this?
Terry Rodriguez [00:27:02]:
I would say it's, more more, the the ladder there. Like, for for me, you know, I'll read a paper or I'll check out a GitHub repo, but, of course, like, papers can cherry pick the results and kinda show you, like, the best sides of things. It's important to get your hands on on the, the code base and, like, determine how it works for yourself. It's like, maybe a technique works really well for benchmark datasets that researchers are using, but when you go to apply it to something that you care about, not so much. And so getting, getting familiarity with different techniques that I was curious about learning more about it or or trying to push the limits of what could be done by trying out these ideas. I think, learning by doing is really important. And, you know, I I earlier in my career, I would I would do things like Kaggle where you kinda have the dataset curated for you, but that's like a different kind of workflow. Right? Like, at the end of the Kaggle contest, people are really hacking to to get like that marginal improvement because at the end, it it really is about something, you know, that that tends to overfit to that performance metric.
Terry Rodriguez [00:28:17]:
And so I, I wanted to use projects to explore more of the engineering considerations that go behind, like, a model choice. It's like, yeah, there's a a a great new model out there, but maybe it involves 3 d convolution. And now you need, like, some expensive hardware and running it in real time is no longer a possibility. And so it might be of theoretical interest, but of practical interest, not so much. So trying to trying to ground our experience on, what works practically for problems, that we were interested in.
Ben Wilson [00:28:53]:
Do you ever build a physical prototype of a solution that involved ML where you're like, this could be useful. Maybe I should try to get funding to to have some manufacturing plant make this and sell it.
Terry Rodriguez [00:29:08]:
Well, on that, on that smart irrigation center idea, we did get outreach from, some people who were representing a a fund that was, backed by, like, big big, agri tech, agri agricultural tech group. And they they were interested in, like, developing that prototype, getting users, and things like that. We, we had a had a project with it, but by the time that we had heard from this group, we were, like, you know, months or maybe a year past, actually working on it and trying to develop it out. But there were other projects that we worked on and kinda committed to, standing up a website for and, like, exploring, people's interest in that, things that, we had, like, applied to YC with or something. So, earlier in our career, we were we were really focusing a lot on kind of, embedded computer vision, IoT, projects, things before there were, like, smart cams, before there were, like, AI chips. We would be just, like, trying to reduce the model as much as we could and run it on a, like, a pi 0 or something with a very small form factor, something that was not battery powered. And, of course, like, since then, there's a there's a there's a whole bunch of hardware and and, SDKs that have emerged to, like, really advance that space. But we were we were, exploring, like, some of those ideas too.
Terry Rodriguez [00:30:33]:
And I guess, like, you know, we must have come to the conclusion at some point that hardware actually is hard. And, so we were kind of, like, you know, exploring more of what we could do with, our our careers and organizations that were well resourced so that we could work on problems that we were interested in without the the need for all that upfront capital to get going. I guess, you know, software is, a lot easier now too. And and, with generative AI, we were really thinking, like, here's a new opportunity to do something on our own as a start up because a lot of barriers have come down. You know, you don't need as much data or even as much compute to do some of the things that were impossible, like, 5 years ago.
Michael Berk [00:31:21]:
Nice. That I think that segues perfectly into Remix. What's Remix?
Terry Rodriguez [00:31:25]:
Well, at Remix, we've, been working on an agent to guide the development and deployment of AI applications. And so what that means for us right now is more of a focus on, deep learning models, on data sources that traditionally require, like, humans to to, label or review. So we're interested in, kind of pushing the limits of what's possible in automation, and, cogeneration has come so far. Content generation has come so far. We started looking at, things like image generators, text generators, and thinking about how we could address some of these, machine learning challenges. So when I was working at a robotics group, you know, we were building a robot that we wanted to be able to, deploy in different warehouses, not ones that we owned. So not ones where it was easier or cheaper for us to scan the space and understand the content inventory for that warehouse. They they're trying to build a robot to help in micro fulfillment centers.
Terry Rodriguez [00:32:28]:
So you could imagine a lot of variability in in the, scene where a warehouse might be deployed. And so at that time, I was really thinking a lot about how, we would be able to, simplify that process of understanding the the environment. And I was thinking a lot about sim to reel, and the the, image generators had gotten to a point where you could tell, like, that, practically speaking, like, as far as an image classifier, an object detector is concerned, the image is is good as real. And I mean that to say, like, you have a lot of control over the design of a of a of an image and that you can when you bring an image like that to the resolutions that are used, like, you know, 200 pixel wide image frames, the the fine grain details that a person might pick up on are not really noticeable. So as as, you know, people working in the content generation space, we're kind of chasing the nines on improving the, quality, the realism of, of those images. We were looking at it and saying, like, this is quite good. And and really what we care about is improving the efficiency of generating images for image datasets. And, you know, when you're making, like, an image classifier, there's a there's a, I would say, a fairly well understood kind of best practices.
Terry Rodriguez [00:34:01]:
You're gonna use transfer learning using a, a tried and true backbone that's been pretrained on a large image dataset. You're going to, use certain practices in in the model selection, and that should depend on the hardware too. Like, if you are running as we were saying earlier, you can't just pick the most accurate model, but, like, there's constraints. Like, will this model fit onto the device that I need to run it? Will it run fast enough? That can that can depend on the, the SDKs that are available for that hardware. Like, an op might not be supported in the newest, latest, and greatest model. Something like transformers, came to to, some of these, like, smart camera devices probably like a year, after transformers were being deployed in servers. And so with, with all of these considerations, it's like a a lot of potential, hurdles for people when they're trying to deploy a custom image, classifier detector, and these are important components to perception systems. I mean, now now things are trending towards, like, multimodal models, but those models too are still quite big.
Terry Rodriguez [00:35:14]:
They don't run-in real time. You know, you can't you can't find everything that you care about in, like, open vocabulary, open set detectors. So there's a lot of, kind of the best practices and recipes that, are are picked up, and they're easy to kind of, build a build a pipeline or a system around, and we wanted to, we wanted to make it a lot easier for people who are trying to deploy, like, these models on on devices to run-in real time and in a perception and vision. Could be like robotics and IoT, AR, and VR. But, as we were working on that, problem, let's say, at the beginning of 2022, especially when we were kind of formally launching what we've done at Remix, of course, Chat GPT is making remarkable progress, and we're seeing the possibility of bringing, like, high level chat interfaces to that workflow. So what we had started with, was focused around computer vision, but, we had seen that making, LLMs is getting cheaper too. And that's, like, data efficient, parameter efficient techniques are getting better. And so we're thinking about how we can expand to include, like, text generation, text embeddings, and really, like, how we can, help people bring, bring that domain knowledge that it takes to pull a model off the shelf and make it do something useful.
Terry Rodriguez [00:36:48]:
As I said, like, most of these models, they're released, haven't been fine tuned on some research benchmark dataset. And so you could imagine a situation where you have a robot and it needs to avoid the ladder, but you go look through, one of these datasets like ImageNet, and there's no ladders. And so, you know, what are you gonna do? You need to use transfer learning. You need to fine tune the model so that it can recognize this concept that's really important to your your setting, your warehouse. And so being able to make that workflow, like, kind of a a really smooth for people, whether it's the training, whether it's the conversions, you know, we wanna we wanna work to make it easier to deploy the models. Training the models is really a small part of the ML application life cycle. So we're thinking a lot about how we can take our experience building these kinds of systems and, make, like, a very simplified user experience for people who want access to AI models, but maybe they don't they don't, have a bunch of experience with training deep learning models and linear algebra and and things like that.
Ben Wilson [00:37:59]:
So is the intention eventually for mix, which is your
Terry Rodriguez [00:38:04]:
chatbot.
Ben Wilson [00:38:07]:
For that to be able to say, like, in this scenario you just presented, hey, mix. The model that you built yesterday with this ID or something, My robot hit 7 boxes that were sitting out on the floor. Can you can you build a model that will prevent that? And it will autonomously go and generate training image data that would show, like, here's a dangerous condition. Like, based on images you took of your warehouse, I'm gonna superimpose boxes in the middle of the floor and say, this is dangerous. And then the same image without a box saying, this is safe. Is that the intention eventually to get to a point where it would would use, you know, image generation techniques to create a training set, then train the model and help you deploy it?
Terry Rodriguez [00:39:04]:
To, at some level, yeah, where it is right now, it would be more like I need to differentiate these categories of objects. And so if you were to pull a model off the shelf, it's probably been trained to identify, you know, 20 or a 1000 objects that you maybe don't care about. And so just first closing that gap of off the shelf model versus the model that's gonna work for your deployment is, is where we are, now. And what we would like to do is be able to, serve those continuous updates. We'd like to be able to, help help, solutions to monitor the performance of the models, and and we would like to be able to, use, like, real images at your warehouse and, techniques to synthesize or superimpose, like, some of these obstacles. It might be less of a this is safe or not, situation than than just identifying the the obstacle. And, you know, that could involve not just classifiers or detectors, but maybe we could help with, your ability to improve a depth estimation model or, you know, any any number of, like, current state of the art techniques. We wanna make it easier for people to use the right workflows, access the right data.
Terry Rodriguez [00:40:28]:
And so, for now, we're using, image retrieval and image generation to design these datasets for for, the user's purpose. And, to the extent that we could, it would be it would be great to do more with automation. It'd be great to do more with procedurally generated data where we could, create a 3 d asset and, put it in different poses and things like that. So, we're we're really exploring, like, all of the techniques that are, in in the best practice and trying to, codify that and trying to find ways to just make it a bit easier to, to deal with, deal with the agent at the level of, like, I I I have a request or, you know, to to make, like, executive decisions about it and let the, the pipelines, the tools that are used in the background, do the rest. So the the user experience is like chatting with the agent and then having their intent kind of parsed and mapped to a specific workflow that that would make sense to use that some you know, one of us would have designed for that application. And, yeah, just constantly be, pushing on the state of the art there. If multimodal models end up smaller and faster and more real time, we'll be there, to help people build those. And in general, you know, the the there's a a lot of excitement around being able to, go find that that one model that does everything you care about.
Terry Rodriguez [00:42:06]:
But in practice and engineering, I've never been in a place where they said, like, off the shelf model was good enough. We don't need it to be both faster and more accurate. So there's gonna be, like, all the tricks that you could possibly pull with data augmentation, with, training the model, testing it, you know, optimizing hyperparameters to some extent. But, deep learning is pretty robust too. Like, there's a there's a lot of consistency out of out of the models if you're if you're not looking for, like, state of the art performance on differentiating a 1,000 different object categories. In practice, maybe there's, you know, a much smaller set of concepts that you need to identify and work with. And the other thing that we think a lot about too is, like, just generally trying to invest in techniques that are going to, help with that cold start or that few shot problem. Like, how can we how can we help people who are data limited get started and then start the data flywheel that will feed into model improvements.
Terry Rodriguez [00:43:15]:
So that should surely involve, like, a better UI for humans to help, help cheaply say, like, this is a good example, that's a bad example so that we can take take advantage of, their judgment. But we're not we're not, quite there yet. I think the next step for us is getting to the point where we're, helping to deploy the models. And I'm thinking of, my experience with Foreman, robotics company before I joined Tubi. You know, they had an interesting idea, but but I guess, like, I wanted to go a different way with, where they were going with the product. You know, they they had an agent that could be deployed on your robot, and it would monitor ROS messages that are being, sent on the devices. And a big part of what they were trying to do is just get that data off off of the, the these RAS topics and back them up to the cloud. But for me, I was also interested in deploying models on the agent, like helping helping roboticists run those types of models, helping them help and simplify like their perception problems.
Terry Rodriguez [00:44:26]:
And so I think a lot about that deployment model of being able to have like a lightweight binary that you can install. It's configured to point to a source. Maybe it's like, you know, your your device where the camera's at. Maybe it's a a ROS topic or maybe it's a directory of images. You know, maybe it's your data lake. But how can you deploy an agent so that it's just configured with, all the all the information it needs to ingest from the input and then produce that stream of detections or or classification, what whatever it is, like the structured information from from the, image sources, making that, really easy, so that someone has, like, a very simple high level CLI for I need a model that does this. Now I need to deploy it. Here's the basic information I need about where where the data is to consume and where to send it.
Terry Rodriguez [00:45:21]:
Maybe we're helping people back up data to the cloud so that we can use the cloud resources to update models. We can run more powerful models in the cloud so that we have, some way of measuring what's happening down on the edge with these, you know, smaller, perhaps less accurate models.
Michael Berk [00:45:41]:
Got it. So I have, like, a trillion questions. Let me ask maybe 2, maybe 1, maybe 7. We'll see what happens. First question. This seems like a really, really interesting use case for drift detection. And for instance, with that, crash report, let's say, in a factory floor that says, hey. You hit a person.
Michael Berk [00:46:05]:
You you hit a box. That's gonna be in plain text. And if you feed that into your system, it can then assess what new training data needs to be generated, retrain on the fly, and then redeploy. Have you guys thought about that angle?
Terry Rodriguez [00:46:19]:
Yeah. We wanna we wanna make, the the switching cost of updating or putting in a new model, like, really low. And I was thinking about this a lot when we are working on this robot because we were using, an OKD, camera smart camera by Luxonis, and it has this, vectorized compute, on it, a VPU. It's made by Intel. And so they they you're able to flash, like, MobileNet V2 kinda like, you know, maybe 5 megabyte sized, model files on there, like, 10 at a time. So you can build, like, some pretty, sophisticated perception pipelines just just with where that's at. But, depending on what context the robot's in, you might not be as interested in, detecting and tracking every person in the field of view, but maybe there's some other object of interest that you wanna be monitoring. And so being able to, make the training of these models, really fast, really cheap, and, make it possible so this agent could, depending on the context, pull those models down.
Terry Rodriguez [00:47:28]:
And I think even as we're thinking more about, like, open set, open vocabulary, detections that the latest generation of models are capable of, Like, there's gonna be that speed accuracy trade off, and there's gonna be the need to distribute the work, onto the edge, if possible. So, you know, in this robot that we were working on, you might have, like, a a powerful, Xavier or something like this with the with the GPU, and it can it's capable of, like, you know, a lot of a lot of, computation. But, still, you have, like, 8 cameras on there, technically, like, 12 cameras. They're like stereo cameras with the center central camera. So you would have, all these camera feeds, and you can't really deal with that without pushing more of the models closer to the source of capture. And so, there's gonna be a need for these small efficient lightweight models to be, deployed there and probably a need for switching depending on the context. And you could imagine, like, the bigger, LLM kind of controller living on that Xavier. And this would be like a way that probably people are trying to design these systems currently with with what's available.
Terry Rodriguez [00:48:46]:
So we wanna we wanna, really explore, like, data efficient, parameter efficient, and and really fast kind of model training and updates because we think that the, that's that's a good strategy for dealing with some of the context switching that you you would expect when you're putting a robot in the wild.
Michael Berk [00:49:09]:
Cool. Second question. It seems like you guys are still in sort of this exploratory phase where you're figuring out best practices, trying to see where your market niche could be. Ben and Terry, a question for both of you. It seems like with this text input to inform data generation, to inform retraining, to inform new model deployment, if that's generally the pipeline you guys are building. I'd be curious what area or what industry you would target first. And I was thinking about it, the criteria you can sort of think of as, alright, it needs text feedback. Mhmm.
Michael Berk [00:49:49]:
It probably needs to be low stakes, and it probably needs to be high volume. So an example of not that would be military applications. Like, oops. We just blew up a family. That happens once every month. That's probably not the best industry to start out at. You want it to be a bit more robust. Likewise, with factories, maybe that's a little bit too high stakes.
Michael Berk [00:50:08]:
You probably have volume. But if you knock over an entire cart of semiconductors, that's probably not ideal, and that would cost a lot of money. So what is the ideal entry point for this tool in terms of an industry?
Terry Rodriguez [00:50:24]:
Well, before I answer that, I'll even say that there's another part of the pipeline too. Like, with multimodal models, it's, a lot of them are getting specialized and adapted using pipelines with expert models. And so you could imagine it's text to data to model to then a pipeline for more data for, like, multimodal model or something like this. So so you can there's there's, like, so many ways that the data and the models are kind of, like, feeding into each other. But, I totally agree. Like, some application areas are not where we wanna be. Like, you know, I would not recommend to somebody to use a generated dataset to try to make, like, the very best person detector or something like this. Like, somebody who's, really working with, very expensive, large, ground truth labeled datasets, you know, less likely to lead to a robot that ran into something that didn't look like a person.
Terry Rodriguez [00:51:25]:
Things that are lower stakes would probably be like these pipelines that feed into other models. So that's an area that I'm interested in as as we're seeing multimodal models kind of, mature and become more efficient. But, I think that an area that, I could imagine too is, like, wrangling data in a data lake. Like, maybe you have, millions of images in s 3 somewhere, and you really just need to find all the instances that have blank in it. And so the cost of, specializing one of these models and the cost of running it on on those images is probably a better trade off to make than throwing, like, bigger generalist model at the task. I would I would rather make a, custom ladder detector to throw at my, corpus of images and just map, that model to a a column of image file paths than to feed each of those rate limited, of course, to GPT 4 v. So there's, like, different areas in the cloud or in in model creation. There's probably some areas where, like, you know, you can make a really good detector.
Terry Rodriguez [00:52:41]:
And so if you if you have a situation where you just need to detect a handful of objects, say, like, aerial photography or something like this, these these techniques can be used to to help with that. But then there's other vision tasks where the, where the the, I guess, the information that you need to identify, it's subtle. It's, like, maybe, like, you're thinking of something like, finding finding defects in parts at a factory or something. It might just be like a subtle scratch. We don't yet have a a pipeline or a method for producing models that can help flag, like, those kinds of, like, fine grain defects. Maybe when we get to, like, generating a 3 d asset or something like this, then we'll be closer to being able to, like, synthesize that kind of data too. But, yeah, I would I would say that this is probably most beneficial for people who are earlier in their process of creating models, model pipelines. And at some point, you you could imagine an organization gets to the point where having a team and and really getting into investing in that pipeline is, is, valuable for them.
Terry Rodriguez [00:54:01]:
But for a lot of organizations, it may be something kinda simpler, like the kind of problems where you could pay a consultant for a quick project, but it's not like, you know, a core feature or something that you're highly invested in, like recommender systems. Right? Like, that's gonna be a difficult problem to, get to the point where you could just fully automate
Ben Wilson [00:54:23]:
it. Yeah. Could not agree more. Yeah. As far as industries, I can't think of a specific industry where it'd be like, oh, you could only like, this would only succeed at first in this sort of industry. This is this seems so broad and applicable to so many different industries and jobs that I've interacted with. To give context on that. Let's say your initial idea that you came up with that you're mentioning about a physical project, you're like, we're gonna use an image classifier to detect when this plant needs water.
Ben Wilson [00:55:06]:
Something like that does not require human expertise per se, like, in an ideal world at a farm, like, large scale agriculture, you you shouldn't have a person going around looking at stuff like that. What what's the solution right now is to deploy cameras on crops. And then some humans still has to go and look at those and say, I gotta go out to to plot 43 b and check on those plant on that corn crop. But when they get over there, it's not like they're doing a lot on a modern industrial scale farm. They're going up there, and they're visually classifying what the problem is and then activating some automation to do something for that. Like, oh, time to turn the sprinklers on or time to turn the sprinklers on less because it's too much water or, you know, some sort of soil sample that needs to be taken and analyzed for potassium content. Like, what is what's going on? Or there's are there bugs or something eating the crop? All of that stuff, I don't think humans enjoy doing, at industrial scale. So an application where you can monitor something like that autonomously at the edge, but then adapt to changing conditions.
Ben Wilson [00:56:29]:
Like, oh, we have this massive storm system that's coming through. Do I need to swap out to determine what the effect of this is with a slightly different model that's detecting damage to crop rather than, you know, moisture to crop. Something like that resonates with me a lot. And then industries like advertising and marketing, I think stuff like this is incredibly useful where you can rapidly update to latent factors that are affecting a model that you're not aware of, or you just you haven't been able to collect the data because the data didn't exist until yesterday, and then a rapid retraining, redeployment, and then reuse of generated assets, I think a simplified pipeline for that that adapts to a fast changing environment is would be a very big game changer. I mean, I hope you guys solve it. Like, it like, really solve this and make it so that you all of a sudden are in this position where you're like, man, we gotta hire a 100 people. Like, we can't we can't deal with this.
Terry Rodriguez [00:57:35]:
No. That that I I hope so. I guess, like, to to add to that too. Right? Like, a lot of problems, you know, don't need machine learning. And, you know, you have to understand if if, the relative cost is is, low enough to justify that type of solution. And so while we're, you know, over here, geeking on the the drought stress detection models and things like that. Like, to what extent can you can you change the irrigation? Can you or is does the plant need it? You know, nature does pretty well with, like, you know, the the environmental conditions the plants have evolved in for all this time. So it's like, where is it valuable? And and I think, I think a lot about, like, the value would be especially for people who want to make a new product, and they want to be able to, use AI cheaply.
Terry Rodriguez [00:58:37]:
But they don't have, all the expertise that they need to, like, get get, the the data wrangling, the the pipelines, the monitoring, the, you know, model updates, that whole kind of, you know, the the whole the whole ML life cycle. So we we would like to be able to, just keep kind of expanding on what we can do with, image and text data and and building pipelines that help people, get quicker to an application. There's, like, you know, finitely many reference applications, reference patterns that people practically care about now. So how much can we just, accelerate their access, to those kinds of pipelines? I used to think earlier on in my career, like, everybody's gonna wanna be a machine learning engineer. But, you know, after working in a few different orgs, I understand, like, a lot of people are are never gonna, like, get into, like, you know, data mining and and trying to find these patterns and things like that. So, like, what tools are possible to, accelerate that for orgs so that they can test an idea, get to market, find out if it's worth investing in, like, the, the team and and everything else. Because a lot of times, it's a little bit backwards now. An organization says, I wanna do AI.
Terry Rodriguez [00:59:59]:
I need to hire an AI expert. And then they've they've went so far into this investment, and I've seen it, you know, in my own experience where it's like, do this company really need a machine learning specialist, or do they need, like, a consultant to work for a couple of months? Yeah.
Ben Wilson [01:00:16]:
Yeah. And and something that I've seen that's really blown my mind with the ML field in general is that we're now interacting with an entire new generation of people that are interacting with AI. You know? And they don't come from the background where they've ever had to go through the process of generating a feature set, training a model, testing that model, going through 173 iterations of training until they got it kind of right, and then send it out for review by experts, and then go back to the drawing board, do it, you know, train it up 30 more times. That whole life cycle, that whole process of what it is to deploy production amount and then write a full code best based around that. Even if you're experienced and you're solving a a sizable enough problem at a company, that's months of work to do get something that's a major project out there. Like, do you need a REST API around this? Okay. We have to think about deployment. What's the scalability of this? How many requests are coming in? Oh, 6000 a minute.
Ben Wilson [01:01:30]:
Okay. We need Kubernetes. Now we need to think about that aspect of this. But now people with interacting with somebody else's infrastructure, you know, you can interact with OpenAI SDK Yeah. And ask it whatever you want it. You can prompt it with something complicated to control how it is, you know, doing it. Or if you wanna go with something more specialized and you're you're using Hugging Face transformers SDK and you're taking a, you know, a foundation model, you're fine tuning it, or using something like PES to do, you know, updated weights on top of that. You're not doing that life cycle of ML, like, getting that to the point where it's ready for production.
Ben Wilson [01:02:15]:
So what's amazing to me is that people are expecting behavior with AI interactions that are orders of magnitude faster. Like, nobody cares. They're like, yeah. I just wanna test out 400 different prompts and see which one performs the best. Those 400 prompts, they can get an answer to which one of these solves their problem in 3 days. And it just blows my mind, this revolution that's happening right now. And I think that's kinda that distills down, like, what you guys are trying to work on is how do we get that same sort of behavior and speed and rigor, but applied to some of these more traditional ML things.
Terry Rodriguez [01:03:00]:
Yeah. I would say, you know, a lot of lot of, admiration for what OpenAI has been able to do with simplifying access to, like, very high quality models. You know, you can go on their store and, like, whip up something that's using, like, rag in the background and, you know, test an idea, like, faster than ever. Right? Like, you used to used to, as you said, have all this overhead just to get your model into somebody else's hands to try it out. And now, you know, probably in 30 minutes, you could test a new idea through through something like their store. And if that's good enough, justify, like, okay. Now I'm gonna write the code base that uses their API. I'm not yet at the scale where I'm thinking about owning the models and the infrastructure.
Terry Rodriguez [01:03:46]:
But, eventually, the more invested into the problem you get, the more you're gonna wanna, control those costs. You're not gonna wanna cut us a $1,000,000 check to OpenAI when you have enough users. And so, you know, that's where I think the the, pain points are now. Right? It's like I wanna have OpenAI level LLM inference, but I I wanna self host or do something that's more cost effective. Like, how how could we help people to, to adapt the models to their use case to help them to self host or or do something on prem or things that, are more like real time and at the edge and these other areas where you won't find a solution from a product like OpenAI yet.
Michael Berk [01:04:31]:
Right. So I I think this might be the best breaking point, for the next, like, 6 hours. So I'm gonna take advantage. And, I know we're at time and we have busy lives, so we should return to those. But, would love to keep talking. So, in summary, we talked a little bit about to be at the beginning and recommendation engines. Specifically, they're often subject matter driven and simpler than you might think. Throwing everything into a deep learning model, unfortunately, once again, doesn't solve it.
Michael Berk [01:05:04]:
And, evaluating these models with a b tests, is, again, really challenging because of the user interactions and, the interactions between even concurrent experiments. So it's important to think about what exactly your north star metric is showing and what exactly you're testing. Image generation almost always leverages transfer learning, and there's often a big backbone created by one of the big tech companies, and you wanna sort of fine tune or leverage that to your use case. And that's exactly what Terry is doing at Remix. They're sort of trying to remove a human in the loop and instead use smart LLMs and, generative models to inform how smaller models, specifically on edge devices, should be retrained. And they're sort of still in the exploratory phase. But, yeah, I'm I'm very excited to see how you guys grow over the next few years. And, again, if you solve this, as Ben mentioned, 100 of people will be working for you in no time.
Michael Berk [01:06:03]:
So, Terry, if people wanna learn more about you, your work, or Remix, where should they go?
Terry Rodriguez [01:06:08]:
Well, check out Remix, with a y dot ai, and, check us out. I'm Terry Rodriguez. Find me on LinkedIn, or you could see, our Twitter smells like ML. So, yeah, we're we're around.
Michael Berk [01:06:23]:
Awesome. Cool. Well, this was a lot of fun. Until next time, it's been Michael Burke and my cohost Ben Wilson. And have a good day, everyone.
Ben Wilson [01:06:32]:
We'll see you next time.
Adaptive Industry ML: Challenges, Automation, and Model Applications - ML 149
0:00
Playback Speed: