Evaluating and Building AI Systems - ML 166
Michael Berk dives deep into the adventures of AI and machine learning with our special guest, Richmond Alake, a staff developer advocate at MongoDB. Richmond's journey from web development to AI was driven by a quest for excitement and new challenges. In this episode, he shares how he transitioned into the AI field, his passion for using writing as a learning tool, and the importance of continuous learning in evolving tech landscapes.
Hosted by:
Michael Berk
Special Guests:
Richmond Alake
Show Notes
Michael Berk dives deep into the adventures of AI and machine learning with our special guest, Richmond Alake, a staff developer advocate at MongoDB. Richmond's journey from web development to AI was driven by a quest for excitement and new challenges. In this episode, he shares how he transitioned into the AI field, his passion for using writing as a learning tool, and the importance of continuous learning in evolving tech landscapes.
They explore the intricacies of building and evaluating Retrieval-Augmented Generation (RAG) systems, the benefits of MongoDB's versatile database functionalities, and the pressing challenges in machine learning data collection and evaluation. Richmond also gives us a peek into MongoDB's advanced solutions for AI application development and how strategic data chunking can impact efficiency.
Whether you're a budding AI enthusiast or an experienced developer looking to expand your horizons, this episode is packed with practical advice, career insights, and the latest trends in AI and machine learning. Stay tuned as we uncover how to navigate the complexity of RAG pipelines and the evolving landscape of generative AI. Let's get started!
Socials
Socials
Transcript
Michael Berk [00:00:05]:
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I'm not joined by my co host today. Unfortunately, he is out of office taking a much needed vacation. But today, we are joined by Richmond. He started his career as a software developer, specifically focusing on systems and full stack JavaScript. He then worked as a contractor at O'Reilly leading a computer vision course and then, contributed to NVIDIA as a, course instructor. Currently, he's a staff developer advocate for AI and machine learning at MongoDB. So, Richmond, we're gonna start off with a little spicy question.
Michael Berk [00:00:41]:
Do you need clickbait titles to get views on media?
Richmond Alake [00:00:46]:
Man, I have not written in Medium, like, religiously in at in at least a year and a half. Back in the day, no. Yet today, yes.
Michael Berk [00:00:58]:
Oh, man.
Richmond Alake [00:00:59]:
Okay. I'm sorry.
Michael Berk [00:01:01]:
That's too bad. So I I have a blog that I used to publish to weekly. It was breaking down academic, sort of articles for the layman so that they can actually try to implement and understand the concepts. And I remember I had some really grotesque titles of, like, 7 technical words in a row. And then I was like, top ten tips for being good at Python and then so many clicks. It's really an interesting phenomenon. How do you think about naming pieces of content?
Richmond Alake [00:01:30]:
Firstly, the top ten stuff. Listicles do great, right, on those platforms like, medium. So but I've never I don't do that much. So sorry. Can you just repeat the question again? I sort of went off track.
Michael Berk [00:01:47]:
Yeah. So for name and content, what's a good name? What's a bad name? Do you ever get guilty for using listicles?
Richmond Alake [00:01:56]:
I don't get guilty for using listicles because I don't write a lot of listicles. But what's a good name and a bad name for naming content? I normally do bad names. I'm a I'm in a bad name category as in I'm doing more content in on YouTube, and I don't understand how to name my videos. Sometimes I'm thinking like you just put as much technical words or make it as descriptive as possible. But then I see videos that do, like, several 100 of 1000 of views, and it's it's like a sentence. It's like cleaning my computer or something.
Michael Berk [00:02:28]:
Mhmm.
Richmond Alake [00:02:29]:
And they got millions of views. I'm like, I'm doing this this YouTube this medium game wrong. So, I mean, changes from platform to platform. So on medium, good good content, good title for your content, it's really something that will make people click. And what I do is any article that I find myself clicking on, I save that title in a notebook in a in a notebook and because I'm clicking on it. There must be a reason why that title is very good. So I try to so I have is, like, a list of different titles, and I try to, like, make my content title match that back when I used to write more frequently on Medium.
Michael Berk [00:03:10]:
Got it. Okay. Cool. So, do you mind if we take a step back? Can you clarify and sort of elaborate on your content creation journey? And I'm curious as someone who, like, wrote a medium blog, who then entered this podcast, who then became a host of the podcast, who then used that to join Databricks, content creation was very essential for my career path. I was wondering if you felt the same way or whether it hurt, helped. What are your thoughts?
Richmond Alake [00:03:39]:
For me, content creation has definitely helped. So, we probably have the similar similar background in where I started creating content just so I can reinforce my learning. So I started creating content back when I was in university, doing my master's in in what we call AI, and I wasn't really understanding the very hard stuff. So I was like, hey. Let me actually write this. It was gonna force me to go deeper into topics. And it it really helped build my, one, my knowledge, and 2, my reputation within the space that I'm in, which is the AI, the machine learning space. So back in the beginning, I did a lot of computer vision content, and deep learning content.
Richmond Alake [00:04:24]:
And that led me to doing, at least 2 engagement, which was one was a live training course with O'Reilly, and another was a computer vision course on with O'Reilly. And then that led me to writing for NVIDIA. And a bunch of gigs have just come from me just being present online. And in fact, me being in MongoDB is actually off my medium articles. It's not off any of my previous well, most of my previous career, technical career, gigs definitely played a role, but I was explicitly told, hey. We saw what you did on Medium. It's great. Come and do the same thing here at MongoDB.
Michael Berk [00:05:06]:
Interesting. Okay. So to that end, I've seen similar roles at Databricks where we sort of leverage personal social media accounts and the ability to write content and then sort of repackage it as Databricks advertising almost. And on the most technical side, it can be in the form of tutorials. On the least technical side, it can be in the form of this is a fancy new concept called Gen AI. Consider using it. What is your day to day look like?
Richmond Alake [00:05:34]:
Wow. That's a very that's a very interesting question. So my day to day, my primary role as a developer advocate is to educate developers and our customers on how they can use MongoDB for their AI applications. Now the way I educate is in different forms. So it could be tutorials like you mentioned. This could be tutorials on written tutorials like we do on Medium. And I also do, tutorials on YouTube, which could be, like, live streaming like we are doing now, where I go for a bunch of notebooks or a bunch of code. And I also have different forms of engagement in my day to day role.
Richmond Alake [00:06:13]:
Sometimes, we create notebooks that live on GitHub repositories, tutorials on GitHub repositories. And there are times I'm talking to customers. So I've I've as you know, Gen AI is everywhere, so I've had extensive, conversations with customers over MongoDB building AI applications. And we're really exploring some edge use cases. And and at MongoDB, we're a database company, but we're supporting customers on every level, including AI knowledge and full leadership.
Michael Berk [00:06:43]:
Yeah. Okay. So lots of different types of content creation. And, yeah, to to the comment on MongoDB supporting a lot of use cases, yeah, at Databricks, we only have good things to say from what I've heard. There's actually a customer that I'm working with that uses Databricks for ETL, then MongoDB as the serving layer. So, yeah, if we're still in sort of the microservices era where lots of things can plug and play, you can switch in and out components. Got it. So in your personal opinion, what is the best type of content creation for personal learning?
Richmond Alake [00:07:24]:
The I was reading this book called, Hidden Potential by Adam by Adam Grant. And he was arguing in the book where he was saying that most people don't generally generally learn by writing. Right? It's just, an assumed, it's just an assumption, or maybe I'm paraphrasing very, very wrong. But I find the best way or the best form of content for learning is just typing. It's it's the closest to writing. Right? It's just type. I know people can learn by just doing a podcast like this, and they pick up good stuff, and they could go deep dive later on in their own time. But for me, just writing because when you when I was writing medium articles, when I'm writing articles and tutorials about MongoDB, I I cannot say a technique or a word and not explain it to yourself I'm writing for, because then I'm just making things a bit ambiguous.
Richmond Alake [00:08:24]:
Right? So I have to explain every technique, and that allows me to double click into things I don't know and fills my gap, my knowledge gap. So, that that is the best form. Just writing the it writes in tutorials, typing up. I find it more better than doing videos on, YouTube. If you watch some of my livestream, you might just see me just sweating profusely and just just, I remember one of my last livestream, I had a I had a coughing fit. So, that doesn't happen with tutorials, does it? So yeah.
Michael Berk [00:08:56]:
Usually. Yeah. So I think you're referring to the the concept of the Feynman technique where you go and, learn through teaching. And if you can't explain something simply, you don't know it well enough. And so by writing it out, you sort of list out all of the core components. And then as you're trying to describe each component, if you can't do that, you go and learn it. And that's sort of like a test to see if you truly know something. Do you find that writing implementations via code does the same thing, or should it be via words?
Richmond Alake [00:09:31]:
What do you mean by write and implementation by code? Do you mean pseudo code?
Michael Berk [00:09:35]:
No. I mean, let's say we are hooking up a vector database in MongoDB, and we want to build some sort of RAG application based on that. Is that a better method of learning, or would writing a blog be sufficient?
Richmond Alake [00:09:50]:
I think writing I think first, if you're gonna build a RAG pipeline, using MongoDB, we have loads of tutorials that will help you just get something started up in in, like, 5 minutes. Right? Literally, we have a Colab notebook. Just run all the cells, put in all your MongoDB configuration, and you have a RAC pipeline. Right? So that that will help you get started. Now if you want to learn because a RAC pipeline has different components. There is a there is a chunking. There is the embedding. There are different aspects of a RAG pipeline that you can dive into, and we have the resources that can help you to learn that.
Richmond Alake [00:10:28]:
So if you wanna get started up and just do RAG, we have notebooks. We have a GEO repository, which we'll link in the show notes. But if you really want to learn, then I would say every components of the RAG pipeline, you wanna probably double click into and, explore that topic. You can spend several months in in areas of different rack pipeline. And now we're moving into a AgenTek systems, which have their own, complexity in terms of system architecture and other areas that you probably need to fill some gaps. Because Ingentec systems are making the old the old are becoming new with Ingentec systems. Right? Unit testing are now a thing in AI. But yeah.
Richmond Alake [00:11:12]:
So, it's something that that, essentially, if you just wanna build something and get started, we have we have to toys for that. Or if you would like to just have that full leadership, have very in-depth knowledge, technical knowledge, then I would say deep diving into what you're actually implementing helps. Free content.
Michael Berk [00:11:35]:
Yeah. Yeah. I think I agree. The my approach is typically if you have a quick start notebook, that can get you sort of 50% of the way there. You know how to implement. You know how to generally make something run and not fail. But if you want to become sorta a next level developer, it's really helpful to go and actually understand what's going on. So not to get into the vector database stuff, but what is the embedding model? What does it do? How is search performed? Is there hybrid? Is there filter? All these types of things are actually really, really relevant in your use case.
Michael Berk [00:12:06]:
That said, people don't have enough time to become experts in everything. So another question or sorry. Go ahead.
Richmond Alake [00:12:13]:
I was gonna say and you don't need to be an expert in everything, is there any Exactly. I I was talking to a few folks, and I was like, there's no such thing as an AI expert. The only thing that anyone could be is be an expert learner, and that's it. Oh, I like that. That's what I'm trying to be. So I'm not a expert in AI, but if there's something I don't know, I know how to learn it.
Michael Berk [00:12:37]:
Right. Yeah. How do you define that cutoff, though? When should you be like, alright. This is good enough. I quote, unquote know it.
Richmond Alake [00:12:49]:
For me, like, doing this thing is fun. Doing AI is fun for me. So I I I was doing a full time job and writing content on medium after my 9 to 5. Now I get to do it as or I get to write content and do different variety of content of my full time job. So when I find when I dive into topic and I explore it, I don't typically have a carve because just to give you an example, I went into uni back in 2018 focused on computer vision and deep learning. I was writing those stuff on medium back in 2018. In 2024, I'm going to release a computer vision course. So many years later.
Richmond Alake [00:13:31]:
For me, there is no carve. It's just, hey. The the stuff I learned in 2018, I've forgotten. So it just it feels good to just relearn stuff and just go back to the roots of deep learning. Look at all the different architectures and of of of of newer networks and how they've progressed over the times and discover things that are new. Right? That's fun for me. So if we're talking about a cutoff, for me, there isn't. It's when I I for an article, there is because you have a aim of an article.
Richmond Alake [00:14:03]:
You have a a goal on what you wanna teach. But in terms of a field, as in it's it's exciting just being being, involved in this.
Michael Berk [00:14:13]:
Yeah. I I definitely feel that. In in most Databricks engagements that I'm put on, I have no idea what I'm doing, and I'm gonna I'm rarely, like, that upfront about it. But when you say, hey. I have this problem. Solve it. I don't know the solution, but I know that I can get there. And so that typically drives what I learn.
Michael Berk [00:14:31]:
And if I can build it to meet the success criteria, then I say I know it generally, and I generally stop learning about it. But as you said, there's so much cool stuff. Like, you could spend many lifetimes deep diving into all this.
Richmond Alake [00:14:45]:
Yeah. Is it until I get to the to the level of Andre Capaldi, that's where the call is. Right?
Michael Berk [00:14:52]:
Yeah. And I've even he has a bunch of blind spots, I'm sure. Like, it's just too much for one person to know. Okay. So I'm revisiting your LinkedIn and going through the the the insane path of graduate software developer, full stack JavaScript, contract web, contract web, founder, course instructor, computer vision, program lead, it goes on. So why so many disparate jobs? And I understand that that's sort of an artifact of being a contractor, but, did they did all these different experiences help you? Did you enjoy that, and did you intentionally set out to have this path?
Richmond Alake [00:15:35]:
Okay. Let me break. Did I intentionally set out to have this path? Not a lot of people know this, but me coming into the whole computing world was a mistake. In the UK, when you go into college, which is this in between limbo state between secondary school and university, you do it for 2 years. When you go into college, you have to you have to choose 4 subjects. I chose 3, and I chose, chemistry, physics, and maths. But you have to choose 4. So I came into the induction day with just 3 topics, and they weren't gonna let me enroll because you have to choose 4.
Richmond Alake [00:16:15]:
And I just said, hey. Choose 4 me. And, the person putting me into the college chose, computer. And that's how I came to this path. And the only reason why he chose computer was because he was the computing teacher. And, here I am today over talking about AI all because of, of his women charts. But really, my career trajectory ever ever since then has been very it's been what makes me excited. So I used to be a web developer, full stack web developer.
Richmond Alake [00:16:50]:
It's it's something that, is one thing that made me fall in love with, the MongoDB technology. I see focus on the main stack. And, I did that for a number of years, then I got bored. You would notice that I moved from places to places because I guess when you're young, you can get very bored easily and you wanna be excited, you wanna keep learning. But I was a web developer, and I feel like I reached a ceiling. And there wasn't much I could learn in terms of, in terms of breath, but I could definitely have gone deep. I felt like I was deep enough. Because where I was as a contractor, for you to become a contractor, back then anyway, you had to be you had to be a what you call, like, a specialist, someone that could come in and solve a very particular problem.
Richmond Alake [00:17:41]:
So you will notice those contract gigs are for about 3 months or 6 months because what we're doing in those contract gigs is the the team are experts that come in to build a certain system or certain front end application and deploy, and you're done. That's it. So you have to know what you're doing, be very good at solving problems, and finding solutions. So that was, like, the stage I was at when I was, a web developer. And I got bored, and I decided to go to the next challenge, which was AI. And I bought this massive textbook, which which, scared me to university to go through a mile. I was like, wow. AI is so difficult.
Richmond Alake [00:18:20]:
There is no way I'm gonna learn this by myself because I tried to do the self the self learning thing. And I went to university to get a master's, but, today, people are becoming AI experts of YouTube videos, apparently. So I got a bit the bit the wrong end of the deal. But, that that explains, like, the different paths. Right? And, the instructor or me writing for NVIDIA or writing in other places, Really, that's just stuff I do outside my 9 to 5 because, again, this is fun for me. I like learning, and I like sharing what I'm learning about. So, I wrote on Medium. I wrote I write in a lot of places.
Richmond Alake [00:18:59]:
I write for net I wrote for Neptune AI, which is more MO ops focused. And I wrote for NVIDIA, which was more data science focused. And I wrote, I taught on O'Reilly, which was computer vision focused. So just dabbing into the old bits and just learning.
Michael Berk [00:19:15]:
Got it. Why do you write? Is it solely for your personal gain for learning, or do you like giving back?
Richmond Alake [00:19:26]:
It is selfish.
Michael Berk [00:19:28]:
So That was the question. Yes.
Richmond Alake [00:19:31]:
Yeah. Yeah. It's it's just so I can learn. Because, again, I'm not I'm not the 8 year old kid or that started programming. I'm not the kid that started programming when I was 8 years of 13. I came into this field by accident, so I had a lot of catching up to do. And, everything just didn't click initially for me, so I just had to I had to spend double the time and double the effort that I think it took over people to. So it's just a practice for me, and I was just happy.
Michael Berk [00:20:00]:
Yeah. No. I agree. I'm the same way. I'd I'd if people get something, great, but it's more for my my personal benefit. Do you think that sort of roundabout path helped you?
Richmond Alake [00:20:14]:
Helped me in law.
Michael Berk [00:20:16]:
That's a good question. So you could have studied computer science at plus technical writing. I'm sure there's degrees somewhere out there that combine the 2. You could have also gone directly into this AI ML route, and you could have theoretically not taken all these disparate roles. But you got a lot of exposure to many different types of work, many different problem sets. Do you think that exposure helped you in your career and what you're doing right now, or would you have rather been sort of a a super focused specialist?
Richmond Alake [00:20:50]:
Well, if I if I wanted to become a specialist, then I guess I would have stayed a dev a web developer and had that depth. Right? So the reason why I'm here now is because I am not focused. I'm just moving around. Right? That's why I'm I'm I'm where I am today. I always tell people, like, I would like to say that everywhere I am now, I planned it all. It is all it's all part of my grand plan because, my close friends are like, wow, Richmond, you work at MongoDB now. And I I used to be a big fan of the technology. Such a big fan that I bought the stocks when the IPO'd.
Richmond Alake [00:21:26]:
And I used to, like it it was just free. It was just free stocks. I was a university student, so I didn't have much. It was a 100 pound worth. So to some to my friends, they're like, yeah. You definitely plan this trajectory. I'm like, no. I'm just going with the flow.
Richmond Alake [00:21:40]:
Right? And one thing at MongoDB is you will find that most people in in MongoDB are very passionate about what they're doing, and they're very passionate about the technology. So there's a lot of content creators in MongoDB. There's a lot of hackers in MongoDB. People that are just working their way through their own products or working through their way through side project. In my in my interview at MongoDB, we were showing each of our side projects, and that was my well, that was one of the code interviews just talking through talking through the codes and the lines and and someone's, hobby projects, and he would just gauge in my my expertise. And it's what you find I don't think it's unique to MongoDB. I think it might be a a byproduct of being in AI. It's just fun to just build stuff.
Richmond Alake [00:22:30]:
Right?
Michael Berk [00:22:31]:
Very fun. Yeah. Okay. Interesting. I definitely wanna talk about that. One more question on career trajectory, though. You said sort of boredom and curiosity were the driving factors. How do you know you're bored?
Richmond Alake [00:22:48]:
Good question. How do I know I'm bored?
Michael Berk [00:22:56]:
And I can elaborate a little bit. Like Yeah. Yesterday, I was bored. I was in a call. They were talking about sort of in an inefficient way, they were talking about not the interesting things that I had done a 1000000 times, and so I checked my phone. That is boredom. Should I leave my current career or my current position just because of that? Maybe, maybe not. So, like, how many days in a row of being like, damn, this kinda sucks, or, like, what is sort of the the inflection point when you're like, I should probably switch? What does that feel like to you?
Richmond Alake [00:23:31]:
So if you look at my career path, I've switched industry. So from web to AI, I'm not afraid to go back to uni. Then I've also switched different companies and and and job types. So when when do I know that I need to switch? I guess it's just instincts. Right? When you're no longer excited, maybe it's like a relationship. You know what I mean? You just stay in the bad relationship. You're like, this is going nowhere for me. As in people stay in bad relationships for several years before they leave.
Richmond Alake [00:24:03]:
Don't they? Right? So, for me, I'm like I I kinda like detect it very early on. Like, hey. I am not excited about this. I'm not excited about being in this space. And it's not just one meeting now. That would be crazy because I'll be in very in a lot of different companies. But, it it it's it has to be for a period of time where I find myself not being excited. What that looks like, I don't know.
Richmond Alake [00:24:33]:
Maybe I stop writing about this particular topic or I stop talking about it or I stop feeling passionate about it. So it just signals to me like, hey. You're no longer excited about this. This thing that you're doing is becoming it's becoming a job. Go look for something you're excited about.
Michael Berk [00:24:53]:
Yeah. I I think what you're saying resonates. I the way that I look at it is I'm a very hope driven person. If I'm not hopeful, if I'm not excited, a lot of life becomes less fun. There's less appeal. And so there's sort of it's a soft line, obviously. There's no, like, I have x points of hope. Now I need to switch.
Michael Berk [00:25:16]:
It's a soft line that's intuitive, but hope and excitement are definitely the way that I gauge if I'm in the right place. It's it's just also so productive to be excited and hopeful because you're you work well. You're you have better ideas. You're more efficient. You wanna get up in the morning. But if you're not having that excitement and hope, maybe it's
Richmond Alake [00:25:35]:
time to switch. One thing I I I mentioned earlier on is, I was brought into MongoDB to focus on tutorials and written type content. But, really, when you come into MongoDBs, wherever you make of it, right, now I'm finding myself talking to customers, giving an hour 30 minute sessions on agentic systems, and that's exciting. That is very exciting because, 1, there is a lot I don't know. But, also, the stuff I do know, I get to share it with people. So it's and if you're an AI, trust me, you're gonna be excited for a very long time because, the tunnel goes deep, doesn't it?
Michael Berk [00:26:18]:
Oh my god. Yeah. Yeah. And and you can go deep in a variety of different directions. It can go deep on the high level use case. It can go deep down to the computing layer over to the side to the algorithms to distributing those algorithms. So there's a lot to learn
Richmond Alake [00:26:32]:
for sure. Yeah. And even even in in terms of MongoDB as a as a platform, there is so much. Right? So MongoDB, we we act as a vector database within RAG systems and agentic system, but it's also an operational data data layer. Right? You could store any type of data on it. But we also have a a streaming product feature. We have, ability for you to run MongoDB instances on edge devices. So there's so many areas.
Richmond Alake [00:27:00]:
We have MongoDB charts, right, for time series data. So even within MongoDB, I'm like, there is so much excitement and so many areas that I have not yet gone into because vector such a rag is such a deep tunnel that you can go into specifically. But, but, yeah, it's, it's it's, it's an exciting place to be.
Michael Berk [00:27:23]:
Yeah. So let's let's dig dig a little deeper. You hinted at a couple of things. One is these sort of data cloud providers, Databricks, Snowflake, MongoDB, all the others, they have a subset of the microservices stack that was that's required to build a good data platform. What do you see MongoDB investing in from that perspective?
Richmond Alake [00:27:47]:
So definitely definitely in terms of product features of MongoDB, we're investing very, very, we're investing across the board. Right? So first thing is we're solving key problems for our customers, and we have different product features to do that. But I see a lot of investment towards the future, especially AI application development. So not just and the thing about MongoDB is we're not just doing things at a technical level. We're doing things at a full leadership level and and other areas that you can, you can think of. So I'll give examples. So in terms of technical side, we're investing heavily on the vector slash capabilities of, of MongoDB. So helping people to do official retrieval within the RAG system or a GENTEK system, the the re efficient retrieval and, relevant retrieval is one of the key things that is is pretty much the the the bread and butter of vector search.
Richmond Alake [00:28:45]:
Having the functionality itself retrieve your appropriate document based on a query and chunk chunking in different, hold in different embedding sizes. But one thing we also deal about MongoDB is we recognize that MongoDB is not a model provider. We're not we're not, we're not line chain. We're not Lama index. We're not OpenAI. We're not we're not. But we can bring the expertise of these folks together. And we have a new program called the map program that I call it the Avengers of the AI because what you get if you come onto the MAP program is expertise from all these AI companies.
Richmond Alake [00:29:23]:
I'm talking the Lima Index, the Lamchain, a lot of companies that focus on evaluation. Some companies are focused on, providing these models, and we bring that expertise to our customers. So we're really meeting our customers and developers at varying level. And that's that's where I see MongoDB investing a lot in as well.
Michael Berk [00:29:44]:
Got it. So what I heard was vector databases are the bread and butter. Models are not the bread and butter. And then from there, you guys look to be scrappy and solve problems at multiple levels. Is that about right?
Richmond Alake [00:29:59]:
Well, not scrappy because this the the use case we're dealing with from again, MongoDB has existed for for for over a decade now. So we have the expertise to not be scrappy. Right? We have True. Best practice. We we they we know what we're doing in terms of building scalable performance application. And we bring that expertise to the table to our customers. So, and this is customers are developing enterprise applications. So that's what we're building.
Richmond Alake [00:30:28]:
And when I say we're not model providers, I mean, we know what we do best, and that has been a data layer. So we can't go and start fine tuning or building foundation models. Right? That would not be a good investment of our time. You're not gonna that wouldn't be a good investment of our time. But what we can do is identify our key partners within the space and bring their expertise to our customers.
Michael Berk [00:30:53]:
Exactly. Yeah. Just as we're alluding to earlier, you can't have an AI expert. And, likewise, an organization can't be like a, like, fill in the blank expert. They need to delegate where they're not, capable or where they just don't have staff. Like, for instance, I do professional services. We don't do staff augmentation. There are millions of consultants in the world, though, that can do that.
Michael Berk [00:31:17]:
So we typically leverage partners to do a lot of the customer work when applicable. So it's cool to see the the same sort of focus and specialization. Another question. So we've been talking about RAG and, vector databases. What do you think are sort of the unsolved problems in this space? What's the cutting edge? What's the frontier?
Richmond Alake [00:31:40]:
It was it was specifically focusing on the RAG architecture. Unsolved problems are chunking strategies. Right? How am I chunking my data? And what I mean by that is how am I, partitioning parts of my data or my text data or audio data into little bits that can be then embedded by an embedded model and stored in a vector database? How am I deciding where the cutoff point is? Right? And I don't think that's a massively solved problem, because they have they have very different, approaches to chunking. Another
Michael Berk [00:32:18]:
Wait. Sorry. Question on that. Can you clarify the pros and cons of big versus small cutoff sizes?
Richmond Alake [00:32:27]:
Well, the if we're talking about big big, big chunking approaches, chunking big corpus of text, One thing you can get is, your embedded model typically has a context window. So the model provider will probably truncate anything that overlaps the cont contact window. So you're gonna lose some information. So there's information loss. That's why you probably do smaller embedding sizes. But with small smaller, smaller it's not embedded sizes. Sorry. Smaller chunks.
Richmond Alake [00:32:56]:
But with smaller chunks, you might not have enough information, to enable to conduct efficient retrieval. So the way the retrieval works is you get a query, then you embed a query, then you do a vector search on the vector on your query embedding and the embeddings you have stored. Now if the embeddings you have stored in your vector database doesn't con contain enough information to have a rich semantic, rich semantic information within it, then the retrieval of the trunk itself and the allocated metadata is not gonna be great. So that's why it's not a solved problem. There were different approaches. There are different approaches. And personally, for my learning, because one, I understand the rack pipeline on a across all the components, but if I had to choose one that I had to go into, that I am going into, is gonna be chunking and, evaluation, LLM evaluation. So my knowledge in those need to improve to the point where I can because they're unsolved problems, but I can give people, best practices.
Michael Berk [00:34:02]:
Got it. Okay. And then one more question before you continue. What are your thoughts on LAMA index versus lang chain versus building your own versus something else?
Richmond Alake [00:34:11]:
That's a loaded question.
Michael Berk [00:34:13]:
Oh, I know. I'm well aware.
Richmond Alake [00:34:16]:
Where's my? So firstly, I don't have any favorites because the position I'm in is a developer advocate. So it's my job to to know the pro it's my job to know how to use all of these tools and communicate them. So if you see the articles or videos I've done, some of them cover landline index. Some of them cover langchain. Some of them actually don't use any of this abstraction frameworks to implement direct pipeline or whatnot. So, what are the best? Right? So for me, I find if you want control controllability, of every components of it, a RAG pipeline, I sometimes I might go into not using any of the any of the abstraction framework. And what I mean by that is there are times where I want to conduct some specific queries, right, that this abstraction frameworks might not have implemented yet. I I I probably wanna do that in native, MQL and not use any abstraction framework to to conduct any sort of filter on my data.
Richmond Alake [00:35:20]:
I'll do that directly using the done operator and aggregation pipeline that we get with the MongoDB query language, which is very easy to use. The the whole point of this abstraction frameworks or data frameworks like lambda index and line chain is they're they're very easy lift and they get you to where you need to be much quicker. And they're growing. They're maturing to the point where you're able to do the things that you might not be able to that you can't do previously in terms of in-depth query. You're able to do them now. And MongoDB works very closely with Llama Index and Longchain as well, the engineers over there. We have integrations with both of these frameworks. So if I was to say what are pros and cons is really on a use case by use case basis.
Richmond Alake [00:36:06]:
Yeah. As in, it will it will be hard for me to choose.
Michael Berk [00:36:11]:
Okay. Cool. Thanks. So you're you're going through the unsolved problems when I rudely interrupted you?
Richmond Alake [00:36:17]:
Yeah. I think the the the biggest ones for me anyway are, well, the chunking I've mentioned I mentioned about the chunking approach. Also, LLM evaluation. I wouldn't say it's it's an unsolved problem, but it's, it it's a problem that, it's it's a step that a lot of people tend to ignore, validating your system. Because I in machine learning, right, when you're actually training a machine learning model, you actually do a validation procedure, in parallel as you're training the model. So you actually watch your validation loss, which indicates to you how well your model is actually doing to any unseen data. But within this this sort of cycle of AI, a lot of people are going into into production without any sort of form of validation system in in place. They're just, like, hoping that if it works on vibes.
Richmond Alake [00:37:17]:
But and I'm I've been guilty of that as well because it's so exciting. Just wanna get something out there and solve a problem. Alright? And get get, start using this tool. But it actually hurts in a long term. You build a lot of technical depth. So l l m l m e vals or l m evaluation is a topic that a lot of people need to start thinking of very early. As in, I'm I'm gonna put out a bunch of content on this. We already have a lot on MongoDB written by some of my amazing team members.
Richmond Alake [00:37:48]:
But, in terms of again, you could always go deeper. Right? So I'm looking to go deeper into l and m evals, into, into chunking. And another unsolved stuff is, agentic system. Right? Last year, agentic system will be unpredictable. You couldn't control the execution flow of the of the of the system itself. But now you have tools like AutoGen and Landgraf that gives you more deeper level of, controllability and scalability of the execution graph. So, AgenTek system are still an unsolved problem and an area for people to explore creative approaches to enforce real reliabilities in this system. And we find that the old is becoming new, so there's a lot of unit testing.
Richmond Alake [00:38:38]:
There's a lot of old practice that are coming into, generative AI application development. That's what I'm seeing.
Michael Berk [00:38:46]:
Cool. How do you communicate the stochastic nature and and explain that it can be reliable to customers?
Richmond Alake [00:38:55]:
That it can't that it cannot. It can't.
Michael Berk [00:38:58]:
Can't. It is possible to build a production system with a fundamentally stochastic tool.
Richmond Alake [00:39:04]:
Well, the l l m LLM evaluation and the the procedure within that is the answer to that. Right? That you can't just say, hey. Trust me, bro. It's gonna work.
Michael Berk [00:39:14]:
You could try.
Richmond Alake [00:39:16]:
You could try. Right? Just like, trust me. But, when you start showing, hey. We have, we have this annotated data set, and we've run it through the model, and it's generate ten. It's it's generating content that has a a low amount of hallucination. It's got high relevance, and you're able to show that in in in a very objective manner, then you can start to build confidence. And that's why you see a lot of tools, emerging in this space. So we have Lang Smith.
Richmond Alake [00:39:44]:
We have Verizon AI. We have Patronus AI. And you have all of this observability and monitoring tools that are coming out to give people that confidence because it can. And not only do they show give you confidence, they show you where things can be improved as well. That's why I'm I'm making it like a, at least for the next, few months, deep dive in into some LLM eval tools and just the same way I've done with langchain, lama index, haystacks, and the whole data framework. I wanna explore the ecosystem around, monitoring, observability, and evaluation. But, yeah, the the answer to your question is very straightforward. Right? It's just use objective based metrics to show that that to show that your your system can be reliable.
Michael Berk [00:40:33]:
Yeah. I I think I agree that that makes sense. If you can show it's just like with any other model. If you can show that it works in a training environment, then you roll it out slowly to a production environment and then just monitor it for drift. It's the exact same thing as any other ML model. I think that And
Richmond Alake [00:40:50]:
that's not new. Sorry? That's that those concepts are not new. Right? Those those are ML those are m o up stuff. Right? They've been they've been there for years.
Michael Berk [00:40:59]:
Yeah. Exactly. And I feel like the the days where Ford recommends a Chevy truck or whatever that that meme was, those are sort of starting to, like, decrease. And especially if you have a good training, evaluation step, you can really guard against things that are real hardcore hallucinations. And maybe it says not the right thing or maybe it doesn't say the right piece of information, but it's not gonna tell you how to build a bomb or, like, I don't know, set your kitchen on fire, whatever whatever the concern is. Got it. So how are if you're gonna go, like, and build this evaluation framework, I know you're gonna be looking into this in the next few months, but can you give sort of a 60 second overview of how one would approach that?
Richmond Alake [00:41:44]:
So what what I'm the way I'm approaching, the topic of evaluation is to look at the stuff that originated from NLP. Right? NLP is not my strong point. Deepgram, like, computer vision was, as you could tell from my article, but they NLP because natural language processing has a bunch of metrics that they used to evaluate outputs from all this natural language model. So you have stuff like the blue school, the school, and you have stuff like, Meteor. So those eval evaluative those evaluation metrics are based on statistics, which is really which is really difficult when you're trying to evaluate reasoning models like, GBTs and and, and and all the all the clothes out there. It can be really difficult to use those objectives, objective metrics. So new metrics are emerging not emerging, but our hair. So there's a lot of, metrics that are able to quantify hallucination.
Richmond Alake [00:42:48]:
There are people that just solely use the LLM as a it's called LLM as a judge. So use the LLM as the evaluator of, of your system where you're able to give, the the output. You give a reference data and a bunch of prompt to make the element play the role of the of an evaluator and give you, like, a very single label. Hey. Does for example, does this output have the have the does this output have any information that is not in the in the reference text? And you can use that to detect a destination. So and you could just use LLMs for that. So you can use you don't have to use a massive one. You can use a small LLM for that as well.
Richmond Alake [00:43:31]:
So those are the new stuff like correctness is another metric size coming out, and, that that is how I would look. So if I also advise someone, I I would say start with the stuff with the basic stuff like the school or school. And those are not perfect, then you start to move into implementing LLM as a judge. And then looking at tools. Right? Looking at Lan Smith, looking at Arise AI that makes all of this such a easy lift. You'd have to build you'd have to build your your your own framework from scratch if you don't have the resources to.
Michael Berk [00:44:09]:
Right. Okay. Going one level deeper, a lot of these metrics that you're referring to, like Rouge, are supervised. And, basically, Rouge, you can think of as, recall. So with that, you need a truth set. Do you have any advice on defining that truth set, the true prediction that is actually correct? I've found that to be really challenging.
Richmond Alake [00:44:31]:
Yeah. Data data, what's the what's the classification of this problem like? Data collection, data sourcing is always been a problem within machine learning as in you have platforms. What was it called the mechanical turk? It was a platform where I think that was some of the yeah. Yeah. And that was used for, like, data sourcing. Right? Yeah. Yeah. So it's I don't know if that still exists.
Richmond Alake [00:44:59]:
You can correct me if I'm wrong. I'm not sure if it still exists, but it's it's an info. Data sourcing and data collection and data annotation is an ongoing problem. And in generative AI, application space is still like I said, the old is new. Right? We still have the same problem. But the way people are solving it now is synthetic data. Right? I did a I did a live stream, a couple days ago, and I just got some a bunch of synthetic data, using an NLM just to create a a a a an example of I gave a few examples, and I was able to create me a a a dataset with some few prompting. Now there are some I've seen some papers saying that there are issues with models that are being trained on synthetic data.
Richmond Alake [00:45:43]:
I'm not double clicked into those, but, but that is the way I would say one solve solving these problems of data scarcity. When you try to build this data set is try synthetic data, and you can just try good old manual label. Yeah. Good old manual labor is in there there is no race. There is no race to finish. So what I would say is maybe putting your team aside and maybe dedicating some time to actually creating some of these annotations yourself like people used to do back in the day. There was a there was a time when people didn't have chat gbt, and that's how you take it. Or I had to draw, like, the annotation boxes on, on several images, and that took hours.
Michael Berk [00:46:30]:
Yeah. Bounding boxes, man.
Richmond Alake [00:46:32]:
Yes. Are you just now you have these tools that do it automatically. It's crazy. Well, you just have to, like, spend several hours. Yeah.
Michael Berk [00:46:43]:
No. I I think you hit the nail on the head. It's sort of another unsolved problem. Typically, it's subject specific, and you wanna become a subject matter expert for your specific problem. And then either manual input manual input them or guide some sort of automated system to do that. But because it's so use case specific, if I ask one question in one context, there could be 50 good answers depending upon another context. And so, it's really important that you think critically about your business use case or your personal use case and ask the right questions and then define the right truth set.
Richmond Alake [00:47:20]:
Yeah. Is this what you tell your customers?
Michael Berk [00:47:24]:
Yeah. Because I don't wanna do that for them. Like, it's it's really hard to go in and become an expert in their business. And, typically, I can't. Like, it takes, I I don't know, several months to get up to speed on what are their true problems. So, for instance, I was recently working on a financial project. I just didn't have the expertise to know what was good and what was bad. So we would say, hey.
Michael Berk [00:47:49]:
We'll put out some responses. Give it a thumbs up, thumbs down. That'll be our binary label on whether it's good, and then we can start tuning accordingly.
Richmond Alake [00:47:57]:
One question I would ask is, do you do when you're when you're, building this RAG pipelines for your customers, right, do you do an evaluation step first in the evaluation and validation process, or do you just build out the RAG pipeline and go back? And the reason why I'm asking that is because one thing we this space is moving so fast and a lot of people wanna see POCs and stuff in production. And when I say a lot of people, I mean, the investors, everyone thinks we're in the AI bubble. So most people building this demo is in large enterprise company might be feeling the heat, and they might say, hey. I don't have any time for for evaluation. Just one on Vybe. So tell me, what what do you what are you seeing or what are you telling people?
Michael Berk [00:48:44]:
Alright. So on whether there's time for evaluation, if you're not making time for evaluation, you should be fired. It's like just not evaluating a model and putting it into production is not how any technical thing works. That said, as you mentioned, there's a lot of heat. And right now, there's a lot of pressure specifically from some executive that heard a podcast about chat gpt that we should be using Gen AI. All of their competitors are using it, so let's implement it as well. So I've been on a lot of those projects where they're just like, find a problem, solve it with ChatGPT or RAG or post source models. And that is a very different business use case.
Michael Berk [00:49:25]:
But what my job is is to ensure that we're building a reliable and stable solution that also meets the business need. So, it depends upon the customer, depends upon their angle, but evaluation is absolutely essential, unless you're doing very basic things.
Richmond Alake [00:49:41]:
Is would you say everyone is doing basic things as in, I know we we can defo there are different components in the right pipeline that you could go deep on, but where does the complexity come in, right, that you're seeing? Because everyone is still building this RAG enabled chat box. Right? Very few people are moving into a Genesys system, which is a bit, a little bit, maybe one level up. But where does the complexity come in when you say basic things?
Michael Berk [00:50:11]:
Yeah. So when I think basic, I think applying a custom LLM over a dataset that extracts a piece of information, summarizes it, that type of thing. It's it's one shot using a pretrained model, not even using RAG. The more complex things are typically RAG based with a variety of different sources, like disparate sources and also in different formats. So you need to start using SQL tools. You need to start using maybe custom API calls, and it's this complex agent chain that synthesizes a bunch of data and then returns it to the user. So that's when I when I say complex, that's typically what I'm referring to.
Richmond Alake [00:50:50]:
Okay. Okay.
Michael Berk [00:50:51]:
Yeah. What about you? What's the most complex thing you've seen?
Richmond Alake [00:50:54]:
Definitely around well, there's some cool stuff going on in the agentic space, man. The stuff I've seen makes me feel like I'm doing nothing. I'm like, dude, I need to go back and hack away. There there are some agentic systems that I've seen. I won't go too much into details, but it's not necessarily what the agentic system is doing, but it's the the components and how they're they're maybe doing tool selection. Right? You know, the agents have a bunch of different tools. How they're doing tool selection, tool storage as well. There are some some folks are not programmatically storing the tools.
Richmond Alake [00:51:30]:
And there's some cool stuff in how they're profiling different entities that the system interacts with is also cool as well.
Michael Berk [00:51:38]:
Come on. Give a little more detail. Just a little. They won't go to them.
Richmond Alake [00:51:43]:
That that is that is some detail that I've just given is in, the the tool storage and the tool selection process is is is very interesting because if you think about it, right, if you're building complex systems that are maintained to interact with a bunch of, with a bunch of systems, what you're gonna have is a very convoluted Python file, right, of just different functions and whatnot. So what's a good way to to store that and actually retrieve that? There's some there's some cool stuff I've seen some folks do. And that's that's the complexity and not the system itself or the the infrastructure and the way you're building out the system. It's that for me, that that that was that's some that's that's some cool stuff. But also in terms of the the variety of data types as well, man, that was an interesting use case, whereas, an application was doing pinpoint frame selection based on a prompt. So imagine a movie that this I have this I have this with my wife all the time where I'm say I'm trying to get her to remember a movie, but I can't remember the name of the movie. But I can remember a scene in the movie. Right.
Richmond Alake [00:52:56]:
So it wasn't a 2 way you could tell it, and then it brings up the particular scene. Oh, okay. If you if you think about it, for a system to be able to do that, you have to embed pretty much every frame of the scene of the movie for you to or at least maybe every, like, 10 frames or so. It's and you have to do different, chunk and retrieval steps. And it's it's yeah. That was a bit complex and cool because I'm like, wow. I need this.
Michael Berk [00:53:29]:
Yeah. For real. It's like Shazam for movies, but describing the movie.
Richmond Alake [00:53:33]:
Describe exactly. It's like
Michael Berk [00:53:36]:
that exist. I guess that's I as you said, it's a lot to index all that. But
Richmond Alake [00:53:40]:
It's it's I've it's such a, like, useful useful tool. It would save me a lot of looking, doing a bunch of charade and whatnot. Yeah.
Michael Berk [00:53:49]:
I would pay 99¢ for that on the App Store for sure. Okay. Cool. So I know we're sort of coming up on time. I wanted to wrap with one final question. Back to the career stuff. Do you have any advice for people that maybe started in any portion of your career and that would want to enter into a developer advocate role or specifically enter the AI ML space. Do you have any advice for them?
Richmond Alake [00:54:15]:
To enter into the AI ML space, to enter into the AI ML space is easier than it was back when I did it because I went to go buy a huge textbook. I went to university to get a master's. Now you can watch a YouTube video. You're good. Or you can watch a bunch of several YouTube videos. And I don't think I might be wrong with this. I don't think people put too much emphasis on certifications anymore. You could correct me if I'm wrong.
Richmond Alake [00:54:44]:
Back in the day, people were treat treating certifications like collecting Pokemons. It was, it was That's fine. And everyone was asking, hey. Have you got a certification in this? Have you guys I was like, no. I don't. And I don't know if it's still the same vibe if people are very much into certifications as they used to back in in the day. And, also, back I'm I'm saying back in the day. Good lord.
Richmond Alake [00:55:11]:
I'm not that old. I see. This is just a few years ago. I don't know. I think it's changed, but there used to be such bloodlines between what machine learning engineer does and what a data scientist does. So you're looking to go into AI at this point in time apart from there's loads of learning resources, so you gotta be able to, filter out the signal from the noise. But knowing what role you wanna go into is in it's it's I don't I feel like it might still be as ambiguous as it was back then, but the only difference is you're expected to do some stuff with, with NLMs in whatever role you're in now. You could be a data scientist or data analyst or machine learning engineer.
Richmond Alake [00:55:53]:
You better know how to call some APIs, boy. But yeah. So it's, I'm saying that so the the the general advice is the the what I said, right, at the beginning, if you're looking to go into AI and ML, right, just try to be an expert learner. Don't, like, don't try to map it out. That's what's worked for me. I didn't try to become a developer advocate. I just did advocacy stuff while I was be trying to become an expert learner. And it turns out this is a full time job.
Michael Berk [00:56:27]:
Yeah. No kidding. I completely agree. The way that you get promoted is by doing the role above you for long enough, well enough. Likewise, if you wanna do a lateral transfer, the way that you do that is by doing that role until someone is like, damn. We kinda want you to do this role full time. We can't live without you. So a 100% agree.
Richmond Alake [00:56:49]:
We can't live without you.
Michael Berk [00:56:51]:
I mean, may that might be a bit extreme. But yeah. Okay. Cool. Lots of cool stuff. Let me summarize some of the key points that I found
Richmond Alake [00:57:02]:
really interesting. I would I would like to get your opinion. How would you advise folks to, like, go into AI and machine learning?
Michael Berk [00:57:13]:
I don't have a concise answer. I think it depends upon where you are. If you are in the data space, try to use AI and ML in your existing application. So if you're, like, an analyst or if you're even, like, a finance person, maybe try to apply some machine learning into that. If you're really far away from the industry, it's tough. You just gotta go hit the books and start packing side projects. I think Kaggle is kind of overrated, but doing something you're truly passionate about makes it sustainable. So for instance, in college, I did a, basically, a random forest model on a bunch of NBA and MLB data to bet on DraftKings.
Michael Berk [00:57:55]:
And I learned so much about ETL, how to manage models, what the hell random forest is, and that kept me going. Like, I would actually wanna work on it at night instead of being like, damn. Let's go get me that ML job. It was just a passion project. So I guess those are the core pieces of information. If you're in data, try to apply existing, models to your use case. If you're far away, find a passion project that can keep you going for however long it'll take to get the skills.
Richmond Alake [00:58:22]:
Your answer is very good. It makes my one look kinda rubbish. As in my answer was like, go with Chloe. I said, your answer was very
Michael Berk [00:58:31]:
There is something to that, though. I I completely agree. Like, don't overplan. Like, may keep your eyes open and try to see what opportunities arise and then pounce.
Richmond Alake [00:58:40]:
Yeah. Yeah. Exactly. Seize the moment, seize opportunities, and, Yeah. I like this. And the thing about, I'm just gonna bring it back to MongoDB is, there's so much you can there's so much role you can play here in terms of the the different type of roles we have. I'm I'm I'm, I guess, surprised every time I'm here. We have a venture capital within MongoDB.
Richmond Alake [00:59:03]:
I'm like, you wanna be a VC? You can be that within MongoDB. I'm like, damn. Like, that is that is very interesting. For for someone coming from a very startupy background, it's very, very interested to see the variety of roles you can play here. And you can you can build your career horizontally even within the same company. So for me for me to become a web dev I was a web developer and I transitioned into AI, I had to go into university. But even even in a company like MongoDB, you can just do that by talking to the right people, showing them the skill sets you have, and making that transition and transfer horizontally to a different team. So I think one thing I'll add to the show notes is the careers page for MongoDB in case people wanna make that transition into AI or machine learning or wanna just work with some passionate people.
Michael Berk [00:59:54]:
Yeah. What where are you is there an area that you guys are hiring specifically, or is it across the board?
Richmond Alake [01:00:00]:
I think I I think if you look at careers page, we're definitely hiring across the board. There's gonna be something there. Definitely a page that people should keep their eye on. MongoDB has grown exponentially over the years, and the with this AI stuff, and we definitely need some talented talented people and not just in coding, people that are just passionate about this. It's it's, it's very exciting. Like, I I really mean it when I say I'm in a company, and I come off my calls with with customers or I come off my calls with my team member, and we're just like, we're having similar conversations like this where we're passionate about, the stuff we're doing. We're debating on what method is better or how we can decide which one is the best or what to even talk about to to to our audience. Yeah.
Richmond Alake [01:00:50]:
It's exciting.
Michael Berk [01:00:51]:
Yeah. Yeah. And to elaborate a little bit, I have a buddy that was one of the early software engineers, back end engineers at MongoDB back when it was this this startup called Mongo. And, yeah, the he raved about the culture. He's since moved on, but, it's a really cool organization. All the AI space is growing really fast, so definitely recommend if you're interested. Cool. So some key points.
Michael Berk [01:01:20]:
One thing is if you're looking to truly learn something right about it, use the Feynman technique. That'll show the gaps in your knowledge. Another thing is let curiosity and boredom drive your career trajectory. Excitement and hope, once that goes away, typically, you know, it's time to move on. More tangibly, when choosing a wrapper framework around your RAG application, lang chain and LAMA index get you to production faster, but sometimes you have to roll your own solution for edge cases. And then some cutting edge things that are currently unsolved, but hopefully will be solved pretty soon. 1 is evaluation of these models. 2 is how to chunk.
Michael Berk [01:01:54]:
And 3, agentic systems. They're a bit more stable than they were a year or 2 ago, but there's still a lot of work that needs to be done in each of these three areas. And then finally, there are no AI experts, just expert AI learners. So good. Richmond, if people wanna learn more about you, MongoDB, or your work, where should they go?
Richmond Alake [01:02:14]:
Definitely, check out our developer center on MongoDB where you can see a bunch of interest in folks like myself that create content. Also, we have, we have the mongodb, dotcom page, which has a bunch of we could take you in different directions within MongoDB. You can find me on LinkedIn. Reach out to me. I love I love the LinkedIn messages. So the messages people send to me, I reply to them. I try to reply to them. I try I try I treat LinkedIn like it's Twitter, but people tell me I'm not funny.
Richmond Alake [01:02:46]:
But
Michael Berk [01:02:47]:
I did see a post about your skincare routine.
Richmond Alake [01:02:52]:
Oh, no. No one asked me that as well. He really finished reading the post. That was a clickbait.
Michael Berk [01:03:00]:
Well done. I did click on it.
Richmond Alake [01:03:02]:
Excellent. So your next article, just write about your skincare routine and the headline. Then bring them into some bring them into some content about random for us. You just you got to be honest anyway.
Michael Berk [01:03:15]:
Yeah. It's a well known fact that machine learning engineers really care about their skin. So perfect.
Richmond Alake [01:03:23]:
This has been a fun conversation, but, I'll put in the show notes some resources for we we spoke about the rag notebooks to help people get started. I'll put in the show notes, GEO, one of the best GEO repository that you can basically spin up to pull out different type of rack pipelines with different stack, with different levels of the stack. Provide a career page as well and, the web of the center. Amazing.
Michael Berk [01:03:49]:
Alright. Well, this has been a lot of fun. Until next time, it's Michael Burke, and have a good day, everyone.
Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I'm not joined by my co host today. Unfortunately, he is out of office taking a much needed vacation. But today, we are joined by Richmond. He started his career as a software developer, specifically focusing on systems and full stack JavaScript. He then worked as a contractor at O'Reilly leading a computer vision course and then, contributed to NVIDIA as a, course instructor. Currently, he's a staff developer advocate for AI and machine learning at MongoDB. So, Richmond, we're gonna start off with a little spicy question.
Michael Berk [00:00:41]:
Do you need clickbait titles to get views on media?
Richmond Alake [00:00:46]:
Man, I have not written in Medium, like, religiously in at in at least a year and a half. Back in the day, no. Yet today, yes.
Michael Berk [00:00:58]:
Oh, man.
Richmond Alake [00:00:59]:
Okay. I'm sorry.
Michael Berk [00:01:01]:
That's too bad. So I I have a blog that I used to publish to weekly. It was breaking down academic, sort of articles for the layman so that they can actually try to implement and understand the concepts. And I remember I had some really grotesque titles of, like, 7 technical words in a row. And then I was like, top ten tips for being good at Python and then so many clicks. It's really an interesting phenomenon. How do you think about naming pieces of content?
Richmond Alake [00:01:30]:
Firstly, the top ten stuff. Listicles do great, right, on those platforms like, medium. So but I've never I don't do that much. So sorry. Can you just repeat the question again? I sort of went off track.
Michael Berk [00:01:47]:
Yeah. So for name and content, what's a good name? What's a bad name? Do you ever get guilty for using listicles?
Richmond Alake [00:01:56]:
I don't get guilty for using listicles because I don't write a lot of listicles. But what's a good name and a bad name for naming content? I normally do bad names. I'm a I'm in a bad name category as in I'm doing more content in on YouTube, and I don't understand how to name my videos. Sometimes I'm thinking like you just put as much technical words or make it as descriptive as possible. But then I see videos that do, like, several 100 of 1000 of views, and it's it's like a sentence. It's like cleaning my computer or something.
Michael Berk [00:02:28]:
Mhmm.
Richmond Alake [00:02:29]:
And they got millions of views. I'm like, I'm doing this this YouTube this medium game wrong. So, I mean, changes from platform to platform. So on medium, good good content, good title for your content, it's really something that will make people click. And what I do is any article that I find myself clicking on, I save that title in a notebook in a in a notebook and because I'm clicking on it. There must be a reason why that title is very good. So I try to so I have is, like, a list of different titles, and I try to, like, make my content title match that back when I used to write more frequently on Medium.
Michael Berk [00:03:10]:
Got it. Okay. Cool. So, do you mind if we take a step back? Can you clarify and sort of elaborate on your content creation journey? And I'm curious as someone who, like, wrote a medium blog, who then entered this podcast, who then became a host of the podcast, who then used that to join Databricks, content creation was very essential for my career path. I was wondering if you felt the same way or whether it hurt, helped. What are your thoughts?
Richmond Alake [00:03:39]:
For me, content creation has definitely helped. So, we probably have the similar similar background in where I started creating content just so I can reinforce my learning. So I started creating content back when I was in university, doing my master's in in what we call AI, and I wasn't really understanding the very hard stuff. So I was like, hey. Let me actually write this. It was gonna force me to go deeper into topics. And it it really helped build my, one, my knowledge, and 2, my reputation within the space that I'm in, which is the AI, the machine learning space. So back in the beginning, I did a lot of computer vision content, and deep learning content.
Richmond Alake [00:04:24]:
And that led me to doing, at least 2 engagement, which was one was a live training course with O'Reilly, and another was a computer vision course on with O'Reilly. And then that led me to writing for NVIDIA. And a bunch of gigs have just come from me just being present online. And in fact, me being in MongoDB is actually off my medium articles. It's not off any of my previous well, most of my previous career, technical career, gigs definitely played a role, but I was explicitly told, hey. We saw what you did on Medium. It's great. Come and do the same thing here at MongoDB.
Michael Berk [00:05:06]:
Interesting. Okay. So to that end, I've seen similar roles at Databricks where we sort of leverage personal social media accounts and the ability to write content and then sort of repackage it as Databricks advertising almost. And on the most technical side, it can be in the form of tutorials. On the least technical side, it can be in the form of this is a fancy new concept called Gen AI. Consider using it. What is your day to day look like?
Richmond Alake [00:05:34]:
Wow. That's a very that's a very interesting question. So my day to day, my primary role as a developer advocate is to educate developers and our customers on how they can use MongoDB for their AI applications. Now the way I educate is in different forms. So it could be tutorials like you mentioned. This could be tutorials on written tutorials like we do on Medium. And I also do, tutorials on YouTube, which could be, like, live streaming like we are doing now, where I go for a bunch of notebooks or a bunch of code. And I also have different forms of engagement in my day to day role.
Richmond Alake [00:06:13]:
Sometimes, we create notebooks that live on GitHub repositories, tutorials on GitHub repositories. And there are times I'm talking to customers. So I've I've as you know, Gen AI is everywhere, so I've had extensive, conversations with customers over MongoDB building AI applications. And we're really exploring some edge use cases. And and at MongoDB, we're a database company, but we're supporting customers on every level, including AI knowledge and full leadership.
Michael Berk [00:06:43]:
Yeah. Okay. So lots of different types of content creation. And, yeah, to to the comment on MongoDB supporting a lot of use cases, yeah, at Databricks, we only have good things to say from what I've heard. There's actually a customer that I'm working with that uses Databricks for ETL, then MongoDB as the serving layer. So, yeah, if we're still in sort of the microservices era where lots of things can plug and play, you can switch in and out components. Got it. So in your personal opinion, what is the best type of content creation for personal learning?
Richmond Alake [00:07:24]:
The I was reading this book called, Hidden Potential by Adam by Adam Grant. And he was arguing in the book where he was saying that most people don't generally generally learn by writing. Right? It's just, an assumed, it's just an assumption, or maybe I'm paraphrasing very, very wrong. But I find the best way or the best form of content for learning is just typing. It's it's the closest to writing. Right? It's just type. I know people can learn by just doing a podcast like this, and they pick up good stuff, and they could go deep dive later on in their own time. But for me, just writing because when you when I was writing medium articles, when I'm writing articles and tutorials about MongoDB, I I cannot say a technique or a word and not explain it to yourself I'm writing for, because then I'm just making things a bit ambiguous.
Richmond Alake [00:08:24]:
Right? So I have to explain every technique, and that allows me to double click into things I don't know and fills my gap, my knowledge gap. So, that that is the best form. Just writing the it writes in tutorials, typing up. I find it more better than doing videos on, YouTube. If you watch some of my livestream, you might just see me just sweating profusely and just just, I remember one of my last livestream, I had a I had a coughing fit. So, that doesn't happen with tutorials, does it? So yeah.
Michael Berk [00:08:56]:
Usually. Yeah. So I think you're referring to the the concept of the Feynman technique where you go and, learn through teaching. And if you can't explain something simply, you don't know it well enough. And so by writing it out, you sort of list out all of the core components. And then as you're trying to describe each component, if you can't do that, you go and learn it. And that's sort of like a test to see if you truly know something. Do you find that writing implementations via code does the same thing, or should it be via words?
Richmond Alake [00:09:31]:
What do you mean by write and implementation by code? Do you mean pseudo code?
Michael Berk [00:09:35]:
No. I mean, let's say we are hooking up a vector database in MongoDB, and we want to build some sort of RAG application based on that. Is that a better method of learning, or would writing a blog be sufficient?
Richmond Alake [00:09:50]:
I think writing I think first, if you're gonna build a RAG pipeline, using MongoDB, we have loads of tutorials that will help you just get something started up in in, like, 5 minutes. Right? Literally, we have a Colab notebook. Just run all the cells, put in all your MongoDB configuration, and you have a RAC pipeline. Right? So that that will help you get started. Now if you want to learn because a RAC pipeline has different components. There is a there is a chunking. There is the embedding. There are different aspects of a RAG pipeline that you can dive into, and we have the resources that can help you to learn that.
Richmond Alake [00:10:28]:
So if you wanna get started up and just do RAG, we have notebooks. We have a GEO repository, which we'll link in the show notes. But if you really want to learn, then I would say every components of the RAG pipeline, you wanna probably double click into and, explore that topic. You can spend several months in in areas of different rack pipeline. And now we're moving into a AgenTek systems, which have their own, complexity in terms of system architecture and other areas that you probably need to fill some gaps. Because Ingentec systems are making the old the old are becoming new with Ingentec systems. Right? Unit testing are now a thing in AI. But yeah.
Richmond Alake [00:11:12]:
So, it's something that that, essentially, if you just wanna build something and get started, we have we have to toys for that. Or if you would like to just have that full leadership, have very in-depth knowledge, technical knowledge, then I would say deep diving into what you're actually implementing helps. Free content.
Michael Berk [00:11:35]:
Yeah. Yeah. I think I agree. The my approach is typically if you have a quick start notebook, that can get you sort of 50% of the way there. You know how to implement. You know how to generally make something run and not fail. But if you want to become sorta a next level developer, it's really helpful to go and actually understand what's going on. So not to get into the vector database stuff, but what is the embedding model? What does it do? How is search performed? Is there hybrid? Is there filter? All these types of things are actually really, really relevant in your use case.
Michael Berk [00:12:06]:
That said, people don't have enough time to become experts in everything. So another question or sorry. Go ahead.
Richmond Alake [00:12:13]:
I was gonna say and you don't need to be an expert in everything, is there any Exactly. I I was talking to a few folks, and I was like, there's no such thing as an AI expert. The only thing that anyone could be is be an expert learner, and that's it. Oh, I like that. That's what I'm trying to be. So I'm not a expert in AI, but if there's something I don't know, I know how to learn it.
Michael Berk [00:12:37]:
Right. Yeah. How do you define that cutoff, though? When should you be like, alright. This is good enough. I quote, unquote know it.
Richmond Alake [00:12:49]:
For me, like, doing this thing is fun. Doing AI is fun for me. So I I I was doing a full time job and writing content on medium after my 9 to 5. Now I get to do it as or I get to write content and do different variety of content of my full time job. So when I find when I dive into topic and I explore it, I don't typically have a carve because just to give you an example, I went into uni back in 2018 focused on computer vision and deep learning. I was writing those stuff on medium back in 2018. In 2024, I'm going to release a computer vision course. So many years later.
Richmond Alake [00:13:31]:
For me, there is no carve. It's just, hey. The the stuff I learned in 2018, I've forgotten. So it just it feels good to just relearn stuff and just go back to the roots of deep learning. Look at all the different architectures and of of of of newer networks and how they've progressed over the times and discover things that are new. Right? That's fun for me. So if we're talking about a cutoff, for me, there isn't. It's when I I for an article, there is because you have a aim of an article.
Richmond Alake [00:14:03]:
You have a a goal on what you wanna teach. But in terms of a field, as in it's it's exciting just being being, involved in this.
Michael Berk [00:14:13]:
Yeah. I I definitely feel that. In in most Databricks engagements that I'm put on, I have no idea what I'm doing, and I'm gonna I'm rarely, like, that upfront about it. But when you say, hey. I have this problem. Solve it. I don't know the solution, but I know that I can get there. And so that typically drives what I learn.
Michael Berk [00:14:31]:
And if I can build it to meet the success criteria, then I say I know it generally, and I generally stop learning about it. But as you said, there's so much cool stuff. Like, you could spend many lifetimes deep diving into all this.
Richmond Alake [00:14:45]:
Yeah. Is it until I get to the to the level of Andre Capaldi, that's where the call is. Right?
Michael Berk [00:14:52]:
Yeah. And I've even he has a bunch of blind spots, I'm sure. Like, it's just too much for one person to know. Okay. So I'm revisiting your LinkedIn and going through the the the insane path of graduate software developer, full stack JavaScript, contract web, contract web, founder, course instructor, computer vision, program lead, it goes on. So why so many disparate jobs? And I understand that that's sort of an artifact of being a contractor, but, did they did all these different experiences help you? Did you enjoy that, and did you intentionally set out to have this path?
Richmond Alake [00:15:35]:
Okay. Let me break. Did I intentionally set out to have this path? Not a lot of people know this, but me coming into the whole computing world was a mistake. In the UK, when you go into college, which is this in between limbo state between secondary school and university, you do it for 2 years. When you go into college, you have to you have to choose 4 subjects. I chose 3, and I chose, chemistry, physics, and maths. But you have to choose 4. So I came into the induction day with just 3 topics, and they weren't gonna let me enroll because you have to choose 4.
Richmond Alake [00:16:15]:
And I just said, hey. Choose 4 me. And, the person putting me into the college chose, computer. And that's how I came to this path. And the only reason why he chose computer was because he was the computing teacher. And, here I am today over talking about AI all because of, of his women charts. But really, my career trajectory ever ever since then has been very it's been what makes me excited. So I used to be a web developer, full stack web developer.
Richmond Alake [00:16:50]:
It's it's something that, is one thing that made me fall in love with, the MongoDB technology. I see focus on the main stack. And, I did that for a number of years, then I got bored. You would notice that I moved from places to places because I guess when you're young, you can get very bored easily and you wanna be excited, you wanna keep learning. But I was a web developer, and I feel like I reached a ceiling. And there wasn't much I could learn in terms of, in terms of breath, but I could definitely have gone deep. I felt like I was deep enough. Because where I was as a contractor, for you to become a contractor, back then anyway, you had to be you had to be a what you call, like, a specialist, someone that could come in and solve a very particular problem.
Richmond Alake [00:17:41]:
So you will notice those contract gigs are for about 3 months or 6 months because what we're doing in those contract gigs is the the team are experts that come in to build a certain system or certain front end application and deploy, and you're done. That's it. So you have to know what you're doing, be very good at solving problems, and finding solutions. So that was, like, the stage I was at when I was, a web developer. And I got bored, and I decided to go to the next challenge, which was AI. And I bought this massive textbook, which which, scared me to university to go through a mile. I was like, wow. AI is so difficult.
Richmond Alake [00:18:20]:
There is no way I'm gonna learn this by myself because I tried to do the self the self learning thing. And I went to university to get a master's, but, today, people are becoming AI experts of YouTube videos, apparently. So I got a bit the bit the wrong end of the deal. But, that that explains, like, the different paths. Right? And, the instructor or me writing for NVIDIA or writing in other places, Really, that's just stuff I do outside my 9 to 5 because, again, this is fun for me. I like learning, and I like sharing what I'm learning about. So, I wrote on Medium. I wrote I write in a lot of places.
Richmond Alake [00:18:59]:
I write for net I wrote for Neptune AI, which is more MO ops focused. And I wrote for NVIDIA, which was more data science focused. And I wrote, I taught on O'Reilly, which was computer vision focused. So just dabbing into the old bits and just learning.
Michael Berk [00:19:15]:
Got it. Why do you write? Is it solely for your personal gain for learning, or do you like giving back?
Richmond Alake [00:19:26]:
It is selfish.
Michael Berk [00:19:28]:
So That was the question. Yes.
Richmond Alake [00:19:31]:
Yeah. Yeah. It's it's just so I can learn. Because, again, I'm not I'm not the 8 year old kid or that started programming. I'm not the kid that started programming when I was 8 years of 13. I came into this field by accident, so I had a lot of catching up to do. And, everything just didn't click initially for me, so I just had to I had to spend double the time and double the effort that I think it took over people to. So it's just a practice for me, and I was just happy.
Michael Berk [00:20:00]:
Yeah. No. I agree. I'm the same way. I'd I'd if people get something, great, but it's more for my my personal benefit. Do you think that sort of roundabout path helped you?
Richmond Alake [00:20:14]:
Helped me in law.
Michael Berk [00:20:16]:
That's a good question. So you could have studied computer science at plus technical writing. I'm sure there's degrees somewhere out there that combine the 2. You could have also gone directly into this AI ML route, and you could have theoretically not taken all these disparate roles. But you got a lot of exposure to many different types of work, many different problem sets. Do you think that exposure helped you in your career and what you're doing right now, or would you have rather been sort of a a super focused specialist?
Richmond Alake [00:20:50]:
Well, if I if I wanted to become a specialist, then I guess I would have stayed a dev a web developer and had that depth. Right? So the reason why I'm here now is because I am not focused. I'm just moving around. Right? That's why I'm I'm I'm where I am today. I always tell people, like, I would like to say that everywhere I am now, I planned it all. It is all it's all part of my grand plan because, my close friends are like, wow, Richmond, you work at MongoDB now. And I I used to be a big fan of the technology. Such a big fan that I bought the stocks when the IPO'd.
Richmond Alake [00:21:26]:
And I used to, like it it was just free. It was just free stocks. I was a university student, so I didn't have much. It was a 100 pound worth. So to some to my friends, they're like, yeah. You definitely plan this trajectory. I'm like, no. I'm just going with the flow.
Richmond Alake [00:21:40]:
Right? And one thing at MongoDB is you will find that most people in in MongoDB are very passionate about what they're doing, and they're very passionate about the technology. So there's a lot of content creators in MongoDB. There's a lot of hackers in MongoDB. People that are just working their way through their own products or working through their way through side project. In my in my interview at MongoDB, we were showing each of our side projects, and that was my well, that was one of the code interviews just talking through talking through the codes and the lines and and someone's, hobby projects, and he would just gauge in my my expertise. And it's what you find I don't think it's unique to MongoDB. I think it might be a a byproduct of being in AI. It's just fun to just build stuff.
Richmond Alake [00:22:30]:
Right?
Michael Berk [00:22:31]:
Very fun. Yeah. Okay. Interesting. I definitely wanna talk about that. One more question on career trajectory, though. You said sort of boredom and curiosity were the driving factors. How do you know you're bored?
Richmond Alake [00:22:48]:
Good question. How do I know I'm bored?
Michael Berk [00:22:56]:
And I can elaborate a little bit. Like Yeah. Yesterday, I was bored. I was in a call. They were talking about sort of in an inefficient way, they were talking about not the interesting things that I had done a 1000000 times, and so I checked my phone. That is boredom. Should I leave my current career or my current position just because of that? Maybe, maybe not. So, like, how many days in a row of being like, damn, this kinda sucks, or, like, what is sort of the the inflection point when you're like, I should probably switch? What does that feel like to you?
Richmond Alake [00:23:31]:
So if you look at my career path, I've switched industry. So from web to AI, I'm not afraid to go back to uni. Then I've also switched different companies and and and job types. So when when do I know that I need to switch? I guess it's just instincts. Right? When you're no longer excited, maybe it's like a relationship. You know what I mean? You just stay in the bad relationship. You're like, this is going nowhere for me. As in people stay in bad relationships for several years before they leave.
Richmond Alake [00:24:03]:
Don't they? Right? So, for me, I'm like I I kinda like detect it very early on. Like, hey. I am not excited about this. I'm not excited about being in this space. And it's not just one meeting now. That would be crazy because I'll be in very in a lot of different companies. But, it it it's it has to be for a period of time where I find myself not being excited. What that looks like, I don't know.
Richmond Alake [00:24:33]:
Maybe I stop writing about this particular topic or I stop talking about it or I stop feeling passionate about it. So it just signals to me like, hey. You're no longer excited about this. This thing that you're doing is becoming it's becoming a job. Go look for something you're excited about.
Michael Berk [00:24:53]:
Yeah. I I think what you're saying resonates. I the way that I look at it is I'm a very hope driven person. If I'm not hopeful, if I'm not excited, a lot of life becomes less fun. There's less appeal. And so there's sort of it's a soft line, obviously. There's no, like, I have x points of hope. Now I need to switch.
Michael Berk [00:25:16]:
It's a soft line that's intuitive, but hope and excitement are definitely the way that I gauge if I'm in the right place. It's it's just also so productive to be excited and hopeful because you're you work well. You're you have better ideas. You're more efficient. You wanna get up in the morning. But if you're not having that excitement and hope, maybe it's
Richmond Alake [00:25:35]:
time to switch. One thing I I I mentioned earlier on is, I was brought into MongoDB to focus on tutorials and written type content. But, really, when you come into MongoDBs, wherever you make of it, right, now I'm finding myself talking to customers, giving an hour 30 minute sessions on agentic systems, and that's exciting. That is very exciting because, 1, there is a lot I don't know. But, also, the stuff I do know, I get to share it with people. So it's and if you're an AI, trust me, you're gonna be excited for a very long time because, the tunnel goes deep, doesn't it?
Michael Berk [00:26:18]:
Oh my god. Yeah. Yeah. And and you can go deep in a variety of different directions. It can go deep on the high level use case. It can go deep down to the computing layer over to the side to the algorithms to distributing those algorithms. So there's a lot to learn
Richmond Alake [00:26:32]:
for sure. Yeah. And even even in in terms of MongoDB as a as a platform, there is so much. Right? So MongoDB, we we act as a vector database within RAG systems and agentic system, but it's also an operational data data layer. Right? You could store any type of data on it. But we also have a a streaming product feature. We have, ability for you to run MongoDB instances on edge devices. So there's so many areas.
Richmond Alake [00:27:00]:
We have MongoDB charts, right, for time series data. So even within MongoDB, I'm like, there is so much excitement and so many areas that I have not yet gone into because vector such a rag is such a deep tunnel that you can go into specifically. But, but, yeah, it's, it's it's, it's an exciting place to be.
Michael Berk [00:27:23]:
Yeah. So let's let's dig dig a little deeper. You hinted at a couple of things. One is these sort of data cloud providers, Databricks, Snowflake, MongoDB, all the others, they have a subset of the microservices stack that was that's required to build a good data platform. What do you see MongoDB investing in from that perspective?
Richmond Alake [00:27:47]:
So definitely definitely in terms of product features of MongoDB, we're investing very, very, we're investing across the board. Right? So first thing is we're solving key problems for our customers, and we have different product features to do that. But I see a lot of investment towards the future, especially AI application development. So not just and the thing about MongoDB is we're not just doing things at a technical level. We're doing things at a full leadership level and and other areas that you can, you can think of. So I'll give examples. So in terms of technical side, we're investing heavily on the vector slash capabilities of, of MongoDB. So helping people to do official retrieval within the RAG system or a GENTEK system, the the re efficient retrieval and, relevant retrieval is one of the key things that is is pretty much the the the bread and butter of vector search.
Richmond Alake [00:28:45]:
Having the functionality itself retrieve your appropriate document based on a query and chunk chunking in different, hold in different embedding sizes. But one thing we also deal about MongoDB is we recognize that MongoDB is not a model provider. We're not we're not, we're not line chain. We're not Lama index. We're not OpenAI. We're not we're not. But we can bring the expertise of these folks together. And we have a new program called the map program that I call it the Avengers of the AI because what you get if you come onto the MAP program is expertise from all these AI companies.
Richmond Alake [00:29:23]:
I'm talking the Lima Index, the Lamchain, a lot of companies that focus on evaluation. Some companies are focused on, providing these models, and we bring that expertise to our customers. So we're really meeting our customers and developers at varying level. And that's that's where I see MongoDB investing a lot in as well.
Michael Berk [00:29:44]:
Got it. So what I heard was vector databases are the bread and butter. Models are not the bread and butter. And then from there, you guys look to be scrappy and solve problems at multiple levels. Is that about right?
Richmond Alake [00:29:59]:
Well, not scrappy because this the the use case we're dealing with from again, MongoDB has existed for for for over a decade now. So we have the expertise to not be scrappy. Right? We have True. Best practice. We we they we know what we're doing in terms of building scalable performance application. And we bring that expertise to the table to our customers. So, and this is customers are developing enterprise applications. So that's what we're building.
Richmond Alake [00:30:28]:
And when I say we're not model providers, I mean, we know what we do best, and that has been a data layer. So we can't go and start fine tuning or building foundation models. Right? That would not be a good investment of our time. You're not gonna that wouldn't be a good investment of our time. But what we can do is identify our key partners within the space and bring their expertise to our customers.
Michael Berk [00:30:53]:
Exactly. Yeah. Just as we're alluding to earlier, you can't have an AI expert. And, likewise, an organization can't be like a, like, fill in the blank expert. They need to delegate where they're not, capable or where they just don't have staff. Like, for instance, I do professional services. We don't do staff augmentation. There are millions of consultants in the world, though, that can do that.
Michael Berk [00:31:17]:
So we typically leverage partners to do a lot of the customer work when applicable. So it's cool to see the the same sort of focus and specialization. Another question. So we've been talking about RAG and, vector databases. What do you think are sort of the unsolved problems in this space? What's the cutting edge? What's the frontier?
Richmond Alake [00:31:40]:
It was it was specifically focusing on the RAG architecture. Unsolved problems are chunking strategies. Right? How am I chunking my data? And what I mean by that is how am I, partitioning parts of my data or my text data or audio data into little bits that can be then embedded by an embedded model and stored in a vector database? How am I deciding where the cutoff point is? Right? And I don't think that's a massively solved problem, because they have they have very different, approaches to chunking. Another
Michael Berk [00:32:18]:
Wait. Sorry. Question on that. Can you clarify the pros and cons of big versus small cutoff sizes?
Richmond Alake [00:32:27]:
Well, the if we're talking about big big, big chunking approaches, chunking big corpus of text, One thing you can get is, your embedded model typically has a context window. So the model provider will probably truncate anything that overlaps the cont contact window. So you're gonna lose some information. So there's information loss. That's why you probably do smaller embedding sizes. But with small smaller, smaller it's not embedded sizes. Sorry. Smaller chunks.
Richmond Alake [00:32:56]:
But with smaller chunks, you might not have enough information, to enable to conduct efficient retrieval. So the way the retrieval works is you get a query, then you embed a query, then you do a vector search on the vector on your query embedding and the embeddings you have stored. Now if the embeddings you have stored in your vector database doesn't con contain enough information to have a rich semantic, rich semantic information within it, then the retrieval of the trunk itself and the allocated metadata is not gonna be great. So that's why it's not a solved problem. There were different approaches. There are different approaches. And personally, for my learning, because one, I understand the rack pipeline on a across all the components, but if I had to choose one that I had to go into, that I am going into, is gonna be chunking and, evaluation, LLM evaluation. So my knowledge in those need to improve to the point where I can because they're unsolved problems, but I can give people, best practices.
Michael Berk [00:34:02]:
Got it. Okay. And then one more question before you continue. What are your thoughts on LAMA index versus lang chain versus building your own versus something else?
Richmond Alake [00:34:11]:
That's a loaded question.
Michael Berk [00:34:13]:
Oh, I know. I'm well aware.
Richmond Alake [00:34:16]:
Where's my? So firstly, I don't have any favorites because the position I'm in is a developer advocate. So it's my job to to know the pro it's my job to know how to use all of these tools and communicate them. So if you see the articles or videos I've done, some of them cover landline index. Some of them cover langchain. Some of them actually don't use any of this abstraction frameworks to implement direct pipeline or whatnot. So, what are the best? Right? So for me, I find if you want control controllability, of every components of it, a RAG pipeline, I sometimes I might go into not using any of the any of the abstraction framework. And what I mean by that is there are times where I want to conduct some specific queries, right, that this abstraction frameworks might not have implemented yet. I I I probably wanna do that in native, MQL and not use any abstraction framework to to conduct any sort of filter on my data.
Richmond Alake [00:35:20]:
I'll do that directly using the done operator and aggregation pipeline that we get with the MongoDB query language, which is very easy to use. The the whole point of this abstraction frameworks or data frameworks like lambda index and line chain is they're they're very easy lift and they get you to where you need to be much quicker. And they're growing. They're maturing to the point where you're able to do the things that you might not be able to that you can't do previously in terms of in-depth query. You're able to do them now. And MongoDB works very closely with Llama Index and Longchain as well, the engineers over there. We have integrations with both of these frameworks. So if I was to say what are pros and cons is really on a use case by use case basis.
Richmond Alake [00:36:06]:
Yeah. As in, it will it will be hard for me to choose.
Michael Berk [00:36:11]:
Okay. Cool. Thanks. So you're you're going through the unsolved problems when I rudely interrupted you?
Richmond Alake [00:36:17]:
Yeah. I think the the the biggest ones for me anyway are, well, the chunking I've mentioned I mentioned about the chunking approach. Also, LLM evaluation. I wouldn't say it's it's an unsolved problem, but it's, it it's a problem that, it's it's a step that a lot of people tend to ignore, validating your system. Because I in machine learning, right, when you're actually training a machine learning model, you actually do a validation procedure, in parallel as you're training the model. So you actually watch your validation loss, which indicates to you how well your model is actually doing to any unseen data. But within this this sort of cycle of AI, a lot of people are going into into production without any sort of form of validation system in in place. They're just, like, hoping that if it works on vibes.
Richmond Alake [00:37:17]:
But and I'm I've been guilty of that as well because it's so exciting. Just wanna get something out there and solve a problem. Alright? And get get, start using this tool. But it actually hurts in a long term. You build a lot of technical depth. So l l m l m e vals or l m evaluation is a topic that a lot of people need to start thinking of very early. As in, I'm I'm gonna put out a bunch of content on this. We already have a lot on MongoDB written by some of my amazing team members.
Richmond Alake [00:37:48]:
But, in terms of again, you could always go deeper. Right? So I'm looking to go deeper into l and m evals, into, into chunking. And another unsolved stuff is, agentic system. Right? Last year, agentic system will be unpredictable. You couldn't control the execution flow of the of the of the system itself. But now you have tools like AutoGen and Landgraf that gives you more deeper level of, controllability and scalability of the execution graph. So, AgenTek system are still an unsolved problem and an area for people to explore creative approaches to enforce real reliabilities in this system. And we find that the old is becoming new, so there's a lot of unit testing.
Richmond Alake [00:38:38]:
There's a lot of old practice that are coming into, generative AI application development. That's what I'm seeing.
Michael Berk [00:38:46]:
Cool. How do you communicate the stochastic nature and and explain that it can be reliable to customers?
Richmond Alake [00:38:55]:
That it can't that it cannot. It can't.
Michael Berk [00:38:58]:
Can't. It is possible to build a production system with a fundamentally stochastic tool.
Richmond Alake [00:39:04]:
Well, the l l m LLM evaluation and the the procedure within that is the answer to that. Right? That you can't just say, hey. Trust me, bro. It's gonna work.
Michael Berk [00:39:14]:
You could try.
Richmond Alake [00:39:16]:
You could try. Right? Just like, trust me. But, when you start showing, hey. We have, we have this annotated data set, and we've run it through the model, and it's generate ten. It's it's generating content that has a a low amount of hallucination. It's got high relevance, and you're able to show that in in in a very objective manner, then you can start to build confidence. And that's why you see a lot of tools, emerging in this space. So we have Lang Smith.
Richmond Alake [00:39:44]:
We have Verizon AI. We have Patronus AI. And you have all of this observability and monitoring tools that are coming out to give people that confidence because it can. And not only do they show give you confidence, they show you where things can be improved as well. That's why I'm I'm making it like a, at least for the next, few months, deep dive in into some LLM eval tools and just the same way I've done with langchain, lama index, haystacks, and the whole data framework. I wanna explore the ecosystem around, monitoring, observability, and evaluation. But, yeah, the the answer to your question is very straightforward. Right? It's just use objective based metrics to show that that to show that your your system can be reliable.
Michael Berk [00:40:33]:
Yeah. I I think I agree that that makes sense. If you can show it's just like with any other model. If you can show that it works in a training environment, then you roll it out slowly to a production environment and then just monitor it for drift. It's the exact same thing as any other ML model. I think that And
Richmond Alake [00:40:50]:
that's not new. Sorry? That's that those concepts are not new. Right? Those those are ML those are m o up stuff. Right? They've been they've been there for years.
Michael Berk [00:40:59]:
Yeah. Exactly. And I feel like the the days where Ford recommends a Chevy truck or whatever that that meme was, those are sort of starting to, like, decrease. And especially if you have a good training, evaluation step, you can really guard against things that are real hardcore hallucinations. And maybe it says not the right thing or maybe it doesn't say the right piece of information, but it's not gonna tell you how to build a bomb or, like, I don't know, set your kitchen on fire, whatever whatever the concern is. Got it. So how are if you're gonna go, like, and build this evaluation framework, I know you're gonna be looking into this in the next few months, but can you give sort of a 60 second overview of how one would approach that?
Richmond Alake [00:41:44]:
So what what I'm the way I'm approaching, the topic of evaluation is to look at the stuff that originated from NLP. Right? NLP is not my strong point. Deepgram, like, computer vision was, as you could tell from my article, but they NLP because natural language processing has a bunch of metrics that they used to evaluate outputs from all this natural language model. So you have stuff like the blue school, the school, and you have stuff like, Meteor. So those eval evaluative those evaluation metrics are based on statistics, which is really which is really difficult when you're trying to evaluate reasoning models like, GBTs and and, and and all the all the clothes out there. It can be really difficult to use those objectives, objective metrics. So new metrics are emerging not emerging, but our hair. So there's a lot of, metrics that are able to quantify hallucination.
Richmond Alake [00:42:48]:
There are people that just solely use the LLM as a it's called LLM as a judge. So use the LLM as the evaluator of, of your system where you're able to give, the the output. You give a reference data and a bunch of prompt to make the element play the role of the of an evaluator and give you, like, a very single label. Hey. Does for example, does this output have the have the does this output have any information that is not in the in the reference text? And you can use that to detect a destination. So and you could just use LLMs for that. So you can use you don't have to use a massive one. You can use a small LLM for that as well.
Richmond Alake [00:43:31]:
So those are the new stuff like correctness is another metric size coming out, and, that that is how I would look. So if I also advise someone, I I would say start with the stuff with the basic stuff like the school or school. And those are not perfect, then you start to move into implementing LLM as a judge. And then looking at tools. Right? Looking at Lan Smith, looking at Arise AI that makes all of this such a easy lift. You'd have to build you'd have to build your your your own framework from scratch if you don't have the resources to.
Michael Berk [00:44:09]:
Right. Okay. Going one level deeper, a lot of these metrics that you're referring to, like Rouge, are supervised. And, basically, Rouge, you can think of as, recall. So with that, you need a truth set. Do you have any advice on defining that truth set, the true prediction that is actually correct? I've found that to be really challenging.
Richmond Alake [00:44:31]:
Yeah. Data data, what's the what's the classification of this problem like? Data collection, data sourcing is always been a problem within machine learning as in you have platforms. What was it called the mechanical turk? It was a platform where I think that was some of the yeah. Yeah. And that was used for, like, data sourcing. Right? Yeah. Yeah. So it's I don't know if that still exists.
Richmond Alake [00:44:59]:
You can correct me if I'm wrong. I'm not sure if it still exists, but it's it's an info. Data sourcing and data collection and data annotation is an ongoing problem. And in generative AI, application space is still like I said, the old is new. Right? We still have the same problem. But the way people are solving it now is synthetic data. Right? I did a I did a live stream, a couple days ago, and I just got some a bunch of synthetic data, using an NLM just to create a a a a an example of I gave a few examples, and I was able to create me a a a dataset with some few prompting. Now there are some I've seen some papers saying that there are issues with models that are being trained on synthetic data.
Richmond Alake [00:45:43]:
I'm not double clicked into those, but, but that is the way I would say one solve solving these problems of data scarcity. When you try to build this data set is try synthetic data, and you can just try good old manual label. Yeah. Good old manual labor is in there there is no race. There is no race to finish. So what I would say is maybe putting your team aside and maybe dedicating some time to actually creating some of these annotations yourself like people used to do back in the day. There was a there was a time when people didn't have chat gbt, and that's how you take it. Or I had to draw, like, the annotation boxes on, on several images, and that took hours.
Michael Berk [00:46:30]:
Yeah. Bounding boxes, man.
Richmond Alake [00:46:32]:
Yes. Are you just now you have these tools that do it automatically. It's crazy. Well, you just have to, like, spend several hours. Yeah.
Michael Berk [00:46:43]:
No. I I think you hit the nail on the head. It's sort of another unsolved problem. Typically, it's subject specific, and you wanna become a subject matter expert for your specific problem. And then either manual input manual input them or guide some sort of automated system to do that. But because it's so use case specific, if I ask one question in one context, there could be 50 good answers depending upon another context. And so, it's really important that you think critically about your business use case or your personal use case and ask the right questions and then define the right truth set.
Richmond Alake [00:47:20]:
Yeah. Is this what you tell your customers?
Michael Berk [00:47:24]:
Yeah. Because I don't wanna do that for them. Like, it's it's really hard to go in and become an expert in their business. And, typically, I can't. Like, it takes, I I don't know, several months to get up to speed on what are their true problems. So, for instance, I was recently working on a financial project. I just didn't have the expertise to know what was good and what was bad. So we would say, hey.
Michael Berk [00:47:49]:
We'll put out some responses. Give it a thumbs up, thumbs down. That'll be our binary label on whether it's good, and then we can start tuning accordingly.
Richmond Alake [00:47:57]:
One question I would ask is, do you do when you're when you're, building this RAG pipelines for your customers, right, do you do an evaluation step first in the evaluation and validation process, or do you just build out the RAG pipeline and go back? And the reason why I'm asking that is because one thing we this space is moving so fast and a lot of people wanna see POCs and stuff in production. And when I say a lot of people, I mean, the investors, everyone thinks we're in the AI bubble. So most people building this demo is in large enterprise company might be feeling the heat, and they might say, hey. I don't have any time for for evaluation. Just one on Vybe. So tell me, what what do you what are you seeing or what are you telling people?
Michael Berk [00:48:44]:
Alright. So on whether there's time for evaluation, if you're not making time for evaluation, you should be fired. It's like just not evaluating a model and putting it into production is not how any technical thing works. That said, as you mentioned, there's a lot of heat. And right now, there's a lot of pressure specifically from some executive that heard a podcast about chat gpt that we should be using Gen AI. All of their competitors are using it, so let's implement it as well. So I've been on a lot of those projects where they're just like, find a problem, solve it with ChatGPT or RAG or post source models. And that is a very different business use case.
Michael Berk [00:49:25]:
But what my job is is to ensure that we're building a reliable and stable solution that also meets the business need. So, it depends upon the customer, depends upon their angle, but evaluation is absolutely essential, unless you're doing very basic things.
Richmond Alake [00:49:41]:
Is would you say everyone is doing basic things as in, I know we we can defo there are different components in the right pipeline that you could go deep on, but where does the complexity come in, right, that you're seeing? Because everyone is still building this RAG enabled chat box. Right? Very few people are moving into a Genesys system, which is a bit, a little bit, maybe one level up. But where does the complexity come in when you say basic things?
Michael Berk [00:50:11]:
Yeah. So when I think basic, I think applying a custom LLM over a dataset that extracts a piece of information, summarizes it, that type of thing. It's it's one shot using a pretrained model, not even using RAG. The more complex things are typically RAG based with a variety of different sources, like disparate sources and also in different formats. So you need to start using SQL tools. You need to start using maybe custom API calls, and it's this complex agent chain that synthesizes a bunch of data and then returns it to the user. So that's when I when I say complex, that's typically what I'm referring to.
Richmond Alake [00:50:50]:
Okay. Okay.
Michael Berk [00:50:51]:
Yeah. What about you? What's the most complex thing you've seen?
Richmond Alake [00:50:54]:
Definitely around well, there's some cool stuff going on in the agentic space, man. The stuff I've seen makes me feel like I'm doing nothing. I'm like, dude, I need to go back and hack away. There there are some agentic systems that I've seen. I won't go too much into details, but it's not necessarily what the agentic system is doing, but it's the the components and how they're they're maybe doing tool selection. Right? You know, the agents have a bunch of different tools. How they're doing tool selection, tool storage as well. There are some some folks are not programmatically storing the tools.
Richmond Alake [00:51:30]:
And there's some cool stuff in how they're profiling different entities that the system interacts with is also cool as well.
Michael Berk [00:51:38]:
Come on. Give a little more detail. Just a little. They won't go to them.
Richmond Alake [00:51:43]:
That that is that is some detail that I've just given is in, the the tool storage and the tool selection process is is is very interesting because if you think about it, right, if you're building complex systems that are maintained to interact with a bunch of, with a bunch of systems, what you're gonna have is a very convoluted Python file, right, of just different functions and whatnot. So what's a good way to to store that and actually retrieve that? There's some there's some cool stuff I've seen some folks do. And that's that's the complexity and not the system itself or the the infrastructure and the way you're building out the system. It's that for me, that that that was that's some that's that's some cool stuff. But also in terms of the the variety of data types as well, man, that was an interesting use case, whereas, an application was doing pinpoint frame selection based on a prompt. So imagine a movie that this I have this I have this with my wife all the time where I'm say I'm trying to get her to remember a movie, but I can't remember the name of the movie. But I can remember a scene in the movie. Right.
Richmond Alake [00:52:56]:
So it wasn't a 2 way you could tell it, and then it brings up the particular scene. Oh, okay. If you if you think about it, for a system to be able to do that, you have to embed pretty much every frame of the scene of the movie for you to or at least maybe every, like, 10 frames or so. It's and you have to do different, chunk and retrieval steps. And it's it's yeah. That was a bit complex and cool because I'm like, wow. I need this.
Michael Berk [00:53:29]:
Yeah. For real. It's like Shazam for movies, but describing the movie.
Richmond Alake [00:53:33]:
Describe exactly. It's like
Michael Berk [00:53:36]:
that exist. I guess that's I as you said, it's a lot to index all that. But
Richmond Alake [00:53:40]:
It's it's I've it's such a, like, useful useful tool. It would save me a lot of looking, doing a bunch of charade and whatnot. Yeah.
Michael Berk [00:53:49]:
I would pay 99¢ for that on the App Store for sure. Okay. Cool. So I know we're sort of coming up on time. I wanted to wrap with one final question. Back to the career stuff. Do you have any advice for people that maybe started in any portion of your career and that would want to enter into a developer advocate role or specifically enter the AI ML space. Do you have any advice for them?
Richmond Alake [00:54:15]:
To enter into the AI ML space, to enter into the AI ML space is easier than it was back when I did it because I went to go buy a huge textbook. I went to university to get a master's. Now you can watch a YouTube video. You're good. Or you can watch a bunch of several YouTube videos. And I don't think I might be wrong with this. I don't think people put too much emphasis on certifications anymore. You could correct me if I'm wrong.
Richmond Alake [00:54:44]:
Back in the day, people were treat treating certifications like collecting Pokemons. It was, it was That's fine. And everyone was asking, hey. Have you got a certification in this? Have you guys I was like, no. I don't. And I don't know if it's still the same vibe if people are very much into certifications as they used to back in in the day. And, also, back I'm I'm saying back in the day. Good lord.
Richmond Alake [00:55:11]:
I'm not that old. I see. This is just a few years ago. I don't know. I think it's changed, but there used to be such bloodlines between what machine learning engineer does and what a data scientist does. So you're looking to go into AI at this point in time apart from there's loads of learning resources, so you gotta be able to, filter out the signal from the noise. But knowing what role you wanna go into is in it's it's I don't I feel like it might still be as ambiguous as it was back then, but the only difference is you're expected to do some stuff with, with NLMs in whatever role you're in now. You could be a data scientist or data analyst or machine learning engineer.
Richmond Alake [00:55:53]:
You better know how to call some APIs, boy. But yeah. So it's, I'm saying that so the the the general advice is the the what I said, right, at the beginning, if you're looking to go into AI and ML, right, just try to be an expert learner. Don't, like, don't try to map it out. That's what's worked for me. I didn't try to become a developer advocate. I just did advocacy stuff while I was be trying to become an expert learner. And it turns out this is a full time job.
Michael Berk [00:56:27]:
Yeah. No kidding. I completely agree. The way that you get promoted is by doing the role above you for long enough, well enough. Likewise, if you wanna do a lateral transfer, the way that you do that is by doing that role until someone is like, damn. We kinda want you to do this role full time. We can't live without you. So a 100% agree.
Richmond Alake [00:56:49]:
We can't live without you.
Michael Berk [00:56:51]:
I mean, may that might be a bit extreme. But yeah. Okay. Cool. Lots of cool stuff. Let me summarize some of the key points that I found
Richmond Alake [00:57:02]:
really interesting. I would I would like to get your opinion. How would you advise folks to, like, go into AI and machine learning?
Michael Berk [00:57:13]:
I don't have a concise answer. I think it depends upon where you are. If you are in the data space, try to use AI and ML in your existing application. So if you're, like, an analyst or if you're even, like, a finance person, maybe try to apply some machine learning into that. If you're really far away from the industry, it's tough. You just gotta go hit the books and start packing side projects. I think Kaggle is kind of overrated, but doing something you're truly passionate about makes it sustainable. So for instance, in college, I did a, basically, a random forest model on a bunch of NBA and MLB data to bet on DraftKings.
Michael Berk [00:57:55]:
And I learned so much about ETL, how to manage models, what the hell random forest is, and that kept me going. Like, I would actually wanna work on it at night instead of being like, damn. Let's go get me that ML job. It was just a passion project. So I guess those are the core pieces of information. If you're in data, try to apply existing, models to your use case. If you're far away, find a passion project that can keep you going for however long it'll take to get the skills.
Richmond Alake [00:58:22]:
Your answer is very good. It makes my one look kinda rubbish. As in my answer was like, go with Chloe. I said, your answer was very
Michael Berk [00:58:31]:
There is something to that, though. I I completely agree. Like, don't overplan. Like, may keep your eyes open and try to see what opportunities arise and then pounce.
Richmond Alake [00:58:40]:
Yeah. Yeah. Exactly. Seize the moment, seize opportunities, and, Yeah. I like this. And the thing about, I'm just gonna bring it back to MongoDB is, there's so much you can there's so much role you can play here in terms of the the different type of roles we have. I'm I'm I'm, I guess, surprised every time I'm here. We have a venture capital within MongoDB.
Richmond Alake [00:59:03]:
I'm like, you wanna be a VC? You can be that within MongoDB. I'm like, damn. Like, that is that is very interesting. For for someone coming from a very startupy background, it's very, very interested to see the variety of roles you can play here. And you can you can build your career horizontally even within the same company. So for me for me to become a web dev I was a web developer and I transitioned into AI, I had to go into university. But even even in a company like MongoDB, you can just do that by talking to the right people, showing them the skill sets you have, and making that transition and transfer horizontally to a different team. So I think one thing I'll add to the show notes is the careers page for MongoDB in case people wanna make that transition into AI or machine learning or wanna just work with some passionate people.
Michael Berk [00:59:54]:
Yeah. What where are you is there an area that you guys are hiring specifically, or is it across the board?
Richmond Alake [01:00:00]:
I think I I think if you look at careers page, we're definitely hiring across the board. There's gonna be something there. Definitely a page that people should keep their eye on. MongoDB has grown exponentially over the years, and the with this AI stuff, and we definitely need some talented talented people and not just in coding, people that are just passionate about this. It's it's, it's very exciting. Like, I I really mean it when I say I'm in a company, and I come off my calls with with customers or I come off my calls with my team member, and we're just like, we're having similar conversations like this where we're passionate about, the stuff we're doing. We're debating on what method is better or how we can decide which one is the best or what to even talk about to to to our audience. Yeah.
Richmond Alake [01:00:50]:
It's exciting.
Michael Berk [01:00:51]:
Yeah. Yeah. And to elaborate a little bit, I have a buddy that was one of the early software engineers, back end engineers at MongoDB back when it was this this startup called Mongo. And, yeah, the he raved about the culture. He's since moved on, but, it's a really cool organization. All the AI space is growing really fast, so definitely recommend if you're interested. Cool. So some key points.
Michael Berk [01:01:20]:
One thing is if you're looking to truly learn something right about it, use the Feynman technique. That'll show the gaps in your knowledge. Another thing is let curiosity and boredom drive your career trajectory. Excitement and hope, once that goes away, typically, you know, it's time to move on. More tangibly, when choosing a wrapper framework around your RAG application, lang chain and LAMA index get you to production faster, but sometimes you have to roll your own solution for edge cases. And then some cutting edge things that are currently unsolved, but hopefully will be solved pretty soon. 1 is evaluation of these models. 2 is how to chunk.
Michael Berk [01:01:54]:
And 3, agentic systems. They're a bit more stable than they were a year or 2 ago, but there's still a lot of work that needs to be done in each of these three areas. And then finally, there are no AI experts, just expert AI learners. So good. Richmond, if people wanna learn more about you, MongoDB, or your work, where should they go?
Richmond Alake [01:02:14]:
Definitely, check out our developer center on MongoDB where you can see a bunch of interest in folks like myself that create content. Also, we have, we have the mongodb, dotcom page, which has a bunch of we could take you in different directions within MongoDB. You can find me on LinkedIn. Reach out to me. I love I love the LinkedIn messages. So the messages people send to me, I reply to them. I try to reply to them. I try I try I treat LinkedIn like it's Twitter, but people tell me I'm not funny.
Richmond Alake [01:02:46]:
But
Michael Berk [01:02:47]:
I did see a post about your skincare routine.
Richmond Alake [01:02:52]:
Oh, no. No one asked me that as well. He really finished reading the post. That was a clickbait.
Michael Berk [01:03:00]:
Well done. I did click on it.
Richmond Alake [01:03:02]:
Excellent. So your next article, just write about your skincare routine and the headline. Then bring them into some bring them into some content about random for us. You just you got to be honest anyway.
Michael Berk [01:03:15]:
Yeah. It's a well known fact that machine learning engineers really care about their skin. So perfect.
Richmond Alake [01:03:23]:
This has been a fun conversation, but, I'll put in the show notes some resources for we we spoke about the rag notebooks to help people get started. I'll put in the show notes, GEO, one of the best GEO repository that you can basically spin up to pull out different type of rack pipelines with different stack, with different levels of the stack. Provide a career page as well and, the web of the center. Amazing.
Michael Berk [01:03:49]:
Alright. Well, this has been a lot of fun. Until next time, it's Michael Burke, and have a good day, everyone.
Evaluating and Building AI Systems - ML 166
0:00
Playback Speed: