How To Do Research Spikes - ML 093

Have you ever wondered how to efficiently learn topics? In this episode, we discuss how to conduct a research spike within an ML team setting.

Hosted by:

Ben Wilson •

Michael Berk

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

Have you ever wondered how to efficiently learn topics? In this episode, we discuss how to conduct a research spike within an ML team setting.

Sponsors

Transcript

Michael_Berk:

Hello everyone, welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I'm joined by my other host.

Ben_Wilson:

Ben Wilson.

Michael_Berk:

And today we're going to have a panelist discussion about spikes. So not volleyball, not like a spike in the ground. We're gonna be talking about research spikes. And so what is a research spike? Well, you are an ML engineer and you're a very busy ML engineer because most people are just busy people. But as the good ML engineer that you are, you read blogs on the weekends. And one of these blogs has a really cool method that promises 100% accuracy without overfitting your data, takes one second to run, and you actually don't have to write code for it. So you bring this to your manager and say, Hey, maybe we should try out this method. Your manager says, let's do a spike. That's the scenario. And I think Ben, we can go from there. So what is said term spike?

Ben_Wilson:

Uh, I don't, I don't know the genesis of it. Um, I just have used them in, in Jira before. Uh, so when you're doing sprint planning, you're, you know, you have some implementation that you either want to consider or is planned in the future. Like maybe one or two sprints later, or it could be next quarter. You can. And I have worked in teams where. Nobody does research beforehand. I don't think that's adhering to the Agile method, but it's more like, hey, we know we need to build this thing. Let's just guess at story points and then we'll put it on two sprints from now. But the more successful teams I've been a part of, people set aside some amount of time. Either an individual who's gonna be doing the implementation does it on their own, or... The fun one, which is the whole team sits down and responds to some pressing need from customers, whether they're internal or external to that development team, of saying, hey, we'd really like this functionality and end of story, period, end of sentence. And the team's like, all right, we need to figure this out. Is this worthwhile? What are our shortcomings? let's take some limited time boxed amount of time and see what we can figure out.

Michael_Berk:

Thanks for watching! Yeah. And I've seen a bunch of different methods for this. At a prior role, we had something called a rotating hermit where they would just not sign into Slack, not attend any meetings. And for that week, they were tasked with delivering an insight, implementing a solution or something like that. That was really fun, but it was definitely low on the collaboration scale. So another method we did was, were hackathons and they were roughly every two quarters, sometimes every quarter. And it would be looking to solve, especially creatively solve large business issues. So why is this happening with a given team? Can we attribute success or growth in the company to this specific initiative? That's a really hard one that executives love to try to solve with data. But there are many different ways to go about it. Ben, what are the formats that you've seen work well?

Ben_Wilson:

I think it really depends on the team and the processes and tooling that the team has at like available to them. So in a very junior team, I always recommend like pair spikes where say we have a team of 10 people and I'm if I'm the TL of that team, the technical lead, I would say I want a senior person and a junior person to pair up and try to do a mini hackathon to investigate this thing. And that way both people benefit. The less experienced person sometimes asks questions that the more senior person might not consider because of their inherent bias and their experience that sometimes blinds you. And then the more junior person is going to just see of more senior person do things and they kind of absorb all that stuff Like wow, I learned so much from doing this and it It helps to, from a team process perspective, doing stuff like that helps to break down the false and assumed barriers there are between more experienced people and less experienced people. It's not gonna guarantee that these people are gonna be friends, but it can help them be more friendly in their work environment. So they get to kind of get to know each other. Like, hey, we've worked on this thing, or we've worked on six things together in these really fun things that we're doing. And you get to know the other person and see how they work. And that collaboration is really good for team morale. For extremely senior teams that I've been on where the person with the least amount of experience on the team has been doing it for 12 years. pairing that usually is pointless with respect to executing something that you're going to deliver to the team, even on extremely short periods of time. So in that scenario, because of the speed at which everybody works and how quickly they can gain insights from that brief research period, just doing a live coding session. I've seen work really well in that situation. Hey, everybody is taking a different thing to do. We're gonna set two hours and we're all sitting in a room together doing it. So we can make comments to each other, ask people questions like, hey, have you checked in this API? What's with the signature here? It's not documented. And the other person might be like, oh, I just messed with that three minutes ago. Here's how I got through that. that can be really beneficial. And then for, for anything that's of sufficient scope that you can't time box in a short period of time. Or it's like, Hey, regardless of the experience of the team, what we're trying to investigate requires reading a lot as well as writing code and going through examples and demos and checking advanced functionality within this library. You're not going to do that with your whole, you know, development team sitting in a room because that might take 12 hours, might take 20 hours or something. You're not going to have everybody just do that all sprint long. So usually you'll whoever's going to be doing them, leading most of the implementation, who's going to be the senior person who's going to be drafting up design documents for what we're going to be working on over the next couple of weeks or a couple of months. Just have one person do that and then everybody else peer review that work. And you know,

Michael_Berk:

Thanks for

Ben_Wilson:

show

Michael_Berk:

watching!

Ben_Wilson:

your work. Everything that you did during that research, all your notes, that all gets submitted to the whole team to review. So people can be familiar with it. So when they're checking your implementation later on, they know what to look for.

Michael_Berk:

Yeah. So there's tons of different ways to go about it. I think it might be helpful to sort of give a couple of tangible examples of things that have worked well in our experience. I'll start off really quick. One example was we had a rotating hermit implementation of set spike, and it was just one person working on engagement classifications for different types of users, and this was really, really effective and is actually one of the most data science built models in the organization. And basically what happened is the researcher identified the problem and said, hey, we don't know if different types of users behave differently, let's explore that. So then they went in, did a bunch of EDA and then said, hey, well, let's try to throw some unsupervised learning at it, eventually settled on a hidden Markov model and then was able to classify users into three different with a couple others added on from logical. a continuation of that implementation. And this included clustering on just three variables. So it was really, really easy to implement with SQL. So it could be built into all of the tables and that helped it be used widely throughout the organization. So that's one example. And another example is we were working on, as I sort of mentioned before, attributing different teams and different initiatives to the overall bottom line of growth. better or worse because of COVID or because this PM has this new feature that's so good that everybody's flocking to the feature or to the, to the website for it. So that second one, it was just a really tough problem. It was really not well scoped prior to the hackathon. And we had lots of really cool ideas, but we didn't end up finalizing a production solution. Lots of things were thrown around, but nothing actually came to fruition. So those are just a couple of examples of, and that was at a media company. But Ben, do you have some examples on your end?

Ben_Wilson:

So I can talk from two separate perspectives here because of the ludicrous journey that I've made on my career. So from an analytics and pure data science perspective where when I was working on teams like that, where we would get a business problem that it might be highly detailed what the problem is. And Of course, when somebody gives you a problem like that, they're going to give you a solution that they think is going to work. It's almost guaranteed to happen. Uh, very few people can, can just present a problem objectively. They're always going to be like, we could solve it by doing it's important. If you're going to start a spike, listen to that, but put that kind of to the side and don't think about a solution. Uh, it's all about being creative. When you're thinking about. what the actual problem is. Like how is this, how are they seeing that this is a problem? What data are they using to, that inform them to tell me that we need, you know, my team needs to work on this so that we can start exploring data and looking at relationships and seeing like, hey, do we have the right data for this? How often do we collect it? How clean is it? Before even looking at, oh, wow, we could build this model with this sort of. algorithm, like I don't care in those situations, that's implementation detail. It's all about, can we even solve this problem that they're saying is a problem? Can I validate that this actually is a problem with data? And that's what the spike would be focused on for me, at least a first phase one. And after that validation stage, um, I would usually take on that sort of work as a TL to do. to save the time of the team. And then if we get to the conclusion of, yeah, this is really a problem, we need to do something about this. I'm not gonna, you know, I might during that process think of things of how to solve that. And based on my experience, I would be always searching for the simplest way to solve it, because I don't like maintaining code, I don't need to maintain. So. The easier it is, the better in my book. But I'm not going to say any of that to the rest of the team. I'm going to present it as if I'm completely ignorant of what's going on and let them get creative. Just present the problem. Like, Hey, here's the data that we should be looking at here. Here's the problem that the business is struggling with. Everybody, you know, take four hours and see what you can do. What can you find? And then we'll meet back up this afternoon after lunch. Let's talk through it for an hour. Everybody present some data. Present your notebooks. And at no point is the expectation that the code is clean or well-written. For this sort of work, be as dirty and nasty as you have to be. It's all about speed and exploring as much as you can. Now, that's the data science side. From a software engineering side, it's very different, that process of evaluating a problem. Evaluating a problem at that point, you usually have direct evidence or requests that people have, saying we want a feature that does X. Now, that's as far as it usually goes, and that's as far as it usually should go, because the people asking for it aren't gonna know how to implement it. They don't need to know how to implement it. It's not their job. It's the job of the development team to get creative and figure out a way to solve this problem and then implement it. But the determining ways to solve it. There's, there's different stages of that. So stage one is what's the grand scope of what we need to do and what is in scope and what is out of scope. What is that minimum viable product that we need to build? And determining that is what you figure out by doing a rough spike. Like, hey, we need to integrate with this API, this open source tooling. How does this thing even work? Like, can I go through their docs? Can I read through their examples? And can, like, do they even run? If I copy the code from their website and paste into an execution environment and run it, does it throw an exception? If so, what exception does it throw? Is it because I don't have to write libraries installed in my system? Do I need to do stuff to my system to get this to run? I'm writing all those notes down. I'm like, hey, this doesn't work in Python 3.10.2, which I currently have installed. It only works on 3.9. Okay. No. Oh, it needs this version of NumPy in order to run this thing. Or it needs this version of Pandas. There's some dependency conflict. Okay. I'm writing all that down. And then if I execute. If I get the environment so that I can just import their library, great, awesome. I make a note of that. And then I take their example, their hello world for that, that module and try to run it. Does it run? If it runs without exceptions, awesome. Cool. That's. That means that this is potentially a solution or potentially something that we can do. If it blows up on the hello world. That's a red flag. Something's not right here. Maybe they made a recent change, or maybe they haven't updated their docs in a while. So then the next thing that I'll do is I'll go into their test suites for an open source package and I'll pull out four or five tests that don't require a massive test harness to be executed. Or if it does, I'll just fork the repo and try to run the test suite locally. and do the test run? Do they pass? If they do and everything works fine, that's low risk. If I can't get the test to run, that either means that that test environment and their requirements that they have to build that package are flaky and challenging to sort of set up, or their CI is broken, which is a huge red flag, if it's on their main branch. that sort of thought process of is this functional? Can I use this to solve this problem? I might be testing three different packages to solve this problem. If we're talking about ML engineering integration, we go through that same process with all three of them and just make notes. And then after that, might go through and say, okay, on the non-example data, or the non-hello- world, you know, example, let's take an actual tutorial on real data. Maybe it's, maybe the data is provided by the package for you. We'll check it out, look at it. Is it legit? Have they done something funny to it? Are they only using integers and there's no floating point units in here? Are they, you know, pre-casting categoricals to integers or something or index values? You know, take a look at all that stuff and then be like, all right. Now I'm going to take it on some real data that we have internally and try to run it through this package. Does it work out of the box? Does it not throw exceptions? That sort of thing. And then later on, after everybody in the team or if it's a limited number of people doing that investigation, that's when you start talking about, okay, here's three ways that we can do this. We evaluated these three approaches. Here's the pros and cons of each. and this is the direction we're going to move in. Only after that is done and the team sort of agrees on, hey, this is kind of what we're going to be doing. Then you start worrying about implementation details and planning out like, what do we actually need to build here? And thinking through, you know, what is the architecture of the code need to look like? Are we going to be using these principles in constructing this? How does this interface with the rest of our our software that we're adding this to. So it becomes very different. But after that initial spike phase, in both scenarios, whether it's pure software engineering or data science work, ML work, I've always been a fan of not worrying about implementation details until after peer review of the idea. So you get to that peer review phase and people are like, okay, you tested out these four different models for data science type stuff, four different approaches. It doesn't have to be perfect, doesn't have to be tuned, doesn't have to be, you know, it has to be trained, of course, but it doesn't have to be, you know, run through hyper-parameter tuning and be like, hey, we got the best RMSE with this particular solution. Who cares? It doesn't matter. You'll tune that later on. That's an implementation detail. or like, whoa, how do we do monitoring of this? And how do we schedule retraining? That's an implementation detail. Worry about that later. The spike is to figure out, can we solve this? Yes or no? Basically, the time that we would be committing for the next n number of days, weeks, or months, is it worthwhile? Is it more important than other things that we could be doing?

Michael_Berk:

Yeah, there were like 50 really interesting points. Let me see if I can circle back to a couple. So one thing that is really interesting that I would love to get your opinion on Ben is how you approach developing the simplest solution. So putting some like a face to the name for that question. There often is this human drive to get something to work, right, and get something fancy to work, especially if you're more junior, it's really enticing to throw deep learning at everything. How, what part of your career and what lessons have you learned that make you try to simplify problems? Is it getting burned in the past on maintaining code? Is it, why do you do that?

Ben_Wilson:

It's a huge collection of maintaining crap code that I've written in the past and struggling with keeping it running. The more complex that you write something and the less amount of peers that look at your implementation before it becomes production, the higher the probability that you're the one who's going to be maintaining it alone, which is a danger. does something that makes you feel good about your skills and massages the ego of like, yeah, I'm good. Look at how complex of an implementation I built here. This is awesome. And you might have some buddies at work that are like, hey, that's pretty cool. That's fascinating. That's amazing. Anybody saying that to you on something that's complex? They're never going to maintain that. They're never even going to help you maintain that. So don't ever ask for people to. Praise your work on something that's clever. Uh, because any praise you're going to get there is, is somebody who doesn't care. Like they might care that they see something cool and maybe they learned a few things, but they're not going to help out when you're on call for this thing and you got to fix it at two o'clock in the morning, they don't care. Uh, the peers that you should be looking for on reviewing something are the people who are on call. They're the ones who realize that they have to maintain what you built and they're going to look at it through that critical eye. So I've learned that process. I don't know if it's something you learn in its entirety. It's something you always learn. You're always continually learning whenever you're writing code and working in... different teams, you're gonna learn different aspects of this. But... you'll really feel the pain, not only with your own code, and that's a humbling experience when you realize that you've written something that's hot garbage, that breaks often, and it's truly humbling when you look at something that's failed, and it's the first time it's failed in six months, you look back at the code base, not looking, or maybe not remembering who wrote it, but you look at it and you're like, I can't read this. I don't know how this works. What idiot wrote this thing like this complex? And then you look at the commit history and you're like, oh, I'm the idiot. Man, what the hell was I thinking? Because if you have to reverse engineer your own code, that's bad, right? So I've been in that situation before where I've looked at something that I built a year before and I'm baffled at. just the hubris that I had when I wrote that of thinking like, Hey, this is a good idea. Look at, look at how cool this is. And then looking at it a year later when it breaks, I'm like, what the hell? Like this is so bad. It would take less time to rewrite this from scratch properly than to fix this convoluted mess that I've created. So I've learned that lesson the hard way, but it's, it's frustrating in a different way when you're on call supporting somebody else's implementation and you look at it like wow am I stupid or Do I just not know enough about how this language can be used and you're kind of reading through it and frustrated because you have to fix this thing because it's a production down process. And then you ask a peer, or maybe ask somebody who's more senior, like, hey, can you help me take a look at this? And then they get that same look on their face and you're like, okay, it's not just me. Like, this sucks, this is not written right. And then having, and then realizing the person that wrote it left the company 18 months ago. And one of the reasons why they left is because people didn't like the crap that they were producing. So, you know, they were forced out or, you know, frustrated or something, or they're just an a-hole. And... you realize that in order to fix this, you have to rewrite the job or rewrite the project or something, and that feels bad. So, yeah, I've learned it in a lot of different ways. Most of them have been humbling experiences, but that's the reason why we do peer reviews. If you're working on data science projects or ML, you should be doing a peer review. If you don't have an ML or data science peer who is available, or knowledgeable enough to look at what you're doing, get a software engineer to look at it. Make a friend, like, hey, could you just do a PR on my implementation? And then let them know, like, hey, it's cool if you don't know how the model works. That's not really what I want you to look at, because the model's only four lines of code. Could you look at the other 1,700 lines of code here and let me know if I'm being stupid? And if it's just. senior software engineer they're going to leave comments. Don't take it personally. Realize that that's just how it works in software engineering.

Michael_Berk:

Yeah, getting lots of eyes on a problem is really, really helpful, especially from experienced people like Ben who have learned that maintaining code sucks. It's it's it's really. It really can help with a lot of things. And sometimes, just taking a very simple modeling example, a 2% drop in accuracy is worth the amount of man and woman hours it would take to maintain that much more complex production version. So there's a lot of things that go into making the decision for what should be put into production and what shouldn't.

Ben_Wilson:

Yeah, and you mentioned that junior data scientist wanted to use deep learning. I do have a lot of opinions on this that probably

Michael_Berk:

Hit

Ben_Wilson:

aren't

Michael_Berk:

me.

Ben_Wilson:

that popular, but I'm a huge fan of deep learning. I think it's amazing. It is this awesome tool set and paradigm that you can use to do things that other implementations or other algorithms are just completely incapable of doing. With that being said, if all you have is a hammer, everything looks like a nail. And if you apply that advanced tooling to every single problem that you come up against, I know the argument might, and I've received this argument from junior people before, like, well, I can use TensorFlow for every problem. Yeah, you can. But who's paying the bill? So depending on what resources you have available, to solve a problem that could be solved with, say, logistic regression. And you could get a pretty good result of generalizable results for a supervised learning problem, where you're just predicting a probability of class membership. And could you use deep learning for that? Of course. It's completely suited for that. But that logistic regression model will train on a four core CPU in 30 seconds. And then retraining that deep learning model to get 30 second response time, you might need to have a. pretty expensive, you know, tensor processing unit or a GPU available. And when you're talking about how many iterations you're going to have to do while developing that solution, it's not, you know, people like to think of how long does it take to train one, one model? Like, well, it's only, it's only five minutes to retrain it. Yeah. Well, how many times are you going to do that five minute retraining while you're building this project? 2000, 5,000, 10,000. I don't know about you, but I try out a lot of stupid ideas whenever I've built ML implementations. Sometimes I just read through the APIs. I'm like, huh, I wonder what happens if I try this? Well, all of those what happens if I try this iterations, they all start to add up. And computing resources are not free if you're talking about cloud-based things. If you're talking about on-prem servers where... your IT department is maintaining some machines for you to do testing on, which there's not that many companies that have server farms that they control anymore. But if you do, you're taking resources from somebody else when you submit that job. And if you're doing it on your laptop, shame on you. You should only do that for research and learning or local unit testing. But if you're doing something that's eventually become a project that's going to run in production services at your company, you should be doing that in a proper development environment, which that costs money.

Michael_Berk:

Yeah, and one more pitch on like the anti deep learning wave, especially since we're talking about spikes, spikes are short research projects that have a finite amount of time to deliver a result or deliver something that should not be a result. And that's sometimes equally valuable, like knowing that this solution is not viable for this problem. When you are in that research phase, iteration is so freaking valuable. And if you're using complex models. your iteration time is often a lot slower because the model takes a lot longer to train. If I can write a SQL case statement, I can try a thousand case statements in a day. Let's say that's a lot, but a couple hundred in a day and see if

Ben_Wilson:

Michael_Berk:

they

Ben_Wilson:

he.

Michael_Berk:

work and then throw it through a loop in Python, now I can try a hundred thousand case statements. All of that will run super, super fast. But if you immediately jump to complex solutions, you don't have a starting place and you don't have therefore a place to continue from upon. So if you are trying to get results, iteration I found is super super valuable and super super effective. And big complex models should usually be a last resort unless you know that the implementation or the subject matter of that ML algorithm warrants deep learning.

Ben_Wilson:

Yeah, definitely. Like a spike we did recently where we're trying to see, can't disclose all the details of what we're working on, but we were trying to interface with some packages that are pre-trained deep learning things for MLflow. And we wanted to see, hey, how does MLflow support all these things? So the first thing that we tried is, hey, let's just take the pre-trained models and run it through. junk data that we're generating. We just did some synthetic data generators, generating a bunch of text, generating a bunch of basically representative image arrays. They were like, hey, this is a color image of just random colors for each pixel. What is it like? Cause we just need data to pass through it. So we're generating all this random data and running it through it. And we're like, wow, this is so cool. Like it made me like remember the. the fun parts of data science and while I was going through and messing with the sentiment analysis stuff for BERT models. I was writing some funny things and then sending it through it. I'm like, yes, super fast. And then one of the validation steps that we had to do is to emulate what retraining would look like. I was running it on a CPU system and I found a corpus of text that had... classification labels associated with it that were different than what the original model was trained on, and I set up retraining on it. That retraining didn't finish for our spike. It was still running 17 hours later, and I eventually terminated it. I'm like, wow, this sucks. Right after I kicked that off, my next step was to validate that it performs on GPUs. Yeah, that finished in three minutes. on retraining on a couple million record entries. But that instance costs 65 times what that CPU instance cost. And I had it up for two hours. And I'm like, yeah, if I had to do this for five straight sprints working on a speech to text sentiment analysis implementation, would I really be OK with? You know, taking the pre-built model and then building an ensemble of four different pre-built models and retraining all of them, like, where is this going to run? I need to have a talk with my director to see if we have budget for me to even do this because this is probably going to cost $50,000 a quarter just to retrain this thing every time we need to retrain it. So the considerations like that, you can think about those things while you're doing these evaluations. on either side, like a software implementation or from a data science perspective, it really comes down to time and money.

Michael_Berk:

Yeah, and that should be made clear from by the TL or whoever's managing the spike. Basically they're having a consistent acceptance criteria and evaluation criteria is super useful. So everybody can come in with the same stats like on this cluster, it took this long to run or this is the accuracy based on this loss function. And then you can compare apples to apples. That's super important. But Ben, I have a question for you. So you've been part of a few spikes, and you've developed your own strategies to be good at spikes. What are a couple traits

Ben_Wilson:

better

Michael_Berk:

that

Ben_Wilson:

than

Michael_Berk:

make

Ben_Wilson:

Michael_Berk:

people

Ben_Wilson:

was. I don't know about good.

Michael_Berk:

valid? Okay. What are a couple traits? that you see in ML engineers that make them very good at spikes and a couple of traits in ML engineers that you see that make them very bad at spikes. Whether it be personality, skill set, you name it.

Ben_Wilson:

good traits, I would say, can the person see the forest for the trees? While they're going through research, they're just hitting the main points. So instead of, again, we're talking about retraining pre-trained deep learning models. Somebody that can see the forest for the trees might take to evaluate what it's like to retrain, they might take... an open source data set that might have 10 billion records in it, download it, you know, get it all ready. The person who can see the bigger picture is going to take 10 rows of that data and issue it a single epoch retraining just to see how, like what it does. Does it update weights? Yes or no. Do I see like a result on TensorBoard? Can I open that up? How does it look? Uh, You know, what happens if I just change this parameter on 10 rows of data? What does that change? What happens if I change my optimizer? You go through and do a bunch of these validations on just random selection of data. You're validating that, what the feel of everything is. What is my code gonna look like? How is it gonna be to interact with this API? Is this something I wanna take on? The person who can't see the forest for the trees is gonna... take that open source retraining data set, and they're going to hit retrain on all of it. And they're going to check, OK, what does 50 epochs of retraining look like? I want to validate all my results on TensorBoard. I want to put a loss function in here, a custom loss function. I want to be able to see what my metrics are across all of these different parameters for testing things. they're doing implementation details while doing research. That's what that is. So if you're worrying about the entire API and like what an end-to-end, effectively an integration test is, that's not what a spike is for. Spike is, does this API suck? Like, is this a pain to do? Or is this something that it's pretty good? It's well-designed, the API is kind of easy to use, it's well-documented. I think this is going to work for us. Yeah. And then also a bit of humility. Uh, people that don't take themselves too seriously when they're going through and, and evaluating stuff, like not taking things personally, if you can't figure something out, it's not productive. If you get frustrated and annoyed and blame the API, it seems like I've seen that happen in research spikes before. where somebody just trashes this API or says like, this is, I mean, there are times when that's justified. There are some bad implementations out there in open source where you kind of look at it and you're like, yeah, this is broken. But the person that focuses on that and just wants to to waste too much time providing evidence of why something sucks, they're not gonna be able to think about other alternatives. They're gonna focus. sort of negative emotions on how bad this thing sucks or how much they don't like this thing. It's like, whatever, just move on to the next thing, test something else out. You got four hours to do this. And people that are successful in doing that research can just iterate fast, get the data that they need to make an informed decision, and then work on presenting those findings to the team.

Michael_Berk:

Yeah, I think you hit the nail on the head when you said four hours. It's time boxed. So you need to be able to perform under pressure. Getting frustrated almost never helps. Um, if you have perfectionist traits, that's also not ideal. Just quick aside. Uh, sometime in college, I built a website because you know, you need a website for your resume to look cool and also websites are fun. Yada yada. I started building the website and. I look up and I realize I have spent the past 90 minutes aligning footer text on an otherwise empty website. And I was just like, what the hell is wrong with me? I need to get my life together. And knowing that about yourself is actually really, really valuable because I never make that mistake now because I know I need to consciously counteract it. If I had all the time in the world, I would go super deep and make things perfect. But having some self-awareness about your style of working can be really beneficial in these time-constrained activities. And so that's one thing. If you're a perfectionist, know that. and kind of like work on that. And then a second thing that we sort of hinted at before that I think is really, really valuable is being able to iterate and being okay with sort of a, a rough outline of the solution. But Ben definitely hit the nail on the head that if you can see the big picture, it allows you to move more dynamically in the solution space. You might be like, Oh, maybe we can reframe the problem as classification instead of regression or whatever it may be. So to that end, Ben. What percent of the time has the most effective solution reframed the problem versus found a clever implementation to the existing problem?

Ben_Wilson:

In my experience. That's an excellent question. I don't think I've ever

Michael_Berk:

And look,

Ben_Wilson:

even

Michael_Berk:

and

Ben_Wilson:

really

Michael_Berk:

look...

Ben_Wilson:

thought about that.

Michael_Berk:

And I think for software problems are more clearly defined. It's like build this, but let's stick with data science, whether it be decision science, machine learning, that type of thing.

Ben_Wilson:

So I thought that about software before I started doing it full time. There's a lot more creativity and implementations in software engineering, particularly like SaaS tooling, than you assume there is as a user of it. There's a lot of really clever things, particularly

Michael_Berk:

scratch that then.

Ben_Wilson:

in companies like the one that we were for. Engineering teams are absolutely brilliant, but also humble. It's amazing seeing them work. But finding something during research that changes, that exacts a paradigm shift that changes the entire approach. I'd say when I was more junior, that was more frequent. The more experience that you get, the less that ends up happening, but that isn't to say that it doesn't happen. You should be open for that happening. Even if you are super experienced and you're like, hey, I've solved this problem before. There could be something that you have in your structured list, which that also goes to say... Before you start a spike, have a plan, like have a game plan of, Hey, here's the things that I need to check. I don't need to check them beforehand and I shouldn't do that, but here's a list of things that I need to check out. Basic research and preparation, but If you've come across something that

Michael_Berk:

Thanks for watching!

Ben_Wilson:

is so much simpler to do it that way and still solves the problem than what you've done in the past, yeah, definitely be open for that. And I'd say that's about a 10% of the time that happens, um, for the stuff that I worked on in like the last, you know, the last two years that I was doing data science work. Um, that was kind of what the rate would be. But then the flip side of that coin of I don't find a solution that's readily available out in the open source world or in some service that I can sign up for and buy to solve it. That's probably also on that 10% spectrum. with the same ratio but not as a beginner. As a beginner that was 0% because you wouldn't know if you're new to data science and ML or software you're not going to have the the skills or the capabilities to build something from scratch. You might think you do but newsflash, spoiler alert, you don't. It's not going to be something that's going to be any good probably.

Michael_Berk:

We don't have to hurt everybody's feelings, just because it's true. I mean, I don't

Ben_Wilson:

I mean, from

Michael_Berk:

know.

Ben_Wilson:

my own experience, I did attempt to do stuff like before

Michael_Berk:

Same.

Ben_Wilson:

I was ready to do it and

Michael_Berk:

Yeah.

Ben_Wilson:

build stuff that I'm like, wow, that sucks. Or like, I'm annoyed that I spent a month on this and it went nowhere because it only, it didn't cover all of the cases or it didn't run in this environment or it created, the implementation was. It was not performant in a way that we could run it in production. It was too expensive

Michael_Berk:

valid.

Ben_Wilson:

or something. It just takes time to learn all that stuff. You learn that by seeing open source tooling and seeing how they're implemented and learn from that. Everybody learns that way. It went from 0% to it ramped up relatively high when I was medium level experience in my career within data science. Oh, the open source tools can't do this. I need to build something myself. So there are still, I'm sure, sure. At previous companies I worked at, some of these things are still running today. Um, they probably shouldn't be, but things that there is a solution in the open source, you know, pantheon out there that could have been used and should have been used, but I might not have liked its APIs or I might have thought, well, not that I can do it better, but I want to learn how to do this sort of thing. So I'm going to implement something myself so that I don't have a dependency on this package, which is just stupid. It was dumb of me to do that. And it was probably for pride as well to show, like, hey, I can do this. I can do this. create my own implementation of this algorithm. If I could go back and tell myself now, like, hey, if you wanna do that, cool, man. Like, do that on the weekends or like do that at night. Don't push that to a company's code repository. Do the simpler thing. If you wanna learn how to do that, how to build this algorithm. then just learn how to build the algorithm and do it at home on your own computer. And don't waste all that time trying to get it working in your production environment when it's three lines of import statements and then 30 lines of a function that you can just interface with this open source package and do the exact same thing. And it's maintained by 400 people in the open source community. It's easier. It's faster and it's probably gonna work for a lot longer period of time and you don't have to maintain it Which is the big bonus? and then you know fast forward to now with respect to implementations of Me not finding something that could potentially solve it and saying like hey Okay, I need to build this For data science work pretty much Never happens or it's so infrequent. Yeah, for solving an actual modeling problem, one time in the last two years, did I actually have to build a custom algorithm. And that was simply because it didn't exist, and I knew it didn't exist because a customer showed me a PDF of a white paper that had been published six months before, and they said, we need this on Apache Spark. And I'm like, yeah, that does not exist because nobody's built it yet. So yeah, we can work on this together and see what you need of this. But first you need to prove to me that you do need this algorithm and nothing else is going to work. And they did that. And then we built it together. Um, but it's, that's exceptionally rare and it's usually for highly esoteric problems where it's like, okay, this package might exist or the solution might exist on single node machines. It's not a, you know, a concurrent process or a parallelizable process. But their usage of this and their data set volume that they need in order to train a model, it has to be distributed because you can't spin up a VM that can hold that much data in it to actually run the algorithm. But if you're doing stuff like that, always ask around and do a bunch of research before you tackle something like that, because it's a lot of work. It's super fun work, but you now own that. You have to maintain that.

Michael_Berk:

Yeah, 100%. As someone who doesn't have 30 years of experience in the data science field, I'm starting to learn that building stuff from scratch doesn't always work. When I was a kid, like seventh grade, I told my dad I was gonna build AI. So I went upstairs and made a list of words and it would randomly put out a sentence. And I was like, look, it talks. And having sort of, like knowing what you don't know really really valuable thing and as you work with smart people you start really appreciating how difficult the difficult problems are like you're not you're not that much better than anyone else or better at all so yeah but

Ben_Wilson:

but

Michael_Berk:

100

Ben_Wilson:

we're all better

Michael_Berk:

on that

Ben_Wilson:

together. And that's

Michael_Berk:

oh beautiful

Ben_Wilson:

the point that I like to tell to as many people as possible. I've never in my career actually met someone who on their own with no input from an external source build something amazing. Anything that, if you're using an open source package that's in existence somewhere. and there's a lot of people that use it, take a look at the PRs that are filed from the internal maintainers. Just look at them. There's going to be constructed feedback from other maintainers or people from the community on that PR. If it's a complex PR and there's a lot of things that have been done to build that, think about what the internal process at that company or within that team was before it even got to the point where you're building a PR. They probably did a design review. They went through and that got torn apart and collaborative work of all of these amazing minds came together to build a plan to build something that'll get implemented, that'll work really well. It's always... A lot of people assume that great software... great implementations are a solo thing. They're not. It's a team effort. you're never gonna find something that's truly amazing that's just built by one person in isolation. So you shouldn't

Michael_Berk:

Yeah.

Ben_Wilson:

build ML implementations like that either.

Michael_Berk:

Yeah, basement neck beard coders exist but they're they're usually not the the amazing coders that we think they are. Or at least if they are they have help.

Ben_Wilson:

Yeah, they're talking to somebody. I mean, you have to.

Michael_Berk:

somebody, I hope. Cool, so I'll do a quick recap. So today we talked about spikes. Spikes are short research projects and they can take many formats. One format that is really effective that both Ben and I have seen is pairing up. So if you have a junior and a senior person, that can be effective, or two senior people, sometimes less effective, but if there's a lot of open collaboration, that can work really well. You can also have entire team hackathons or give a rotating hermit, let's say a week, to come up with a research project and then deliver those results. Couple benefits of spikes. We've talked about a few, but I wanted to highlight three. So first is team morale. It's really fun to work with each other, get to know each other, et cetera. Second is you can save the team time by efficiently exploring what should and should not be built. And then third is you can build skills. So sometimes you aren't in a time box when you're developing solutions and it's really fun to work under different constraints. And if you do implement spikes, here are a couple tips. Spending time getting existing code to work versus building your own solution usually is a high ROI activity. It's better to work with what's out there and what has been peer reviewed by many, many people instead of trying to build your own thing from scratch. Don't worry about productionizing until you get feedback. Also, keeping it simple is really, really effective because you're going to have to maintain it. And so if it's simple, it's easier to maintain, easier to understand, easier to iterate upon. And then from a human perspective, if you're looking to develop some traits, being able to see big picture is really, really important. Being humble is also really important. And then again, focusing on iteration is also really important. Anything else you want to add Ben?

Ben_Wilson:

It's good.

Michael_Berk:

Well until next time it's been Michael Burke and

Ben_Wilson:

Ben Wilson.

Michael_Berk:

have a good day everyone.

Ben_Wilson:

Take it easy.

How To Do Research Spikes - ML 093

0:00

46:58

Playback Speed: