Where DevOps and ML Meet - DevOps 156

Hosts of the Adventures in DevOps podcast, Jillian Rowe and Jonathan Hall, join Ben and Michael on this week's episode crossover. They talk about the intersection of ML and DevOps. They dive into the concepts and differences between ML and DevOps. Additionally, they talk about how ML ideas may be applied to DevOps principles and vice versa.

Show Notes

Hosts of the Adventures in DevOps podcast, Jillian Rowe and Jonathan Hall, join Ben and Michael on this week's episode crossover. They talk about the intersection of ML and DevOps. They dive into the concepts and differences between ML and DevOps. Additionally, they talk about how ML ideas may be applied to DevOps principles and vice versa.

Sponsors


Socials

Transcript


Jonathan_Hall:
Hello, everybody, welcome to an exciting and interesting, unique episode of this show. I'm not tling you to name because this is actually two shows. We are joined in the studio today with the hosts from the Adventures and machine Learning podcast Michael and Ben, and then from Adventures in Devops Today is me, Jonathan and Gilian, So we're goin to do a cross over episode and talk about Deveops and L, and whatever else happens to come to mind today before we dive into the conversation. Don't we do a brief around the room and introduce ourselves, Michael? D. You want to start
 
Michael_Berk:
Sure, I would love to thank you for the intro. My name is Michael Burke. I'm a resident solutions architect at Data Bricks, so that means I just do a bunch of random crap. Sometimes it's building machine learning models. sometimes it's implementing data infrastructure or data architecture, And then my background is in sort of a B. testing and machine learning as well, so I'll kick it over to my co host. Ben.
 
Ben_Wilson:
Everybody, I'm Ben, you know, as Michael said, Co, host of Adventures and machine Learning Podcast By education and early work. I was a nuclear engineer, didn't enjoy that, got out of that and the navy and worked in process engineering at a bunch of factories, which led me into data science, which led me into wanted to write better code, which, becoming a machine learning engineer, and then got hired by Data Bricks, worked in the field for a while. Now I work on building Mops tooling such as M L, Flow with an engineering at Data Bricks, also the author of machine learning Engineering in Action by manning
 
Jonathan_Hall:
Cool, Gillian,
 
Jillian_Rowe:
Yeah, that is very cool. I'm gonna have to ask you about M. l flow. so I'm Gillian. I work. I used to work as a bionformatician and then kind of slowly but surely moved more on to the kind of computational side of things, Was an Hpccisadmon for a while, and then moved more into this, you know kind of field that we called Debops, where mostly I work with as an independent consultant. I work with biotech companies, and I helped the data scientists get their their petblinds in their code into production. Wherever that happens to be. Usually it's on a s, but it could be on their systems. Sometimes it's sort of your standards statistical pipe lines. Sometimes it is more machine learning pipe lines and that's always kind of fun when that happens, and sometimes it's other kind of tools. like very high performance data visualization applications where you haf data. it appears in a browser, but behind that you have you know, maybe like terrabites of data hanging out in the back end that all have to be presented for a scientist to be able View and view quickly, because people don't want to have to wait for their buttons to you know, for their buttons to spin around and finally show them data.
 
Jonathan_Hall:
Very cool and all around at the intros. I'm Jonathan Hall, I kind of do a little bit of everything back in Devworkdevlops. Work lately I'm helping couple of different companies, one with some go development, and another with some p. H. P. I'm diving into P. H P again. I haven't done that in about ten years, so M. yeah, a little bit of everything and I have my own. How should I say? Maybe I'll edit that out. I don't know Hat was going to say anyway, So edit that out whoever is editing. So what are we talking about today?
 
Ben_Wilson:
The intersection of M l and Devops, and what these terms actually mean with respect to
 
Jonathan_Hall:
Nice?
 
Ben_Wilson:
how you would apply Devops principles and not necessarily technology to M. L. You know, before we started you know recording. we were chatting a bit about Ou. Know what are the fun things that we could all talk about together And
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
turns out there's a lot,
 
Jonathan_Hall:
I'm sure there is yes. so let's let's start because I think when people hear devopsand, M l, the first thing it's going to pop to everybody's mind is M. l. Ops. One of you guys want to tackle that, Michael or Ben? What is Mlopson? Is it a useful term in the first place?
 
Ben_Wilson:
Uh, useful. I think it's It's good to have a label for things.
 
Jonathan_Hall:
Hm,
 
Ben_Wilson:
It's bad when that label is so ambiguous and nebulous that nobody knows what it actually means And if we were to go back in time fifteen years ago, maybe twenty years ago, Dot com bubble was grow and getting really big. People were writing a lot of letting. not so great code trust, trying to push stuff out there make some money. Get a Fired by a big tech company. That's the thing that created San Francisco as it is today, and built palloeltom. People realize that that time like Hey, we all need to subscribe to some sort of process around how we build software, how we deploy it safely. How not to get woken up at three a M on a Saturday, Because your code blew up in production. And what are all those controls that we need? What are the the things? What are the processes that need to be in place? and in any solid engineering practice you start with those processes of what you need to do, and then you build software to automate that stuff. And I think that's what people associate with with Davos today who don't who aren't actively involved in traditional software engineering. They're like. Oh, it's those tools. It's like it's Terra form. That's Davos. And and you know, Our C, i C, D platform where we're using get up actions that's devops. Like now it's the process. Doesn't matter what tool Yo're using. It's that whole concept of we write code. We get it reviewed, We make the changes, we push the changes we test, and we do you know this, this full life cycle of people, everybody who has distinct roles are interacting with one another in order to get the end goal, which is software running well in production, And there's a whole bunch of other there And that. But if we were to go back twenty years ago and look at what that was, that's where Milosis now people are like. Yeah, we know there's things that need to happen. We don't know what they are yet exactly. we don't know the most optimale way to do it. so people are just kind of lumping a bunch of stuff in like. Oh, I need you know, I need to explain ability. that's really important. So that's part of Melos, and I need tracking. That's really important and I need you know to be able to do hyperpraminer tuning and track Of that and that's all going to be an M lop, like, kind of, I guess, but it's just an overly loaded term that refers to too many things right now because
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
people haven't all agreed and the tooling hasn't completely been built out. That does all of the things that you need to take an idea from its initial concept to something that's shipped to production is running continuously. That's my take on it.
 
Jonathan_Hall:
Any disagreement for you, Michael.
 
Michael_Berk:
Yes,
 
Jonathan_Hall:
I want to
 
Michael_Berk:
tons
 
Jonathan_Hall:
see a fight.
 
Michael_Berk:
of disagreement.
 
Jonathan_Hall:
Great,
 
Michael_Berk:
No, I think that's a that's aligned with my understanding as well. Um, one point, I would like to highlight that Ben stated was sort of the Malo development, in a historical sense, is pretty nacent and pretty underdeveloped where where software was about twenty years ago, Giver take whatever number of years, And so there's a lot of work that needs to be done And I was wondering if we Kick it over to you guys because I was wondering this myself. What is the reason that M l is different from software Like, Why is it more? quote, Unquote, complex, or just different,
 
Jonathan_Hall:
I think the short. I'll give my short answer so that I can say something at all, because as soon as I handed over Gillian, she's going. No, She knows this step better than I do, and she's going to just make anything I say sound like child's play. I think the difference. The big difference is that M l is Uh. Software is mostly Um, building logic. Mostly, it's not entirely building logic. There's also assets, sometimes graphical assets, or data assets. whatever, but it's mostly building logic, where as M L includes logic, but Huge component that is data and a model, and those things don't fit well into the brains of softer developers all the time. Like how do we handle these other types of things? I don't know. so I think that's probably the biggest difference. I mean, there certainly is a softer component to M, l. M. But there's those you know, those data models and then the data that feeds them or it acts upon, I think makes it different, so I think that's the biggest difference, but I'm really curious now if Gillian can either correct me or or expand on that.
 
Jillian_Rowe:
So now I agree with you on a lot of points. I would say software is a solved problem, but I would say it's a lot closer to being solved, and we. we understand the parameters right. like Um, software, kind of, as its core hasn't changed that much since I took you know, Intro to computer science considerably longer ago than I'm willing to admit. right, the data structures are all very similar. We have list, we have dictionaries, we have link list, we have functions, we have classes. right, these are. these are all kind of fairly stand Things that you see sort of across the board within software. We have. you know. My favorite is a four loop all the time. only. Um, you know like so, so we have all these things and you know for the most part if you can kind of read software from one language, unless it's got like super super ridiculous layers of abstraction. you know, just across the board, you can kind of look at it and see what it's doing, And I would say that's not true for data. Um, data is a lot more complicated. That's why, In ways that We don't like, we don't know what we don't know. So for example, if you look across, maybe the last ten or fifteen years, how many different kinds of data bases have there been right? We had relational data bases and then everybody was like. No, that's two structure. Let's have you know mon go d v in document data bases, and then let's have like reds, key value stores, And maybe we should just stick everything in the last tic search because you know why not, and we're kind of constantly evolving these ways to to have data And to reason about data, and then not only with that when when you have a data set quite often you have to, you have to track the history and what's called like the data lineage of that data and it happens in multiple places right. So maybe software like we could say it basically happens in my idea. That's not true of data. So for work with bionthramatics and genomic data, So you know, let's say that we're collecting you know Samples from a person in a clinical trial. There are so many different places where something could go wrong with that data. there could be, you know, like I don't know. the person drank a bunch of crazy energy drinks before they went in for their diabetes blood draw, or you know, or the nurse who was drawing the blood could have mixed up some tubes, or somebody might have been eating lunch in the lap that day when they were running. the you know, like the running the blood test through the sample that you know, Mic. What could have been kind of monkey? Once you have the raw data, there's like so many different ways that you can process that so you can just see. like all across the line, there are so many different places where something could have happened that affected that data, so I think data is much more of like an open system where a software was much more of a closed system And I don't. I don't know that I quite like that terminology. but that's that's what I'm gonna go with for the moment.
 
Ben_Wilson:
Yeah, I mean, I'd agree with that quite a bit. Even even the myth that people have of, let's say we all work at a start up together. And
 
Jonathan_Hall:
All right on three, one, two three, we all work at a start,
 
Michael_Berk:
Hm
 
Jillian_Rowe:
Um,
 
Jonathan_Hall:
guys. come on.
 
Ben_Wilson:
So if Jonathan you're in charge of creating the app
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
Nd, you've created a front end that users interface with, and that's sending data to Gilly. And who's running the entire back end system? Who's processing all of that You know, Parsing that Jason, getting into a structure format, which then goes to Michael, who's our data engineer, Who's taking that and putting that into a format that can be saved into a either relational table or data warehouse somehow. Then if I'm working as the data scientist, I might think, Hey, we know where the data came from. We know what the structure was. We have no strong type and controls on every phase of this process. What happens when we need a new product feature And Jonathan now adds for four features to the data. Juliana has to up date all of that on the back end to process that, and Michael's got to change the data engineering code fast forward four months. Oops, we didn't Those three features. In fact, one of them is just wrong. We. our logic
 
Jonathan_Hall:
Sorry,
 
Ben_Wilson:
was bad.
 
Jonathan_Hall:
sorry
 
Ben_Wilson:
so
 
Jonathan_Hall:
about
 
Ben_Wilson:
even
 
Jonathan_Hall:
that.
 
Ben_Wilson:
with those controlled systems that people assume are going to be sacrosanct, they never are. Over time they are at at a particular point in time. You know we can have our tests. you know, validating that everything works correctly, but over a long duration things go wrong and in data in the data world, I don't know if there's really good Els right now To correct for that, you can detect it. But how do you coerce data? You know with the
 
Jillian_Rowe:
That
 
Ben_Wilson:
Gnome
 
Jillian_Rowe:
is why
 
Ben_Wilson:
example?
 
Jillian_Rowe:
we have clinical trials
 
Ben_Wilson:
Yeah, exactly
 
Jillian_Rowe:
right, like, um, Yeah, No, no, I agree with you. There are no really good ways to track that, so you have to have some kind of like end test in mind where you say like Okay. Is this working the way that we say that it's working And you have to have some way to verify it in Sconomics. Ou have to be able to verify things. Usually an allo clinical trial.
 
Jonathan_Hall:
One of the things we do in Davos using the word loosely is this contempt of everything is code. You know, infrastructure. S code deployment is code. whatever, Get Ops is buzzwere. these days you know He reason for that is, I think it solves the same problem you're talking about, but for a different problem domain, you know. it solves the problem. What was the state of our configuration and our infrastructure On January third, When the system crashed, We
 
Ben_Wilson:
Hm.
 
Jonathan_Hall:
can. We can figure that out and we can revert in theory. At least we can revert, or We wind back to that last known good state. I think you're saying, Basically we need something to solve that problem for data. At least that's part of what you're saying, and data as code doesn't really make sense because you know who wants terrabites of. I don't eve. No, that would look like. honestly. I mean, code is data in the first place. In a sense, you know. as far as the machine is concerned, Am I understanding the problem correctly? Know this, The tracability and the revertability, and and know all that stuff. All these problems are unsolved for data. Broadly speaking,
 
Ben_Wilson:
They're each solved in a piecemeal fashion. So if you wanted to build this entire story right now of like hey, can I detect those changes? Can I use statistical validation to determine what the impact of those changes might be? Can I filter effectively without having to manually go in and Delate rose in a data base that's terrible. Can't even imagine how there would be a gnomic scale and you're like Hey, I've got eighteen pet bits of data that came in. Let's go through that manually. Like not going to happen. So there's tools that you can automatically. You know these concepts. they're not. They're not new. The theory of them, The techicgoes back many hundreds of years, So you just implement that and a lot of people have already implemented these these techniques in packages. The trick is those detection algorithms and their implementations. They're not typically built into a data engineering in jest layer, So Don't get. you. Don't buy a data base from somebody. and it comes with data validation checks. You need a third party to bolt on to that. That says okay. Now, I need everything that's coming in here has to meet these rules, Uh, and also on the output side, When you're going into the model, you might be joining a bunch of tables together and
 
Jillian_Rowe:
Oh,
 
Ben_Wilson:
getting a data set that your gonna use to to feed into whatever implementation you're using. You can go in shop for that and get a package that will do that validation, or you can Roll your own, if you. if you're a massachist, that will do effective full screening of all that data, and then on the output of the model, you can do the same thing again, and you need monitoring, as Gillian said lineage, and that whole end to end concept is data, lineage, To say what changed, and then bolting on the validationchecks to each phase of that. Nobody has that full end to end built out. yet Some companies are really far along on that path and will be done in the next two years to have that full story done, But it's still too nacent.
 
Jillian_Rowe:
I would also say like there's There's a big issue with the size and resolution of the data just constantly increasing faster than the scientists, or you know, the people who are like deploying for structure and pipe lines and things like that in the background can really compete with. so, for example, and working with a company, Now that within like two within, let's say four months, I think it's about four months, ten ex, the size of their data and it's like Well, we can. We can still do all the same things right. And it's like what We can, but it's you know. it's It's ten x the data size, so Um, on a good day it's going to take ten times longer. but often you know. often these things don't scale right. There's there's a lot to be said for you know. Okay, how do you deal with an array that's billions of data points long? Can you read that in a memory? How much of a machine do you need to read that in a memory? Are the statistical kind of you know? tools and tool sets that we're using. Are those even applicable to that much data You know, like like back in the day, If you're doing statistical methods, you would apply something called Bonforoni correction, which is basically, I'm running so many tests that I need to make sure that I haven't like. Just just found something just by virtue of running so so many tests. How do you do that when you have you know, like like billions of data points and you have a matrocy right of billions of data points you have. Like you know billions one way and billions the other way, and like everything in between. So it's umyeahit's. all a complicated issue and the scientists are cool. but like, Could you all stop increasing your data for just just like a week or two? That would be very nice.
 
Ben_Wilson:
That was something. I had a conversation with one of our genomics people at Dataricks many years ago and I asked him. I was like, What's the biggest problem that that you see in industry? And he said the exact same thing that you just said. He's like there. He's like. It's good for us as a company, but it's bad for all of our customers who have to do the analysis because you can now sequence to human geno, like full sequencing in you know millionths of the amount of time that it used to, And he's like these machines can just power through samples, and every year you're just you're doubling the size of the data, or it's in order of magnitude bigger. I can't remember what he said. If you're listening, Will Brander always great talks? but he was talking about just how how this proliferation of d n A and all of the data is actually making it so that people Can't get their analysis done in the amount of time that that they have allocated without blowing their budget way up, And he's like, Yeah, No, C. T is going to be like, Oh, you have a hundred X. The data here's a hundred x. the budget it's going
 
Jonathan_Hall:
Right.
 
Ben_Wilson:
to like. Go figure out how to do this more efficiently. That's a real challenge.
 
Jillian_Rowe:
Yeah, and then you know we solve one problem, so I would say, like specifically, Dan Sequencaing, That's that's a fairly solved problem right any most by informaticians can be like. Yeah, Sure, I can run an n g S pipeline, and it's all going to be fine. Then they start developing new ways of looking at the data and new ways of tagging sequences. and let's look at individual cells, because you know why not, and like all this kind of stuff, so as soon as you figure out one problem or like one, you know type of data, one data structure Say, like you know, mixed data is one, like specific type of data. Well, then there's single cell and then there's like images that come from a microscope, And then there's like, Let's figure out how these proteins are actually like positions in a three dimensional space and you know, like it. Just it goes on and on and on and it never it moves. I think the data. I think the data, in my opinion, the size, the complexity, and like, the sheer, like variety of the data move so much faster then The software, and he, and like the computational capabilities.
 
Ben_Wilson:
Hm.
 
Michael_Berk:
And
 
Jillian_Rowe:
That's
 
Michael_Berk:
Gillian.
 
Jillian_Rowe:
an opinion.
 
Michael_Berk:
How are you
 
Jillian_Rowe:
though,
 
Michael_Berk:
seeing people handle this increasing scale,
 
Jillian_Rowe:
Um, like crying. I don't know. I
 
Jonathan_Hall:
Oh,
 
Jillian_Rowe:
mean.
 
Michael_Berk:
Like just spending more money on compute, or do they change their techniques or how do they handle it, or other than crime?
 
Jillian_Rowe:
So so it's a little bit of both. That's kind of like you know. Been like my whole career strategy and how how I make my money is like. I find the people with the interesting problems who are like right on the custom of that, Like we have a data set and we need to increase. either we already have increased it, or ere about to increase it by ten X. Like what can we do? I do a lot of like parallel computing I work with so specifically I work with data scientists, which has, which has always been really interesting to me, because it's like it's more Like a people problem, then a computational problem, which which I personally find more interesting, and quite often times you know, and sort of in, like the spirit of Dubops, I always see us. It's a collaboration. It's not like I'm working for them. Where they're working for me. You know, like vice versa, were working together on kind of, you know, an equal scale to get this research out the door. and, but quite often, because their data scientists, their experts is in the science, and like biologists are wild, right, they understan More about, like some sub family of you know proteins that nobody's even heard of before then like I know about my own children. A right like they are, They are all in on on their research. They're always. They're always very cool people to talk to. And so you know, so what that said? there? They're not computer scientists for the most part, and buy on phermatics. They don't have any kind of classical training. You know, they're like. I'm going to hack together some stuff with some python. Nor you know like It was all Pearl back in the day, and bio Pearl, because Pearl, save the human genial people For anybody who wants to make fun of me about my pearl days, you know so so a lot of times there's a lot that can be done to just go through their code and apply kind of methods that you know we're going to work. So one thing that I do a lot is I do a first pass and I just see like what can be vectorized What? What can we vectorized out of this? so specifically in the Python, or like the R. Tidy Verse. There is this package called Nump, which is you know Way of creating like rays and matrices, And they have a lot of every optimised ways of computing on rays, So like a lot, a lot of kind of linear algebra stuff for you know, for those of you that are finally remembering your math days, that's always like the first pass and then the second pass is to see kind of what sort of data they have And can we can? We give it some structure. Um, And that's that's just kind of all over the place. That. really? it really kind of depends on what it is. Usually it's something it can fit into either some kind of matrix. like, for example, imaging data as a matrix. You have a matrix where each point in the matrix corresponds to a pictellinyour image. Right and then other types of data are. You can throw those into. You know something like a data frame. And that's I think. that's becoming like the most popular method that I'm seeing. So far it seems like You know like like these park files are getting really big and so, for example, like w. s. Athena is really kind of pushing the sort of idea of like Okay, transfer all of your data into park format. Throw it on W. s. Athena and then let us just like scale out all the you know everything for you.
 
Michael_Berk:
That's interesting. and just for anyone who is not aware of the definition, do you mind defining vectorization
 
Jillian_Rowe:
I think I do. I'm not sure what the definition of vectorization is. I mean, it's like it's black magic underneath the hood, So you could, you could have a four loop and you could say like okay, So let's say I have an array and I want to add one to it. One. one way of doing that would be to have a four loop and say, like four element in array element is equal to element plus one Right, And that would be one way of doing it, or you could pass it to a vectorization library and you could do that, and sort of the notation Would be like Numpiaray plus one, And then it's smart enough to know underneath the hood to apply that to each element in the array, But I know at some point I learned like what, what is actually happening behind the scenes? And now I really don't remember. It's been a while you guys pet kids. My memory is
 
Jonathan_Hall:
I
 
Jillian_Rowe:
shot.
 
Jonathan_Hall:
was always good at. I was always good at math. but matrix math and Victor math always confused me, so
 
Jillian_Rowe:
I like
 
Jonathan_Hall:
When I hear that, I just it sounds like star trick to me, you know, Reline, the face induces and blablablah, and okay, cool. I'm glad that's working for you guys.
 
Ben_Wilson:
It's the underlying
 
Jonathan_Hall:
M.
 
Ben_Wilson:
tech of a lot of machine learning libraries. Even the ones that people don't really talk about these days That much. one of my favorite libraries that's out there, and this makes me super nur. for at least for Python libraries is Stott's models. I don't know why I just love it. It's not that the apis are exceptionally designed or anything. they are very good. but when you dig into the source code you start seeing that everything is re Erring to compiled libraries. Which there's another library that's just like that. The one you just mentioned, Umpire nump is this shiny Python veneer over eight absolute metric crap ton of C code that has been optimized and compiled against your your run time. You're operating system that you're installing it on, and those vector optirization optimizations that happened, Whether it's going to be trunketing that to a sparse matrix where you're saying Hey, I've got a crap Zeros in here. I don't need to store those that. I'm just going to default to say if this data is missing at this vector position, I don't need to hold that into memory, so it becomes much smaller like Hey, have ninety nine percent sparsity. I'm going to save. I'm going a store one percent of the data in this vector representation that that links to a hash map in memory. That that register. I can say do this operation. this multiplication division. If you A bunch of venters together, that's a matrix. it can do stuff like invert that matrix and then take the dot product of it, and those operations are really efficient and from a thread execution point that's all done in parallel, so that's why you get this blinding fast performance. and it's interesting to hear you say Gilian, that the first thing that you look for is stuff like, what are the optimizations that I can do with respect to data structure storage, and within that that code, Because Everybody that I've ever met industry or ever talked to that is really good at getting things out the door and being able to do an effective code review in the M. L. space data structures are the first thing that they look at like. Hey, how are you storing this in memory? Yeah, it's cool that you have this fancy algorithm whatever, and I'm sure it works great. But how is that algorithm actually doing that math within the computer? And if you're storing it as a list of Python lists and then trying to multiply another list against that, that's why it takes forty minutes to run. Let's convert these into victors, and now it executes in seventeen seconds. That's
 
Michael_Berk:
Yeah,
 
Ben_Wilson:
good to know.
 
Michael_Berk:
and for
 
Ben_Wilson:
Meet
 
Michael_Berk:
an
 
Ben_Wilson:
another random person who sees that exactly the same way as other season professionals that I
 
Jonathan_Hall:
Hm,
 
Ben_Wilson:
ve worked with at data bricks. That's cool.
 
Jillian_Rowe:
That's good. Then I like to know that I'm a good company.
 
Ben_Wilson:
Uh,
 
Jonathan_Hall:
Uh,
 
Ben_Wilson:
uh,
 
Jonathan_Hall:
huh,
 
Michael_Berk:
Yeah, that's interesting. Typically when I looked to optimize a work flow vectorization is very important. and if you've ever looked to figure out what is the best way to loop over a panda's data frame or a numpiaray. those stack overflow posts that just have a chart of essentially seven different methods, and on the X access scale. Why access is time? Those are awesome, so go check them out. But yeah, then then data structures is a great approach, but So one thing that I think people often miss is removing essentially processes that aren't essential to the to the end goal. Often time people bring stuff from point to point B that isn't needed, So trying to minimize down things in production is often really really helpful. And Jonathan, I know you've been quiet for a bunch of this.
 
Jonathan_Hall:
Yeah,
 
Michael_Berk:
Do you have any thoughts on how to improve Run times may be in the devopspace, or whether it be
 
Jonathan_Hall:
Uh,
 
Michael_Berk:
data
 
Jonathan_Hall:
huh.
 
Michael_Berk:
Geneering or anything?
 
Ben_Wilson:
P.
 
Jonathan_Hall:
Not
 
Ben_Wilson:
h. P.
 
Jonathan_Hall:
really. I mean P. h. P. Stop using P. H, P is a good way to increase though the time.
 
Ben_Wilson:
Try a job. a script.
 
Jonathan_Hall:
Um, So, I mean, that's not an area where I have a lot of experts. I mean, I have worked on doing profile, so maybe that's my y. General advice. The first thing to do is profile what you. What's wrong? You know, Don't just make blind guesses about. I think this looks you. Sometimes you can you can get right. You know trained enough well enough trained eye can see patterns in the code that are going to just be bad. but once you're past that surface level, you do some profiling before you try to uptomy stuff. You now see where the bottle neck actually is happening, And that's omething that so many people forget to do and they're like, Oh, I have a memory league or I have my Cpsyuknowgoing. you know, had percent whatever, and they jut start doing things, And Maybe you're lucky and you get it right, and it's better now. actually, very recently on stack overflow. In fact, I saw a post. Somebody had some something. He were Oregon. Go Was was eating a hudrpecent C. P. And they asked how do I optimize this And the answer was one of those Let's just try something things that actually ended up breaking the entire code but
 
Ben_Wilson:
Sorry,
 
Jonathan_Hall:
got accepted because it stopped eating. C. P U. H.
 
Ben_Wilson:
Sup
 
Jonathan_Hall:
Somehow,
 
Ben_Wilson:
went to zero.
 
Jonathan_Hall:
Yeah, C, P went literally went zero. Basically it was was a tight for loop, waiting for An event to happen and the solution was just return early. Effectively. that wasn't It wasn't quite that simple. but that's effectively what happened. It was. just do an early return if you don't get an answer, rather than actually waiting for the loop to complete properly. So yeah, bench marks, find out what the actual problem is first before you try anything. That would be my general advice that I think probably applies to even data modeling N. M. L.
 
Ben_Wilson:
I could not agree more that I see that more in the people with just enough C. S knowledge, who are getting into data science these days that want to focus on. Ah, It's really cool that we're solving this problem with M. L. but they have enough computer science understanding, but not enough experience where they want to go through. You know as the old saying goes, it's the root of all evil when you're reading code, so that whole premature Pitimization
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
where you're like Hey, this is going to run slow even though it's just a hunch. or I think this is going to be bad. And then you introduced all of this crazy complexity and spend all this time making something that took fifteen Nana seconds in execution to run in four and no seconds. Meanwhile, your code, when you have to now move on from your prototype to creating your first You know release version. It takes you six months to write that code because it's so insanely complex because you optimized it ahead of time, And then when you need to change it a year from now, everybody's like we need to rewrite this because nobody knows how this works.
 
Jonathan_Hall:
I was helping a start up last year when I, when I was introduced to them, they had some terrible data base performance. This wasn't M. L, but it it was just. it was a simple data base, but it was performing terribly, and their customers were experiencing time outs and other errors as a result of time outs and their solution, which honestly wasn't a terrible solution in the long term, but for this problem it was over called. The solution was to switch data bases and re implement the entire data layer. Now they needed Do that, but for other reasons,
 
Ben_Wilson:
Okay,
 
Jonathan_Hall:
and so I joined and I'm like, maybe we could do some data profile, some performance pro filing, and see where the slow queries are. Within two days we had it working effectively for customers. They spent weeks on this problem,
 
Ben_Wilson:
We
 
Jonathan_Hall:
and
 
Ben_Wilson:
need to
 
Jonathan_Hall:
so
 
Ben_Wilson:
move to Mon. Go.
 
Jonathan_Hall:
effectively
 
Ben_Wilson:
it's like No,
 
Jonathan_Hall:
we
 
Ben_Wilson:
you need
 
Jonathan_Hall:
we
 
Ben_Wilson:
an index.
 
Jonathan_Hall:
post. It was almost that simple. We ended up postponing that rewrite the data later rewrite for months because it was no mean it. We still nee It happened, But it became something that was no longer urgent and I just took a couple of days. of. The main reason it took two days to do the profiling was because they weren't doing very good. Devopstthe release cycle was slow and everything
 
Jillian_Rowe:
Yeah, I want to pick you back on that. and like say, context is so so important. This is something I kind of struggle with sometimes like when I'm talking to the software people, because I feel like they get a little bit hung up on like Well, I need to have these elegant abstractions of this thing and it's like, Do you though? Do you? really? you know? So, for example, like last year or something, I was working on a project and there was this one step in this pipeline. There was like, really, really, not optimize, and it took like a couple of hours to run. But the thing, They only needed to run that step like two or three times a year, because it was linked to a physical process that could like, only happened a few times a year where they had to add new test kids And it was like this whol process. so it's like. Well, it's the least efficient part of the code. But do we care and the answer was like, No, not not really. It happens, you know a couple times a year, so I mean, if you wanted to say how many you know, seconds a day, it's running for you probably could, And so having that kind of Understanding of like, sort of where the priority is and what's happening where is also very very important.
 
Ben_Wilson:
Yeah, a hundred percent I, I couldn't even begin to communicate how many teams that I've worked with. Uh, Well, while I was in the field, the data bricks where you talk to this season, sort of salty data engineer, or or soft back end software engineer, And they're like Well, we need to create these layers, as you said Julian. These layers of abstraction. we need to have a builder inter face where We need to do a factory pattern here. And my response at first was like, Do we really need like you know, Go home and crack open a couple of software development books Like, Yeah, I know the gang afore said this was super important. We're not really using an O P language here, and it doesn't matter if we're following a pattern anyway. this is not law. this is. This was generally acceptable to accepted at the time that that book was written As being way better than what came before, which was script garbage and just spaghetti code everywhere, so setting constraints on a language that was available at the time. you know, it was predominantly around C. and then later people applied it to Java, which is an O P language, but modern language. S. these high level fifth Gen languages As the things that we've talked about on this this podcast so far, let me see if I can enumerate them. Go fifth generation, high level language are fifth, Maybe a sixth generation or sixth level, incredibly powerful statistical language, Python. fifth generation, You know extremely high abstraction language. All of these interpreted languages you can do functional program, you can do P. you can do declarative. You can write the crappiest script that you could ever imagine. In these things they will run. and the computer is pretty good at rewriting your code for you. The interpreter is Um, Sign patterns and execution. If we're not talking about a compiled language where we have to handle memory management, Most of the languages that people are touching these days, we're no longer worried about design patterns as much unless you get to a level of complexity of your project that is so extreme that you need levels of abstraction, or like, Hey, let's eliminate
 
Jillian_Rowe:
M.
 
Ben_Wilson:
eighty percent of the boiler plate by creating a factory pattern here. That's what that's for. Ah, But when we talk about the the importance Es in modern software development in M. L, and you know, back in software engineering, the way I see it is readability is better than composability. Composability is better than inheritance, and inheritance is better than chaos, and those four levels. That's kind of how I see code now. And if we can, the only time we need to move away from pure readability is where. Okay. We need some straw. You're here because the code is just too big. That's my take on it.
 
Jillian_Rowe:
I agree. I like readable code.
 
Jonathan_Hall:
I think readability. So first I just preface. His readability is subjective and it depends on who who's reading. You know. it's not. It's not a static trait of the code, but I agree that reaabilreadability is almost the highest concern for most code. It is almost the highest concern May be the highest concern for rare code performance trumps readability. You know something in
 
Ben_Wilson:
Hm.
 
Jonathan_Hall:
a tight loop, maybe, but in general and this is something I I pound, pound the hammer all the time when I'm Coaching people. venturing developers. read a bill Code is the most important thing. as Martin Fowler says, Any old fool, or paraphrasing, any old fool, can write code that a computer can understand. It takes a good programmer to write code that another human can understand.
 
Ben_Wilson:
Hm.
 
Jillian_Rowe:
I like that. I'm going to steal that. That's like, right up there with the you know, documentation is most important for me in five minutes. When I've forgotten what this does.
 
Ben_Wilson:
You know the lines test that I use now, which actually sort of makes me chuckle. A little bit in the back of my head is when I read an implementation. If I can strip all the comments out of it, which, the only time you should be reading inline comments in code. in my opinion, is when you're reminding your future self that this was decided for a particular reason because of things outside of the code base, like, Don't forget we're doing this because there's a tech debt over here That hasn't been fixed yet When we fix that, remove this crap. But if I remove all of the comments and even the dock strings from methods and functions, and then I can send it to somebody like one of my, my, my fellow employes in the field at data bricks, who has no context on what I'm working on, and if they're not impressed by the implementation, if they're like, Yeah, cool. I guess
 
Jonathan_Hall:
M.
 
Ben_Wilson:
It seems
 
Jonathan_Hall:
Hm,
 
Ben_Wilson:
pretty simple than I'm like. I nailed it.
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
That's how I know. if it's not impressive, Beto,
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
somebody who is enthusiastic about software development. But if I send it to somebody and they're like who, this is super awesome. I'm going to need a couple of hours to figure out what's going on here. I
 
Jonathan_Hall:
Okay,
 
Ben_Wilson:
know I have to refacture the code because it's just too complicated
 
Jonathan_Hall:
Yeah, I agree with your sentiment on comments. To comments should only exist when they're necessary to explain why they should never explain what, or how that should be obvious from reading the code. The rare exceptions are when performance is of utmost criticality. You know, if you have to do something with a reverse for loop for some reason because of memory optimizations for your C P architecture. Whatever, maybe you want to explain that, But those those are so rare that they are the exception that proved the rule
 
Ben_Wilson:
And they should be buried. In my opinion,
 
Jonathan_Hall:
Definitely buried
 
Ben_Wilson:
That should be in
 
Jonathan_Hall:
and behind
 
Ben_Wilson:
its
 
Jonathan_Hall:
a well
 
Ben_Wilson:
own
 
Jonathan_Hall:
named
 
Ben_Wilson:
module.
 
Jonathan_Hall:
function.
 
Ben_Wilson:
Yeah, its
 
Jonathan_Hall:
Highly
 
Ben_Wilson:
own
 
Jonathan_Hall:
performance
 
Ben_Wilson:
modules.
 
Jonathan_Hall:
for loop implementation is the name of that that that function.
 
Ben_Wilson:
Yep,
 
Jillian_Rowe:
I had that happened to me this week. I was. I was timing two different what I thought, Look from the documentation like two different like in taxes For accomplishing the same thing. It was Noht one was like significantly faster than the other one on my you know billions of data sets that I was pro filing on, so I actually put it in the comment so that I forget about it and revert it back to the previous one because I had already done that like three times and then I ask, Is this so much slower? I don't understand and I was like Well, let me let me just try this. And then I realized Okay, so yes, so there's a comment that says I don't know why, but this is away faster than the other syntax. That's supposed to be exactly the same pandysm looking at you.
 
Ben_Wilson:
There's some nefarious things that happen with Panda's By the way where you write a like an basic Lamda to apply some sort of function to everything row wise.
 
Jillian_Rowe:
Yeah, but once you've applied like that's the devil like you're You know you're in no man's land there.
 
Ben_Wilson:
Yeah, but you can write that and do a unit test on that or like, Hey, I'm gonna mock up. You know a thousand rows of data With all of this you know different distribution of data and you run it. You're like Yeah, that's pretty fast. Like the whole unit test executes in you. Now point three seconds and then you run it on an integration test and you're like who. This sucks Like I'm calling this function that I wrote. And why is this taking thirty eight minutes to run on? You know, One point five trillion rows of data, and then you rewrite it against that that test data set that you've generated one point five trillion rows. Nd. you're like Hey, I got the run time down to till like two minutes. This is awesome And then you run that against your unit test and it inverts So it's like an algorithm on small amount of data appears to be terrible and then something, but it works better at extreme scale. Panes is full of stuff like that A lot of times. as you said earlier in the podcast, Gillian, you know, take that panda's data frame. It's all just numb, so extract out the Umi elements and then use nump operations and you're like Hang on now. Let's way faster. Is something wrong with Panda's? It's like. Well, sometimes it's not intended to be used the way that people use it. But if you are down for some linear algebra in and base based notation with numb, That's gonna be fast.
 
Jillian_Rowe:
Yeah, but panda's are like using a data frame. Use it as like a cheap way to organize data. And because if it's in a data frame, you get you're supposed to get. anyways, the vectorization like Off the you know, just like built in right, Like if you're
 
Ben_Wilson:
Hm.
 
Jillian_Rowe:
doing something against a column of data, it's supposed to be vectorized until you get into. You know your fancy functions with Apply, in which case to anybody skuesse. What happens? Then you know like we don't know. It's all black magic. Then
 
Michael_Berk:
Yeah, I remember, a couple of years ago I implemented a simulation framework for switchback tests, and the simulation framework was essentially a bunch of Monte Carlo simulations looped with a permutation test. So it's just like millions and millions may be billions of four loops. And what I initially built was a pandasimplementation and that was slow like it really did not run. And then I moved it over to Napa, and it helped a bit, But there are also some sort of out of the box solutions like Modin. If you just change important mode, Dot panda's or whatever the import statement is, instead of raw panda's that I think had a three x improvement without changing any code. So that's really hard high roi solution. And then I'm a huge, huge fan of number, which actually compiles four loops down to machine level code and those run like lightning as well, But yeah, it's you can sort of convert the code itself, but you can also try to paralelize Each of the streams of, let's say in Monte Carlo simulation, So there's lots of different angles you can take, but big fan of motion as like just a one line change and it usually leads to improvements.
 
Jillian_Rowe:
Yeah, I've been getting really into mode in recently. It seems like it's It's really like, matured in the last like six months to a year. So I was. I was bench marking a bunch of cod against that and one of the things that they do say is like, Oh yeah, it should speed up your apply and you can go. It has like a very readable dashboard. I really like the mode in or I guess it's actually the Ray Dash Dash board. So you use modin, and you set Ray is the execution back end And then you know, I look at the dashboard here and you can act. You see that it is executing everything in parallel and you can run. You know, I mean you can run something fancy. I just run like H. Top on the note that I'm on and I just take a look at like you know, Is this actually multi threaded and yes, it is. I guess the biggest problems that I've had with that though, And maybe maybe you guys have some advice for me now that we're moving on to the E. and my problem segment of the
 
Ben_Wilson:
Yeah,
 
Jillian_Rowe:
show is that at some point like when you have these really large data sets and you want to run parallel operations, your task Graph gets so large that like managing the task graph ends up being the thing that takes the most time is supposed to actually doing doing the computations. Because I've been keep running into that wall lately. How do you guys deal with that data breaks? deal with that like should I just buy data breaks and that's the solution.
 
Ben_Wilson:
It most certainly can in
 
Jillian_Rowe:
Okay,
 
Ben_Wilson:
a number of different ways, So it's funny that you mentioned Ray. The team that I'm on just did the the integration of Ray on Spark, and we got to play around with that library. It's awesome and the team that that built it maintain it are also awesome, human beings, just fascinating to work with. And then, but for the question that you asked, Uh, the way I would first Approach it is if it is inherently parallelizable and you don't have any cross dependencies across rows, you can kind of get away with a really simple. Just create a synthetic grouping key on your data. It's just an arbitrary generated window function and just a sign. a group membership to a throwaway column group By that and then use vectorized uh, cerialization, decertilization with Piero, and you throw it into a pandasd, So on every spark executor, you're going to have a certain sub set of your data set. that's going O be in Panda's, and then once it's in the executor, the worker on spark, you can do whatever you want. you can say Hey, I'm gonna 'm gonna strip that out. Throw in a umi. I'm goin O use number if I want, and you can if you need it. the scale, and this is what a lot of hard like genomics and health and Life Sciences customers do when they're like, Hey, we, we got a couple of petobites of data we need to process An, And we're doing this on this beef machine that we have on Prim, and it takes you know thirty six hours to run this after a couple of days of us, you know, hammering away at a prototype like Hey, S long as budgets, not a concern for you, And you don't mind spinning up five hundred v Ms in the cloud and they're like whatever. we don't care
 
Jillian_Rowe:
We have
 
Ben_Wilson:
like.
 
Jillian_Rowe:
B C money. it's not
 
Ben_Wilson:
Yeah,
 
Jillian_Rowe:
ours.
 
Ben_Wilson:
yeah, we spent up a lot of instances. Like all right, We Twelve thousand c P. cores available to us on this cluster. That job now finishes in eight minutes. So that's that's what you can do with data bricks. It's pretty crazy.
 
Jillian_Rowe:
Do you guys have so one thing that I've been getting asked about a lot and I'm still kind of reasoning about how I can talk about this in a way that's understandable to the people and to the scientists, because I have to you know, live in both worlds. But is this idea of adaptive scaling? so let's say with the example that I used earlier, where in four months we moved from a data set, there was two hundred million data points. Although it's like a matrix, you know, multiply that to two billion, so that ten the resolution of that data, And I mean the most clever way to deal with that is to be like I'm going to put. I'm going to have my mode in data frame that has my points, and then I'm going to have a ray cluster in the back end and then I want to set my Ry cluster to be adaptivly scalable, where like a minimum it uses. I don't know what ten notes on like an E, C, S or data brick cluster. I guess And then, but if it needs to do it and scale up to a hundred, because you know what the hell? It's not my money. It's the V C money.
 
Ben_Wilson:
Oh,
 
Jillian_Rowe:
So why not All? Yeah, like I see people like more and more interested in this kind of solution as time goes by, I'm not quite sure. Like why I've been trying to figure out, like why it's entered sort of the public consciousness of the people that I'm working with, and I feel like the like. There's always like one source of truth that I can go, and you know, and like, find, and then heavily questioned them. So I'm going to do that. But is that is that kind of computation possible? and is that something that you also see an increased interest in?
 
Ben_Wilson:
For years
 
Michael_Berk:
I can take
 
Ben_Wilson:
now,
 
Michael_Berk:
that one for a little bit, if you don't mind, sir.
 
Ben_Wilson:
Go ahead.
 
Michael_Berk:
So one metal point is that I think T. c, or Total cost of operations is more at the forefront of a lot of businesses, and they just want their compute to be as cheap as possible while meeting a given requirement. So that is a potential explanation for why, And then, at data bricks, there's a bunch of auto scaling concepts, whether it be in compute or in Sort of a sequel, End point, And essentially what happens on the back end is we have a gateway that ques into different compute resources. And if that gateway has a very large Q, then scales up the compute resources. And so that's sort of one way to handle this is you have sort of a meta driver. Almost that will determine the load on each of the computer resources. and if the Q is too large or too small, Scale up or scale down accordingly, and then one other consideration is that typically work loads aren't even, so. it's helpful to have sort of a fast track or a slow track for smaller big operations. But yeah, just sort of managing meta data about the Q, or about how much back up there is in the system. That's a relatively simple way to go about it.
 
Ben_Wilson:
Yeah, and we try to tackle all of that stuff as just opaque. You know, if you're an end user who's trying to rationalize about how do I out O scale this for cost, or for. just if you don't care about your your budget. You're just like. How do I get more compute resources and not have to constantly be adjusting? You're going in and starting up a cluster or something every day, And like Well, it was fourteen nodes yesterday and it was kind of slow. Do I need eighteen nodes? Well, how do I can figure Eighteen notes? Do I need to change my code? So that's what our platform as one of the many many things that our platform does for users is you don't need to worry about any of that stuff. We have written algorithms that figure all that stuff out for you and autonomously handle provisioning of of machines in whatever cloud environment you're in
 
Jillian_Rowe:
Those are my favorite libraries. That like abstract, I know I was kind of harping on abstraction earlier and now now I'm right back on it. but it's abstracting like the execution layer away from me because I'm like there's a computer there somewhere. There's some storage.
 
Ben_Wilson:
M.
 
Jillian_Rowe:
It's probably on a similar network. I don't know, and I don't really want to have to know. I just want for my stuff to run, and I always kind of feel like two when I'm talking to scientists. If like, if they are worrying about those sort of details, I kind of like I haven't done my job here like you guys, don't. You shouldn't need to be worrying about that. We need to. Um, we need to get back to basics and figure out what's gone. What's gone sideways somewhere? The data bricks like just a big h. p. C cluster. Am I allowed to ask that?
 
Ben_Wilson:
It is not. we don't support H, P. C.
 
Jillian_Rowe:
Why not?
 
Ben_Wilson:
I don't think there's a big enough market for it because most, I think most people who are using H, p C are doing it on Prem. not just because they want extreme performance, it's usually because they're dealing with something that they really don't want to get out onto the Internet. Like the data is so insanely sensitive and they have so many security protocols around the ingress, And certainly the egress of the data from there, And it's also really challenging to get that amount of hardware in the cloud dedicated in such a way that could compete with an on prem h, p, C system, You're like, Hey, the way that those racks share memory, the way that they share G, P resources and C p resources, where you're like, Hey, I can. I can put fifty thousand pus all in the same machine. It's not physically doing that. Serve blades aren't built like that. It's just network architecture. Where you're talking about. your constraint Is is the laws of physics or like, what material needs to be. The bus bars need to be made out of in order to connect this server tower to this service tower. What is the inner connect? There is it. Is it bare metal? Is it some sort of unique alloy that we're using? Is it just massive pipes of fiber optics? You know, there's a lot of kins Iterations that you have control over when you're building an h. p. C system on Prim in the cloud. It's like it's probably in the same data center, but they're all virtual and you're not going to get the performance out of it. I'm sure that that every cloud is offering those. I'm pretty sure Amazon offers something for H, P. C, and as does as well, But that's like an ephemeral thing. It's not. It's not going to be apples to apples. comparison to what you'd get in your own data center.
 
Jillian_Rowe:
That was a good answer. I was mostly asking for inflammatory reasons, because a lot of times the response to that is like H. P. C is going to die and that I could be like. No, it's not,
 
Ben_Wilson:
Oh
 
Jillian_Rowe:
and you know,
 
Ben_Wilson:
heck,
 
Jillian_Rowe:
and then we
 
Ben_Wilson:
no,
 
Jillian_Rowe:
can.
 
Ben_Wilson:
it's not.
 
Jillian_Rowe:
then we can rip off that for a little while, but that, so that was good.
 
Ben_Wilson:
I don't. I don't actually see like a killer to H. P. C. because it does things that no other system can really effectively do. There's a reason why people build those systems and they excell what they do, so I wouldn't listen to people who are just trying to
 
Jillian_Rowe:
No, I'm not
 
Ben_Wilson:
Trying
 
Jillian_Rowe:
going
 
Ben_Wilson:
to instigate.
 
Jillian_Rowe:
to. I think well, sometimes that person trying to instigate me,
 
Ben_Wilson:
Uh,
 
Jillian_Rowe:
but I'm planning on you now running the H. p. C way for the rest of my career before I go to Acadia and pitch a tent to some place with no internet. Hat's the retirement plan.
 
Ben_Wilson:
So do you think that that quantum computing will be adapted to potentially augment H. p. C. in the next ten to fifteen years?
 
Jillian_Rowe:
I don't like. I don't understand quantum computing. I don't know if it's like you know, maybe maybe I'm just like, not quite smart enough to understand it, or if I haven't seen the right sort of like diagram that makes it fit into my head or what it is, but like I don't get it, I feel like it's one of these things that's thrown around. It's a little bit like a I. to me where it's like You know, there's stuff happening behind the scenes. I don't know what it is. Um, so Ye don't really have an opinion on that. Either way, I don't really understand the topic well enough. What do you think
 
Ben_Wilson:
I mean, I think it lends itself to some of the activities that we would. We would take on from an applied traditional statistical methodologies where you need to just brute force your way through a potential solution, and the Cubit states can do that exceptionally well and much faster than the Silicon based architecture. I personally think it's a little bit far off forgetting commercial version Of these. and then you know a lot of people are talking about the hardware and the tech and they're like, Yeah, it's going come. it's going to. It's going to crack all the past words and like, And it's not you know. they'll adapt. Uh, but the focus that people have been having is on the hardware, and of course the bespoke implementations of how you can apply that hardware to certain problems, but nobody is really focusing on. Hey, what is the compiler Look like? What is the? What? Do the I D plug ins look like? What does the death process look like for creating this stuff? How do I write tests? What is the test framework for this? How am I going to? you know, Write and Algorthem in some language that this thing understands, because it's not. It's not running both on. It's not run and go. You know. it's going to have its own, you know assembly language instruction set. That's going to go and talk to that hardware, so I'm sort of waiting for that sort Stuff and then be able to make a decision like Okay, I could see the utility of this right now. It's it's just it's cool to novel concept. I think.
 
Jonathan_Hall:
I'll just throw in my two cents on that, even though I don't really understand H. P. C very well, but I have a sense that quantum computing and I think even analogue computing are kind of the. They're gonna don't thin. They're going to fundamentally
 
Jillian_Rowe:
M?
 
Jonathan_Hall:
change the way computing work. They're just going to fracture it, And I say that in the same way that like we, we now have C, p s versus g, p, S doing different types of work. I think we're going to start Discovery that quantum is great for certain types of work and analogue computing is great for certain types of work, and it's just going to make things more complicated in the sense that like is no longer. Just throw it on A. S. It's like make sure you get it on the right S cluster that has those analogue compute resources or this quantum compute resources or whatever, it's just gonna.
 
Ben_Wilson:
Hm.
 
Jonathan_Hall:
We're goin t have more moving parts I think is going at a high level. That's whti'sgonna lok. like.
 
Ben_Wilson:
Yeah, totally mean. that's that's how that like, as you said, that fractured between G, P and C P. there's tons of use cases that are out there That. Could you execute it on a G. P? Sure,
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
how is it going to run? It's going to run like crab,
 
Jonathan_Hall:
Right.
 
Ben_Wilson:
Because Gus are not designed to do this thing just the same way that you're you know you're doing. You know the training of a deep learning model where you're adjusting weights in this massive matrix of connected points And this huge graph H, g, p, S. do that orders of magnitude faster because they're really good at doing like exceptionally parallel math computation. They're really good at algebra. Like really good. and when you look at the architecture of a g, p U, you're like how many parallel pipes are in this thing? How many concurrent calculations can it do it once? That's what it's designed to do it. They're designed to to do basically three Modeling of video games on somebody's you monitor. That's what that is like. Process these three hundred and forty seven thousand objects in you know synthetic three dimensional space.
 
Jillian_Rowe:
Ah,
 
Ben_Wilson:
What do they need to do at? you know, a hundred and ten frames per second. It's got to be really good at just processing that map of where to actually draw this stuff. But C, P stuck at that like Run, Run a modern video game on a c. P. O. It just won't run. Welcome to the slide show.
 
Jonathan_Hall:
Well, we're coming up on an hour. Any thing we want to touch on before we close out,
 
Michael_Berk:
Nothing
 
Jonathan_Hall:
Everybody
 
Michael_Berk:
for my
 
Jonathan_Hall:
says
 
Michael_Berk:
end
 
Jonathan_Hall:
no. All right,
 
Jillian_Rowe:
Is a going to take our jobs where we're gonna? Were we all going
 
Ben_Wilson:
Oh
 
Jillian_Rowe:
to do
 
Ben_Wilson:
yeah,
 
Jillian_Rowe:
our resounding answer to that?
 
Jonathan_Hall:
I think we've Yeah, I don't know.
 
Ben_Wilson:
A really
 
Jonathan_Hall:
Is it worth
 
Ben_Wilson:
quick
 
Jonathan_Hall:
bringing
 
Ben_Wilson:
round
 
Jonathan_Hall:
that up at
 
Ben_Wilson:
table
 
Jonathan_Hall:
the end?
 
Ben_Wilson:
on that. uh, my hot take Now, it's going to augment our jobs like every other technology that has come along since we discovered how to farm as a species.
 
Jillian_Rowe:
I also wote. No, this is like the fourth round of like a I is going to, you know, take over your job specifically in health care and genomics, There's been a lot of several Ai tools developed over the years and each one was, You know. There was. There was a big. Kind. of. Everybody was worried about it. and every time it was like it's fine. I still have a job. Now everybody knows about the chat. P. T. and so now like everybody is asking that, But same stuff to me. It's another tool that I've been told will take my job. They haven't yet H. p. C for life, so it'll be fun.
 
Michael_Berk:
For fun. I'll disagree. I think it will take our job. Just not yet. Uh, self, driving cars will replace truck drivers, and so eventually he will be advanced enough to do some of this work. but we will then just need to pivot into different roles and leverage that technology, so if you're agile you'll be fine. You're not agile. Tough luck.
 
Jonathan_Hall:
Yeah, that's close to what I was going to say. I think it will take some of the jobs the same way the automobile replaced horse breeders,
 
Ben_Wilson:
Hm.
 
Jonathan_Hall:
you know, but from an industry standpoint it's not going to take our jobs. Somebody has to program the Ai
 
Ben_Wilson:
Yeah, it's going to take our job titles, but it's not going to put us on the bread line.
 
Jonathan_Hall:
Exactly
 
Ben_Wilson:
Yes,
 
Jonathan_Hall:
exactly. And as I said before we recorded, my whole job is automating my job away. That's the definition of softer development, so it's not something to be afraid of. It's actually what I do.
 
Jillian_Rowe:
That's true. Cold myself had of jobs all the time. That's like the. It's the nature of the beast. Yeah, I like that way of putting it actually, So I do think it will get rid of some jobs, but I think like the net will remain the same. There will still you know. if there was X number of jobs before, there will still be X number of jobs. That's just the distribution of job titles I think will shuffle around a little bit.
 
Jonathan_Hall:
If there's something they could take, they could, they could like n fl swoop kill a whole bunch of softer development jobs. It would be companies learning how to embrace the proper softer development practices.
 
Ben_Wilson:
The entire consultant industry
 
Jonathan_Hall:
Yeah,
 
Ben_Wilson:
gone overnight. Yeah,
 
Jonathan_Hall:
If
 
Jillian_Rowe:
No,
 
Jonathan_Hall:
we could
 
Jillian_Rowe:
it would be every time you know somebody's like. we have to write this whole text tack. If every manager on the planet just knew to be like, No, don't do that.
 
Ben_Wilson:
By
 
Jillian_Rowe:
And
 
Ben_Wilson:
versus
 
Jillian_Rowe:
what would we
 
Ben_Wilson:
build,
 
Jillian_Rowe:
do?
 
Jonathan_Hall:
All righ, Well, this has been a great conversation.
 
Jillian_Rowe:
M. Hm,
 
Jonathan_Hall:
It's been fun meeting you guys been Michal.
 
Ben_Wilson:
Like wise.
 
Jonathan_Hall:
Thanks for doing the cross over episode. It's been fun.
 
Jillian_Rowe:
Yes, this has been great. thanks
 
Ben_Wilson:
Yeah,
 
Jillian_Rowe:
for coming,
 
Michael_Berk:
Yeah, thanks,
 
Jonathan_Hall:
We voted before recording to not do picks today, but I'm going to pick the Adventures of Machine Learning podcast. For those of you who haven't listened to it, You check it out should be very educational
 
Michael_Berk:
M.
 
Ben_Wilson:
And for our audience, definitely check out the adventures in Tevop's podcast To get some some perspective on how to think about your projects from a production deployment and develop in perspective. I think you'll enjoy it and you'll get something out of it
 
Jonathan_Hall:
Right, thanks, guys until next time.
 
Ben_Wilson:
Later.
 
Jillian_Rowe:
Bye.
 
Michael_Berk:
Thanks.
Album Art
Where DevOps and ML Meet - DevOps 156
0:00
1:05:21
Playback Speed: