Deep Learning for Tabular and Time Series Data - ML 104
Today we speak with a staff data scientist at Walmart who specializes in forecasting. He has built an open-source tool that allows you to leverage tabular data in PyTorch. He also has written a book on time series forecasting with deep learning.
Special Guests:
Manu Joseph
Show Notes
Today we speak with a staff data scientist at Walmart who specializes in forecasting. He has built an open-source tool that allows you to leverage tabular data in PyTorch. He also has written a book on time series forecasting with deep learning.
On YouTube
Sponsors
- Chuck's Resume Template
- Developer Book Club starting
- Become a Top 1% Dev with a Top End Devs Membership
Links
- [2207.08548] GATE: Gated Additive Tree Ensemble for Tabular Classification and Regression
- Modern Time Series Forecasting with Python: Explore industry-ready time series forecasting using modern machine learning and deep learning
- LinkedIn: Manu Joseph
- Twitter: @manujosephv
- GitHub: manujosephv
Transcript
Michael_Berk:
Hello everyone. Welcome back to another episode of adventures in machine learning. I'm one of your hosts, Michael Burke, and I'm joined by my cohost.
Ben_Wilson:
Ben Wilson,
Michael_Berk:
And today we have a guest that I'm very excited to chat with. His name is Manu and he's currently a staff data scientist at Walmart. Walmart, excuse me. Um, but in prior roles, he's worked as a supply chain consultant and headed an applied research team that focused on both causality and prediction. So Manu, I noticed that you left school with an MBA, but since then have moved onto sort of some technical projects. So how have you found that transition?
Michael_Berk:
Right. that's That's quite an interesting journey. if you ask me, because how I reached and teached data science as a field is a very long and winding journey, my undergraduates, or graduate studies, as in engineering, industry, engineering, which is basically like mechanical engineering as a little bit of management, scientific things like that, But then after that I was interested in in goading in lace. We took a job with an it from consent tecnlgysulution and as working as solved program connecting, I was there for about a year and a half, all though I enjoyed coding, you say, but the kind of problems are not really interest in, so I thought I just go for some higher studies. See what's out there and then India sounds like a very good idea. And then did that supply chain and operations as a specialization, And then I was working as a supply, Shane, onsultedin one of the forms, And as it happened in that journey or in that in that stand, I ended up working with a lot of data, Because whenever we suggest something as an insult, and we need to back it up with some something solid, so we always used to go back to data, so stick in the say that this is what we want to do. This is why we seeing this, et cetera. And and then I realized that I'm starting to enjoy that data, part of it a lot more than consulting part of it, and that made Switch to a more complete Lyin antetics kind of a role. and thankfully, I got like a very good transition role where my experience in supply chain was value that role and my antics aptitude as well. It is more like a program management kind of a role, but in that during that role I started to kind of skilled up skill on line, move And a study. things like that, And that's when got interested in machine learning And then it was all kind of self study and I was just doing it on my own interest, and it was very, very passionate about that, and the kind of snowball in to me moving to a full time position as as a data scientist then, because of my interest, and kind of, I'm somebody who always tries to do so Things kind of kind like a maker, kind of natural attitude, and that led me, let me do all these open source contributions and the psycho etcetera etcetera, And that's how I kind of transition and gotten very very heavily into lanaistate. So Yeah, that's the long and
Michael_Berk:
Nice.
Michael_Berk:
lined ance,
Michael_Berk:
Quick, quick interlude. So I'm getting a lot of sort of static through your
Michael_Berk:
Is
Michael_Berk:
mic.
Michael_Berk:
it?
Michael_Berk:
Is it possible there's some wind? It looks like you're inside. So
Michael_Berk:
Yeah, hold on. I think there might be something. Just turn that off and come back
Michael_Berk:
Got it. Amazing. Thank you.
Michael_Berk:
Better,
Michael_Berk:
Uh yeah so far
Michael_Berk:
Is it
Michael_Berk:
much
Michael_Berk:
or
Michael_Berk:
yeah yeah it's completely fixed so far
Ben_Wilson:
Yeah, it's gone now,
Michael_Berk:
Cool
Michael_Berk:
Cool. All right. So I'll mark that clip and then we can continue. So one of your more most recent projects is PyTorch tabular. And that sort of helps you run PyTorch on tabular data, which is a big surprise. So can you explain what tabular data is and why do we need your library?
Michael_Berk:
Right? So table data is the most popular data that you find in the business you'll find. Exceltexcel fires sequel tables. Any relational data basis, It's all tabular nature that you have columns and rose that stab data. and whenever we think about table data, at least from the mail point of view, the the most obvious choice is very interesting models exter S like g, D, and things like that. But what kind of led me to this? This? this kind of a nice is that in my previous organization I was working with a lot of different businesses, and then most of the use cases are traveler, so we always go and hit the like, the likegdms and exrebos of the world, and I was always kind of thinking. Is there something other than this that we can do? And that's how I star Looking into this area and I realized that like deep learning did a lot of wonderful things in an image domain In text do main, like even in the graph domain, but table, not really that much, but as as like as I went deeper red. I did realize that there are some research happening in the area like people are working on, kind of trying to push the boundary on in that area. But usually what happens is all of these is like all though the researches release the code basis, it's not really really usederferently like it will take a lot of engineering on your side to make it work. So that's when. so basically I just started this as a like a personal project. I was just trying to make sure run a few of these different models on the data that we have, and once I started doing this, then I realize that there is Here are people who want this kind of a kind of a thing like something other than in bosting models to work with Have data. And that's why I decided to kind of open source it because it's a good form already and it is easier to open search, and one of the one of the main motivations behind that is also to kind of help accelerate research and usage of of this in the space, right, because grain musing models, and all of these other like brilliant, So I could learn by libraries there, and it's very. It's like the most easiest library that you can use very differently, but there is nothing of that sort when you move to deep learning for table data, so I wanted to make something which is easy to use accessible to folks. Even without a lot of software engineering background, They can start using something a little more advanced, And that's why I basically started this and then And when I started it, I think by during that time right a lot of other, like specialized models for traveler data came out, which, in their research papers they show that it's better than the extreme boost and likegbms of the world. In some data It so I thought, Why not have like a unified Ap with which you can use all of these models? right you can. Just you just need to switch out a class just like you do with Sic Langachout class, and the rest of the pi works fine. So that was kind of like the design that that was taken, and then it right now, The library is in that shape just a couple of weeks before the thing. I launched another major version where I also included self supervised learning. has another pipe line Because those are things like areas where your standard like your standard supervised learning. It's very easy to do with You classical mamachineaning methods. But this, these kind of nice use cases are not really straight forward, so that's where probably deep learning and the flexibility that deep learning kind of brings to the table makes a lot of a lot more sense, so I thought let's start that kind of a push as well. So there are already a few models in the out there which have dabbled in self supervise learning for table data. So but I started some something very basic, basic, like a de noising, auto and Cora, but there are other things like Tabnettabnet. also has a self supervised one and sub Tab. like the work in this area is actually very interesting so yeah,
Ben_Wilson:
So to help our listeners, reviewers out a little bit to break it down from the nerve perspective When we're talking about these traditional Mlalgorithms you mentioned like like, like g, b, M boost, So I could learn anything that is a supervised training model. Where you're saying I have this data in this training data set. That's how you know data Scientists typically think about it is like I've trained d. F. I have test d f and validate d f. like, And as data frames with the ingestion that's happening under the covers of that panda's data frame, which is nothing more than a collection of series. like basically numb series objects, those objects get vectorized in order for more easy, easier to communicate with the underlying algorithm which is usually written and see, so that the vector of strongly typed data goes down into that. Why can't deep learning? I mean, I know the anse. This, but from you to the listeners, Why can't deep learning process that vector? How does it need to think about data? And why does it mean?
Michael_Berk:
Oh,
Ben_Wilson:
Why does it mean that if you use pie Torch Vanilla and try to to push supervised learning
Michael_Berk:
right,
Ben_Wilson:
mode into that, Why is that so complicated?
Michael_Berk:
Right, so yeah, if you are, If you ask me if it is possible to do it with Birbonsbyaproach. Absolutely yes, but it requires a bit of soft war engineering because it's not just a model. You need the data to be in the form of a data loader or a data set, which in turn goes into a data loader which batches it, so each table data would have its own kind of way of doing it. And because it stabs a data, there's going to be a lot of three processing involved Like you need to normalize something, and you'll need to. If there are categorical varas string variables, then you'll need to end God Them in in a particular way. There's a whole lot of soft war engineering part which happens before you can start training a model. and even after you start training a model, you still have a lot of hurdles there. Because training a model on a thing, G, P with a single kind of loop that you have, it's very straightforward, but it's not scalable. But when you move to Multgpsystems or distribu ruster systems, things become quite complicated. So So what what Pitostabler kind of does is? it sits on top of the shoulders of giants, like panes, Pita, Obviously, and next, recall pita lightning, which basically kind of encapsulates everything related to the training aspect. If so, what pitostabler kind of really allows you to do is to really get away with just defining a basic set of configurations right, I'm just telling the library that Okay, this is my data frame. I have these columns. these columns are, say, continues, these columns are categorical. This is my target. Whether I want to do some transformation, target transformations, Romnormalization, just like set the conflig, and just start running for running the whole whole thing right. And another thing that the model the light light kind of tries to do is to kind of set intelligent default So that even somebody who is not really really kind of deep into into deep learning can start doing this right away, Because the most basic thing that we all the library just ask you to do is that you tell them till the library whether it's a classification or regression task. You tell the model what the target and what the columns that you want to use, but that's it, and which model you want to use. The rest All is like you'll get something out of it and not saying that Out of it, but you'll get something out of it. And then, as always, it's up to the user to kind of make it the best version of the model. It can be.
Ben_Wilson:
Yeah, if any listeners want to follow on with what my new is talking about and answering my question. Uh, check out. there. They get hubberpository In the example section, you can follow along and pie twitch tabular example, where it breaks down what he's talking about. Like you have a data configuration object that you cinstantiate a trainer configuration, which has a lot of options that you can override. So you, it exposes a lot of the internals of how Pie Torch functions about. You know what is your early Stop in condition? what are you trying to optimize For how often are you going to be doing check pointing, And how many epochs are you going to run? And what your batch size is, So you can can figure all that stuff with Pie Torch, but you also get into that that imbedding configuration in a simplified manner instead of having, to, which is the personally, in my X history as a data scientist, M l engineer. I used to hate doing. Not that I love the creative part of feature engineering work where you look at the raw data and what Sites can I glean here? How can I, you generate something that's going to make a lot of
Michael_Berk:
Yes,
Ben_Wilson:
sense, But I hated the process of doing manual in coatings because it's boiler plate work. It's You're just doing the same thing over and over and over. You can get fancy with writing a bunch of functions and classes and you know doing stuff like that, but you're just re implementing something that should
Michael_Berk:
Oh
Ben_Wilson:
be a library that you can just call, So that's basically what this is and it removes all of that annoying work and you can just focus on getting that Model to train. so
Michael_Berk:
Yes.
Ben_Wilson:
get on here. It's a cool project
Michael_Berk:
Yea, more thing to add Here. Is that the latest version that I kind of release a couple of weeks ago. I've moderilized it further so that if you don't want to use the training part of it or if you don't want to use a model, part of it, just a data part, data, or do you kind of thing? That's also possible. You just need to use that part of it and then rest of it is your own. You can even bring your own like bare bones spite Model, with very, a very simple condition that the import should be in a dictionary for at some specific keys, And that would start working with this whole setup very pretty quickly, so I just try to make it a little more like, less shackled by a little bit, kind of like pre determined kind of temple, and make a little more free to use, which is also something that is like. I always love to do that always Whenever I start to use a use the library. One of the things that always kind of irks me is that there are some certain rules that I want to follow to make this work. The less rules there are, the easier my experience in using libraries. That's what I'm trying to do it. that the pitostabar.
Ben_Wilson:
And that's the real balance when you're creating open source packages is. I think when people get into the process of coming up with an idea and then they want to release something out there if they're if they're experts in that domain. Hopefully they are of what they're trying to build for open source software you tend to want. If you've never worked as a soft ware engineer before, you tend to want to just open up all the knobs. say like hey, I made this configurable. So it's an argument in the main method And then you look at the you know the argument list and you're like, Hang on. There's four hundred arguments and here, I'm pretty sure that violates like you know most
Michael_Berk:
Oh
Ben_Wilson:
of the peps that are out there, but it becomes overwhelming to a user. and even if you're sitting there putting defaults in, what does your pidoxstring look like Help method? And hang on a second. It's printing out thirty seven pages of text. Nobody's going to read that. Uh, and if you're your pidoxs, get generated in. Read the dox, Io. Nobody's going to scroll through that generated.
Michael_Berk:
Yeah,
Ben_Wilson:
You know, r, s t h d M file, so it becomes daunting to a user when you're looking at that like this is scary. There's
Michael_Berk:
A,
Ben_Wilson:
too many things for me to tune,
Michael_Berk:
Oh,
Ben_Wilson:
and the alternative of that is okay. The the
Michael_Berk:
Yeah,
Ben_Wilson:
other extreme of that is I'm handling everything for you. So you'll You'll be appealing to The novices
Michael_Berk:
yeah,
Ben_Wilson:
who are like, Hey, this is so easy. There's only two arguments.
Michael_Berk:
Oh,
Ben_Wilson:
I just passed my data in and the names of my columns or something, and that becomes a black box that no advance user wants to use because I like Hang on. I can't even use this because I can't override these things that are critical for my my task that I'm trying to do. So I like the approach that you took
Michael_Berk:
Ah,
Ben_Wilson:
in this in this
Michael_Berk:
Ah,
Ben_Wilson:
library. It's very similar to ones that we do at data breaks for open sore software. It's like
Michael_Berk:
okay,
Ben_Wilson:
We default a bunch of stuff. We also expose it in a configurable manner that you know. in advance
Michael_Berk:
Oh
Ben_Wilson:
user can overrate it if chosen. but we try to limit the scope of complexity even for the advance users like hey, yeah, there's four hundred things that you could change here, but in any given project you might only need to change twelve of them. even as an advanced user.
Michael_Berk:
Yeah,
Ben_Wilson:
The rest of them. the defaults are fine, But your next project might be twelve different ones you need to change, So it's an intelligent Good design that you came up with in my opinion, So definitely
Michael_Berk:
Thank
Ben_Wilson:
people
Michael_Berk:
you.
Ben_Wilson:
check it out. S cool.
Michael_Berk:
And then I had one question for both of you actually. Uh,
Michael_Berk:
Oh,
Michael_Berk:
so PyTorch tabular currently gets about 2000 downloads a month. It has 820 GitHub stars and overall 14 contributors. So
Michael_Berk:
Oh
Michael_Berk:
what are your guys's thoughts on sort of the stages of adoption for an open source library? So obviously stage one is build version zero, but from there, how do you get contributors and how do you get users?
Ben_Wilson:
Um, You go first. I'll have a different
Michael_Berk:
Yeah.
Ben_Wilson:
nuance view on this one.
Michael_Berk:
It's a tough question, Michael, so I don't know. I haven't really thought about it that much because it's just the one open orse contribution opened library that I. I was there from the beginning till now, and in that journey I know, basically I kind of launched it, then gave a little bit of publicity in my Linked in profile and there's a little bit of popular popularity that came from my colleagues who talked about this, Other folks, et cetera et Cetera. I kind of organically Um, and I don't know. At some point I don't know. there was some inflection point after which it kind of took off. I mean, I wouldn't say took off in in a big way. it is still oliationut just, but it kind of took off a little bit. after a while. I actually exactly don't know what happened at that time. and and I think, probably also the one of the things that kind of me It popular would be like if I have to, kind of, since I am starting this, like putting it up right, I also have to make the things easy to use for people, so enhancing the documentation making would read me, and kind of putting some sample notebooks or something, some scripts out there so that people get started with it. That always helps, and after a while I don't know the contributors and everything right. it kind of came organ Cally. I didn't really run behind anyone and kind of get contribution. It kind of came out kind of organically when people are not using it and then said that, Okay, there's some issue, raised an issue or something. Then I said, Probably you might want to kind of recipe and then fix that so that kind of gets Snowball into a little bit of a thing, And the Fortin contributors, There are a few, few, few of them who made a lot of contributions in there. Some structural contributions in there, and a lot of people who kind of raise issues or change documentation, add documentation. but I consider everybody, even even if very like. Even if you add a comma and a documentation, I consider that as a contribution. So yeah, everybody is a. That's a spirit of open source. Yeah, so that's there is no formal answer that I have on the stages because it's I don't know.
Ben_Wilson:
I think there's an optimal formal answer that if we were P. Ms. we would be able to tell exactly what that that path is. I think it's still alchemy, though there's so many factors that that add into what your stages of development might be, so you could have a thousand projects that hit version zero Point one, which is should be effectively. Your first release is like, Hey, this, Basically a prototype. We did our best effort of creating something that is hopefully usable, but considered that version and probably the next five releases as being, You're fine tuning and hopefully not rewriting everything. but you're You're kind of listening to feedback and that's what a successful project is going to do.
Michael_Berk:
Oh,
Ben_Wilson:
Is just going to be super friendly to the community. Anybody that's going to use it. Get the get the word out there, and The bigger, the splash of zero point one to zero point five. that's going O be directly correlated to who you
Michael_Berk:
Yeah,
Ben_Wilson:
are. How many people know that you live on this
Michael_Berk:
yeah,
Ben_Wilson:
planet? How many people know what you do on this planet and what company is attached to your name
Michael_Berk:
Yah,
Ben_Wilson:
or this project. So the
Michael_Berk:
Yeah,
Ben_Wilson:
larger that is, the more respect that company is, the bigger the splash is
Michael_Berk:
Yeah,
Ben_Wilson:
going to be. That doesn't guarantee that you're going to even get a get from zero point one to zero point two. though, even for the biggest Ompanies in a world, you know, we have very large tech companies out there that release prototypes just to get that feedback and understand like Hey, is this something that people actually really want to use? People say they want to use it. Are we thinking of this in the right way? And sometimes you just scrap that and you haven't invested that much by that zero point one version. But if the feedback is, nobody cares
Michael_Berk:
yeah,
Ben_Wilson:
or
Michael_Berk:
Yeah,
Ben_Wilson:
people hate it, or they have so many things to say about how it should be different. you know, cut your losses And go work on the next iteration from scratch and release Zerowpoint one of that and that's how you learn and iterate. Um.
Michael_Berk:
Yeah,
Ben_Wilson:
but assuming that you are a big company, people respect you. They see your name attached to this, and like this is probably no, be good. And does it solve the problem that people have? Do people really need this
Michael_Berk:
M,
Ben_Wilson:
and does it not exist elsewhere? Are you building something
Michael_Berk:
M.
Ben_Wilson:
that is fundamentally new and that makes people's lives easier in Work? If yes, to all of those, then you're sitting on a gold mine and
Michael_Berk:
Oh,
Ben_Wilson:
provided that you handle the open source community with respect and friendliness, and sort of open arms as a collaboration. The project is going to grow exponentially before you know it, oure on some advanced version of this thing, And you're like How many downloads do we have this week? Really? Uh, you
Michael_Berk:
Right,
Ben_Wilson:
know, look at the Now Pro That I'm working on with a fantastic team. Data breaks M. l flow we're looking at. You know, we've long past
Michael_Berk:
Ah,
Ben_Wilson:
that ten million down loads a month mark. And you just think about
Michael_Berk:
That's
Ben_Wilson:
how many people are using this tool and you look at how many people interact with the issues and people, Just how
Michael_Berk:
Yeah.
Ben_Wilson:
many contributors where we're having? I think we just passed like five hundred and fifty or something. Um, So project can do
Michael_Berk:
yeah.
Ben_Wilson:
that. It can grow because it's It's hitting all those marks that it needs to hit. Not everybody loves it.
Michael_Berk:
A.
Ben_Wilson:
You know You're not out there to make a project that everybody is going
Michael_Berk:
Oh
Ben_Wilson:
to love, But you want you actually want to get something out there where people are giving you negative feedback because that means they care. So the people like this sucks and I wish this this needs to change.
Michael_Berk:
h.
Ben_Wilson:
That's great. like we love getting that feedback Because it means somebody's
Michael_Berk:
h.
Ben_Wilson:
passionate about the project. They really want
Michael_Berk:
Oh
Ben_Wilson:
us to fix this, or they hate this lack of feature so much that they're going to build it and we're going to help them build it. That that passion is what makes for a successful project. In my opinion,
Michael_Berk:
Yeah,
Michael_Berk:
Got it. Yeah. Something last episode that you said, Ben resonated with me, which
Michael_Berk:
Yeah,
Michael_Berk:
is any feedback
Michael_Berk:
yeah,
Michael_Berk:
from a customer that is honest is valuable. So
Ben_Wilson:
Hm,
Michael_Berk:
if it's good, great. If it's bad, great. Um, the only way that
Michael_Berk:
Oh
Michael_Berk:
you can hurt my feelings is if you lie.
Ben_Wilson:
Yeah,
Michael_Berk:
So, uh, it's,
Ben_Wilson:
I would
Michael_Berk:
it's.
Ben_Wilson:
actually add to
Michael_Berk:
Yeah,
Ben_Wilson:
that to say, if you are to award points to feedback and this is this is universal in life. In my opinion, Uh, good feedback is worth one point. Bad feedback is worth ten points. Invalid, bad feedback
Michael_Berk:
He,
Ben_Wilson:
is worth negative ten points, and invalid
Michael_Berk:
Yeah,
Ben_Wilson:
good feedback is worth negative one hundred points. So only listen to honest feedback that's good, but don't really put That much importance into it. The honest feedback that's bad, Really listen to that. that is super important stuff that you should focus on.
Michael_Berk:
ah,
Ben_Wilson:
But people lying to you in a positive way, Really ignore that. it doesn't matter. It's just an ego boost. Nobody. you shouldn't care about that and then lying negative feedback. Just ignore. That's
Michael_Berk:
Why?
Ben_Wilson:
my
Michael_Berk:
Oh,
Ben_Wilson:
take. Why
Michael_Berk:
Yeah.
Ben_Wilson:
if you focus on inverting the importance Of
Michael_Berk:
Oh
Ben_Wilson:
positive feedback, that's that's honest, and only focus on that. you're never going to innovate. You're never going to grow as a person or gon organization or as an open
Michael_Berk:
yeah,
Ben_Wilson:
source project or whatever you're
Michael_Berk:
Oh,
Ben_Wilson:
talking about in life. If that's all you focus on, you're just going a feel great about yourself and like I'm awesome or my company
Michael_Berk:
ah,
Ben_Wilson:
is awesome. We can do no wrong.
Michael_Berk:
Yeah,
Ben_Wilson:
But if you focus on the negative stuff and not allow it to be a personal
Michael_Berk:
yeah,
Ben_Wilson:
attack, but more of seeing it as the Hey. The only reason Somebody's telling this is because they actually care and that matters more than anything else, And
Michael_Berk:
Ye,
Ben_Wilson:
I need to listen to this person who cares about this and do something to correct this and that that is what breeds innovation and change, a positive change in the growth of something, whether it be a person, a company or a product,
Michael_Berk:
Oh
Michael_Berk:
Yeah, that makes a lot of sense. The way that
Michael_Berk:
yes,
Michael_Berk:
I define. So one thing that I've been really interested in throughout my career is this concept of value. So
Michael_Berk:
Oh
Michael_Berk:
what is valuable? What is not valuable? How do you find what's valuable and then
Michael_Berk:
Dark,
Michael_Berk:
have you deliver valuable
Michael_Berk:
Yeah,
Michael_Berk:
solutions? And I'm still working on my definitions, but at least for insights and sort of
Michael_Berk:
yeah,
Michael_Berk:
decision science, I have a two-pronged
Michael_Berk:
Oh,
Michael_Berk:
approach that works really well. Uh,
Michael_Berk:
Oh
Michael_Berk:
the first thing is that your insight moves metrics. So. If we find some discovery
Michael_Berk:
yeah,
Michael_Berk:
that, uh, has no impact on any of our bottom line metrics or a North star, it's not really valuable.
Michael_Berk:
Oh,
Michael_Berk:
It could be cool. It could be actionable.
Michael_Berk:
Yah,
Michael_Berk:
It could be interesting, but if it doesn't move a metric, it's not valuable. And then the second component is it needs to be actionable. So if you find something
Michael_Berk:
yah,
Michael_Berk:
that let's say you're selling hot dogs, go back to the classic example
Michael_Berk:
Oh,
Michael_Berk:
and you know that. Every winter solstice. There will be a, Oh, this example's not going well. Um,
Ben_Wilson:
Hot dog
Michael_Berk:
let's
Ben_Wilson:
festival,
Michael_Berk:
say that,
Michael_Berk:
right,
Michael_Berk:
yeah, a hot dog festival. Um, but the, the, that, that's an actionable thing. Like you can plan accordingly, but the point is if it's not
Michael_Berk:
Kay,
Michael_Berk:
actionable. There's
Michael_Berk:
Oh
Michael_Berk:
it's cool. You can make a presentation about it, maybe make a
Michael_Berk:
yeah,
Michael_Berk:
commercial or something, but, um, it can't influence
Michael_Berk:
I,
Michael_Berk:
your decisions. So I think
Michael_Berk:
Yeah,
Michael_Berk:
that really resonates with what Ben was saying, which is
Michael_Berk:
Yeah,
Michael_Berk:
You need things that are actionable and positive feedback tends to continue inertia, whereas negative feedback can redirect it or even stop it. So that makes a lot of sense. And I think it differs greatly between fields for let's say, decision science or product development, or you name
Michael_Berk:
Oh,
Michael_Berk:
it. Um, but that, that, that definitely resonates with my experience as well.
Michael_Berk:
Oh
Ben_Wilson:
And I think both things tie into what Manu built, which is something that's not supported by the native package. The native library, Something that is actively being research. People care about it And you just went out and said, I need to use this for my work.
Michael_Berk:
Yeah,
Ben_Wilson:
And why not be altruistic? Can give this to the open source community. And there's a bit of like selfishness associated with any open source because your name is on it and people who like, Hey, this dude, build this cool thing that we're using so people know You are, but it's also just giving something useful to people and seeing that organic growth that happens because people are like. Actually, I want to try Pie torch with with structure data because I don't want to go through the process of having
Michael_Berk:
yeah,
Ben_Wilson:
to coerce that data that's sitting in, you know,
Michael_Berk:
Oh,
Ben_Wilson:
data warehousing table into something that pie torch can consume.
Michael_Berk:
Oh
Ben_Wilson:
You know, and a lot of people that are using these libraries now Are you know? Four? Five, six years ago people use Ing. You know these popular
Michael_Berk:
yeah,
Ben_Wilson:
deep learning projects. Um. You needed some C S training to be able to do that.
Michael_Berk:
Oh
Ben_Wilson:
You look at Ou. now, T, f. one dot implementations that were running in production. look at that code. If anybody's out there that has some of it running at their company or people that have written it, you can't just be an applied engineer or applied data scientist. To do that, you need to hit the books And learn some stuff and take some courses and understand
Michael_Berk:
Yah,
Ben_Wilson:
how do I manipulate tensors And how do I get this data structure from? You know, Maybe it's not in a data table,
Michael_Berk:
yah,
Ben_Wilson:
Maybe it's not in a C, s. V. Maybe it's a flat file that's just sitting there In Some
Michael_Berk:
Oh,
Ben_Wilson:
you know, super old N coding standard. It's like
Michael_Berk:
h.
Ben_Wilson:
Zip. you know. Not even Jesup,
Michael_Berk:
h.
Ben_Wilson:
just zipped files
Michael_Berk:
h.
Ben_Wilson:
sitting there on on object store. How do I get that into A into a format that you know, tenser, flow, er, pie Torch can con Sum this. You need some Cs background. I mean you can. you can force it and write some crappy code and it will maybe work. But to get it to be performing production ready, you got to get to do your home work or call a phone a friend. To be like a man.
Michael_Berk:
h.
Ben_Wilson:
You know you're doing sofwerengineering stuff. Can you help me out with this problem? But nowadays people don't need that that skill set in order to use some of these libraries. You know, man, you mentioned at the
Michael_Berk:
A.
Ben_Wilson:
at the top of the podcast about people, A lot of super S problems are using Xrboostnlike g, B, M. I would
Michael_Berk:
Oh
Ben_Wilson:
agree at a lot of companies. that is the case. Uh, but some of the people that have been around for a while
Michael_Berk:
yeah,
Ben_Wilson:
who, before those libraries existed before I learned, was even really a main stream thing. You're using lower level libraries. You're using stuff that doesn't have
Michael_Berk:
Yeah,
Ben_Wilson:
a lot of traction these days. You know. it's the last time you saw States models in production where somebody's building regress or from scratch Like, Yeah, there's
Michael_Berk:
Oh
Ben_Wilson:
a P is to help you out there, But look at the number of options available for tuning that thing. You have to know that library. You have to know the math behind it in order to build something with that, But a lot
Michael_Berk:
Yeah,
Ben_Wilson:
of stuff in production even now is built with that stuff. But it's not with
Michael_Berk:
yeah,
Ben_Wilson:
the new generation that's coming in,
Michael_Berk:
Yeah,
Ben_Wilson:
and I'm like I'm a hundred percent all for it this easy. It's the ease of use that's happening. I think that's the direction that the industry needs to go into because you shouldn't need to have a P. T and mathematics were physics. In order to build a model. It's ridiculous. So these libraries that are making it simpler and making it more easy to understand and intuitive, is what's advancing the not the state of the art for are our industry.
Michael_Berk:
yeah,
Ben_Wilson:
It's advancing the opportunities to solve real problems with using these tools. Because you're making it easier.
Michael_Berk:
Yes, absolutely. but I would also kind of have an argument that it is a double to age, so
Ben_Wilson:
Hm,
Michael_Berk:
on one side you're making it very easy and like a lot of people can start using it like you said without C training without mat training and stuff Using it, but In some some places that becomes a problem as well, because a lot of I've seen a lot of people use machine learning, moral sychran, morals without really understanding, kind of doing it the wrong way and it is not
Ben_Wilson:
Hm,
Michael_Berk:
very apparent, but it's like if you know what they're doing and what they're supposed to do, you instantly realize that they're not doing it the right way. But then those kind of small gotchas and kind of ads are probably that's That's the end of the the price you pay for making it popular with the community, but
Ben_Wilson:
And I would.
Michael_Berk:
yeah,
Ben_Wilson:
I would double down on what you just said with another anecdote, which is, I heard the same arguments fifteen years ago when people were talking about the ease of use and explosion of applied business intelligence where all of a sudden, companies like tableau come on on the market and it's like well, it's so easy to do an analysis on fairly large data Now and the statisticians who were you know historically doing this now, Isis. In, you know, a lot of them are writing code and Python two point X. Back in the day, you know, doing sort of manual manipulations with a version of a version of Panda that we wouldn't recognize today if you Uys, were slightly different, A lot harder to use. But they're doing stuff with that or there. They're using Excell with custom formulas in it or there. They have some sort
Michael_Berk:
Mad
Ben_Wilson:
of B
Michael_Berk:
lad,
Ben_Wilson:
I tool. Yeah, Mat lab or Sass, and doing analysis on these proprietary platforms. They saw this proliferation of B. I, becoming easier and more open to the lay person, and I heard the same arguments like the analysis are going to suffer now. business is going to come to the wrong conclusion. I think it's a self correcting thing. if a company
Michael_Berk:
M.
Ben_Wilson:
allows people that are unskilled untrained to do things
Michael_Berk:
Yeah,
Ben_Wilson:
that They're making decisions on. Uh, they're not going to allow that to happen for very long, or they're not going to be a company for very long because they're going to be relying on bad information, so nobody really talks about
Michael_Berk:
Oh,
Ben_Wilson:
that in the bi world anymore. Like everybody
Michael_Berk:
M
Ben_Wilson:
can use power, bind and tableau, they're made to be as simple as possible, and anybody
Michael_Berk:
Ye
Ben_Wilson:
can make a fantastically terrible analysis with one of those tools. Whether it's like statistically and mathematically Invalid, the conclusion is completely
Michael_Berk:
A
Ben_Wilson:
bankers, or it's intentionally. You know, a chart is intentionally done with a hidden axis
Michael_Berk:
Yeah,
Ben_Wilson:
that's not linear. It's log rhythmic and they're like making it seem like it's linear.
Michael_Berk:
Oh,
Ben_Wilson:
so people are like. Oh yeah, we need to do this to make sales. It's like No, it's a misleading report that does happen and it's still. It's always since.
Michael_Berk:
Oh
Ben_Wilson:
since the start of easier apis in the community,
Michael_Berk:
yeah,
Ben_Wilson:
Seen that happen at customers and stuff where people are.
Michael_Berk:
Yeah,
Ben_Wilson:
Like we got this model. It's perfect. It's a. It's a classifier detecting fraud. I'm like cool. Um, what's your accuracy
Michael_Berk:
Oh
Ben_Wilson:
on that? Like? what's what's the area? Nd, R, C, And they're like. Well, it's it's
Michael_Berk:
Yeah.
Ben_Wilson:
a hundred percent. That's that's awesome.
Michael_Berk:
That's what.
Ben_Wilson:
That's phenomenal. Your model is broken though, like no, No, it's
Michael_Berk:
Oh
Ben_Wilson:
perfect. Like Yeah, the labels in the training data like it's completely over fit to that because it's predicting what it already knows running on some hold out validation. Oh, jees, it's It's one percent accurate
Michael_Berk:
Yeah,
Ben_Wilson:
on the data that's never seen before. Like yeah,
Michael_Berk:
Oh,
Ben_Wilson:
yeah, don't put that in production. That's not going to go well for you, so yeah, I mean, it happens all the time, but it is Correcting. People will see that once an expert comes in and says that doesn't
Michael_Berk:
M.
Ben_Wilson:
look right, Can you try this and just make sure before we release this and then that's an education opportunity.
Michael_Berk:
exactly. I've had been in the first reaction that I get when some details me that they got a ninety nine person dirt model. Is that check your code? Check the data
Ben_Wilson:
Check your data.
Michael_Berk:
hundred times. Yes,
Ben_Wilson:
Yep,
Michael_Berk:
that's like a big red sign.
Michael_Berk:
I don't know, I produce 99% accuracy models regularly. So maybe I'm just really talented, but.
Ben_Wilson:
I mean, there are some models I did training series at a previous job where somebody had asked me that, like Hey, could you provide training examples of models that you could get a hundred percent of accuracy on? Or you know, triple nine, ninety, nine, point nine, nine, nine percent. So I generated some data sets that I would have the class. Do. We would build the data set in real time, and then it would run through the code that I was writing in real time at the head of the class and one of my favorite ones was Let's predict It's going to rain in the next sixty seconds. And the input data was. Everybody looks out the window and tells me if the if the sky is blue or gray and we would do that and have it run like every you know, A couple of minutes. like a right. Everybody, give me all your data that you collected up the last five minutes. Let's see, are we still accurate? Of course, it was like a cloudless beautiful sunny day the first time that I did that, and like see, we hit a hundred percent accuracy. It's predicting Five minutes in the future and we're always right. So that's a lesson for people as well in the data science community. Think about your problem. Like what is your training data?
Michael_Berk:
Oh
Ben_Wilson:
And when are you making the prediction? And what are you going to do about that? And should this even be a model? Should this even be created? Because it's something that you can take the data itself and say what color is the sky right now?
Michael_Berk:
yeah,
Ben_Wilson:
Okay, how quickly can it start raining in my geographic
Michael_Berk:
Oh
Ben_Wilson:
region? If I'm predicting five minutes in the future, I can just say Yep, blue sky. No clouds, Don't think it's going to rain. Don't need my umbrella.
Michael_Berk:
Yeah,
Michael_Berk:
Yeah,
Michael_Berk:
yeah,
Michael_Berk:
that, that, I mean, maybe you should have done it on a gray day though. That could have been some, a
Michael_Berk:
Oh,
Michael_Berk:
better sample, but,
Ben_Wilson:
That did
Michael_Berk:
um.
Ben_Wilson:
do do one. but I just adjusted it to say that the time scale that we were predicting in the future was really really
Michael_Berk:
Oh,
Ben_Wilson:
short,
Michael_Berk:
Got it,
Ben_Wilson:
was
Michael_Berk:
like
Ben_Wilson:
like thirty
Michael_Berk:
zero
Ben_Wilson:
seconds
Michael_Berk:
seconds.
Ben_Wilson:
later.
Michael_Berk:
Got
Ben_Wilson:
And
Michael_Berk:
it,
Ben_Wilson:
so
Michael_Berk:
yeah.
Ben_Wilson:
it's like Hey, it's a hundred percent.
Michael_Berk:
Oh,
Michael_Berk:
Yeah. Yeah. I mean that, that tracks. Um, but I also wanted to
Michael_Berk:
right,
Michael_Berk:
get into one more topic with Manu. Um, he has
Michael_Berk:
Yeah,
Michael_Berk:
recently written a book called modern time series forecasting with Python explore industry ready time series forecasting using modern machine learning and deep learning. So it
Michael_Berk:
Yeah,
Michael_Berk:
has that classic, what is it colon a bunch more text. Um, and
Michael_Berk:
M.
Michael_Berk:
so Manu, I was wondering if you could elaborate a bit on Sort of
Michael_Berk:
Oh,
Michael_Berk:
the, the latest and greatest deep learning models for time series. Cause Ben and I chat about a time series quite a bit on the podcast and we're big proponents of simple and like a reema or profit based types of models. Um, so where do you see deep learning, excelling and what are those models?
Michael_Berk:
Right, right. So deep learning there are. There are some new work that's coming out of the again research community on on de learning for time series data, both on the Time series forecasting side and the time series fascification side. Um. Although the I mean the Individua Papers that proposes all of these techniques do show that they do better than some of the competing methods, But as again, like similar to table data, like time cities data as well, there's no one model which does well in all Darasds. It is purely up to, kind of up to the person who is working on it, kind of find out what works well. So I've seen deep learning work well than Arms and the profits of the world in many projects. but one one strong contender to to the best model class is also standard machine learning models, which is basically are like dims, extreme posts of the world. They are very strong, especially when you're talking about global machine learning models. but I'll come to that since the question was about displarning So a few models which really interested me and which, which, which showed some innovation in the way that they're kind of handling the time series data of the temporal pattern, One is the temporal fusion transformer. I think it came out from Google Rescich, But then they had. already. Although the architecture is quite complicated, it has many many different parts in it, but there are some intelligent ways in which they made. The whole thing works. So that and then model also performs pretty decently on many data. And another, There are two other models from the transformer. These modifications of the transformer model, which is called, one, is called the informer model and the other is the auto, former model. Again, both of them have have, very, in a way tive ways of injecting the temporal kind of aspect of time series into The more. Because all those transformers were designed for sequence more sequence modeling, but it is predominantly for nalty type sequence is rightly. you don't need to look at a long history to find out the seasonality which happens, every one ready to wear something. That the patterns are more like local and things. It's very. I wouldn't say simpler. but the kind of the kind of focus is different. But then These guys, the guys who prepared propose the auto, former and informer, the very intelligent ways of kind of making sure that again you can capture long term patterns, Because with Transformers right, if you just in, increase the context size or the the window of memory that you give the model. In theory it is, it should be able to find out given enough data. But in the real world Have a lot of constraints about computing things like that right. So the more history that you put in the the attention calculation is quadratic and it's it kind of explodes. So these guys have very intelligent ways of including longer history with with lesser computation. And those models are also very innovative and like they do perform better. But then the I'm now Very firmly on the deep Learning for Time series forecasting camp, yet a kind of on defense, because I've seen unreasonably effective machine learning models. Global machine learning models work very well, and so, as part of my previous organization, I've worked with a lot of different large companies in designing their forecasting systems, And almost exclusively these. In these days, the number of time serieses that you need to kind of forecast are in ten thousand, soft, thousands, et cetera. So in those scenarios using global machine learning models have always been better performance, less computation, extensive, intensive, less headache from the engineering side Because you don't have Like millions of models to manage, you just can manage a couple of models. So in all of those aspects, I've seen the global machine learning models work really well and I did not really have a have a reason to go for deep learning models because these models were doing their job perfectly fine. I'm sure if I put some effort into it and then put some resources to getting a good day learning model done, I might be able to do it, but you know In a implementation you, you have priority at you. You know that you want have something working as the point is not to get the best accuracy. it is to get the accuracy which needs to be there for the customer. So once you hit that resholed, you kind of say that this is the model which works well, which serves your purpose and is reasonably easier to manage, and you don't have a lot of technical data, et cetera, So in All of those aspects I've I've not done a deep learning model introduction yet, but yeah, but then yeah, there are. I missed another deep learning model which I like a lot. which is ended right. That's something that that was shown to be there. Is this M competitions right? The M for five now and suggest why I wound up. So these are basically Ternational forecasting competitions, which kind of, I think is posted by. I mean, have the strong backing of the international forecast is the journal of castes. Something so they do this early or by early competition. So for by the time till four, all the other competition versed consistently won by, you know your classical methods like expenditure mooting. Now, Obviously there are other kind of modifications that I'm just classing it all together and no single class and even for was won by a model from Slavismil, which is heavily inspired from the experience moving model. Just basically put it into a deep learning. Kind of. It's goin like like a hybrid model check. But ever since that that end for competition, all the other Competitions have been consistently won by either Re learning model or global focasting machine learning model. And this end model was proposed. The history behind that is actually cool, because after the end for competition, the conductors of the the whole competition kind of concluded that all of these new age fancy models, like machine learning and De planning, are not the best thing for time series forecasting, Um, and the classical models, and probably hybrid models is the future, And that's the kind of conclusion that was put forward, and the people who proposed Envids kind of said that we want to have purely kind of purely learning model, and we want to kind of say that the kind of prove that this can work better than all the other models which were part of the competition of Mforand, which they did They. They kind of ran this data. The data through this indeed model, and the model was able to perform thatter, If and beads would have been part of the fur competition. They would have won that competition. So that was like a strong statement that was made. And then I think probably folks got interested and started working on this time series for casting the planing area. That's that's interesting. But yeah, probably another thing. probably, and this is. this is the right place to kind of put this forward. Is that global machine learning models which basically needs that, instead of training a machine learning model for each time series separately, you kind of train a single model for a whole bunch of time series S together. Right To put it simply and plainly, that is a global. So that is a global machine learning model And M five competition and a lot of other competitions which happens on Canal and other way. All the forecasting competitions have been won consistently by one of these globillmodels And that's if you ask me the future, I think right now I'm betting on global machine learning models and global returning models, to some extent to be the next step in time series. Forecasting The book that that I was putting forward one of the main narratives in the book is also that that that you kind of start looking beyond aromas and profits and experience was smoothing, and see that there is a whole different class of models which works extremely well and which is most suited to current situations, Right because I don't know. Aremaexpence. Mooting. models all came out decades ago, probably fifty Years ago, and at that point of time the quantum of data that you had was pretty less, and you did not have to like back fifty years back. If you tell somebody that you need to focus one million time seriously, they would laugh at your face, But now that it's very very real, so and when you're thinking from that context, training managing and serving one million models is also a nightmare. So You would. so I don't know. recording that popular men, modern solutions, modern problems. So we need to kind of go ahead and kind of adopt some of these new techniques, which, which makes it easier in one sense to work with this large quantum of data
Michael_Berk:
Yeah, that makes sense. Yeah, I know that Ben has thoughts on this, specifically distributing time series. You wanna share?
Ben_Wilson:
Yeah, I mean, I've seen l s t M be attempted to be adapted to time series problems. And if exactly as you said, if the data set fits something that works well for that architecture, and you adjust that short term of memory, you know attenuation. You can get predictions for forecast against back testing, cross validation for that specific type of data that perform way better than anything Else, My gold standard. and it's purely through bias. It's one of the most challenging models to to tune. In my opinion, is Whole winter's exponential smoothing implementation, which that was based on the think, it was nineteen seventy two or something. That algorithm came out. But that thing.
Michael_Berk:
A
Ben_Wilson:
it's fantastic for just great time series, where if you understand
Michael_Berk:
Yeah,
Ben_Wilson:
and you can decompose that time series and you tune
Michael_Berk:
yeah,
Ben_Wilson:
it very tightly, it is fantastic.
Michael_Berk:
Yes,
Ben_Wilson:
Um, but how do you
Michael_Berk:
Oh
Ben_Wilson:
do that For five thousand forecasts. You're not find tuning five thousand models. I don't care who you
Michael_Berk:
Yeah,
Ben_Wilson:
are not happening. M. so stuff like profit works really well for uh,
Michael_Berk:
Yeah,
Ben_Wilson:
even untuned profit works pretty well for a different set of use cases where you know l. s. d. M
Michael_Berk:
Yeah,
Ben_Wilson:
might kind of fall apart of it because you have to still tune that attenuation band. and
Michael_Berk:
yeah,
Ben_Wilson:
if you don't do it properly, Certain real world data sets that have some sort of repeating pattern that goes on for a little bit too long. All of a sudden, it's not going to respond at all to the forecast. It's going to repeat this pattern over and over and over again. Profit does
Michael_Berk:
Oh
Ben_Wilson:
that as well if you have certain types of data that go into it. Um. but in the global model perspective I've seen
Michael_Berk:
yeah,
Ben_Wilson:
the exact same thing that you've said as well. Where.
Michael_Berk:
Oh,
Ben_Wilson:
What do you do if you have ten million items that you need to forecast inventory for sales inventory globally? What if you are this mass Of company? What if you're seven eleven and you're like
Michael_Berk:
yeah,
Ben_Wilson:
Hey, we need. We need predictions on every product in every store in the world. Like all right, You got four point
Michael_Berk:
yeah,
Ben_Wilson:
seven billion models you need to build
Michael_Berk:
Oh
Ben_Wilson:
every week. Where are you going? O run that?
Michael_Berk:
yeah,
Ben_Wilson:
So
Michael_Berk:
yeah,
Ben_Wilson:
for global models, you train it on all of this data and you give it you know, basically your exogenous regress or elements as part of that training set where you can give that additional data in context to it. They work really well For these extreme scale problems. But if you were to extract just a single from what I've found in the testing that I've done, you extract a single
Michael_Berk:
yeah,
Ben_Wilson:
random set of forecasts
Michael_Berk:
Yeah,
Ben_Wilson:
of these discreet series from that global model that you're going to be applying and then build a whole winters or a profit model on those and just do basic, like optunabased hyperpram or tuning on them. Just say like hey, you get fifty iterations. That's it. I'm calling quits.
Michael_Berk:
Oh,
Ben_Wilson:
You can't sit there Ndoptimize for ten thousand cycles like you would in a competition, Um, or as part of a white paper, You just
Michael_Berk:
Oh,
Ben_Wilson:
within realistic boundaries of production reality at a company. If you compare those to that global model, the the discreet ones are going to beat it
Michael_Berk:
Yeah,
Ben_Wilson:
every single time on accuracy. There's no getting around that you're going to be. You could be in order of magnitude more accurate, But it's that economy of
Michael_Berk:
yeah,
Ben_Wilson:
scale problem. How do you solve this problem? And
Michael_Berk:
Yeah,
Ben_Wilson:
like people know how to do it, we have customers of data bricks that are doing Today that are running four point five million profit models in
Michael_Berk:
yeah,
Ben_Wilson:
production. But you need to spin up,
Michael_Berk:
O,
Ben_Wilson:
you know, eight hundred v Ms, and expose you sixteen hundred cores to this problem, so that
Michael_Berk:
yes,
Ben_Wilson:
computation is done by the end of the
Michael_Berk:
Yeah,
Ben_Wilson:
work day, so that you have your predictions out there for the next work day and it's expensive. Um, So the real trick is how do you? How do you solve
Michael_Berk:
Yeah,
Ben_Wilson:
that problem where you need accuracy and you also need to stow The model Because there's industries out there. Logistics industry is not one of them. But if you're in the financial sector or health and life sciences, or you're doing government accounting, or you work for a government somewhere, Any model that you put out there for consumption has to be stored. because it. you could either
Michael_Berk:
Yeah,
Ben_Wilson:
be facing a lawsuit where that's going to be inspected
Michael_Berk:
Yeah,
Ben_Wilson:
or you just need to be audited because of legal requirements by a government. So you need to store it somewhere
Michael_Berk:
yeah,
Ben_Wilson:
So we
Michael_Berk:
Oh
Ben_Wilson:
actually solve this. We have an open source package, a data ricks, that we created for this exact use case called Diviner, and handling that that extreme
Michael_Berk:
yeah,
Ben_Wilson:
scale for doing like profit
Michael_Berk:
Yeah,
Ben_Wilson:
models where you need the accuracy, but you also need to store it, but you don't want
Michael_Berk:
Yeah,
Ben_Wilson:
four software engineers to work on this problem for six months to build all the infrastructure of like, How do I save a million models every day? That's what that package solves for people, and that gets back Time back to your your initiative
Michael_Berk:
Ah,
Ben_Wilson:
that you're talking about man, or it's like Hey, I need to take tabular data and run it through Pie Torch, and I don't have a team of software
Michael_Berk:
right,
Ben_Wilson:
engineers that can do this for me
Michael_Berk:
Oh,
Ben_Wilson:
because this sucks building this infrastructure and I don't want to just
Michael_Berk:
I.
Ben_Wilson:
build a bespoke implementation that just works for this one project that's
Michael_Berk:
Oh,
Ben_Wilson:
built into my code. My project code. Thank you. you know the open source community. thanks people like you that think of these things like
Michael_Berk:
Oh
Ben_Wilson:
Hey, this is a problem. People need to do this. It's painful to do it manually. It's way better to buy it. It's better to buy than to build, but it's way better
Michael_Berk:
Yeah,
Ben_Wilson:
to free than to buy. So having it out there in the open
Michael_Berk:
I,
Ben_Wilson:
source is great and we, we
Michael_Berk:
Yes,
Ben_Wilson:
feel the same way. That's why we. we did the same thing for this use case.
Michael_Berk:
yes,
Ben_Wilson:
But I do agree with you with the vast majority
Michael_Berk:
Right,
Ben_Wilson:
of extreme scale forecasting that's out there where you don't need ludicrous accuracy. you don't have for any particular decision that's out there. You don't have a million
Michael_Berk:
Yeah,
Ben_Wilson:
dollars on the line. Like if you get it
Michael_Berk:
Yeah,
Ben_Wilson:
wrong, you lose a million dollars. Um, if you don't have that problem, or if you get it wrong,
Michael_Berk:
yeah,
Ben_Wilson:
you kill somebody. if you don't have that problem, and you're just like Hey, I need to know Within twenty percent margin of air how much milk we need to ship to this store two weeks from now, Like because, two weeks ago we shipped too much and we had to throw away a thousand gallons. So if you just need that sort of thing, and you have you know a thousand stores globally that you're shipping milk to, And you don't even know to tell the farmer. Hey,
Michael_Berk:
Yeah,
Ben_Wilson:
we actually need need about ten percent less next month, So that access that you have. Why don't you go to make cheese? We'll buy the cheese too, but we just don't need the milk, So being able to do that, those global models are great for that, and I completely agree that that's the way that people are going to start doing this to solve this problem, because it's way cheaper. It's way faster and for most use cases there's It's accurate enough.
Michael_Berk:
Yeah, and another thing is, I think I've seen this in retail a lot, so the retail Demand has a lot, is really influenced by the promotional activities and
Ben_Wilson:
Yep.
Michael_Berk:
things like that. And that's not periodic. You don't
Ben_Wilson:
Right,
Michael_Berk:
have the same things everywhere, And that's another area where you know a classical model does not really do well, which when I've seen, because I re Max to some extent, but
Ben_Wilson:
You can
Michael_Berk:
I,
Ben_Wilson:
do it. It's
Michael_Berk:
yeah,
Ben_Wilson:
not
Michael_Berk:
but
Ben_Wilson:
easy.
Michael_Berk:
yeah, because it's very. I don't know. it's very unstable once in one way, But whenever you get these kind of things simply doesn't learn that well, or I might not put that much effort into making that well as well. but in such scenarios right when he, the signal, when an external signal come like, really plays a big part in moving the needle. Then also these these machine learning models going to pick that up very very well, That's another are I've seen, but I agree to your point, like if there is no other, It's like a very stable kind of time series. right that is now like frequent peaks. Everything the classical models does really well, Kind of, even if you put a auto tuned experiential smoothing, Just run it for a few things. it'll do we. I totally agree to that point.
Ben_Wilson:
Yeah, and you nailed it right on the head with the the complexity involved in when humans are manipulating an unpredictable fashion. Your time series
Michael_Berk:
Yeah,
Ben_Wilson:
trend, You know, it's not something like. Hey, it's holiday sale time. We know we're going to get sales to spike and it happens every year Around this time different magnitudes and maybe slightly different shape of the actual regression curve.
Michael_Berk:
Yeah,
Ben_Wilson:
But when you start saying like hey, we had a fire sale In August for some reason, and that's why we have this huge spike in in sales. you have to either manly, go in there and clean your data to say Hey, I actually need to remove all of these sales from that trend because it's synthetic or I need to market, and then marking it means that's now an ex genius regress or term, So every other row or every other time period is a zero and this is a one, so that I can tell it to Sara, Max, or a re. Max, or any of the other A genius regress or terms
Michael_Berk:
Yeah,
Ben_Wilson:
to do that. It's just easier to do that and supervise learning where you're providing that vector into the training model. That's just another feature
Michael_Berk:
Yeah,
Ben_Wilson:
and you can apply a weight to that feature and say like Hey, Really, pay attention to this. Because
Michael_Berk:
Ye.
Ben_Wilson:
this, this is what this actually means and the model will
Michael_Berk:
Ah,
Ben_Wilson:
adjust to that. It's a little bit trickier to do that with certain types of models. I think. Um,
Michael_Berk:
Yeah,
Ben_Wilson:
as I've seen, people do global models with linear regression like general Linear regression and they're like. Well, that's what Arima
Michael_Berk:
But
Ben_Wilson:
is. It's one part of Arima is a regressor. but there's other things in there. Um, so when you're talking about these these gradient boosted models where they can,
Michael_Berk:
My
Ben_Wilson:
they can shorten the training time required in order to build those those discreet regressing terms
Michael_Berk:
To,
Ben_Wilson:
on each of the tree ends. It becomes
Michael_Berk:
Yeah,
Ben_Wilson:
a little bit easier to swallow that and deep learning. I think it adapts to that better by the Follow
Michael_Berk:
Yes,
Ben_Wilson:
on for your point before, With deep learning where it is right now as global model solution. I'd say it's still in research phase Like the papers are still kind of being written
Michael_Berk:
Yes,
Ben_Wilson:
right now. Nobody's got anything that's really. Nobody's got that Facebook or start meta Profit, Mike to drop right now with
Michael_Berk:
Yeah,
Ben_Wilson:
a like Hey, here's the architecture that's going to make this simple and it'll It'll work for eighty percent of the use cases
Michael_Berk:
Oh
Ben_Wilson:
that people want to use simplistic
Michael_Berk:
yeah,
Ben_Wilson:
time series For Asking for. it won't work for the other twenty percent and never will, But I'm waiting for that day when that happens and it drops on pie torch. Uh, I'll be doing an integration with that model. definitely,
Michael_Berk:
I agree that that's exactly right. That's how I see it is when time series forecasting for learning is still very much in the research face. You
Ben_Wilson:
For
Michael_Berk:
can.
Ben_Wilson:
now.
Michael_Berk:
you can still use it for your curiosities. But yeah,
Ben_Wilson:
That's a good bit
Michael_Berk:
Uh.
Ben_Wilson:
of advice as well. Uh, never be afraid to try something. Um,
Michael_Berk:
Yes,
Ben_Wilson:
but always try it before you're using it.
Michael_Berk:
Yes,
Michael_Berk:
Cool.
Michael_Berk:
and yeah, I think
Michael_Berk:
Set. I'm sorry. Go
Michael_Berk:
yes.
Michael_Berk:
ahead, Manu.
Michael_Berk:
I was just saying that In time sees forecasting is where it's like. very like we say that Okay, there are Santaiesforcasting and then we probably go to the Internet. Look at a future to torialswe'll. see this airlines passenger data said, Which goes beautifully And then you learn all of this. You go to a real world. time sees you look at it, and we'll be like. What this? I've
Ben_Wilson:
Yep,
Michael_Berk:
not seen Any of this in any of the tutorials that I've seen, and that's kind of like, Basically the one of the biggest grapes that I had when I started writing this book Right was? Was this whatever tutorial that you look into or whatever book that you read about time series? All of them kind of start with remaclassical models, and they stop there there. Go ahead there on. like the sort of Kay. there is a model Arima, there is a smodelexme smoothing profit. This is how you use it and that's it. But when in my experience, when I was working in the industry with real data, these set of methods work on a very small subset of the the whole universe that you have to focus your high selling in retail, our high selling, very smooth one. They work perfectly, but that those are typically your ten per cent, twenty percent of your whole universe that you have to focus. so I was kind of wondering like where are the tools or where is the literature that talks about tools which can handle this large part of your time series, or if your universe, which it's completely missing, And that's why I think this is like a stays that somebody needs to fill in, And that's the initial thought. Roses behind the book case.
Ben_Wilson:
Yeah, there's no greater feeling of dejection than after getting like buying a book on this stuff or taking a course. And then you're working out like a retail or something, and somebody's like we need to. We need to forecast the sales of relax watches like, Okay, Yeah, that sounds like a great idea, Nd. then you look through the data base and you're like we sold one in the last three weeks and they want daily predictions. What do I do with this? I heard Pro. It works with missing data. You fit the profit model and it's going to predict that you're going to sell the equivalent of Rolexwatches Because there was three sales last week all by one person who's a reseller, And then you look. Oh, Forecast next week is enough role watches to equal the mass of the moon, So you're like Okay, Maybe I need a different solution here. So yeah, Been there. That was early on in my careers with Time series models, Was just seeing stuff Like that were like. I can't
Michael_Berk:
Yes,
Ben_Wilson:
use Arima here, so I need to do something else like predict
Michael_Berk:
A
Ben_Wilson:
you know per day whether a sale is going to happen or not using a legistic regression model, and just have this massive feature set based on all of the properties of. you. know how many people came to a store or how many people logged in? What did they look at this day? And I can predict if you know tomorrow somebody is going to buy one of these things, So you have to get like, really creative and it's not simple. But yeah, you
Michael_Berk:
Yes,
Ben_Wilson:
can't can't apply Arima to that because it's It's a no data set most of the way.
Michael_Berk:
Yes, there's hardly any information right, like one spike in two weeks and that's it.
Ben_Wilson:
Yep, let's let's predict the sales of boogatis in Ohio.
Michael_Berk:
Yeah,
Ben_Wilson:
Yeah, it's not going to work out too well. Not
Michael_Berk:
I mean,
Ben_Wilson:
not to pick on Ohio.
Michael_Berk:
have you
Ben_Wilson:
Sure
Michael_Berk:
checked
Ben_Wilson:
there's
Michael_Berk:
that Ohio doesn't purchase millions of Bugattis daily?
Michael_Berk:
Uh,
Ben_Wilson:
I have not checked. I don't have the data on that. I'm assuming this could be a biased assumption that he's not a lot of bigot sales in Columbus, Ohio.
Michael_Berk:
I think that's a valid assumption. Um, cool. So we're, we're at time. So I will quickly wrap. So we talked a lot about time series data and sort of PyTorch libraries and a bunch of other cool things. But some things that stuck out to me are that tabular
Michael_Berk:
Yeah,
Michael_Berk:
data are data that have rows and columns and PyTorch doesn't natively support tabular data. So. Manu created a library called PyTorch tabular that allows you to specify simple parameters like column names, types, and desired transformations
Michael_Berk:
Oh
Michael_Berk:
so you can manipulate this tabular data. And then regarding open source tooling, if you want your project to grow, it's really useful to provide examples. It's also relevant
Michael_Berk:
yeah,
Michael_Berk:
that the majority of the contributions come from a small set of contributors, classic long tail distribution, and then also leveraging your network or other networks. Is sometimes a necessary component of growing an open source library.
Michael_Berk:
Oh,
Michael_Berk:
But the key under all of this is build something good. If it's not good, it won't get
Michael_Berk:
ye.
Michael_Berk:
used. If it's good, it doesn't guarantee that it will get used, but if it's not good, you're done. And then finally, negative
Michael_Berk:
Yeah.
Michael_Berk:
feedback is worth plus 1 million points. So
Michael_Berk:
Oh
Michael_Berk:
Manu, do you have, so do you have any, uh, or sorry. So Manu, if people want
Michael_Berk:
yeah.
Michael_Berk:
to reach out, how can they get in contact?
Michael_Berk:
All right, so I'm on Linked on. That's the platform I'm most active in ing the link Ion, Slash, my new Joseph, W. M. and I'm also on Twitter. Although more of a raider than any posting. So again, Same idea, my new Joseph W. M. So these are the two places are n't there You can reach out to me? I'm always open for Collaboration talking to new folks,
Michael_Berk:
Great,
Ben_Wilson:
Please
Michael_Berk:
well
Ben_Wilson:
don't put
Michael_Berk:
thank.
Ben_Wilson:
issues on his rep, Just to say high. That's very distracting. but definitely check out
Michael_Berk:
Uh?
Ben_Wilson:
his rep. It's cool,
Michael_Berk:
Cool. Well, until next time, it's been Michael Burke and my co-host.
Ben_Wilson:
Ban Wilson.
Michael_Berk:
And have a good day, everyone.
Ben_Wilson:
Take it easy.
Deep Learning for Tabular and Time Series Data - ML 104
0:00
Playback Speed: