The Disruptive Power of Artificial Intelligence - ML 100
Have you ever wondered about the most promising industries in Machine Learning? Today we will learn from Avi Goldfarb, the chair of AI at the University of Toronto, about... -The most promising AI industries -Potential problems with powerful AI -The economics behind innovation
Special Guests:
Avi Goldfarb
Show Notes
Have you ever wondered about the most promising industries in Machine Learning? Today we will learn from Avi Goldfarb, the chair of AI at the University of Toronto, about...
- The most promising AI industries
- Potential problems with powerful AI
- The economics behind innovation
On YouTube
Sponsors
- Chuck's Resume Template
- Developer Book Club starting with Clean Architecture by Robert C. Martin
- Become a Top 1% Dev with a Top End Devs Membership
Links
Transcript
Michael_Berk:
Hello everyone. Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I'm joined by my co-host.
Ben_Wilson:
Wilson.
Michael_Berk:
And today we have a really, really exciting guest. I've been prepping for months now, not actually months, but a couple of days, and going through everything he has on the internet, and it's really, really insightful and fascinating. So today we're gonna be speaking with Dr. Avi Golfar. He's the Rotman chair in artificial intelligence and healthcare at the Rotman school of management at U Toronto. He's also a chief data scientist at the creative destruction lab, which has arguably the best name on the planet. And they essentially teach tech related startups, how to operate and scale. Uh, he's testified before the U S Senate published many papers. And then when he's bored, he also does a little teaching on the side. So Avi, what is something that our listeners should know about you that is not on your LinkedIn?
Avi_Goldfarb:
You didn't know about me. Well, so I grew up in Toronto, Canada, and then in grad school I was in Chicago, it was the late 90s, and I was an economist, and I was trying to figure out what industry I wanted to focus on. My second year paper was about the beer industry, it was about the decline of Schlitz beer in the 1970s, and I was looking for my third year paper, and my first idea was about advertising in the cigarette industry, and I had this moment saying, you know what? I don't want to be the beer and cigarettes guy. I want to study the industries of the future. And it was the late 90s and there was this industry evolving that no one knew anything about, the internet. And so I decided to try to get my head around that, figuring no matter what I found, it would have to be new because no one knew anything. And that worked. And it landed me a job in the business school at the University of Toronto. I spent the next 10 or so years studying the history of the internet. near history, but trying to understand how that all evolved. And then in our lab in 2012, we saw this, it was the very first year of our lab. It's a program for science-based startups. And we saw this company called Atomwise saying they were using artificial intelligence for drug discovery. And if you jump back about 10 years ago, that just seemed insane. So the idea, we didn't really think about artificial intelligence the way we do today. and to use it to discover new molecules in order to cure disease was totally out there. The next year we had a couple more AI companies coming through our lab. This is all because being based in Toronto, our computer science department has many of the world's leaders in artificial intelligence and machine learning, most notably Jeff Hinton. And then the next year we had this flood of companies, something like 20 or 25 of them, coming through our lab calling themselves artificial intelligence companies. At that point, my co-authors and I, J. Agarwal and Josh McHans, we'd all made our careers studying the internet. They also were looking in the 1990s just like me and thinking about the technology to study. We all decided, hey, you know what? This is the next thing. It's really exciting and let's try to get our heads around that. And so we took our economist lens to think through what this technology could do.
Michael_Berk:
Interesting. Okay. So you recently published a paper on general purpose technologies or GPTs. Um, do you think that ML will be a general purpose technology?
Avi_Goldfarb:
Yes, as part of a suite of technologies around data science. So machine learning is the, the paper is called, is machine learning a general purpose technology? And honestly, when we started the paper, we really hoped the answer would be yes, given my investments in machine learning, but it's academic research and sometimes things don't go exactly as you plan. And what we learned over the course of writing that paper is... Yes, machine learning is a general purpose technology, but it's in and of itself isn't everything. It's only useful if you combine it with a whole bunch of data related tools. And so maybe you want to aggregate it and call it data science, maybe you want to call it machine learning, maybe you want to call it data processing, but all those things together represent what's called a general purpose technology, which is there's these handful of technologies. that have had an outsized impact on the economy for a particular set of reasons, like the steam engine and electricity and semiconductors.
Ben_Wilson:
So one thing that you bring up in your book in the intro, which resonated with me in a way that I wasn't expecting, because there's a lot of books out there that talk about like, oh, this is what AI is, you know, is, or, you know, how it can be used and what the future of it is. And a lot of books I've read, they don't really resonate with somebody who's been doing it for a while, but yours, just blew me away with this presentation that you had about a company called Verifin and how that blew you away when you're looking at what they were doing. You're like, this is the leader in AI in Canada? This is the unicorn that came out of our country? But the way that you presented that with talking through, yes, it's this suite of general purpose can be put together, but there's that other part that made it actually useful for them, which is it's a real world use case that benefits from all of those tools. So do you see the sort of the democratization of disruptive applications of technology in a general sense? Is that where you think the future is going to be going? More people are going to figure that out.
Avi_Goldfarb:
Um. I think there's this handful of companies or applications where there's this beautiful combination of fortuitous circumstances that, well, we had the data in place, we already had the software-related processes to use the prediction that came out of the AI, and there was just this one missing link on figuring out some prediction score. So Verifin, they predict financial fraud. they were in that sweet spot. There was this, really everything else was already in place and there was this one missing link and you just had to find the missing link and it became an extraordinarily valuable company and a nice Canadian success story. But. I think that's not where most of the action's going to be 10, 15 years from now, but I caveat it with 10, 15 years from now, which is that in the short term, right, it's always easier to do things where everything else is set up. And so, you know, for companies looking through, you know, where is the short-term opportunity for me, it's going to be, okay, well, do you already have software in place or are you already doing some kind of prediction? Maybe it's even a machine-related prediction process, but not AI, and can you bring in AI and make it work? But the thing about low-hanging fruit is it gets picked pretty quickly. And the transformative opportunities for the economy, going back to Michael's point about general purpose technologies, the transformative opportunities for the economy are when you can take the prediction and actually serve an entirely new market and do things an entirely And so general purpose technology, the reason electricity and the steam engine and these other technologies have had an outsized impact on the economy, it's not that the innovation itself was so great. It's the innovation led to other innovations, which in turn created this positive feedback loop between the producing and using industries. So in electricity, you had innovations in end uses like electric motors and light bulbs that... in turn led to innovations in the way we say design buildings. Once you have electric light, you can make much bigger buildings with more interior space. But then once you did that, you needed new innovation upstream in terms of power sources, in terms of ways to get the electricity from the power source into the factories and into the buildings. And so this positive feedback loop led to the outsized impact. And what is so exciting about prediction machines is the potential for this positive feedback loop going way beyond where Verifin was, going way beyond,
Ben_Wilson:
Hmm.
Avi_Goldfarb:
okay, well, we already have our workflow, let's not mess with it, and let's just make it a little bit better. Let's sort of totally blow up our old workflow in order to deliver an entirely new kind of value to our customers or entirely new experience to our employees and our suppliers.
Ben_Wilson:
So how does that conversation go nowadays when your lab brings in a new, perhaps not a startup, because startups kinda, I think nowadays they have this sort of grok, they know, we need to start from the ground up with supporting this technology because it's gonna be part of our business regardless of what we're doing. But for... So, a Canadian company that might come in or US based company that has been around for 150 years and they're maybe 25 years behind the times with respect to adopting technology. How do you have that conversation with them aside from just saying, here, read this book, this will make it all make sense?
Avi_Goldfarb:
No, that makes a lot of sense. So we don't start with data, we don't start with AI or prediction or anything, we start with the mission. Okay, so what are you actually trying to do as a company? What does it mean for you to serve your customers well? So we start there and then we say, well, think about your standard operating procedures. Think about all the things you do as a company. How many of those things are about serving the mission? And how many of those things are about compensating for the fact that you don't really deliver what you're supposed to? And once companies get a sense of the various ways they're not delivering on their mission, then we can start the bigger picture strategic conversation about, okay, well, how do we use prediction machines to deliver real value? So for example, think about the best airports in the world, those airports that are rated the best in the world, Seoul, Incheon, Singapore, whatever it might be. They're multibillion dollar airports. They're spectacular. They have great shopping, great restaurants, amazing hotels, some of them have theaters and golf courses and greenhouses and all sorts of crazy stuff going on in there. And then you think, well, that's got to be the ultimate in the airport experience. But if you think about how the super rich fly, they don't fly like that at all.
Ben_Wilson:
Mm-hmm.
Avi_Goldfarb:
The, a private terminal looks like a shed. A private terminal, they just walk right through. Cause why? Because no one wants to spend time at an airport. Right? The ultimate in service of an airport is you arrive at the airport and you walk right on the plane and you take off. There's no time spent in the airport at all. Now, why do you spend time in the airport? Okay, so then, so that's, you're not delivering on your mission. Like even Seoul Incheon, the mission is to ensure smooth air transportation. And almost all of the billions and billions of dollars in architecture at the airport have nothing to do with ensuring smooth air transportation. It's about compensating people for the fact that they didn't get smooth air transportation. So get that conversation going. Then you say, well, how would prediction help? Let's say you had a good prediction about how long it would take to get to the airport for security. And you could get people to walk right on the plane. How does your industry look different? Are there opportunities to deliver that in parts of your company and not others? So you think through mission first, standard operating procedure second, to what extent do those support or really just compensate for the lack of delivery of the mission. And then we talk about prediction. Then once we have a sense of what those predictions are, then we dig into, okay, is this feasible? And when we talk about, is this feasible? It's all about data. So once we get a sense of what predictions you want to make, then we think through what data you have available, what data is available in the market, is there enough in order to build the predictions you want to build? And then the last piece, I guess, is some subtleties on the technical aspects of the prediction machines they're building. But that's at the end. Like, in the beginning, you've got to think big picture in order to say, well, Can you deliver value in a new way?
Ben_Wilson:
You just perfectly succinctly summarized a massive part of what this podcast is all about with Michael and I is
Michael_Berk:
I
Ben_Wilson:
talking
Michael_Berk:
would say
Ben_Wilson:
about.
Michael_Berk:
it's like 95% of this podcast is that
Ben_Wilson:
is
Michael_Berk:
concept.
Ben_Wilson:
not focusing on the technical aspects. Like anybody can do that. I mean, when you get to extreme scales where you're like, Hey, I want to be able to predict each individual passenger who's coming in, what time should they book an Uber so that they're going to get to the airport just at the right time. That gets it kind of crazy. You need specialized humans to build that infrastructure. But for the other 99% of use cases that are out there, anybody can learn it. that has somewhat of a technical background. The challenge of any sort of applied prediction usage is finding that right use case. What should we build? Why should we build it? How do we monitor it? And then how do we approach that through sort of the development process of Agile to say, this is a living breathing entity when we create this tool set that we need to change constantly. It's not, hey, we have fraud prediction. It's out there. We're done. It'll run forever and we'll just make money off of it. And it's like, that's not how it works. It has to evolve. And
Avi_Goldfarb:
Absolutely.
Ben_Wilson:
we should have just like record what you just said for the last couple of minutes and make that like the summary for the podcast, cause it was perfect. Yeah.
Avi_Goldfarb:
Sounds great, it's recorded I imagine.
Ben_Wilson:
But the one sort of thing that that made me think when you were talking about the disruptive nature of electricity, to come at it from a purely theoretical sort of negative perspective, I would like to hear your thoughts on the disruptive nature of something like electricity. When that came out, people had to through necessity figure out ways of generating more power. So by extension, we start figuring out, okay, what are the best ways to create a lot of heat? so that we can boil water and spin turbines. Uh, what's the cheapest way to, to make something really hot. Let's take the top off of that mountain. Mine it for coal ruins the environment there. We ruin, you know, the air CO2 levels increase when we apply that sort of thought process to disruptive technologies with prediction machines. What do you see as the downsides of this, uh, paradigm shift? that you and we both agree is happening in real time right now. And 15 years from now, it'll be commoditized. Do you see any downsides to it?
Avi_Goldfarb:
There's lots of risks. Maybe down the hill. What could go wrong? Okay. Well, that's a fun question to ask. So there's lots that could go wrong. I should caveat this with, in general, I'm quite optimistic of the potential of the technology. Not necessarily because the technology is so great, but I think a lot of our current processes are pretty bad. So there's lots of hope that we can do better. But what could go wrong? So if we build prediction machines that affect individual people, like predict whether you should get credit or predict whether you should get hired or things like that, and we build those machines off our current human decision making processes, whatever biases there are in the current system are going to get embedded and potentially exaggerated, but at least embedded in this new system. So the thing that could go wrong, number one, The, now it's not to be clear, the issue is not that the machines are gonna discriminate because humans discriminate. So yes, the machines are gonna discriminate. That's not worse than our current situation. What's worse than the current situation is they can discriminate at scale.
Ben_Wilson:
Yes.
Avi_Goldfarb:
And so even if there are humans who don't discriminate, maybe that will help the people who are discriminated against. Once this gets embedded in software at scale, then there's potential for discrimination at scale. So sort of set of worries around discrimination. There's a second set of worries around the, let's call it the future of work. Okay, so. If machines are doing lots of things that humans do, and effectively the aspects of our work that give us pleasure, it's an important caveat there. Not that they're doing our work, but the things that provide meaning in our lives and give us pleasure, then we should worry if the machines are substituting for humans. So is there work left that will give us meaning in life? or eight-thirds said worries are around inequality, which is if a handful of companies control the machines or a handful of people own them, or if using prediction machines is something that requires a lot of skill, taking advantage of that technology, then we might see an increase in inequality. So, there are three categories. We get discrimination. We get... increased inequality, increased concentration of power, and we get sort of lack of meaning and lack of work. Now, those are the worries. But I'm actually optimistic in all three. So I'm optimistic on discrimination. This is the last chapter of our book, Power and Prediction, which is if we think about if all we're doing with prediction machines is taking our current processes. extracting the human prediction, dropping the machine prediction, but keeping everything the same, then we have massive reasons to worry about discrimination,
Ben_Wilson:
Yes.
Avi_Goldfarb:
machine discrimination. But the thing is, machines are audible. And so we can figure out, we can identify the discrimination, and in principle, we can improve it. This is Sendhil Malayanathan, who's a professor at the University of Chicago, and won the MacArthur Genius Award for being a genius. Early in his career, he was studying offline discrimination. So you might remember there was this study where they sent resumes, some with the names Emily and Greg and some of it with the names Jamal and LaKisha. That was his study. And he said, we figured out that people discriminate, employers were discriminating, and it took thousands of dollars and thousands of hours to send that resume study. And even after the fact, we didn't know which employers were discriminating. We could just say on average people were discriminating. Fast forward 15 years in his research career, and he's studying whether there's discrimination in medicine, in particular, machine learning in medicine. And they ran a simulation. He describes as taking them about two hours. And they found that there was discrimination in the machine learning algorithm, and they shut it down and improved it. So instead of thousands of hours and thousands of dollars, it was a few hours work, and not really such a big deal. And so there's reasons to hope that we can identify discrimination. And then we can say, well, maybe our current processes aren't so good. And we can think through how to build better processes in order to reduce discrimination going forward. So the, as long as we think, this is an important caveat, as long as the people who control these machines care about reducing discrimination or recognize it is, um, it is something to pay some attention to. There's reasons to be optimistic because, partly because getting rid of discrimination in us humans is pretty hard, even for those who want to. So there's reasons to be optimistic that something could be better.
Michael_Berk:
Yeah. And the incentive to care can come from different areas. I think it's unlikely that Mark Zuckerberg is going to develop the biggest heart in the world and suddenly make ethical AI a gold standard at Metta, but public policy, public pressure, there are many ways to get a company or a person to care about being ethical. So I think it's not just up to our tech overlords.
Avi_Goldfarb:
I suspect most individuals who you think wouldn't care actually really do. It's just a hard problem. And but I, you know, the general point takeaway is yes, the forces can come from customers, they can come from suppliers, they can come from employees, and they can come from the management and the owners of the company all over the place and government for that.
Ben_Wilson:
Yeah, that's actually my prediction for two of your three worries is that eventually that'll be solved through legislation. Because it's going to be so commoditized across all industries, across humanity in general, I think it's inevitable that this is going to be widely used as a meta concept throughout our existence. I wouldn't say eliminating, but reducing that bias through intelligent design and review, whether it come from automated tooling, which there are, there's some pretty impressive work that's being done these days in evaluating that. But when I was working with companies in the field at Databricks, when I used to do that, what Michael does now, reviewing a lot of ML use cases that are in production. kind of look at the feature set and you're like, hang on. Like you do have gender in here. I see that of your users. Should you have that in here? And they're like, well, we can try removing it. Like, yeah, let's do that, please. Let's see what happens. And the model falls apart. The accuracy is garbage. And, you know, explaining it to them like, well, the accuracy is bad because that was your decision factor. with all of your validation data that you're doing, but let's release this on a small subset of your actual user base and see what happens. It doesn't improve. It's better than the validation error that we were getting, but it's not this market improvement that we were expecting. Then all of a sudden you realize that there's so much of the data that is collected is in itself. through the nature of its collection biased in the fact that, okay, it was deciding this way because you collect different data based on gender. There's different activity that's measured and that volume of that data from your users is different between men and women.
Avi_Goldfarb:
And if you can redesign your processes to try to improve what you're doing, you can make a big difference. Let me give you an example. This is paper by Lizzie Raymond, Danielle Lee, and Peter Bergman called... hiring as exploration or something along those lines. Here's the idea, they worked with a big company that hired lots of people, but these were very desirable jobs. They had lots and lots of Africans for every job. The company had historically hired men and had very few people from underprivileged groups. And so what did their current processes do when they said, OK, well, let's have an AI predict who we would hire? They did a pretty good job in predicting which white men were going to succeed in the company.
Ben_Wilson:
Mm-hmm.
Avi_Goldfarb:
Um, but the predictions for women say, so the predictions were bad or not bad, but they were just high variance. There was just, so the prediction came back saying, we don't know if this person's going to do well in the company. And because it was hard to get a job in the company, because they said no to most people, if a prediction for somebody came in and said, well, it's highly uncertain, then the normal process would be to say, well, forget it. We're not going to hire that person. And so if you embed the AI in their current hiring process, their current hiring system, it would just exacerbate any discrimination they already had. But they realized, well, the reason they weren't hiring those people isn't because they thought they were bad or good, because they just didn't know. There's an easy solution to that. Hire some people and learn. And so the company effectively deliberately hired people who had high upside potential. But they didn't know if they would succeed or not and invested in learning. And through that process, over time, discrimination went way down. They figured out which people from underrepresented groups were the right people for the company. And in the long run, the experiment was a success. And they had to rethink their hiring system in order to use machine learning tools well in order to reduce discrimination.
Ben_Wilson:
That's an amazing story and that resonates so well with some of the tactics that I've had to employ with certain use cases in industry where you do something like remove all of the potential bias from feature training data or even your feature evaluation data. Somebody uses standard practices, like, well, I need to validate it on existing data and They just have blinders on just myopically focusing entirely on a metric score and error loss. Like, no, no, no. We need to do scorched earth here. Let's run an experiment. Let's be scientists for a moment and let's collect some new data. And that process in and of itself, I've seen dozens of times in multiple different industries where people's eyes sort of open for the first time. where they start thinking about their business differently. Like maybe we shouldn't be making decisions based on this data. And maybe our decision to even use this data or even use this use case or think about our business and our entire industry in this way is flawed. And it's, I've seen it a couple of times where very big companies start thinking differently about, well, maybe we shouldn't even be in this space. Like we're trying to... optimize this problem and make money in this way. In the process of discovering and evaluating how flawed that system was, they uncover some new thing where they're like, hey, we're going to try out this other thing. They realize, whoa, there's a lot of money to be made here. Let's shift gears here. I saw it very famously with a startup company in New York City. I can't disclose who they are, but they used to have an app that almost nobody used. But one of the benefits that they had was that other apps paired with them for data collection. And instead of them charging for this app service, which their user base just wasn't really growing because nobody really cared. But that data pipeline and them being able to use what they knew how to do really well, which is distributed computing, geolocation data and figuring out what motivates people to go to physical locations on this planet. Compiling that, anonymizing it, and selling it to other companies. All of a sudden, their revenue went up well over 1 million percent in less than one year. And it was all through our optimization for finding new users with ML is not good. And then looking at their data and saying, hang on, this is actually valuable.
Avi_Goldfarb:
That's fascinating. That's another story on just thinking through, we think about where the opportunities lie. It's about rethinking what the business model is and being consistent with, well, what can we do better than anybody else? And how sometimes it's going to be, how can we use a prediction and how sometimes it's going to be, how can we build... help others build predictions in order to do things better.
Michael_Berk:
Yeah. And it sounds like they were sort of sitting on a gold mine. And when they developed the correct technical implementation, they could actually leverage that gold. So Avi, what are some other gold mines that you think will be prevalent in the next five, 10, 15 years in AI?
Avi_Goldfarb:
Cool. Self. I think there's, let's say, two categories. I'm getting an echo. Is that?
Michael_Berk:
Yeah, good call. Same here. I think Ben's computer dropped for a sec, so let's just...
Avi_Goldfarb:
better.
Michael_Berk:
Good
Avi_Goldfarb:
There
Michael_Berk:
on my
Avi_Goldfarb:
we
Michael_Berk:
end, yeah.
Avi_Goldfarb:
go. Okay.
Michael_Berk:
Here, I'll restart the question. Um, let's see, how did
Avi_Goldfarb:
Okay.
Michael_Berk:
I phrase it? Um, cool. So it seemed like that, that organization was sitting on a gold mine and with the correct technical implementation that they could actually leverage that gold mine. So what are some gold mines that you think will be prevalent in the next five, 10, 15 years for AI companies?
Avi_Goldfarb:
Okay, I think there's three categories of big opportunities. So category one are a lot of the cutting edge tools require massive amounts of compute. And so companies, the handful of companies that have built incredible compute facilities, whether it's Amazon and Google and Microsoft and others, they're gonna do well. they're going to be the underlying infrastructure for much of the future. And that's the old story that in the gold rush, people make the most money are the people making the tools. So that's category one. Category two are those who have data related to the story you just told. But not just any data. It has to be unique data, distinctive data. So just having lots of data in and of itself isn't valuable if you're going to be the predictions that you can make from your data are not that much better than the predictions that somebody else could make with other data that's out there publicly or easily available. And the industry that that resonates most closely with, for me, is in the context of health data and hospitals. And then many conversations with hospitals who say, we have this amazing data on all our patients, okay? And we know that health data is valuable, and so it must be worth a fortune on the open market. And you kind of have to have this conversation with them to point out, well, every other hospital in the country has roughly the same data. And so for most hospitals, there's nothing particularly distinct about their patient data or their medical imaging data or whatever it might be, even if you get over the privacy and the HIPAA concerns and all that because you can think about anonymity and other things like that, that for any one hospital, the marginal value of their data is often quite small. Even though when aggregated, if we had data from all the hospitals, it could be extraordinary. And there's people working on trying to solve that problem. Most notably, I guess, Zia Elbermeyer, who's a professor at Berkeley. Oh, and with Sendel, Melanathan, who we already mentioned earlier. Okay. And then the third category of opportunities, which I think is where most of the opportunities are going to lie, are around complements to predictions. So what are the things that you can't do? because you don't have a prediction. So I told the airport story, but maybe thinking more about healthcare, there's lots of pharma companies that have patents, that have treatments for stuff. And a constraint for many of them is identifying patients. Now imagine, especially for treatments that aren't for, that only a handful of people, or not a handful, but millions instead of tens of millions or hundreds of millions of people need. Now, if you had an excellent prediction technology, you need a prediction machine that could diagnose well. And in particular, it would diagnose the thing you have a patent for. Then your patent becomes much, much more valuable. You're providing the treatment. So as the pharma company, you may not even be using machine learning at all. But it turns out the machine learning is creating a feed of customers for the product that you have. that you have control of and that you can make money off. And I anticipate over the next five to 10 years, we're going to find more and more of these things where, oh, you know what? If only I had this prediction, I could extract value from this other part of the value chain. But because I don't know, because I lack information, I can't extract that value. So that's where, so there's these three categories. There's the building the infrastructure. There's having distinctive data. But I think that for most organizations, they're going to be benefiting through that third category of finding the right company.
Ben_Wilson:
To
Michael_Berk:
That's
Ben_Wilson:
piggyback
Michael_Berk:
fascinating.
Ben_Wilson:
on the healthcare analogy or use case that you mentioned, do you ever think that trust will be at such a level where legislators in countries will actually allow companies or researchers that aren't just doing pure academic research, but actually to take data sets that right now we're not legally allowed to join? clinical results with full sequence genome data across the entire population to say, let's actually figure out what drugs to build that will change DNA in such a way or provide some sort of RNA vaccine like thing that you can take that says, hey, we're no longer going to have Alzheimer's problems because of the 30 different causes of that, we have a a targeted drug for each of those 30 different things. We know you're the one who needs to take this one, so here you go.
Avi_Goldfarb:
Wow, so there's like 50 different opportunities in what you described and maybe 300 different challenges. So on trust and privacy, so there are good reasons to worry about healthcare data being shared. In terms of that information can easily be exploited to hurt those who have bad health or are predicted bad. So there's, you can legislate all you want and say it's illegal, but I think that's gonna be very, very hard to manage well at scale. So I think there's real reasons to worry there. So what are the, but at the same time, there is a trade-off between privacy and innovation. This is something, is doing a fair bit of work on justice, and we're starting to think through the challenges and... in privacy and online advertising about 10 years ago, that data is incredibly useful. And privacy regulation is about the restriction of data flows, typically. And the restriction of data flows means less innovation, at least in practice, in almost all cases. So how do we reconcile these two challenges? Piece number one is there are types of legislation that could be win-win. Arguably the most successful privacy regulation in the history of the world, or at least in the United States, is the Fair Credit Reporting Act in the 70s. You may not think of that as privacy regulation, but that's what it was. It said there's a central depository where firms put the data they have about your credit and you can go look and see if it's accurate.
Ben_Wilson:
Mm-hmm.
Avi_Goldfarb:
And that's a win-win for everybody because you can look and correct mistakes that help the companies. and you can see what people are saying about you and that helps you. Now, I guess I said it's a win-win for everybody. I guess it's not, it's a bigger win for the people who are low credit risk, than high, but people were being excluded in the old days anyway and they didn't know why and now they do. So that was like, so thinking through on the legislative side, what kind of things can we build and enforce that enable We somewhat improve trust and still have innovation use of data category number one. There's a regulatory innovation. And then there's technological innovation, which is there's an increasingly effective set of tools that allow you to use data while maintaining privacy. Differential privacy being the biggest headline there. Now it's to be clear. All of those tools, the data is going to be less useful and accurate than it would be if you didn't have to use those tools. It adds layers of bureaucracy, it adds layers of complication, and it, by definition, mixes stuff up. But once you have millions or billions of observations, maybe it doesn't matter that much, you can still innovate and create ways. So I think there's... My take is... There are... Some of those are legislative fixes, there's technological fixes. I don't think we're ever going to get to a point where all of our data, all of our health data is going to be in some central place where companies can go look and figure stuff out. But I think that is good. And my vision of, you know, I can't think of a utopian rather than dystopian world where all of our data is in the same place and easily accessible to anybody. Health data in particular.
Ben_Wilson:
Yeah, with health data, I think there's only a couple of organizations around the world that have an opportunity, one that our company works with NIH in the UK. They can join that data because they have all of it and they can anonymize it. And basically put like, Hey, if you have access to this data, you need to go through extremely rigorous training, but also you basically need a clearance. So because it is secret data about your citizens and they're making interesting head roads into that, but once they collect enough of that information, it's pretty fascinating with what you can do about it. They sell that, the models effectively and, and anonymized scrub data set to pharmaceutical companies saying, Hey, you should look into this because we have 800,000 data points where there's a correlation here that we think there's causation. So And there's already been drugs that have been developed in, uh, US pharmaceuticals that are also customers of ours based on that data. It's pretty fascinating stuff. Like Regeneron has made some, some amazing drugs based on that.
Avi_Goldfarb:
That's amazing because I've heard, I was wondering about that last commercial piece. So, you know, here in Ontario, where I'm sitting, we have amazing data, it's public health care system, just like in the UK. And so in principle, we have a data set with every data point on every citizen or every resident of the province, amazing 50 million people, every interaction with the health care system. They made amazing progress on structuring that data for the purposes of billing. You know, that's well-incentivized. They made pretty good progress for structuring that data for the purposes of some research and they built a very, very big wall on any commercial applications of the data. And so, you know, there's reasons to do that, but that public investment isn't gonna lead to a commercial industry if it's against the law for eating that data. I'm, the UK is doing that, that's amazing. There we go.
Ben_Wilson:
It's very restrictive and there's no possible way that you can tie back any of the data to any individual human. It's flat out impossible because of how they handle that security. But it's using highly sensitive data for the benefit of at least some parts of humanity. The other issue is something like that. When we're talking about taking clinical data. merging it to genetic data. Any geographical region in the world is going to be highly biased. It's like, Hey, if you're, uh, of Northern European descent, you're set, uh, for that data, if you come from anywhere else, you're an outlier, uh, just due to the genetic background of people that currently are under the NIH, which is unfortunate, you know, that it's not something that eventually humanity can get to the point where. Let's do altruistic things with. you know, making sure that healthcare is available and accessible and, you know, sort of fully thought out and implemented around the world. But
Avi_Goldfarb:
I'm
Ben_Wilson:
one
Avi_Goldfarb:
great.
Ben_Wilson:
privacy thing that your discussion made me think about was even with I wouldn't say it's laws, just sort of restrictions around data and companies trying to do the right thing. A company that I used to work for, we had information about all of our users. We knew if you had the right level of access, you could join this meta data set that gives browsing history, purchase history, shipping and receiving to... You couldn't get billing information. You don't want to join that. But... You could find out a snapshot of this person, any user, and the consumption side inside the data warehouse did stuff like scrub out people's first and last names, scrubbed out their address. We didn't want any of that data going anywhere because it's unethical to use any of that information. So there were certain joins that you couldn't do across different data warehouses. We found that with just a little bit of digging into the IP addresses that we were getting associated with every request that was coming into browsing or anytime they interacted with the website or the app, you could triangulate very easily and find out somebody's home address. It's exceptionally easy. And then it's a simple Google search to figure out who that person is that lives there and you'd be like, Hey, I know this person's name. It's obviously this person. They're buying. you know, clothes in this size for this gender. There's only one woman who lives at that house of that age range. So that's her. What are your thoughts or how do you discuss things like that? Like potential nefarious uses of the explosion of data collection to the companies that come talk to you.
Avi_Goldfarb:
Um. So more than completely, but here's where we talk about my MBA class. So first thing is know the law. The law around privacy and the use of data has been changing. You've got to figure out if what you're using is based on some global rules like GDPR or whether you can focus on your North American perspective or whatever it might be. Know the law. And that actually requires investment. It's not an easy thing. In other laws, you can kind of know them without, you know, with five minutes of work at most. On privacy, the laws are pretty subtle. And so it's worth the investment. Okay, second, think through, if our customers knew about our use of data in this way, how would they react? Simple question. And would they be like, oh yeah, of course. Or would they be upset by it? And if they'd be upset by it, this requires some subjective analysis. There's not like, OK, if it's a four, they'll be upset. And if it's three, they won't be. If they'll be upset by it, ask why. And think through, can you mitigate? Think through solution number one is don't do it. Solution number two is do it in a way that if you can figure out ways they wouldn't be upset. Solution number three is tell them about it. Ask for permission. And there's a whole. bunch of categories of, once you know the law, making sure you don't end up upsetting your customers in a way that in the long run hurts your business. I say customers, but sometimes it's supplier data or employee data or others data too. Same kind of thing.
Ben_Wilson:
It's like the tried and true question that in project review when I was working in the industry prior to being a vendor, I'd always ask the question at product review saying, what happens if this makes the page three of the New York Times?
Avi_Goldfarb:
Yeah.
Ben_Wilson:
And if people sat there and looked at each other and the room got really quiet, I knew instantly, like, yeah, we probably shouldn't really think this and brainstorm a little bit more. What we're talking about is a little bit shady. And
Avi_Goldfarb:
Exactly, exactly.
Ben_Wilson:
yeah, there's countless
Avi_Goldfarb:
Yeah.
Ben_Wilson:
companies that have maybe not first or third page of New York times, but they've definitely been mentioned to a point where customers are like, yeah, I'm going to uninstall that app or I'm never going to purchase from this company again, because I don't agree with this.
Michael_Berk:
And I have one more topic I would like to chat about. Um, we're, we're coming up on time, but I think both you Avi and then Ben, maybe on the more infrastructure side, both of you are uniquely positioned to, to give insight on this. And it comes, came from your talk at Stanford where you talked about the difference between electricity and steam engines and how when electricity was first implemented in a factory setting, it took about 40 years to reach over 50%. usage in factories nationwide. And this is because you had to reconfigure the factory, you had to change tooling, uh, all sorts of things like that. So zooming out a bit, let's consider ML is electricity. Where are we in that typical timeline of this back and forth between new use case generation, relevant technology for that use case generation and the initial innovation of ML. Where are we in this sort of historical period of a disruptive technology?
Avi_Goldfarb:
Great. So just on the history, Edison's light bulb patent was 1880. So you got to think of that as like day one of electricity in terms of clear commercial potential. And it's 1920s. It's a little over 40 years later that we saw, you know, most households and most factories adopting. And so for AI, ML, it feels like we're in the 1890s in some sense. So we, you know, there's lots of people who see the potential of the technology. But we haven't figured out what the factory of the future looks like, what the organization of the future looks like. There's been a couple of exceptions, but for the most part, we just don't know. Now, that doesn't mean it's another 30 years until we get there, because there's a lot more people thinking about it, and we can benefit from our understanding of history to recognize that we need to innovate and build a new kind of organization. And so, in terms of where we are, it feels like we're really... pretty close to the beginning. But I do think change can come fast, at least to particular industries, once we recognize what that organization in the future looks like. Once we recognize things like machine learning is prediction technology. So think about it as a prediction machine. Predictions are useful because they help us make better decisions, that's a piece too. But predictions aren't the whole decision. So they end up allowing us to reframe how decisions work. So. And then to say, well, in almost every company, decisions need to be coordinated. Decisions don't occur in isolation. And so the innovation is going to be thinking through how can we change the way we make decisions and how can we build new groups of decision making, new groups of decisions that take advantage of coordinated technology. In the book we talk about, in this book, The Empowered Prediction, we talk about a whole bunch of examples. decoupling the prediction for the rest of the decision. So once you have a prediction in place, actually can change who makes decisions. So you talk about, I remember there was a story, it wasn't entirely true or it was almost entirely not true, but there was a story that was making the rounds about four years ago saying Amazon was firing their warehouse workers automatically. They had an AI in their warehouses that was firing their workers. And the story, like in the press, was things like, well, you know, over the course of the day, there was a camera watching you and you get a score. And at the end of the day, if your score fell below some thresholds, you'd get an email saying you were fired. Not happening, didn't happen, never happened. But let's even take that story as true. It's still not an AI doing the hiring or the firing in this case. It's somebody at headquarters saying, we don't trust our warehouse managers to do their HR decisions. We're going to use AI to take those HR decisions away from the managers in the warehouses and centralize them so that we at headquarters can decide what a good performer looks like and who and sort of how to think about the scoring and what threshold actually gets you a threat or an email or a negative consequence. So point number one is you want to be thinking through new kinds of decisions. that you can move decision making across time and place because of AI. And then around coordinated decisions, there's lots of different examples there. But the essence of it is just because you have a great prediction in one part of your company, if no one else knows what you're doing, you might be better off just ignoring it. And so you need to think through
Ben_Wilson:
Thanks
Avi_Goldfarb:
how
Ben_Wilson:
for watching!
Avi_Goldfarb:
do you bring everything together. And again, I'll use another Amazon example. top of mind today. They make, there's a recommendation engine. Every time you go to Amazon, they sort of make some recommendations of what you might buy. 20 years ago, that recommendation engine for almost everybody said you should buy the DaVinci code. No matter who you were, they said, well, it's the most popular book. They're largely selling books at the time. And everybody got the same recommendation and they didn't have to worry about it The other decision that was coordinated with the recommendation, which is that they have it in inventory, was reliable. They just made sure they had tons of copies of the DaVinci Code. So no matter who came in and you said, okay, buy the DaVinci Code, and then if you bought it, it was in inventory and they could send it right to you. Today, they have a recommendation engine that recommends all sorts of things. But in order to make that work, they also have to have a good prediction tool on the inventory side to figure out what demand's gonna be. And the recommendation engine has to be coordinated with... the decisions on what to hold an inventory and what to ship. That's actually a much harder task. And they still struggle with it in various places. So sometimes there's a recommendation and they don't have it in inventory. Why do you suggest something to me that's not going to be shipped for a month or two? So the challenges in using a prediction, even if you have an excellent prediction in one part of the organization, depend on your ability to coordinate what's going on in the rest of the organization, like the other decisions along with it. And that's a sort of. challenge in sort of rethinking what things look like.
Ben_Wilson:
That's a really good analogy, which leads into my answer on this topic. When you're talking about Amazon and, and the complexity of systems involved in solving that problem. Uh, a lot of people don't realize what that system is that Amazon built. Even though their infrastructure runs on it now, it's AWS. AWS is that solution to that exact problem. That's what they had to eventually build. So all of the services that exist within Amazon web services is to solve that problem, that scale. It's, I've always just found that fascinating. A lot of the tech that was built to do that. It's monumental and it's disruptive. It's a massive disruptor to industry as well. But to, to follow on with the, the infrastructure question that you asked Michael on this, this topic about. Where are we right now? I'll piggyback on your answer, Avi, with timelines associated with electricity. I would a hundred percent agree that we're kind of in that 1890s prior to the 20th century start. Um, I would say that Companies that are adopting AI right now that don't have to retool a factory, but are building a new factory right now, or are starting something in their garage, it's far easier for them to adopt this.
Avi_Goldfarb:
Absolutely.
Ben_Wilson:
But if we were to take that analogy, move it forward 40 years from the 1880s to the 1920s and say... What would the automotive industry look? What would Henry Ford's factory look like without electricity? If that was all steam powered, everything in there, how long would it take to retool that factory and that entire industry to use electricity? It would take a while. And we have companies right now that are, they have petabytes of data and their industries have been around for, you know, prior to the 1880s, some of these companies. And them adopting new technology is just slower because they're huge. They're an ocean liner trying to stop on a dime in the middle of the ocean. It's just, you can't do that. They're not a speedboat like a small startup is. They'll eventually get there. They have to, or they'll go the way of the dinosaur. Like it's just inevitable. But I wanted to add one more point that I was thinking of is The adoption of AI and its utilization as a general tool that is ubiquitous throughout industry, I think eventually we'll get to a point where it's December 7th, 1941 in industry. And what did that do to the adoption of electricity and its use case in factories? All of a sudden we could retool an entire, you know, factories and assembly lines throughout the United States and Canada. We were very close allies during that conflict. But being able to produce 600 aircrafts a month from something that prior to that need and that necessity, we were doing one aircraft every two months. And necessity is the mother of invention. And I think pressure from industry in general, pressure of economics. in the world across all industries based on whether you're adopting this or not. Are you able to leverage the power of solving problems at scale? I think that's the economic World War II of industry. And I think it is coming sometime.
Michael_Berk:
That's crazy. All right. Well, I'll, I'll stay tuned for economic world war seven, I guess three. Um, so, so I'll quickly wrap and then we can have a bit of a call to action. So we talked a lot about, uh, high level AI and how it is related to economics. And there are a few problems that we see with AI. for the future. One is AI replicating humanity at scale. And this perpetuates systematic biases that are part of humanity today. So we either need to correct them or get creative in how to eliminate those biases. AI also might replace the fun work. This is something that people don't always think about, but if chat GPT can write books, authors are out of business. And then also there could be a potential monopoly on AI. Historically, a lot of people hold a little power and AI might be no different. But that said, there's hope. There's some opportunities. If you're maybe a smaller organization or just an individual person, some areas to look for are tooling data and then compliments to predictions. So you can sort of think about what can you do with a prediction and then try to develop based on that. And then one actionable point is that when thinking about sensitive data, it's good to first think about what is the loss say, and then next, what would happen if this issue was on the front page of the New York Times? Would customers be upset and then you can act accordingly. So those were a couple of nuggets. Avi, if people want to reach out to learn more about your books and research, where should they go?
Avi_Goldfarb:
Twitter or LinkedIn. And there aren't a lot of Ivy Gold fives in the world, so you can look me up and find my website pretty easily.
Michael_Berk:
All right, well this has been absolutely fascinating. Thank you Ivy for joining and it's been Michael Burke and my co-host.
Ben_Wilson:
Wilson.
Michael_Berk:
Have
Ben_Wilson:
And
Michael_Berk:
a good day everyone.
Ben_Wilson:
I would just like to add, if you haven't already, please check out these two books. Not only are they excellently written, some of the thoughts and ideas and analogies that are within there will resonate with, I think, not only laypeople, but also for like serious in the weeds ML practitioners to think about your entire discipline and domain in a completely different way. I'm planning on reading them, like each of them several times because they're just so well written. So definitely check them out. They're on Amazon. Actually,
Avi_Goldfarb:
So I'll just
Ben_Wilson:
not
Avi_Goldfarb:
jump
Ben_Wilson:
that expensive.
Avi_Goldfarb:
in here.
Ben_Wilson:
28 bucks for the most recent one if you want to get on Amazon today and hard cover.
Avi_Goldfarb:
And those are power and prediction and prediction machines. Let's make sure we get the... Thanks,
Michael_Berk:
Good
Avi_Goldfarb:
Ben.
Michael_Berk:
call. All right, well until next time, it's been Michael Burke and
Ben_Wilson:
I'm Ben Wilson.
Michael_Berk:
have a good day everyone.
Ben_Wilson:
Take it easy.
The Disruptive Power of Artificial Intelligence - ML 100
0:00
Playback Speed: