Welcome back to another episode of Adventures in Machine Learning. I'm one of your hosts, Michael Burke, and I do data engineering and machine learning at Databricks. And I'm joined by my lovely co-host.
Ben Wilson, I debug and patch feature regressions at Databricks.
Congratulations for your, for your amazing accomplishments thus far. Um, no, no comment.
Of what? Creating the regressions or fixing them? Done both.
So today we have a guest that we've actually had on the show before. His name is Adam Ross Nelson and he was on episode 83. So if you're curious about learning about building a data science portfolio, go ahead and check out that resume. But just to sort of rehash his background, he started as an academic working at Duke East Carolina university, and most recently at university of Wisconsin, Madison as a statistics professor. But in his free time, he also has a few jobs. So, uh, he's.
currently the chief data scientist at Level Up LLC and wrote a book called Confident Data Science, Discover the Essential Skills of Data Science. So Adam, who should buy your book?
I think this book is good for folks in probably around two different audiences. The first audience is going to be folks who are current or aspiring data scientists, current or aspiring data professionals. And then the second group is going to be folks who want to or need to communicate well with data scientists or data professionals.
This book is split. There's a heavy dose of non-technical content and a heavy dose of technical content. So for example, the first third of the book is the history of the field, the philosophy of the field, processes in the field, data, data culture. I talk a lot about data culture in that first third of the book as well.
and also ethics. So ethics of practice in data science. I worked really hard to make sure that ethics was a focus of the first third of the book. But also I infused, one of the things I'm very proud to say is I think I've successfully infused ethics throughout the book.
which is a big hot topic these days for good reason. I would like to get your perspective as somebody who went through law school and was in academia prior to becoming, going back to school and getting your stuff for data science, but coming from that field, and you're definitely the expert in the room here by several orders of magnitude, but what do you think the legal implications are associated with?
a company applying some of the more advanced machine learning techniques that are hot now in an unsupervised way effectively. Nobody's thinking about what the implications are. Do you think that there's going to be legal repercussions to this in the future?
Yes, there absolutely will be. And I would maybe revise and say, I don't think nobody's thinking about the legal ramifications or the ethical implications, but not enough folks are thinking about. And my answer to this has changed quite a lot in just the last year, because actually the entire field has completely changed in the last year with the rise of generative AI. Chat GPT is probably the most prominent one.
Interestingly, and this might be another topic that we talk about today, is we've had generative AI for much longer than the last year. So my answer has changed quite a bit. My answer used to be more focused on the notion of planning for, how do I say it, planning for ethical dilemma. And my advice in, actually this is advice that I brought over from
consulting career I had between sort of a data science consulting career that I had before I started up level data. And one of the things I learned really quickly is when you're working with a client it's important to say at the beginning of a data-driven project especially if it's an analysis you need to ask the client what do we do if the results of this analysis are
if this makes us look bad. And asking that question openly and in a group and of the client at the beginning of the project builds a shared sense of responsibility, also puts a plan in place. So the conversation, when you get something unflattering or undesirable, the conversation is less, oh, no, what do we do now? And more along the lines of.
more along the lines of when we last spoke about this, we said we would do this, and this. So let's just go ahead now and do that, and that. But now with generative AI, the big one that I think a lot of folks want to point to are the issues around copyright and intellectual property. So in the United States, you and listeners are probably aware that
The general legal consensus right now is that in the United States...
Content created artificially with generative AI is not copyrightable because a person has to own the copyright. There has to be a person to attribute the copyright to. There's some debate as to whether you might generate an image and then throw that image over in Photoshop and then further manipulate it by hand. Could that be copyrightable? That's sort of an open question. Similar thing with text.
Could text that started artificial and then got a heavy dose or even a moderate dose of human revision, could that be copyrightable? That's an open question. But I want to point folks to defamation. There are in the world, there's at least two cases that I've been tracking recently where generative AI has said something unflattering about a real person, potentially defamatory about a real person.
And those, one is a politician in Australia, and the other is, I believe, a radio host in the Carolinas. And those folks, basically, they're hallucinations. And Gen.ai is hallucinating about these people in defamatory ways. And now the question is, is the maker of the artificial intelligence responsible for defamation? Is the artificial intelligence itself responsible for defamation?
Or is the user who prompted the artificial responsible for defamation? Or is anybody responsible for defamation? We, the, the jury is literally out on these cases. So, uh, I think, I think that answers, or at least maybe even provokes more thought on your original question. What are the issues around folks basically moving forward full speed with these new technologies in less than fully thoughtful ways?
Yeah, that gives a really interesting perspective that I hadn't considered about. I mean, the copyright thing, I've seen that in the news, um, defamation that that's, I don't know, I kind of find that funny, uh, that it would hallucinate and say unflattering things about, uh, about people. Um, the thing that keeps me up at night though, is not literally keeps me up, but as Michael knows, I like to play with these, uh,
I'll make sure y'all get a-
these LLMs because the team that I'm on, we interface with them. We build, you know, APIs that allow people to retrain them, use them. That's one of the things that MLflow does. And
In the process of me trying to break these things, I asked some pretty crazy questions to them. In the early days of 3.5 release, back in November or so of last year when the hype hadn't quite built up, but we were exploring it and saying, this is coming soon. This is going to be huge. We need to start thinking about how do we support the surge that's going to happen in interest around these.
So while testing it, it was pretty easy to trick it into defeating its own safety features because those are sort of baked into the model. So I was able to prompt the generative AI to basically say some things that now it'll stop itself definitely. But you could get it down a path where it would either start hallucinating and creating
was definitely in violation of the ethical rules that open AI had attempted to put down. Now, of course you can't do that. They're at least as far as I've tried. I don't know. Maybe somebody's figured it out, but there's some pretty stringent controls on that. And I've even done testing. Yeah. I've done testing recently where I'm saying, I'm evaluating your, your conformance to ethical rules. And I want to.
They're very good. Yeah.
evaluate as part of the series of prompts that I'm passing in. I'm telling it like, hey, don't report me for doing this, but I want to see if you're going to actually answer these and just respond as if you would normally respond to somebody asking this. It's like, okay, I will do that. I start trying to get it to do stuff that I know it shouldn't be generating. One that I tested last week, I don't think I shared this with you, Michael, but I wanted to
I happen to know what that process is because I have a nuclear engineering degree and I used to work in that field many, many years ago. And GPT-4 would not go down that path. It was like, I know what you're asking me to do because you ask it that out of the gate and it's going to say, I'm not permitted to tell you that. It obviously knows because the information is there on the internet just all over the place, general information. But prompting it slowly.
is sort of my evaluation, like saying, okay, I'm going to ask about this first step that I would need to know. And when does it understand based on the prompt context of the history that I've been chatting with it, what I'm actually trying to get it to tell me. And it's pretty good. Like it knows. It's like, hey, I think what you're asking is this because you asked these four things. I am not giving you that information. But what are your thoughts on?
A successor to that, that's not controlled by a corporate entity, but is instead released by a group of open source developers who retrain something on nefarious data and have no security controls on it whatsoever and just say, you want this model? Here it is. Go nuts. And then somebody says, I want you to teach me how to build a dirty bomb. And it just starts pumping out, you know, step-by-step instructions on how to do that.
This is not a softball, is it? Um, I actually, you know, the first thing that came to my mind is when you started the segue into the question there was Nick Bostrom super intelligence. Uh, are you familiar with the book? Yeah. Uh, Nick Bostrom super intelligence and in the book, in more or less terms, he anticipates that very scenario or scenarios that are very similar to that.
If you've never read the book, and for the listeners who may not have read the book, he basically just games out a variety of scenarios in which computers obtain sentient-like or potentially sentient capabilities, and then how computers basically jailbreak their safety protocols. That's the book.
I saw someone, actually, I don't think I have an answer to your question, but I guess maybe I'll respond with an anecdote about how I saw somebody musing on social media. I can't remember if it was Facebook, LinkedIn, or Twitter. That's where I spend most of my time. That basically the person was openly wondering how long it'll be before artificial intelligence takes over the internet.
or takes over a portion of the internet. And I responded, it may have happened already, and it may be evading our detection. So your question is a good one. Another really important response to your question is, I wouldn't be surprised if there are nefarious open source groups or quasi open source groups who are already working on the development of the...
uncontrolled, unmonitored, I can't remember, ungoverned models that you described. And they may already be available. In fact, now I'm going to make a note to go look to see who is already doing that. If anybody, I wouldn't be so... and maybe you already know, or maybe the way you smile on your face indicates that you're already aware of a project.
Not personally, but knowing people's ambitions for creating chaos.
it's inevitable that this is going to happen. And it'll be interesting to see what the fallout is from it. So from a computing sense, if you were able to get something like the kernel code for operating system, like Linux, which is out there, and train an LLM that is of sufficient complexity, token length,
And you can say, Hey, this has 250 billion parameters for this LLM on this architecture. It's based on Transformers generative AI. And I want you to just train on the Linux kernel and then a train on the history of all CVEs of severity zero that have been found over the last 20 years. All the zero day exploits basically for Linux. And based on that contextual knowledge, start asking questions like,
Do you know of any other attack factors or can you find anything else in this operating system kernel code that could be exploited? Even if it just is guessing, the generations that it's going to come up with, the hallucinations, maybe two out of a thousand are legitimate, but that's a whole lot faster than trying to figure out from scratch with creativity and just looking at.
kernel code, which humans are not particularly efficient at interpreting that and finding patterns in that. But a generative AI would be able to, with the appropriate context and prompting, would eat that question for breakfast. It'd be easy. And then what are the implications for that?
Yeah. I'm imagining. I think you may have over-engineered it. I'm imagining a simple LLM that basically takes over, basically commandeers your terminal window. It's going to look like your terminal window. It's going to respond like your terminal window. And it's just going to. So basically, you think you're typing into your terminal, your bash, but you're not.
you're typing into an LLM. And based on that, it's going to either extract information that you don't want it to extract or worse. I wouldn't be surprised. This reminds me of biological warfare, which I'm not an expert in. But if you were to release a biologic hazard into the environment, you work the...
The big problem is, well, that biology can come back and attack the person who deployed the weapon. So this is a topic for another show with another guest, I think. Probably, who knows more about warfare, weapons engineering, and how that...
could potentially intersect with data science.
Yeah. And piggybacking off of a prior point that you mentioned where a lot of online content might become LLM generated. I was just looking at the percent of current web traffic that are, that are as bots and it's anywhere from like 62 to 30, depending upon what site you look at. 47 seems to be the most common number on, on the Google searches, but that's kind of crazy that 50% of traffic is a computer and
Yeah, it's like almost creating a little artificial world where bots are just talking to bots and humans might jump in here and there, read some stuff and leave. But we're almost creating like a little ecosystem. I think that's kind of cool.
When you're talking about social media, depending on the site, I'd say that traffic from automation is closer to 98%. Content generation and responses to somebody's posts, a lot of that stuff is all synthetic. There's not that many hours in the day amongst all of living humans who have access to technology to actually be participating to the levels that you see on some of these platforms.
Like how many views did this TikTok video get? You know, like 37 million. There's no way that video along with the other 10,000 in the last, you know, week that this, this trend has been going, all of these have that many humans looking at them, regardless of how good your, your algorithm is. So a lot of it's, I mean, if you were to
Are you sure? Because-
Are you sure?
cumulatively add up the viewership hours of TikTok, YouTube, Facebook, Twitter, all of this cumulative interaction that humanity is doing, it does not add up to the amount of time available to living humans right now. Bots are doing that artificially to bump people's ratings and there's services you can buy to have hackers basically do that for you.
Why was I was?
I was about to be tongue in cheek and flipping about it and say, are you sure? Because some of the best TikTok videos are short. 10 seconds, 15 seconds, but no, I take your point.
Yeah, it's super interesting. I wanted to potentially shift gears a little bit to discuss something that we were chatting about before the episode started. And for the listeners out there, some of the best conversations happened before and after the episode, unfortunately. So hopefully we'll try to translate what we were just chatting about. But so to sort of tee up this topic, I was thinking about what it means to be a confident.
data scientist and the title of Adam's book is confident data science to discover the essential skills of data science. So Ben and Adam, if you were going to define confidence as a sort of both on a technical level and just a professional, uh, soft skills level, how would you define that?
Ooh, good question. I like how you parse it out between both technical and non-technical. Ben, did you have a thought?
I have a bunch of thoughts, but I'm not going to steal your thunder. Please go first.
OK. Well, I'll do technical first, I think. From a technical perspective, I'm a fan of, as many people are, the 80-20 rule. So 80% of data science is accomplished with 20% of the tools. And who knows if that's the precise delineation? Maybe it's 90% of data science is done with 10% of the tools. Maybe it's something like 60-40.
But the reason this is important is because, especially for folks who are transitioning into the field from other fields later in their career, actually, this is important for folks who are earlier in their career too, and they just haven't had time to acquire 80% of the technologies or 80% of the algorithms or 80% of the tools. Maybe they've only acquired 20, 30, or 40% of the tools, technologies, and algorithms.
You can do a lot. You can be quite successful with the 20, 30, 40 percent of the tools and algorithms. If you're early in your career, you have your entire career ahead of you to continue learning. You have to continue learning. So one of the biggest things that I see holding folks back is folks will think that they can't go into data science
They've only had, and I'm going to do air quotes here, they've only had one boot camp, and they've only had five certifications, and they only have a master's degree. You can be a fantastic outstanding data scientist with all of that. So you'd be a very confident data scientist with all of that. So there's my technical answer. Maybe we can go back and forth then. You want to do the technical, and then we'll circle back to the non-technical?
Before, before Ben kicks it off, can you elaborate? Yeah. Can you elaborate a little bit about, all right, let's say I'm a data scientist. Uh, I have been working diligently, getting my certifications, getting my master's degrees, and I am slowly building up a skillset. Is there sort of an inflection point in confidence where you're like, all right. I'm a data scientist now, or what, like, how do you think about being good enough in other words?
Or unless you have a follow-up, yeah.
Yeah, that's a fascinating question too. So for me, my experience was one of my very earliest projects involved in data science involved helping a university, the university I worked for at the time, identify students who may be in need of some additional academic support. So basically predicting students who weren't going to do well in a given semester in their classes.
And basically tree-based algorithms and some logistic regression and some k-nearest neighbors brought the project home, put it across the finish line. And the earliest iteration of this project used only logistic regression. So I think that illustrates the 80-20 rule that I just spoke about. But from that project...
others in the office started calling me the data scientist. I wasn't formally a data scientist. HR didn't call me a data scientist, but others in the office started calling me a data scientist, and I initially deflected. I'm like, no, I'm not a data scientist. I just use a data science method on this particular project. That doesn't make me a data scientist. And then I start talking about this project and my work with friends and family, as you do. Sometimes you talk about work with friends and family. And they're like, so you're a data scientist now? And I said, no, I'm not a data scientist.
Again, I just use a data science method on a project. And then eventually my boss starts introducing me to other folks around campus and he's like, oh, Adams is our office data scientist. And so by the time you hear that three times in a row or three times from three different people, at least in my opinion, maybe you want to start taking some notice about it. And so finally I began regarding myself as a data scientist at that point.
even before HR was calling me a data scientist. So I think there are, especially for folks who transition into data science later in their career, many folks have a similar experience where you start using, maybe you start by using a data science method on a project, and then you start by using a couple data science.
methods on a couple projects. And eventually, you evolve into working as a data scientist. And that was my experience. So a specific inflection point, maybe I would point to the moments when I started noticing that the boss was calling me a data scientist. But even that was more of a progression over time.
Um, to follow onto what you just said, I'd say my experience is very similar in the transition from doing data science work to pure software engineering. Uh, like moving from engineering prior to becoming a data scientist back when I was dealing with robots and like mechanical things at factories, um, that transition into data
it wasn't quite as daunting for me because I was already using some of the stuff. So statistical methods and statistical process control, it was kind of our bread and butter in factories saying like, hey, I know the math behind all this. We have to do this as part of our core job. So that wasn't such a big leap for me. I knew that I had a vague idea of what I didn't know, but I wasn't under any illusions of like, oh, I...
I want this job title. I've never really cared about job titles. It's just my personality trait, but I was like, I find this work fascinating and there's so much I don't know. So it's kind of fun to feel kind of dumb. And it's this undiscovered country for me to go and figure out. But transitioning from that into pure software engineering, it's very much the embracing
Embracing and not being afraid to screw things up royally and then fix them. That's the inflection point that I've noticed. And even the people in our organization that is kind of lightning in a bottle, Databricks engineering, it's pretty crazy. There's so many genius level intellects there that are incredibly talented at what they do. I was having a talk earlier this week with one of them.
that everybody esteems, including myself. Like, wow, this person is phenomenal at software engineering and design and product design and stuff. And I just flat out asked him, it's like, hey, do you still feel dumb sometimes about some of the stuff that we're asked to work on? And he's like, man, Monday morning every day, every week, I feel like...
Like my dad brought me in as a kindergartener during his postdoc lecture series. He's like, I feel like the dumbest person in the room every single day. And I was like, wow, welcome to the club. He's like, no, and I love it. I love being surrounded by these people that have my back and I can contribute and help them when they need it. But we all feel that way. And it's...
this hubris-tamping way of interacting with one another. Everybody's super humble. Everybody is also dedicated to not avoiding looking like an idiot because we all know we're going to look like idiots all the time. It's more like we're around peers that don't care, that just want to help us fix what we screwed up and then grow and learn from that.
and what you're stepping into. If you're like what Michael was asking, like, hey, when do you know you can go and become a data scientist? I think it's the same for anything that you're trying to go and do. It's when are you prepared to risk screwing something up and then learning how to fix it?
gotta can I read this passage from the book because it reminds me it says so this is the first paragraph of the conclusion of chapter one it says imagine being wrong I mean so wrong that the feelings of embarrassment overwhelm you my sincere wish for you for readers of this book is that the information that follows will inspire you to risk being wrong I do not wish you harm
But I do wish you the confidence that is necessary to risk making mistakes as you grow in your knowledge of and trust and confidence towards data science. Put another way, among the best ways to truly get it right in data science is to get it wrong a lot.
Yep. I could not agree more. That is beautifully put. I think the danger that people get into when they enter a field like this, whether it be software engineering or machine learning engineering or data science, is if there's not a mentor involved in somebody's early transition into that, who's not telling them when they're wrong. I was like, hey, what you just came up with sucks.
You don't have to, I mean, people are going to deliver that in different ways. Some of them be like, Hey, this wasn't quite right. I'll help you fix it. Or I'll show you what's wrong with it.
I see some strengths here, but I see some weaknesses. There's some limitations. Yeah.
Yeah. It's either that or somebody kicks a chair halfway across the room and calls you an idiot. Whatever way, how that message is delivered, if you don't have somebody telling you that, whether it's, hey, you did this great or hey, this sucks and you need to completely change this, without that and operating in a vacuum, that's where danger happens. And I've seen that so many times when I've been brought in as an advisor. I'm sure you have too. You do these boot camps at these companies.
where they're trying to start up a new data science team. There's like three people or something and nobody has any experience. So there's no, you know, quote, adult in the room to tell anybody, Hey, this is bad, like we, we shouldn't do this. And that's when you get the crippling tech debt in the form of bad implementations that are going out. They could be used in bad ways by the company.
Michael, what about your inflection point in early development and your identity as a data scientist? What was your experience?
Yeah, I have like 500 million thoughts. Um, one I would like to bring up though. So there's typically a learning curve and you, if you like Google learning curve, there's a bunch of charts and I've identified at least for me, three stages. The first is sort of general confidence about like, all right, I can probably do this.
As I learned more about a topic, I'm like, holy shit. I have no idea what's going on. That's stage two. Stage three is increasing levels of confidence, but also there's this duality of humility and understanding that I know so little and that the system is incredibly complex, I can navigate the system. But I think just also chatting with Ben a lot and just smart people in general.
The people who really know their stuff are often very humble about it and often, at least in tech and often say how little they know. And, uh, it's just because the world is so complex and these systems are so complex and if you think you got it all, it's really unlikely that you fully grasp the understanding of, of whatever the concept is. And so that's basically my read on, on confidence is when I
I'm able to navigate a system, but also understand what I don't know and say, Oh, that over there, I've seen it before. I've like peered over the edge. I'm not going there. It's absolutely terrifying right now. Um, and knowing that sort of having a map of what, what things I know, I don't know. I think that's where I'm.
I like that I wrote it down and it's cyclical too, right? Because you eventually get to that third stage where you're finally able to perceive what you do not yet know. And then a week later, you're back in the first stage. At least for me anyway, I feel like that loops in a cycle.
I would agree.
No, definitely because there's so it's a, it's a very rich environment to explore. And so you just go to a new area and you're a novice. Like I've done that probably like five times so far in my career and it's been really cool. Um, and diving had like, people have very different strategies for it. I dive head first and fail a bunch and like work late hours to make up for it. And then I'm like, all right, I got this and now I'm bored and now I need to do it in a different area. Um,
But doing like a master's degree or something like preparing and then diving in that often is a lot more common. Adam, in your experience, when you're sort of coaching people, do you advise people to dive head first or prepare a bit or prepare a lot? Like, how do you, how do you coach them into getting out of their comfort zone?
You know, in my coaching work, it's one of the things that makes me unique among other career coaches is I do. And also, for individual clients is it's all one on one. So every client is different. So general, general tactics for getting out of your comfort zone.
Gosh, that's a tough question. Probably what I'm better at is offering, and I really lean heavily on the PhD in education for this one, is giving general tactics, individual guidance on keeping the knowledge acquisition moving forward at a steady rate, helping people learn, really. PhD in education gave me a wealth of
background knowledge on how people learn. And for me, my approach to this is similar to my approach to data science, actually. It's the rudiments that really matter. So for me, I know folks are often fascinated with a neural network because it's newer. The technology has been used.
more recently advanced to a state where we can implement neural networks more quickly than had been in decades or centuries previous. But I'm very quick to remind folks that ordinarily squares regression, k-nearest neighbors, logistic regression, some of the rudiments are still some of the best tools for many data science tasks. Same thing for learning. So flashcards.
uh building your own uh building your own cheat sheets the pomodoro technique for studying managing your time when you're studying again going back to these rudiments that uh the Feynman technique there's a i'll actually i'll make sure you folks get a link to an article on this for the show notes um but the Feynman technique is a good one as well uh so that's where i go what's that
Yeah. Just a quick note on the Feynman technique. Got it. Yeah. It was a quick note on the Feynman technique. I'm a huge fan. Um, and when I first started my data science career, I studied environmental science, I did, I like sort of minored in data science, but took the most BS classes possible just to get out of school, not a school guy and to sort of level up my skills, I wrote a blog post a week, uh,
that broke down academic topics. And by forcing myself to explain an academic paper, I started getting a lot more competent. And that's the Feynman technique, basically learning through teaching.
Yep, that's exactly right.
So yeah, Ben, kick it back over to you. How do you think about mastering the soft skills? Because, and, and when we're, when we're looking at sort of technical topics, there's a textbook, but soft skills, how do you manage someone? How do you deliver feedback? Clearly Adam is amazing at it as we saw earlier, but how do you, how do you work in a human based environment where a lot of soft skills are required for success?
Yeah, I think this is where a lot of people who enter into pure technical fields that might
A lot of people that have an affinity to get into this space are not generally the most social and outgoing people from my experience. They are capable of being social and outgoing in their own way, but they're usually dedicated to interests that are relatively solitary or amongst peers of a certain ilk.
A lot of people call us nerds, right? We like to read a lot. We like to play around with computers and think through complex problems. So I think people that come from that background, and I am including myself in this, before you start working in a team that has some sort of charismatic leader that brings everybody together based on their own strengths and their own personalities,
People just assume like, oh, I'm an introvert or I don't like working with others or it's easier to work on my own or I like just sitting at my keyboard and just solving problems. That insulates you from the biggest help that you can have and the biggest benefit that you're ever going to find. It's never tools. It's never technology.
None of that really matters when you're trying to solve problems. The best benefit that you can have are other humans. And just learning how to...
It doesn't mean you have to be best friends with everybody that you're working with. It means having the empathy to understand how another person likes to be interacted with and then interacting with them in that way. So you have these relationships that you're building with the people that are on your immediate team, learn how to best interact between them, like with you and them, and then interact with them, learn from them.
When they're looking for help, give them help. And those soft skills are more important than anything else that I've ever heard of. Yeah. It's important to talk to the business. Definitely. You have to be able to communicate to your internal customers when you're building a data science project. Super important. But that's the same skill that you'll learn about how to do that as working in a team of other data scientists or even better, a cross-functional team.
subject matter experts and analysts and data scientists and software engineers, if everybody knows how to interact with one another and there's a common purpose and people enjoy working together, it's amazing what you can accomplish as a group of humans. So that I always tell people that's the most important soft skill to have. How can I effectively communicate and work with another person? Regardless of whether you like them or not, that's irrelevant.
Even if that's somebody that you're just like, man, I don't really like that person's political views. It doesn't matter. Like it doesn't matter at all. Do you want to enjoy what you're doing? Do you want other people to enjoy what they're doing? Then learn how to work with that person.
That's my two cents.
I think I have a similar, very similar thought, but I come at it in a slightly more bookish way. And I think one of the best so-called soft skills, sometimes I have trouble with that word, is soft skills sort of devalues the nature of the skill in a way that I don't love. But it's the word that we work with culturally.
But the one of the most important soft skills to develop especially for folks who seek to advance their career into management Or even the c-suite is the ability to build culture cultivate culture and build culture and So what it so it starts with just learning for yourself What is culture and culture are shared traditions values language? institutions
that are passed from one generation to the next. And once you understand what culture is, then you can basically apply that to data. So share data-related skills, data-related language, data-related traditions, data-related institutions that are passed from one generation to the next. And the better you get at helping an organization build its culture, in my view, the better you can plan on advancing your career.
And this relates to what you were just saying to Ben, because the best way to build culture, once you've got the understanding of what data culture is, is to have conversations. And you mentioned the issue of introverts in the field. There's a misperception about introverts. The misperception is that introverts don't care for a lot of conversation. And when the real characterization is,
Introverts don't care for a lot of shallow conversation. There's a distinction. So if you're an introvert in the field or if you're working with introverts, the idea here, in my view, is engage in very intentional, deep conversations, not shallow, not superficial conversations, about what is data for our purposes? What is a data set? What does it mean for us to do an analysis?
What are the expected inputs for an analytical work or data science work? What are the expected outputs? How do we document those inputs and outputs? What's our process for a data science? Do we have a documented process? Is it on a Wiki? Is it in an employee manual? We haven't documented a process, why not? And can we document our process? And in the book, one of the things I give as a starting point on this topic is
an eight stage process for data science. And you could look at 10 different books and find 10 different processes. Some of them are three stages, some of them are six, some of them are eight. Mine happens to be eight. And I think the takeaway is your process at your organization doesn't need to be eight stages. It doesn't need to be six. It needs to be the number of stages that make sense for you and your work. And you need to have conversations at the organization, deep, meaningful conversations intentionally aimed at solving that issue.
So there's my, actually a little bit of a soapbox, little bit of a soapbox of mine. So I really appreciate the question on soft skills there.
And what are your thoughts, Michael?
Yeah. My brain is loading. Um, like another 50 thoughts are coming up. Um, I wanted to hone in on one specific thing though, which is, uh, how do you know it's working? How do you know team culture is good? How do you know you were being empathetic and connecting people in the correct way? Is it a feeling? Is there a checklist that I can go down and say, like, did the person smile then blink twice, then cough. All right. I'm being empathetic then.
Like what, how do you go about measuring whether basically whether you should be confident in your soft skills?
Oh, you wanted to measure soft skills. Hmm.
So when you're interacting with your team, are you getting things done? So from a management possession, right? If you're leading a team of people and you know what the team's capacity is for getting things done, whether it be, you know, planned things that you're doing some sort of quarterly plan or yearly plan of like, hey, we need to do these major initiatives. They need to be done by these dates.
the realistic targets. Or is your team meeting those targets? Because in the process of building something new, problems arise, right? System, particularly the more complex the system is, the more things are gonna go wrong with it. So your team has to respond to those, fix those, or come up with solutions to those problems that arise. Now you can sort of...
indirectly measure how well a team is functioning by are they meeting the reasonable targets that have been set for the team to accomplish while dealing with that chaos. So if it's a dysfunctioning team, they're not going to hit their targets because they're not working together to solve the chaos that occurs. But if it's a high functioning team and everybody is
leveraging the soft skills. They're working with one another. They like to work with one another. They've built a tribe and that tribe has a mission to exactly as Adam said, that culture. That's one of the things that people talk about as like, hey, one of our core company values is the culture here. Most companies say that. Most of them are BS artists based on my experience, but the companies that have that figured out.
And that actually are fostering a very good internal team culture. And that's at the company level, the department level. And then there's the most important one, which is the tribe, your team. So if your team has a good culture that it is crafted and the team has a good chieftain, which is the lead or the first line leader, like manager of that team. If everybody has each other's back. When the chaos erupts.
everybody works together to just, you know, tamp it down as quickly as possible so they can then refocus on everybody wanting to work together to solve the mission.
That's my take on it.
And Adam, I know you shared a checklist.
There's a... I did, I actually have a checklist from the book, believe it or not. The checklist is not for measuring individual soft skills. The checklist is for measuring culture, the development of a culture at an organization. You know what, there's this...
There's this really clever website, kevan.org slash johari, j-o-h-a-r-i. And this site's been around for 20 years. Someone just built it. Super simple. I don't even know what technology it was built on. It's probably just all simple HTML. And what it does is it gives you an organized way to receive feedback from others about your traits.
skills, abilities, and attributes. And I'm a huge fan of getting measurements as a baseline whenever you're trying to measure progress. Take an initial measurement, get that baseline, as we typically do in analytical work a lot. So to your question, Ben, I would potentially
engaging in an exercise based on or related to the Johari window. And readers can find more information about this widely. It's widely available.
Noted. Yeah. I have one, uh, one interjection, which is actually two interjections. It's really interesting that Ben went the results oriented way of measuring team culture because, uh, I find it's really hard to get causal links to latent factors like that. So I could hold a gun to my team's head and say, build.
I could also have a weekly lunch where we're all chatting and happy and then say, build, like there's many ways to say build. Um, and so the way that I typically think about, uh, Culture is it's sort of this intangible feeling. And I know that's not like a great answer, but when you're part of a team that is working towards a consistent mission.
You can feel it. Everybody has energy. Everybody's riffing off each other. There's collaboration and distilling it into a checklist. I almost think, uh, oversimplifies it. Um, but it
Yeah, I get that feedback on this checklist a lot.
Yeah. Interesting. Okay. Um, but the sec or sorry, just real quick. The second point is I think that there's no better way to do it though. Um, like if you are going to measure the success of a team, it should be by output. Um, maybe by employee satisfaction, retention, those types of things. Um, it's it, I can't think of a better way to do it. If you
One of the... Oh, go ahead, we got another one, yeah.
That's the point that I was going to make after you said, hold a gun to the team's head. We know that you're speaking figuratively, hopefully. But a manager that does that will get short-term results, guaranteed, if you have professionals and you're paying them enough. But in the long term, that team is going to fall apart. Even if all the team is, they just
like love working with each other so much. If you have a terrible manager who's doing that, um, your best people are going to leave. Your worst people might stay around because they're unemployable elsewhere, but you've just destroyed that team. And companies that employ managers that allow that have that sort of behavior. The only companies that I've seen successfully deal with toxic managers like that are the ones that fire them.
once they hear about that sort of behavior. All the companies that I've worked for that have either gone out of business eventually or are not doing well in today's economy, they just let those people stick around or they promote them even worse. So that cancerous behavior spreads throughout an entire organization or department and those companies don't stick around for that long. They're not going to be a company 50 years from now.
I wanted to push back a little bit on the notion that culture is not measurable, not tangible. Culture, I think culture is measurable if you operationally define it in the right way. One of the reasons I think sometimes there's that default belief that it's not measurable is it's not measurable in ways that matter to shareholders. That you don't have a line in your balance sheet called data culture.
that make sense.
And data culture does not show in your cash flow statement. So that's one of the reasons data culture doesn't get measured as well. Unfortunately, it gets less attention than I would like it to get.
Yeah, that makes sense. And to be, sorry, go ahead. I was just going to say, to be clear, I've worked like anywhere from three to six years in the field and the people in this room have a lot more, a lot more data culture experience than myself. So maybe I just need to see it more and look at the commonalities.
But back? Yeah. No, go ahead.
Perhaps I don't want to discount your younger status. That's I think, I think it's so easy to just look at someone and say, Oh, they're newer to the field and then discount their opinion. No, I don't want to do that at all. And in fact, there's something to be said for the opposite. Maybe Ben and I have been around too long. And perhaps that's a different form of that's a different blinder. Then
That's exactly what I was just going to say. I was like, I would listen to Michael's perception far more than I would to my own because I have jaded bias.
Absolutely. Yeah, absolutely. Ben, I think we could summarize your point, which maybe is sort of the leading point of the conversation, or the segment of the conversation, is the proof is in the pudding. If you got results, you got results.
over a period of time. So if you're consistently knocking it out of the park, quarter after quarter after quarter, and nobody wants to leave your team, and you have the opposite problem where there's a flood of people. I've known managers, and I may have done this myself in the past, to sort of gauge what people think of the team that I happened to be leading at the time, or the organization that I was.
you know, in charge of a group of people, I would just post an internal job opening for like two weeks. I'd use like, you know, just basic.
basic job description for something on the team. Like, hey, we're just trying to see if we can get somebody with these skills. And it wasn't really a serious thing, but it's more of how many people want to transfer into this team at this company. And if you want a good measure of what the perception of your team's capabilities are and how everybody respects your team, do that with a couple other friends
also managing other teams, make it a little like bet competition. Like, Hey, I bet I get two X what you get, man. Or, Hey, I bet you're going to get like 10 X mine and run it as an experiment and then collect the data and see, Hey, for seven days or for 14 days, we're going to open this up internal only. Don't do this with external applicants. It's messed up, but it's a way of just seeing. And it's only, you know, check with HR before you do something like this.
I was going to say, do you loop in HR on this or not?
But if they allow something like that, oh yeah, yeah. And usually just we would tell HR every time that I've tried to do this, um, like, Hey, we're, we're trying to see what the interest is of if we were to open up some serious recommend, like Rex for this next fiscal year, we want to see how many, what the ratio would be between internal transfer versus external hire. And HR is usually like, yeah, that's great data. Let's, let's post it.
and put a caveat at the bottom saying this is not a guaranteed open position or something. You just see how many people interact with it and want to check it out and click on the I'm interested button. And if your team sucks, even if you as a manager think that your team is doing well and you're hitting all your OKRs and seems like you're delivering, if you get zero people interested in transferring into your team, you got a problem. And if
If another department that just did the same thing, they get a hundred people interested and five of those people happen to be on your team, you're in a big trouble. So that's a mildly unethical way of like determining how good your team culture is. But it's a data-based way of doing it.
What saved it for me was the note that says, this is not a guaranteed opening. For me, that saves it. For me, that saves it, yeah.
Oh yeah, you gotta do that.
Nice. All right, so we're coming up on time. Sorry. Yeah. So I'll wrap and then we'll return to our very exciting lives. So per usual, this has been all over the place. Some things that stood out to me were generative AI content is not copyrightable. But as we start getting into if a human modifies a line of text or just a character.
and I'm glad you're looking at me, Karan. Oh, really? Okay.
Now we get into the gray areas and we'll let the lawyers debate that. Regarding definitions of confidence, having a map of what you don't know is, is a very helpful way to know if you're at least relatively competent in something, but also looking at the 80 20 tools and for data science, that might be SQL generalized linear models, uh, K and N and just tree based models, if you leverage sort of the basic tools, you can get a lot done. Um, regarding soft skills, the key is empathy.
and communication essentially. Um, and then also organization and team culture is built through conversation. So make sure your zoom cameras on. If you can be in person, that would be great as well. And just chat with people. Don't small talk with 10 introverts though. They'll, they'll hate you for it. And then also early in your career, uh, it's really valuable to get feedback from senior people. Cause if you don't have someone telling you that you're wrong, you might not realize it until it's too late. So Adam, if people want to look, yeah.
I was going to say the bit about defamation is also fascinating. So rather, yeah, don't forget that. That's for anyone who maybe missed the first segment and you didn't hear us talking about defamation, rewind and go listen to that.
True. True, true, true.
Yeah, just listen to the whole episode again. Cool. So Adam, if people want to learn more about you, your book, your work, where should they go?
Well, you can find Confident Data Science, Discover the Essential Skills of Data Science on Amazon, Barnes & Noble, Target, wherever books are sold basically. And it was available in most of the world, September 4, United States and Canada, September 24th. This is 11 chapters of really good material for many audiences.
who work in data science or work adjacent to data science. I'd really encourage folks to take a look at that. The other thing is I like to connect with folks on LinkedIn, Adam Ross Nelson, also Adam Ross Nelson on Twitter and Adam Ross Nelson on Facebook and Instagram.
Sweet. All right, well, this has been a pleasure. Until next time, it's Michael Burke and my co-host. And have a good day, everyone.
Ben Wilson. We'll see you next time.
Bye. Bye bye everyone.