Bioinformatics and Programming with Ken Youens-Clark - ML 082

Michael Berk interviews Ken Youens-Clark today to discuss various topics including bioinformatics and programming, plus his career progressions including jazz drumming, technical writing, programming, academia, writing books, and solutions engineering.

Hosted by: Michael Berk
Special Guests: Ken Youens-Clark

Show Notes

Michael Berk interviews Ken Youens-Clark today to discuss various topics including bioinformatics and programming, plus his career progressions including jazz drumming, technical writing, programming, academia, writing books, and solutions engineering.  

In this episode…


  • Writing tests and type annotations
  • Project development lifecycle
  • Grading programmers with pass / fail
  • Soft skills within the industry
  • Bioinformatics and computer science
  • Prototyping and improving efficiencies
Connect with Ken Youens-Clark via email and LinkedIn

Sponsors


Links

Transcript


Michael_Berk:
Hello everyone, welcome back to another episode of Adventures in Machine Learning. I am one of your co-hosts, Michael Burke. Ben is actually out today. He had some exciting news and is taking a few weeks off. And so it will be just me and today I am joined by Ken Ewens-Clark. He has spent a lot of his career in academia but has recently switched over to bioinformatics. And I was wondering, Ken, if you could sort of introduce yourself and explain why you're famous.
 
 Ken Youens_Clark:
Well, I'm not sure if I'm famous, but my quick story is I graduated with an English degree a billion years ago and needed to figure out a way to make a living. We got into computers and programming, thought that was really cool. That led me into a bioinformatics lab in 2001. So I discovered this whole world of academia and research and this nexus of biology and computer science and thought that was super fascinating. And that's basically where I've spent most of my career, most of it in research, but at the beginning of this year I joined DNA Nexus, which is a commercial platform, a cloud based platform for enabling bioinformatics. And so along the way, especially during my time in academia, I got really interested in teaching. I really loved mentoring and working with junior developers. And I was a... I was really excited to get some classroom experience while I was at the University of Arizona, where I worked for several years and also earned my master's degree. And it was in teaching those classes that I started writing books, my first book. Actually, I was on this podcast a couple years ago to talk about Tiny Python Projects, which was my first book. I use that in the classroom several times to just teach the basics of like how to write a well-structured command line program in Python and also how to test it. Testing is really has become a crucial theme to all three of my books. My second one is called Mastering Python for Bioinformatics. I published that with O'Reilly in 2021. Continuing the theme of like here's how you use tests to write a good program. and make sure it's reproducible, which is crucial in science and research computing and bioinformatics. And I've really taken that forward. I use all the skills that I talk about in those books daily in my job working for some rather large customers that we have at DNAnexus. Just before I joined at the beginning of the year, I was able to finish up my third book called Command Line Rust. So all my books talk about how to write programs at the command line, the Unix command line, and how to test them. And so I just took the same formula that I had used for two books in Python. I used it on Rust. And honestly, I used it to teach myself Rust. I used that experience. I was enamored of Rust, had played around with it for a couple of years, and I was like, you know what? I want to get serious. I want to really figure this language out and understand all the different ideas, borrowing and pointers and things like that that I'd never encountered in dynamically typed languages like Perl and Python that I'd used most of my career. So that was kind of my deep dive into Rust and also continuing forward with using test driven development to write command line programs, which I think is a great way to learn any language just to write simple programs that you already understand the basics of. Like, you know, for instance, in this latest book. you write an implementation of cat or head, which probably everyone has used a dozen times a day. So that's how I've gotten to where I am. Not currently working on any new books, mostly just trying to keep my head above water with this new job. It's quite a challenge.
 
Michael_Berk:
And what are you doing in this new job?
 
 Ken Youens_Clark:
I'm, my title is Senior Solutions Engineer. It's a pre-sales role, but it's extremely technical. So my job is to, for instance, take a customer's existing pipelines, analysis pipelines, and implement them in our platform. And a lot of times that's pretty simple because they're already probably using Unix command line tools. And so a lot of times we're just putting them into maybe a Docker container, writing some Python or Bash to launch that and wrapping that up in our platform. But sometimes it's writing workflows from scratch, like the customer has a basic idea. They want to take some sequencing data. They want to go through these QC steps. They want to do these cleanup steps. They want to do these analysis steps. And then at the end, they want this custom output, which means I have to sit down and write a bunch of little... a lot of linking scripts a lot of times to, you know, translate one format to another or run some tools. So, and we use a lot of what's called, a language called WDL, workflow definition language. So it's a high level language for describing, you know, the first step in a pipeline is this tool, it takes these inputs, it creates these outputs, the outputs from that tool can then be linked to the next step. and on down it creates a chain and along the way you can implement basically kind of map reduce ideas. So if you have like a hundred files coming in, you can say scatter these over a bunch of nodes and take the outputs of those and reduce them in this step where you bring them all back together for some sort of analysis. So it's just a lot of data analysis and a lot of writing code and honestly writing a lot of tests. One of the longest engagements I've had so far was actually just writing bespoke Python code for a customer who didn't have in-house bioinformatics expertise. And so they came to us with a rough idea of some tool that they wanted written, and I just sat down and wrote it from scratch and put in like a hundred tests there to make sure that it works correctly. So yeah, that's pretty much what I'm doing. I'm a developer.
 
Michael_Berk:
Awesome. You touched on several really interesting points. And please stop me if you've covered these in the last podcast. I wasn't a host a couple of years ago. I joined like six months ago.
 
 Ken Youens_Clark:
Yeah.
 
Michael_Berk:
But what is your overall thought about, how do you approach testing in application? Because there's this dynamic where you can test unlimited things and use all of your time to make sure that the sum function of a given language is working properly. But you can also just assume that, for example, some, like numpy.sum actually sums to numbers.
 
 Ken Youens_Clark:
Mm-hmm.
 
Michael_Berk:
So where do you draw the line between over-testing and yet being confident in your code?
 
 Ken Youens_Clark:
Yeah, there's definitely a lot of spot testing. Anything that I write from scratch, I try to be fairly exhaustive. My functions, there's an interesting book from Manning called Five Lines of Code, which I think the general idea is that a function should be about five lines of code, which is maybe a little terse for me. I want a function to fit in a page, like probably 50 lines or less, but I'm fine with a one-line function. The simpler and smaller a function is, the easier it is to test, and, you know, exhaustively test. So if I'm writing something, I might take, you know, a large function, try to break it down into as many very small functions that have what I feel are exhaustive tests for those very, very limited ideas, and then piece them back together into something larger that's more, more... more complex. And you know those tests, each piece should be orthogonal, right? So, you know, fixing any one function has no effect on any of the other functions. It's interesting you brought up the NumPy sum, because I actually have run into some problems with that. You know, underneath it's implemented in C, and so it's susceptible to buffer overflow. So, you know, if you get into extremely large integer values and you're using NumPy, actually in my second book, the Mastering Python. I specifically point out that if you're trying to create, in this instance, I think it was the exercise where you're trying to find all of the possible DNA sequences that could lead to a particular protein sequence. And given as the protein sequence gets longer, the number of input DNA sequences gets exponentially larger such that it's very, very easy to overflow an integer value in NumPy. And so what's interesting is that actually in Python. You're only it's it the pythons native integers are unbounded Except to your name what however much memory you have so anyway We talked specifically about that, but you know some is a great function to talk about because obviously you can't exhaustively test that you can't add all the numbers in the universe so you do have to just And this was interesting with with the with the rust book one of my reviewers was just he was just so amazing and he really was able to say, well, we can use built into the language, you know, for instance, the upper and lower bounds of this data type. So, you know, for instance, a U32 or U64, you can actually say, well, for this function, going up to the bound, the upper and lower bounds for this data type, does it work at the extremes? Does it work with stuff in the middle? It's quite a bit different with Python because obviously, you know, we don't have that So that would, you know, I remember a CS professor telling me, for instance, if you're writing a function, try to handle like zero, one, and then some number of stuff. And just like that idea has served me really well. So when I'm trying to write a function, I say, well, what happens if I pass the empty string? Or what happens if I pass zero? What happens if I pass a rather large number, and then maybe something in the middle? And then, and then probably that's good enough. But the crucial thing that I find, especially when I'm dealing with input data from a customer, so often the input data is gonna be like a CSV file. And you have an idea of the kinds of data that you're gonna get for a particular column, you write a function to handle that column, and then it's working well enough, and then in production, it breaks because they supply the value that they didn't tell you they were gonna supply. It was something you were unexpected. So what I will often do is take that value, put it into a test, prove that my function fails with that input, go fix the function, run the test again, prove that it now works with the new value that I wasn't expecting when I was developing this, and now going forward I have a regression test. So I know I shouldn't fail when I see this value again. And so a lot of times my tests become this place where I accrue things that I wasn't expecting. and I use them just moving forward to kind of refine the code.
 
Michael_Berk:
That's really interesting. So it makes the project development life cycle a living breathing thing that slowly grows over time. Have you also seen your knowledge base around testing increase throughout your career so that you write better tests?
 
 Ken Youens_Clark:
Oh,
 
Michael_Berk:
Or
 
 Ken Youens_Clark:
yeah.
 
Michael_Berk:
did you have it down by like day two?
 
 Ken Youens_Clark:
Oh no. So just to go back to the beginning, I'm totally a self-taught programmer from the earliest days. A guy, I got a job as a technical writer. So I had an English degree, I can write a term paper, right? I was interested in computers and I was playing around with Microsoft Access databases and some really simple kind of macro programming. And a guy let me write a software. He had written some software. I wrote a manual for it. Then he saw some potential there, and he taught me Visual Basic, which was the language that that program was written in. And so I, for 20 years, really didn't do any testing at all. No one sat me down and said, here's the proper way to write a piece of software. I was just Wild West. Whatever worked. whatever the, you know, and I was writing code for customers and it's just like, if it worked and they didn't complain, we moved on to the next job. I was mostly working on extremely small teams or basically with myself, you know, just me with a customer writing some code. If it worked, we moved on. It was really when I started teaching that I started really getting involved in testing and I credit one of the worst students I ever had with making me a much better student, a much better teacher. At the time when we first started off, we were, my boss, Bonnie Hurwitz, who hired me at the University of Arizona, we had been colleagues together at the Cold Spring Harbor Laboratory Research Institute where we were working on like plant genomics databases. And we're good friends and stuff, and so she let me loose in the classroom. I was really excited about this. And at the time, we were both Perl hackers. I'm showing our age here. I really got into Perl in the late 90s because of web development, and then it took over in bioinformatics because it's so good at text processing. So we thought this is like 2015, 2016, we thought, well, we'll just teach our students Perl. And I had no real method for teaching. I was just like, hey, I've got a dozen students here, turn in a program, I'm just going to run them. manually on the command line and see if they work and I'll just give you a thumbs up. I was just basically looking kind of for pass fail here for these students. I was teaching them from the beginning. I'm just like, give me something that kinda is in the ballpark. Well, this one student turned in code that didn't even compile. It wasn't even syntactically correct. And he was like, well, I get partial credit, right? Because I mean, I turned something in. I'm like, no, that's not how this works. I mean, like it doesn't even compile. Like that, and so I had to have an objective way to say, this does meet some minimum requirements. And so I just like, oh, I have written tests before. I had written some CPAN modules and was trying to learn about proper software development. So I had gotten into testing a little bit. And so I started giving my students a test suite, a basic test suite. I said, you know, here's how you run the test suite. And now you will know before you even turn it in if it works. Like whatever percentage of tests pass, that's your grade. If you passed 80% of the test, great, you got an 80. That's just, it's just black and white here. And so I started using that, and then I started realizing, oh, if I actually show them the test one by one and say like, here is what this test is expecting, here's it actually, and I would order the test in a particular way, like first, your program has to exist and it has to be called this thing. Next, it has to actually print a usage statement. Next, it has to take an input value and do this thing with it. And so I was trying to use the tests in a particular way and that they were ordered so that the students would have kind of a roadmap to how do I start the program and add features and get it to be where it does all the things it's supposed to do. So I got much, much better at writing tests for that purpose. And then I started like teaching like, oh, no, I'm actually gonna teach them how to write tests and show them that this is a good way to learn how to break. a large problem into many smaller problems. And then, you know, it's kind of, so I'm a parent, I have three children, and I came across this quote years ago as a young parent. It said that setting an example for little children takes all the fun out of middle age. And it's true. Like, if you're gonna ask your kids to eat vegetables, you've gotta model that behavior. You've gotta eat vegetables. You gotta stop having a bowl of Captain Crunch, you know, right before bed, which I used to do, right? and just like, you have to model this good behavior. And so I started like everything I did, I was like, oh, I'm gonna have to write a test for this. And you just, I don't know, it just became just a part of almost my identity. Like if I'm gonna write a program, I'm gonna write a bunch of tests. And I feel like that's what has made me become a much, much better developer and test writer and author and teacher. So yeah, I guess that's how I got to here.
 
Michael_Berk:
Yeah, and I could not agree more that tests make you better in all areas of life. They help you modularize your thinking. One of the most impressive programmers that I've ever worked with, he doesn't write complex code, he doesn't write fancy code, but everything is so legible and so clean and makes so much sense and therefore is super, super testable. So not only can you make sure that it will run reliably, but you also can understand it super quickly. And you can also add to it and edit it and change accordingly. So testing it has been invaluable so far in my career. But
 
 Ken Youens_Clark:
I
 
Michael_Berk:
I
 
 Ken Youens_Clark:
would
 
Michael_Berk:
wanted
 
 Ken Youens_Clark:
just,
 
Michael_Berk:
to hone in. Sorry,
 
 Ken Youens_Clark:
I
 
Michael_Berk:
go ahead.
 
 Ken Youens_Clark:
want to just make one analogy if I can to literature. So two of my
 
Michael_Berk:
Yeah,
 
 Ken Youens_Clark:
favorite
 
Michael_Berk:
please.
 
 Ken Youens_Clark:
authors, and please, I have a wide range of authors, but I love David Foster Wallace and Ernest Hemingway, and those could be like red flags for some people because they were kind of problematic in many ways. Obviously Hemingway is super misogynistic, but from a writing style. They're so diverged. Where David Foster Wallace will have paragraph-long sentences with footnotes, and the footnotes have footnotes. And it's just insane how convoluted. But you really don't get lost in his sentences because they're so well-constructed. But they're just so complex. And you're just in awe of how he can put all that there. And then you go to Hemingway, where it's short declarative sentences. almost not even compound sentences, like, you know, just so to the point and very clear. And I would say that my goal when writing code is exactly the way that you describe your colleague. It's like short, extremely obvious functions. And one of the things I would say about tests is I like when I can, I'll put the very short function here and then right below it, the test. And hopefully both of them together fit on the screen. And the test serves as documentation. So you can say, oh, this function I can see takes this as input, this string, and when it's given this string, this is the data structure that it will return. That in addition to using type annotations, which I do in Python, and then obviously you have to in Rust, I think type annotations, documentation, tasks, they all lead to extremely, they can help lead to, simple code that is easy to understand and easy to refactor.
 
Michael_Berk:
Yeah, that's a phenomenal analogy. Hemingway versus yeah, that as like sort of a declarative analogy, he's, I think he would be the declarative programmer in English, I
 
 Ken Youens_Clark:
Mm-hmm.
 
Michael_Berk:
feel like. Um, but I was really curious. One thing that you brought up about grading, uh, was really interesting, which was you were trying to grade programmers based on an objective yes, no, and your default was, was essentially a pass fail. And when someone didn't have something that compiled, well, that it fails. How did you go from such a creative field as English, at least on the more fiction artistic side, where there aren't super hard and fast rules, how did you go from that creative world over to the very declarative, Boolean world of programming? How was that transition?
 
 Ken Youens_Clark:
I found it easy, not easy, I mean programming is not easy, it's not easy to learn, but it fits the way that I think. I think I really missed an opportunity to study mathematics too in college. I think that, you know, I actually, I got through calculus in high school and took the AP exam and was able to just not take math in undergrad because I was a bachelor of arts. And so. you know, at most I would have had to like think past algebra too, right, which I've finished in like 10th grade. So, I, there's something about the way that, I think that coding, it comes more easily to someone who knows how to use language, maybe than necessarily mathematics, but having a strong language background, and then if, I have over the years been trying to learn more mathematics, more the rigorous computer science underneath of what's going on. But I still feel like I approach coding from like, how expressive can I be? How obvious can I make this? Like obviously choosing good variable names, choosing good function names. I actually really, really rely on functional programming style so that it's a lot of times it'll be like, map this list through this function. And if you understand map, then you just kind of understand you're applying the function to each element and then you're transforming like that, or filter. Filter through wanted, the function wanted, this list of values. And so I feel like when I read my code, it's like, oh, this reads like a little English sentence. And so, and that was actually, I think, something that maybe attracted me to Perl for so long. Larry Wall, the creator, is a linguist by training, and he had a lot of ideas about bringing linguistic ideas to... expressiveness of Perl, which I think also leads to a lot of gotchas. Over the years I started understanding how easy it was to misunderstand Perl, just as it is easy to misunderstand English, especially in the written form. So I see a lot of corollaries between language usage and writing code, even very formal code. And maybe I should also clarify that A lot of the programs I write are very short, you know, in the only order of hundreds of lines of code in a particular, you know, script. So I wouldn't classify myself up there in the upper echelons of, you know, super complex coders who probably have a far better understanding of like memory management and like caching and, you know, all the stuff about computer engineering. I really have no clue how those things happen. I've strictly avoided languages like C, C++, Java, my whole career. I've relied more on dynamically typed languages like Perl, Python, JavaScript, Bash. And so maybe there's something there where I'm happier, I'm OK with the kind of the ambiguities that are present in those languages. Maybe they're closer to... to natural languages in that way. I don't know, maybe at this point I'm just rambling, but I do think that a background, a humanities and a classics background is a strength in a STEM field because writing code is one thing, but writing code that other people can understand, writing code that is documented, getting your ideas across, those maybe are soft skills, but they're extremely important to succeeding in this career. as important or more than just being able to write good code.
 
Michael_Berk:
Yeah, I could not agree more about bringing different fields into the technical realm. Oftentimes you need to work with humans and you need to build stuff for humans. And having not just a math background or not just a computer science background allows you to come at things from a different angle and problems from a different angle and having, I've at least found that having diverse team really leads to cool solutions and cool discussions. Um, I haven't ever worked with someone that came from an English background. so that one day.
 
 Ken Youens_Clark:
I would point out actually, you know, I graduated with an English background. I started as a music major. So
 
Michael_Berk:
Oh wow.
 
 Ken Youens_Clark:
and I was a drummer. I started as a jazz studies major at the University of North Texas. I bounced around. I changed my major several times. I really had no idea what I was doing in life. No direction. Never took a CS class in undergrad. Just took a bunch of humanities stuff, philosophy, religion. And never thought that I would become a coder. So, and actually it's funny, I've had a lot of music friends who either just kind of gave it over for technical kind of stuff like web development or graphic artist or systems. My roommate from college was also a music major. He actually finished with a jazz studies degree, phenomenal drummer, and he's mostly like a Unix sysadmin nowadays. So I think that there's a lot of, I think the formality, especially of music, we think, and especially in jazz, like if you're classically trained and you understand music theory, we talk about the form of a piece of music a lot. Very common in jazz to have like AABA kind of stuff. Talk about like, oh, this is a rhythm changes song, and that immediately tells you like a billion things you need to know about a particular piece of music. In classical, we talk about sonata form. or concerto form and things like that. And so structured thinking like that, I think, really lends itself well to thinking in code where you say, you want to modularize things. Like, you know, just like you said, you want to write a function for this. You don't want to repeat this idea over and over again. We do the same thing in music. If you've ever written out music and you're going to repeat four bars, there's no way you're going to want to sit there and write out the same four bars again. Instead, you bracket it in repeat marks and say repeat this twice, repeat this four times. I mean, my God, we're in a hurry here, right? It's the same thing when you're writing code. Like, you're not going to copy and paste this code all over the place. You're like, oh, no, that goes in a function. Oh, and while I'm at it, I'm going to write a test here, and now I'm done. You know, it's so much simpler to... It simplifies your code, really. That's what I'm trying to get at. Yeah, that was like a real inflection point in my life. So up to that point, I had been just working in industry, bounced around all different kind of just basic stuff. I was doing a lot of Windows desktop programming, like Visual Basic, and then I got into Delphi for a while, kind of client server stuff, learning lots about databases. That's been an integral part of everything I've done from the beginning is using databases. But it was all just basic stuff. order systems, you know, customers, the bank account kind of, you know, just your normal fare. Then I got a job at Boston.com, which was really cool because it was kind of a bigger platform, but it was still kind of the same basic stuff. It was news and entertainment instead. And then my boss at Boston.com was a technical reviewer on a book for a guy named Lincoln Stein who was a big Perl author, like both modules and books on Perl. And I saw that Lincoln was hiring. I didn't know anything about him apart from what he had done in the Pearl community, which was rather big at the time. He was kind of a big deal. And so between the connections there, but my boss knowing him, I was able to get my foot in the door at Cold Spring Harbor, which is a, Cold Spring Harbor is a small, small town. It's basically Gatsby territory on Long Island. I mean, it's like super wealthy. It was a whaling village in the 1800s. It has all these historic, it's just like a little tiny village. And in that little village is this world renowned research institute, mostly doing like cancer genomics and stuff like that. And Lincoln was, oh. I'm gonna wait here because I think my connection dropped... to my host. Ah! Sorry, I think.
 
Michael_Berk:
Sorry, I think. Yeah, I dropped out of nowhere. Sorry. So we'll just edit that together, but could you start back at Cold Spring Harbor intro? Like what it
 
 Ken Youens_Clark:
Yeah, I think
 
Michael_Berk:
is? Yeah, I think my
 
 Ken Youens_Clark:
my microphone is having a problem.
 
Michael_Berk:
microphone is having a problem. Well, I dropped, I think, due to wifi or something. So I think it's on my end, unless you're having issues as well. No, I can't hear anything you're saying. nothing. Still nothing, unfortunately. You can hear me okay though, right? I can't hear you but oh the chat, good call. Yeah, worth a try. The same link should work. Well, back. Wonderful.
 
 Ken Youens_Clark:
Did that.
 
Michael_Berk:
Did that.
 
 Ken Youens_Clark:
I'm afraid the microphone...
 
Michael_Berk:
afraid the microphone.
 
 Ken Youens_Clark:
I'm getting a...
 
Michael_Berk:
I'm getting a...
 
 Ken Youens_Clark:
Okay, I'm trying again.
 
Michael_Berk:
Okay, I'm trying again.
 
 Ken Youens_Clark:
I'm still hearing
 
Michael_Berk:
I'm still
 
 Ken Youens_Clark:
an echo.
 
Michael_Berk:
hearing an echo.
 
 Ken Youens_Clark:
I don't think it's using my headphones.
 
Michael_Berk:
I don't think it's using my headphones.
 
 Ken Youens_Clark:
Can you hear me?
 
Michael_Berk:
Strange. Can you hear me? Yeah, I can hear you great. Alright, can you test the mic now? nothing.
 
 Ken Youens_Clark:
Okay, just trying again. Okay, can you say something?
 
Michael_Berk:
Crystal clear. Yeah,
 
 Ken Youens_Clark:
Yeah,
 
Michael_Berk:
can you hear me?
 
 Ken Youens_Clark:
okay.
 
Michael_Berk:
Beautiful.
 
 Ken Youens_Clark:
Did the, did the hope the first part of it got recorded?
 
Michael_Berk:
Yeah, we're currently recording right now. Someone will edit it together.
 
 Ken Youens_Clark:
Okay, cool. All right, so it's funny I can hear the traffic at your sounds of the city.
 
Michael_Berk:
Oh yeah, sorry
 
 Ken Youens_Clark:
No,
 
Michael_Berk:
about that.
 
 Ken Youens_Clark:
no, no, it's good to hear. Anyway, back to Cold Spring Harbor. So Cold Spring Harbor is this little whaling village from the 1800s out on Long Island. Really cool little spot and it's got this world renowned campus for doing biomedical research. Anyway, I got a job working for Dr. Lincoln Stein there. Amazing pearl guy. But I had no idea of his work in biology. He's an MD, PhD from Harvard, MIT, super brilliant guy. He now works for the Ontario Institute for Cancer Research. Just he got his start in human genetics. He started working in mouse models. Then he started working in plants. He was extremely renowned for working in these tiny, tiny worms called nematodes. I created this thing called worm based dot org is one of the first websites devoted to studying to bring together all these resources for community of researchers like literature and he helped write what's called a genome browser so as as we're you know there were a lot of studies in in all kinds of organisms. You know tons of data, a lot of it just like would fit into an excel spreadsheet and you know these little kind of maps that they would create like they were able to figure out. uh... you know the chromosome structure engine generally where on a chromosome where they thought maybe these particular traits were but the the world exploded in the nineties when we started getting into sequencing so we could actually look at base by base the the crew you know the the composition of chromosomes uh... and in the in the early days was incredibly tedious and expensive it costs like ten billion dollars to do the human genome project we can do that project day Now we can do the whole thing in like probably a couple of weeks and a few thousand dollars. It's just absurd how far we've come. But in the early days, there were, you know, these maps as they were creating them, and you needed a way to visualize them. So he created a software called Generic Genome Browser that would just load up these data. And in the early days, they were still very small data sets, but you could visualize them in a web browser. You could look at these images to see where all these features were on the chromosomes. And so he brought all this together to create these websites for what are called model organisms. So things that people study, you know, have these communities around like say drosophila, which is a fruit fly, worms, nematodes. There in plants, there was a very, very small plant called Arabidopsis, a phthalocress. and it's very very easy to grow and grow growth chamber grows very quickly it's it's very understood as small genome you it it was a it would you know there these these mod these organisms that people study because they're it's easy to mutate them they have short life cycles you can look at how the how a mutation changes uh... yeast obviously people study that it's very you know it's a extremely single-celled organism uh... and it's it's a it's fairly easy to grow in a lab and study. And so I learned all this. I had no idea. The last biology class I had was in high school. Actually, I was just visiting with a high school friend a couple weeks ago when I went home. And we both talked about this teacher, this high school teacher, biology teacher. We hated her. She was a terrible teacher. And it was like that. I dropped her class because I hated it so much. And that left a bad taste in my mouth for biology. I'm like, I don't ever want to do biology. uh... which is really a game you need when i think about teaching i think at least i don't ever want to be that kind of teacher right that like turned someone off from from programming forever so it was it was it was really wild that i ended up in this this this bio informatics so uh... you know for the longest time biology had no need for computer science because the data sets were very small and and they were just doing what's called wet lab stuff so you know uh... you know actually in a laboratory aggored, you know, in pet petri dishes and all that kind of stuff. But then once sequencing came along, we started having data. And it started off as a drip, a little bit of data that would take months to get. And then we started getting sequencers that would automate that stuff and start creating a lot more data. And I remember I started my early genetic maps that I was working with had like 30 features on a chromosome. And I... kind of blew my mind when this thing called a fingerprint contig came along and now all of a sudden I have 300 contig, you know, features on a contig. And I'm like, oh wow, that's like 10 times greater. And then we started getting bigger maps that where it was like 3000. I mean, it just kept growing by orders of magnitude, right? And all of a sudden I'm just swimming. I'm trying to cram this data into a MySQL database and I've got 50 million rows and like. I'm just starting to scratch the surface. And I'm like, I was just like, it was blowing on my mind how much more data I was getting every year, year after year. But so with the advent of sequencing technologies, biology has become this huge data science. And has a huge need for people who are trained to computer science. And so you get this thing called bioinformatics, which sounds much more, which sounds cooler in French bioinformatique. because informatique in French means computer science. So bioinformatics is the combination, and usually what you'll find is someone like Lincoln who is trained as a biologist who learned programming, or someone like me who, it's pretty loose to say I was trained in computer science, but I came from computer science and then learned the biology. Nowadays you have young whippersnappers going to school and studying both, right? Like, their but like when i was in high school this was an even a field right now i graduated nineteen ninety so there was no such thing as bioinformatics that they grew in the nineties uh... and we can basically help create the field so now you have people who like all by from access to cool feel they go and they study like a double major in biology and computer science and they kind of bridge the gap themselves you know through their learning sometimes you actually have bioinformatics programs at some place where they they're they're formally saying here's the biology and here's how the computer science and here's how they influence each other but a lot of times you just you're learning each each field individually and you kind of synthesize it yourself so that's i had no idea what bioinformatics was i fell into it i got the job in the lab because i was a web developer because i was a pro guy and uh... and i do databases and so uh... lincoln asked me to create this thing kind of like a genome browser. Genome browsers generally show you horizontally a small section of a chromosome. He wanted to see vertically whole chromosomes laid out next to each other with large scale rearrangements. So you'll see this in chromosomes where, like you probably know this, when the mom and dad get together, there's a bunch of recombination of, you know, to create. So there's a mixing. Like you get your mom's nose and your dad's eyes. That's how that happens. And so you get this genetic rearrangements all the time in plants. And so he wanted to be able to see at a high level, this section of this chromosome got split between these two other chromosomes and got returned upside down. And all these things happen. And I was like, it was the hardest thing I'd ever done in my life. I couldn't believe what he was asking me to do. And the problem was, is that he could have done everything he was asking me to do 10 times better and five times faster. but he was a busy guy and he just needed some, he needed me to do it. And so I was just like, you know, it can be difficult working for like certified geniuses. That's what I'm gonna say. So it was extremely hard. I worked my tail off for many years just to get my head above water. And then after like, I got comfortable with the domain and the comfort is still. I'm not I barely comfortable. I mean, it's such a deep field that I always, always have imposter syndrome. And I think that's a And I talked about this with my current boss, like having been a self taught programmer, having been a self taught biologist kind of person. I just feel like there. I have always felt there are huge gaps in my understanding and I always even to this day after 25 years of programming and publishing three books. I still have imposter syndrome. So But that's how I got into this and you get to a point where you're like, oh, I'm actually contributing to this field now. I don't want to go back to writing customer order systems and working in just basic industry. So I decided to stick with this. I got my master's degree while I was working, master of science while I was working at the University of Arizona. That really helped to fill in, to backfill those gaps. I had never had a statistics class. And so, you know, in science, we talk about p-values all the time. And I was like, I have no idea what a p-value is. And so now I do. It's really cool. So that's how I've gotten to where I am now. But even in my role now, like, I don't do research. I help researchers who have a question. And I implement that for them. But I'm constantly going back to a researcher and saying, now, what did you mean by this? When you, you know, how do you want this transformed? I still have a very, very basic understanding of the data that I work with.
 
Michael_Berk:
What would you say are the three most valuable skills for someone in your role? Someone who works with a researcher then builds technical solutions for that researcher.
 
 Ken Youens_Clark:
You know, there was a guy who I worked with who was a far better programmer than me. But one of the things that he always insisted on was a spec document before he would start writing. And I get it. I understand. And actually, I tried to get a job teaching at the University of Arizona in their software engineering program. And one of the things they dinged me on was the fact that I never brought up a spec document. Like, where does that figure into, you know, how you would teach? I'm like, you know what? In my career, I've basically never had a spec document. Not really. I think one of the most valuable things that I have relied on in my career is prototyping quickly. So somebody says, here's something I want. And it's generally hand-waving, vague, maybe an email, a lot of times a phone call. Here's kind of what I think I want. Here's some input files. Okay, I will implement something that does, you know, the MVP, minimal viable product. What is the simplest, quickest thing that I can deliver that does the first step? And then iterate from there. Actually, I was, there's some interesting people I follow on Twitter and somebody was just like, stop trying to come up with software estimates. It's just, it's a losing proposition. Just basically, I guess, you know, I didn't get deep into what they were... suggesting, but I think it was basically iterating, like, try to create something that does something. Now go from there. And so I would say number one, that's the, that's probably what has made me successful, is that I'm willing to just go at all them. And if I write something that's not even in the ballpark, okay, throw it away. I spent two days on that. It's not a big deal. I'm not going to go six months down a rabbit hole, and then come back with something that doesn't work, right? We're just, I'm just going to go back to the to the to the person who has the domain knowledge and I'm going to say, here's what I've got. Let me show it to you. A lot of times with documentation with pictures with graphs and and and then we can figure out. Am I on the right track. And so that Also just just being able to communicate with people. it whether it like however they're comfortable on that could be email uh... in my current situation like my bosses and as long as you saw there is on a my bosses in dublin ireland uh... we're working with this could this uh... company that was in in australia and so we were like literally spanning the globe up from like uh... trying to figure out like communication like i would talk to my boss he would talk to the customer and would come back to me and uh... and so you know if it i've never spoken to the customer directly at you know, so I'm getting all this through an intermediary, but you know, whether that's chat or email or phone call or video call, just being able to talk to somebody, hear what they're trying to like, I think it comes down to storytelling, really, going back to my roots as, you know, in literature, there's a story you're trying to tell. And we talk about this a lot with data in science, like, okay, we have this data set, one of the best classes that I took in my masters, essentially taught me how to read a scientific paper. And really, in learning how to read a paper, we started with the graphs. And then I realized that's where you start writing a scientific paper a lot of times, is that you're trying to create a picture that is the story that you're telling. And then you fill it in with words. So like, what is the purpose of this thing that we're trying to create? And that's, it's kind of almost like therapy, right? It's like, you think this is what your problem is but not really like uh... actually uh... personal story and and i think this is important uh... just in a general point i was really struggling with mental health for for several years uh... it was it was difficult i had became convinced that i was depressed like i had depression so i finally sought a psychiatrist said i think i'm depressed and he's he said describe what you're going through and i did and he's like he's he came back he said And because of this, your personality type, and I think this pill, it was an SSRI, selective serotonin reuptake inhibitor, is that I think this will help. And it did help. And so, just as a side note, I think a lot of us in this field are probably somewhere, maybe on the autism spectrum, maybe struggle with like, you know, intimacy, working with people. And so, if anyone listening to this is having problems, you know, please reach out and talk to somebody. But, You know sometimes you think the problem is one thing and somebody can help you see that it's actually a different problem and that there's a different solution that you have seen before so you know sometimes the somebody may come to you there like I want this program written and you're like. What do you what is the problem you're trying to solve and then, when you understand that problem when you understand the story then maybe you realize oh there's a much simpler thing like oh we can just put this into a database and that's an SQL query right. You know I I work with somebody who's not. like an SQL person and so they do a lot of stuff with pandas and data frames and stuff like that. And I look at this code and I'm like, that's really, really complex. I think it's much, much simpler if we just load it into a database and use SQL. So, you know, so the first one was prototyping quickly and just like using that to refine what people are looking for. The other is like communication, like try to ask, try to figure out what is the thing that's trying to be solved here. Yeah, third, I don't know, just I talk a lot. Is that useful? I mean, I have no problem expressing myself. And maybe that's rare in this field. You know, I was not very comfortable when I first got up in front of a classroom. And this is an odd thing about me. You certainly wouldn't know, but I am actually an ordained minister. And the reason is because my sister-in-law was getting married several years ago and she asked me to perform her ceremony. So I did an internet ordination. You know, it's nothing serious, but I performed her wedding. And I was very nervous. I had never done that. I had never done any acting, but I'd always kind of been like enamored of it. And so I did the performance of the ceremony. the people at the bed breakfast where where happened they said hey you were terrible would you like to be uh... you know uh... a person who does this at our place i don't know i guess short and so i'd just always act i'd i approached it like it was a role like it was something i was i was acting and so uh... uh... i i kind of forced myself when i got into the classroom to just okay i'm pretending like i'm a person who knows what the hell they're doing uh... who knows how to teach which i did not at all there was so much to learn with teaching uh... and so uh... maybe the willingness to cut up put myself in very uncomfortable situations like honestly going to work for cold spring harbor and learning biology and doing that first piece of software for lincoln that was uh... that was huge i mean i can't tell you how how stupid i felt and and and i have been willing to make myself feel extremely stupid over and over again uh... because i'm just willing to essentially get in a over my head and just trust that i can i can swim
 
Michael_Berk:
Yeah, I could not agree more on the Getting out of your comfort zone comments you you mentioned imposter syndrome And I think that if you don't have imposter syndrome, you're not going hard enough You should be surrounded by people that make you realize that you have so much to learn Of course you can if you're the type of person that likes a steady job where you can coast All like more power to you But I often find that if I am not feeling like the dumbest person in the room at least the majority of the time, I get bored and it's really hard to maintain interest. And also the fastest way to grow is to surround yourself by talent and by really skilled people that know what they're doing. When you make a mistake, I'll never forget on my first internship I had a call with my boss, like an intro call, and he was like, oh, what have you worked on in the past? And I was like, oh, I've coded in Python and R, I use numpy a lot, mispronouncing numpy. he just slightly corrected me and I will never forget the feeling of embarrassment. And I just didn't know because I had always read docs. I'd never said the word numpy before. So being able to get out of your comfort zone is an incredibly valuable skill, especially as you said for people who tend to be more introverted and technically minded and maybe less social. If you're willing to step out of that stereotype, you really do bring a lot more to the game than a lot of your colleagues.
 
 Ken Youens_Clark:
Yeah. I, uh,
 
Michael_Berk:
And
 
 Ken Youens_Clark:
oh, no, I'm sorry.
 
Michael_Berk:
sorry,
 
 Ken Youens_Clark:
No,
 
Michael_Berk:
go
 
 Ken Youens_Clark:
go
 
Michael_Berk:
ahead.
 
 Ken Youens_Clark:
ahead.
 
Michael_Berk:
I was just gonna recap the other two. It sounded like prototyping is a very valuable skill that I am trying to work on as well. It sounds like the main value add from prototyping that you mentioned is that you don't waste time. Is that correct?
 
 Ken Youens_Clark:
Waste as little time as possible. So, you know, I have a way that I now, like I work probably in Python like 90% of the time and have for several years. There was a point at which I was like, okay, Perl had its day, but scientific computing is pretty much dominated with Python. So I moved all my tooling over to Python. It took me a while to figure out how, because I had a style in Perl and I had to figure out what my style was in Python. I have a actually a PyPy module called new.py. You can install it with new-py, but that's how I start every single command line program, which is 100% of what I do in Python is command line programming. And so I use that to start and it always has the same structure. You know, I use art parse and then I do stuff with the arguments. So the first, you know, the first art parse is to me kind of like the... The shield right to it. It's it's it's the interface to the command line now and make sure I getting some decent inputs to my program. So I have that. And then I say, okay. The first thing I need is this input file and maybe this variable and I write that and I do something with it. Maybe I write some tasks or maybe I just kind of eyeball it like okay This seems to work. And then I add the next thing I don't write hundreds of lines of code and then run it. I mean, I write something I had two or three lines of code. and I run it and I look at it. And this is how I try to teach my students too. You know, it's amazing to me how you will get a new programmer who will sit there and write 50 lines of code and at no point, you know, syntax check or just see if it runs or use a code formatter to clean it up. I mean, they just write a bunch of stuff and then they have 16 errors to fix. So. Try to get to where you're just constantly iterating. You're running it, you're checking it. Oh, make files. Oh my God. If anyone hasn't explored just using make files to automate like, okay, I've got this complicated command line structure with all these arguments, just put that into a make file, call it run, put it at the top, and then you just type make, and it runs your program with all those things. And you look at it, and that's how I automate my testing too, make test, and it runs all these tests for me. And so, yeah, you just come up with something that uses just a portion of the inputs and does something, then add the next feature, then add the next feature, and constantly go back and see, are you on the right track? But I would never spend more than a couple of days working on something before I would go back and ask my boss if I'm on the right track.
 
Michael_Berk:
Yeah, and for a lot of people, rabbit holes are fun, so having the discipline is very admirable. I've gotten stuck down week-long rabbit holes and then look up and realize I've done absolutely nothing and just wasted a bunch of time. So
 
 Ken Youens_Clark:
Yeah,
 
Michael_Berk:
kudos to you.
 
 Ken Youens_Clark:
it wasn't wasteful. You probably learned something. Maybe if
 
Michael_Berk:
Oh
 
 Ken Youens_Clark:
nothing
 
Michael_Berk:
yeah, fair.
 
 Ken Youens_Clark:
else, you learned not to spend a week before you raise your head.
 
Michael_Berk:
Yeah, yeah, we'll go with that. Cool so that was point number one and then finally we talked about the third point but point number two was communicating and it seemed like... the angle for communication from your perspective is to find the most efficient path to value and determining value because often a lot of the stakeholders that you're working with don't know exactly what they want or even what is out there in terms of the solution space so if you can communicate read between the lines find their problem and then use your knowledge to try to solve that that's a really valuable skill. Did I interpret that
 
 Ken Youens_Clark:
Yeah,
 
Michael_Berk:
correctly?
 
 Ken Youens_Clark:
yeah, I would say that's it. Yeah, it is. It is amazing. Actually, even in the bioinformatics space, I am now working with. So when I was in academia, I was mostly, you know, I was really working with researchers, principal investigators, people who write grants, and they're grad students or their undergrads even sometimes. And so like the the scale of problems that I was working on was kind of small. And now I'm working globally with, you know, companies and all across the world, and of all different kinds of sizes, from huge, huge mega pharmaceutical companies to small kind of startups with maybe a dozen scientists. And they have a lab technique, but they have no idea computationally what's available. They're like, OK, we've got this data, and this is what we want to discover from it. But we haven't read any papers, and we don't know what any of the tools are. And you're just like, oh my god, I'm not here to do your science for you. but still i i mean i am here to hold your hand to give you an idea so you're like okay so what is it you think you want and and i and it's my job you know especially in a pre-sales role to throw something together show them how it works on the platform and say okay here's your input file you gave me here's the output that i can generate for you are you ready to sign a contract right uh... and we can go from there we have professional services who who have you know people with p h t is who can put the stuff together better than i can but Yeah, I deal with that a lot. People who really just have no idea what is out there in the solution space certainly don't know a thing about cloud computing or databases or any of that. They just, they have something and they think they want something and like, can you help me get there? And you're like, for the right price, yeah, I can. You know?
 
Michael_Berk:
Yeah, it sounds like a really fascinating role and a mix of a lot of different skills Cool so we are Running a little bit low on time so I figured we could start wrapping up any concluding thoughts any Books you wanted to mention slash we mention any ways that people can get in contact with you if they so choose
 
 Ken Youens_Clark:
uh... you know uh... it's funny i put my email address uh... in all my books like just in the source code it's i it's everywhere k like like a gmail dot com i don't care and and not not that many people write me so it's not a big deal i i don't really uh... no one's ever abused it i'm on twitter k y c l four r k i was just a little bit late getting to twitter so i had uh... substitute the the a for the four but i'm k like a lot of going to get a lot of them can like are basically everywhere that i could get it uh... you know so i've written to have a good three books in there and they're really all about testing and i think that that uh... no matter what you do know what what language are working and uh... you should consider writing tests uh... i think it it helps you big problems in the small problems and and and have confidence in your code a and and and test test i was actually having a discussion on twitter uh... the other day and they were complaining because they were trying to add something to an existing database code base that had a bunch of tests but a bunch of the test were wrong they kind of sucked and like code has to be refactored and so to test i mean you it's it's constant you know they they co-evolve with each other if you're not testing honestly what are you doing like how how do you know that your program works uh... it's it's unconscionable to especially in the scientific realm has no tests at all. So test, oh my God, please just write a test. I was working with this one guy at the university and like, it just, it took like six months. I was like, dude, write a test. And I kept showing him like with his code, I'm like, here, I wrote a test. This function doesn't do what you think it does. And he was like, oh, I'm like, doesn't this bother you? Oh my God. So. I don't know, I guess that that would be the thing I would say. And if you don't know how to write tests, I've written three books, two in Python and one in Rust, that will show you how to get started with that. But the ideas you can take to every, any other language. And honestly, if you were to read, if you were to read the Python book and one of the Python books and one of the Rust books, you would see I'm doing the exact same thing. I just, I just cargo-culted those ideas over. I'm like, okay, how do I get started parsing command line arguments using a standard module and how do I test? And that's essentially like 80% of what I do.
 
Michael_Berk:
You hear that people? Test your code. Please.
 
 Ken Youens_Clark:
for the love of God, tester code.
 
Michael_Berk:
For the love of God. Alright, well this has been really fun. It's been Michael Burke and Ken Youans-Clark. And we will see you guys next time. Thanks.
 
 Ken Youens_Clark:
Thank you.
Album Art
Bioinformatics and Programming with Ken Youens-Clark - ML 082
0:00
53:40
Playback Speed: