133 RR Threading with Emily Stolfo
The Rogues discuss threading with Emily Stolfo.
Special Guests:
Emily Stolfo
Show Notes
The Rogues discuss threading with Emily Stolfo.
Special Guest: Emily Stolfo.
Transcript
CHUCK:
Hello.
EMILY:
Hello.
[Electronic buzzing]
EMILY:
Is my audio looking okay? [Electronic buzzing]
JAMES:
Yes, I believe it’s fine. [Laughter]
CHUCK:
That’s got to be on purpose. [Electronic buzzing]
CHUCK:
[Laughs]
DAVID:
Okay, so that was three cheeseburgers. One with no pickle, no onion, one with extra cheese, and…Okay, please move forward to the next window.
[Hosting and bandwidth provided by the Blue Box Group. Check them out at BlueBox.net.]
[This podcast is sponsored by New Relic. To track and optimize your application performance, go to RubyRogues.com/NewRelic.]
[This episode is sponsored by Code Climate. Raise the visibility of quality within your team with Code Climate and start shipping better code faster. Try it free at RubyRogues.com/CodeClimate.]
[This episode is sponsored by SendGrid, the leader in transactional email and email deliverability. SendGrid helps eliminate the cost and complexity of owning and maintaining your own email infrastructure by handling ISP monitoring, DKIM, SPF, feedback loops, whitelabeling, link customization and more. If you’d rather focus on your business than on scaling your email infrastructure, then visit www.SendGrid.com.]
CHUCK:
Hey everybody and welcome to episode 133 of the Ruby Rogues podcast. This week on our panel, we have Avdi Grimm.
AVDI:
Hello from Pennsylvania.
CHUCK:
James Edward Gray.
JAMES:
I can’t tell if I’m mostly dead or all dead.
CHUCK:
David Brady.
DAVID:
Some programmers, when faced with a problem, say, “I know, I’ll use threads.” Now problems, [hey, they too have]. [Laughter]
CHUCK:
I’m Charles Max Wood from DevChat.TV. And I [makes electronic buzzing sounds]. We also have a special guest this week and that’s Emily Stolfo.
EMILY:
Hello.
CHUCK:
Emily, do you want to introduce yourself really quickly?
EMILY:
Sure. I work at MongoDB in New York. I co-maintain the Ruby driver to the database. And I do a lot of community development in the Ruby community. And I also teach at Columbia University as an interim professor. I teach Ruby on Rails, but I’m actually not teaching this semester. So, I gave myself a little bit of a break, but I’m teaching again next semester.
CHUCK:
Cool.
JAMES:
So Emily, that actually came up in Josh’s introduction to your talk at GoGaRuCo and it never got an answer. You teach Rails at a college? That’s great.
EMILY:
Yeah, exactly. So, this class at Columbia is the only class that they teach that touches on the topic of web development in a practical, hands-on way. And so, I went to Columbia and I kept in contact with a bunch of professors. And one emailed me about a year and a half ago and asked me if I’d be open to teaching this class. And I saw it as an opportunity for me to deepen my own knowledge in Rails and also to sort of like give back to Columbia in a non-financial way.
So, I ended up taking the opportunity and it was, at first, really challenging because I never really taught anything of that kind of depth or length before. But it turned out to be super fun and I’ve done it three semesters now. And I saw it as a really cool way to bring in my professional experience to the undergraduate curriculum because I know that when I was an undergraduate at Columbia, I would have loved to have had the same sort of opportunity.
Yeah, so there is this class, Ruby on Rails, at Columbia. But it is a half a semester long. It’s only six weeks. It meets once a week for two hours. I try to hold office hours as often as possible, mostly on Sunday afternoons and try to make myself as available as possible to the students. So, there is some work to be done in terms of incorporating more content and just really devoting more course time to some of the things that people are actually doing when they graduate from Columbia with a Computer Science degree.
JAMES:
I think that’s super cool.
CHUCK:
Yeah, teaching college students real-world skills? I’m not sure what to do with that.
EMILY:
Yeah, it’s really interesting.
DAVID:
Yeah, I have no response. [Laughter]
EMILY:
It’s really interesting. The focus at Columbia is very much on theory and on Computer Science with a capital C and S which I think is really good. You can’t expect academia to change with every trend in the industry or really just revamp its curriculum every year just because of resources. And you can’t bring in new professors every year. And the foundations also are really important, I think. At the same time, I noticed a considerable lack in sort of like sense of resourcefulness that we have as hackers or whatever you want to call us, like learning things on our own, forging ahead, and not being afraid to make mistakes. And so, I tried to focus more on that in my class because those are skills are sort of a way of thinking that will last beyond just the six weeks that I can talk about MVC.
JAMES:
Yeah, that’s a great point. The mindset in general, right?
EMILY:
Yeah, exactly. I call them hacker habits. I try to teach them. I gave a talk at RailsConf last year on hacker habits.
DAVID:
I think there’s a good overlap there though, right? Because in Rails, we talk to hackers and we basically say your Active Record queries are going really, really slow because you’re loading up all the users, or you’re loading up a user and then you go load up all of their friends or whatever. And we call this the n+1 error. And if you just replace it with an include statement and load up all of their associations, it’s now just a single query. And when I talk to people with kind of an autodidactic hacker background, I explain to them that it’s n+1 and it’s this query and all these queries and you can reduce it to a single query. And if somebody’s got a CS background, I can just point to the two things and say, this is O(n) and this is O(k). And they go, “Oh, got it.”
CHUCK:
Yeah.
EMILY:
Yeah, exactly. They’re complementary, like having a foundation and the resourcefulness or hacker mindset I think. They’re not mutually exclusive. And I think both academia and the hacker culture, all of these sort of hacker school structures that we have, whether it’s online learning or in person, like Flatiron School or General Assembly, they could learn from each other because they really do complement each other, these two knowledges.
JAMES:
I’m very sure you’re right. We asked to have you on today. You gave this cool talk at GoGaRuCo about threading, which I though was kind of brave to tackle one of the hardest of the hard topics with like 25 minutes.
EMILY:
Yeah. I had an experience debugging a concurrency issue earlier this year and I thought that it was really hard. And I wanted to tell everybody what I learned to try to prevent that happening to anybody else. And also, I saw a talk by José Valim at RubyKaigi and he was talking about thread semantics in the different Ruby implementations. And he basically said at the end, one of the things that we have to do is educate ourselves more on this topic. And I’m not at all an expert but I did have that experience and I thought, “Hey, I might as well just talk about it and see if I could somehow prevent some of these horrible experiences for other people.”
JAMES:
Yeah, I think we’ve kind of all run into threading at some point and stumbled through the problem and then spent a lot of time after that trying to forget that it ever happened, right?
EMILY:
Exactly. [Chuckles]
AVDI:
So, threads work just great in Ruby, right?
JAMES:
[Chuckles]
CHUCK:
Well and the answer to all of your threading problems is to add a global VM lock, right?
EMILY:
Right, exactly. [Chuckles] So, when we were debugging this issue in the driver like over a year ago, we realized that we didn’t have the problem if we just put a mutex around the entire write and read off of a socket. And we’re like, “Oh that’s great. Let’s just keep the mutex there.” [Chuckles] And then, we’re like, “That’s so ridiculous. We can’t do that.” Yeah, it’s not fun. But I learned a lot in the process.
Basically the issue was stemming from the fact that the threading model in JRuby changed over the last two years. And when we first started supporting JRuby in our driver, we had made some assumptions based on the semantics, but then the semantics changed and we needed to update our code to realize or sort of protect ourselves from the fact that JRuby does use native threads as opposed to Ruby 1.8 which uses one single thread and Ruby 1.9 which uses multiple native threads but you can’t actually have parallelism because of the global interpreter lock.
JAMES:
Alright, so let’s talk about those three models. What’s the difference in how they work?
EMILY:
Sure. So, I used this metaphor in my talk at GoGaRuCo that’s available online, if you want to watch it. So, I came up with this idea that threads are like music. And I do go in detail about this metaphor. If you equate a conductor with the GIL and instruments with a thread and notes as things that actually need to be executed, you could say that Ruby 1.8 has one conductor, one instrument that can be used and multiple notes that need to be played. So, the conductor will say to this one instrument, “You play this note, this note, this note, this note.” So, those notes are played serially, one after another. You can’t actually have a chord and you can only make use of one single instrument.
Then Ruby 1.9, so we still refer to it as MRI but it’s YARV. It’s different from 1.8. You can use multiple instruments but you still can only play one after another. So, you still have a conductor that says, “This instrument plays this note, then this note, then this note, then this note.” You still have one note being played one after another and no sense of a chord being played. However, you are able to use these different native threads or different instruments.
And then JRuby, you can actually have chords because there is no GIL. There is no conductor. So, multiple instruments, multiple notes being played at the same time by these different instruments. So, if we were to draw that back to threading, JRuby uses native threads that can actually be executing things at the same time on different cores on your computer, where 1.9 can use different cores but two threads are not actually executing code in parallel.
JAMES:
Yeah. I think this is one of the most confusing things in all of Ruby and you just did a great job of cutting through it. But I think it was a long time before I even understood the ‘why’ to all of this. Like, why would you have threads if you can never execute two things at once? And the answer to that is that in MRI, there are lots of cases where it knows that it’s safe to let go of the lock for a little while and put a thread away and go to something else. And so, the most common of those cases being I/O.
So, if you have to fetch 10 web pages or something, and you could make a request, read the page, make a request, read the page, et cetera. Or you could thread them all, in which case the requests will be made in parallel. But that biggest portion of making a request over the internet is typically sitting there and waiting on some data to come back from the network. And in those cases, MRI is very smart and realizes it’s about to block on an I/O call, so it suspends that thread and goes to another one. And so, it gets all those requests fired off and then as they start to come back in, it regrabs the global interpreter lock and begins executing that code again.
So, you can do some things in parallel because MRI is intelligent and lets go of it. But if you’re just doing straight up computationally intensive running the CPU-type stuff, then you cannot because the GIL, like you said, won’t allow more than one instrument to play at a time. Very confusing, I think.
EMILY:
Right, exactly, yeah. Any kind of I/O is pretty obvious to the interpreter so it will release the lock, for example two files being opened at the same time. Yeah. But as you said, it wouldn’t make sense to only use one thread for that because if you’re blocking on I/O, then other threads will be waiting for you to wait for another external resource.
JAMES:
And the reasoning that the MRI core team gives for keeping the global interpreter lock, especially now the [inaudible] implementation’s kind of moving away from it, is that it simplifies the writing of C extensions. And there are actually a lot of C extensions in the Ruby ecosystem that rely on that simplified model where they don’t have to worry about thread safety because Ruby won’t allow them to get into situations where that’s a problem. And so, that’s why the GIL has been kept. There’s definitely debate about that. In something like JRuby, they had to rewrite the C extensions anyway because it didn’t have the C API, so it wasn’t a super big price to pay to make them threadsafe as they were doing that. I think that’s why it was easier for them to move away from it, I think.
DAVID:
Didn’t Matz actually come out and say, “Threading is hard, so I’m going to leave it to other implementers to handle it.” I want to say somebody on the show mentioned that.
JAMES:
Matz has said things like he’s more of a UNIX process guy or something like that, I think.
DAVID:
Yeah.
JAMES:
And yeah, I think you’re right that basically he said that people who needed those kinds of optimizations and JRuby’s a good fit for them. But it’s interesting. It is a hard problem and it’s complicated. And so, now the situation is that when you do ‘Thread.new do’ ‘some code end’, if it’s running on MRI or running on JRuby then you could be playing with very different semantics.
DAVID:
Yeah.
EMILY:
Right. On the subject of Matz, I also think that in keeping with this philosophy or approach or, I guess, motivation for creating the Ruby language was that he wanted to focus on developer productivity and making things as simple as possible for the developer. And so, it sort of makes sense that in light of that, he wanted to reduce the complexity of writing thread-safe code for the developers. And so, of course, the GIL has a side effect when the code runs slower because you don’t actually have parallel execution in MRI. But it seems like people, Matz sort of favors developer simplicity and productivity over having -- he thinks that hardware will get faster and that’s sort of not really an issue.
DAVID:
Yeah.
JAMES:
Yeah, that’s a good point. So in your talk, you talked about these different semantics. JRuby behaves differently from MRI and all that. And so, you kind of came to this cool conclusion which was ‘there’s no such thing as thread-safe code’. Do you want to talk about that?
EMILY:
‘There is no such thing as thread-safe Ruby code’. And that’s what I wanted to emphasize that people need to understand or recognize and know a little bit about, the fact that Ruby has different implementations. So, you can look at some Ruby code and run it and have it sort of run in a different way depending on what implementation you’ve chosen to run it with. So, if you write a particular line of code and you’re running it on JRuby, you could potentially run into a concurrency issue where you wouldn’t run into that same exact code unchanged running on MRI.
And that’s the fundamental problem with saying there is such a thing as thread-safe Ruby code because what are you talking about? Which implementation of the Ruby language? Because they do have different semantics. And once you understand the difference in semantics, you can write thread-safe JRuby code or thread-safe MRI code. I wanted to emphasize the fact that you need to be specific with what implementation you’re talking about.
JAMES:
Yeah, Avdi has been doing this series of screencasts in Ruby Tapas on the threading. And it’s just amazing how deep the rabbit hole goes. How many episodes have you done on that, Avdi? It’s a ton.
AVDI:
It’s a lot. It’s a lot. I’m nowhere near done.
JAMES:
Yeah, it’s crazy.
EMILY:
What has your approach been? Have you broken it down by concurrency primitives?
AVDI:
It’s been mostly by primitives. I started out, I think, with mutexes and condition variables. I sort of structured a lot of it around, “Let’s rewrite Ruby’s Queue class.” So the Queue class is -- you could almost say it’s one of Ruby’s threading primitives. It’s one of the few thread-safe data structures that Ruby has.
EMILY:
I think it’s actually the only one, the only thread-safe data structure.
AVDI:
Yeah. Yeah, pretty much, just Queue and SizedQueue. And it’s really there specifically for the purpose of threading. So yeah, I was basically structuring it as, “Let’s rewrite the Queue class with some extra features.” And in order to do that, you have to tackle mutexes. You have to tackle condition variables in order to notify waiting threads when there’s a new object available on the queue or if you’re doing a SizedQueue, to notify other threads when there’s space available and so on and so forth. And it’s been interesting.
I did an episode called ‘Threads are hard’ where I basically just went over the code I’ve written in the previous couple of episodes and showed two or three ways that it was badly broken. And it was badly broken not because I wanted to demonstrate how you can screw things up in threads but because I just totally missed those cases and had to do some stress testing, some code inspection, before I realized what was going on especially if there’s some really weird stuff the way Ruby’s condition variables behave. And I don’t know. It’s a mess.
DAVID:
Avdi, isn’t it true that you started recording all of the threading Tapas at the same time and you’re just waiting for them to finish? [Laughter]
JAMES:
It’s crazy because when you do get into threads and you’re messing with these subtle interactions,
it can be really hard to, first of all, get it into a failure state to show the problem and then, second of all, debug it. Emily showed that really great in her talk. She opened with a demo and she just [inaudible] some code with ten threads and it ran fine in MRI and JRuby. But then as soon as she cranked it up to 200 threads, got enough interaction going on, then it fell apart in JRuby or whatever. And it’s just knowing how to get back to that case and how to get some visibility internally on these things that are happening separately so that you can understand how it fails. I find that very difficult.
AVDI:
Yeah, and one of the terrible things about debugging threads is it’s really prone to Heisenbugs.
CHUCK:
Heisenbugs?
AVDI:
So, a Heisenbug -- actually, I’ll just explain what happens and that will be a perfectly good definition of it. So, you’re debugging a multi-threaded application and you don’t really have good tools for running a proper debugger on it and switching between the threads in the debugger. So you think, “Okay, I’m just going to do some puts here. And in each of the threads, I’m going to do lots of puts. And I’m going to output the state of the thread and I’ll just look at the output and that will tell me what’s going on.” And then, you run it and the bug doesn’t happen anymore.
CHUCK:
[Laughs]
AVDI:
And the bug doesn’t happen anymore because puts is I/O and puts is implemented in C. And so, as soon as it hits that puts, Ruby has an opportunity to schedule a thread. And so, you’ve actually changed the scheduling of your threads by inserting those puts lines. So, that’s a Heisenbug. That is a bug whose behavior changes when you try to observe it.
DAVID:
A wonderful example that I got shown very, very early on was a C program where I had this tight loop of ten things. And all the loop did was seed the random number generator with the current time and then pick a random number and put it out to the console. And the class was teaching us step through things in a debugger first. So, I’m stepping through the debugger step, step, step, step. And I get a different random number every single time. And then the instructor says, “Okay, cool. Now, run it.” And I ran it and I got the same ten numbers. And then he just kind of smiled and said, “Now, figure out why.”
And it took me forever to figure out that srand, or not srand, time was returning seconds. And the act of stepping through the debugger, the act of the computer waiting for me to step through the debugger, was causing the inexorable flow of time to increment the time call. So, we’d get a different srand value each time. And in the debugger, you would get ten different numbers because it took you ten different seconds to get there. But when you ran it straight through, it ran in under a second. And you were home free.
And you see Heisenbugs like this when you’re playing with threading a lot because you start probing them and then you find out that they’re time-dependent. And if you stop one thread but not the other, the time-dependent stuff now works because it’s had enough time to operate.
EMILY:
And the other major thing is that often when you have a concurrency issue, it’s manifested later on.
JAMES:
Right.
EMILY:
Like [crosstalk] for example. And that makes it so difficult to even pinpoint the part in your code where you’re having concurrency issues.
JAMES:
It’s not like that part generally fails, right?
EMILY:
Right, exactly. And so, there’s no stack trace. You can’t do anything besides try in your head to recreate situations.
JAMES:
One thing that can help with that a little, if a thread is actually dying, Ruby has this concept called abort on exception and it’s off by default. So by default, if you have an exception somewhere in your not main thread, then that thread just silently dies. And then later, when you join or try to get the value of that thread or whatever, you get the exception at that point which is, as Emily’s saying, typically far away from the problem that actually happened. You can either globally, doing ‘Thread.abort_on_exception = true’ or I believe you can also set it on individual threads. You can do ‘Thread.current.abort_on_exception = true’ I think [inaudible]. Anyway, once this flag has been flipped then if a thread dies with an exception, it will bring the entire Ruby interpreter down right then with a stack trace. Sometimes, that can help a little bit, but as a complicated problem.
CHUCK:
I’ve used that in conjunction with conditionally then raising an exception somewhere in the thread to try and guess at what might be causing the problem, if it’s not something that makes the thread blow up and die. Because sometimes, you just get funky behavior instead of actual problem or exception in the thread.
JAMES:
Right.
AVDI:
Yeah, one of the things that I have been hammering on in this series that I’ve been doing is, if at all possible, find ways to make your thread crash and die visibly when things are going wrong, at least when you’re running in development mode rather than allowing them to proceed and silently do bad things.
CHUCK:
Is there a reason why you wouldn’t want that to be default, end the program on exception? I remember somebody wrote a book and in the book he said that if you get an exception, it means you want the program to die.
JAMES:
Yeah, I don’t think…
AVDI:
Yes and no.
JAMES:
Yeah. If you take something like the Erlang actor model, the idea there is if something’s going to fail, fine just let it fail and then replace it with something healthy or things like that. And as far as catching exceptions, if you want to actually catch it and handle it, then it has to be at a known point, right? Whereas threads are going through in parallel, you don’t know where it’s going to be at any given time. Whereas you know if Ruby silently swallows that exception but then gives it back to you when you call join or call for the value or whatever, you can do your error handling at that time. At a known point, you’ll be able to do it. So, I think I understand why it is the way it is by default.
AVDI:
Yeah, in Elixir or Erlang, if you can monitor other processes that you spawn or even other processes that you don’t spawn, they’re not going to slit the throat of the monitoring process at any point. The monitoring process has to actually do a receive and note that it received the exit message and do something about that, which is kind of the way you want to handle things.
EMILY:
Yeah, and if you’re using a lot of threads to do I/O, by definition you’re dependent on another thing. And if there’s something wrong with that other thing, you don’t necessarily always want to kill your own process because of that or you want to give maybe some other thread a chance to do the same thing or try again. So for example, if you’re connecting to a database and one thread gets some kind of network error, it doesn’t necessarily mean that your database is down. Maybe that one thread’s one of 200 and encountered the one little network blip. But that doesn’t mean the whole thing should stop.
AVDI:
Right.
CHUCK:
Right. So, are there any other tips that you guys have for debugging multi-threaded processes? Because my next question is how do you make your program thread-safe but if you have more stuff on debugging, that would be something we should probably keep going on?
JAMES:
Emily, you had a good part in your talk where you talked about the different kinds of shared data and which ones are bad. You want to go into that a little?
EMILY:
Yeah. As a general rule, I think I took this off of the JRuby readme or something. But in general, avoid concurrency if you can because it’s just, as soon as you engage with it, it becomes really complex. And if you really have to, then if you really need to use shared global data then try to make that shared global data immutable. But if you can’t avoid that, then your shared global data is going to be immutable and you’ll need to familiarize yourself with patterns or approaches or concurrency primitives which we’ve gone over a couple, like condition variable and mutexes and queues, most that are the ones that you would use in Ruby.
So, other languages like Java for example have a ton more that you can use and a lot of other options. I guess if you have the checklist and you get to the point where you have mutable global things and not just variables, things like the AST, for example. If you’re dynamically defining methods, and then constants and class variables and methods, you’re going to have to familiarize yourself again with these patterns or concurrency primitives in order to work around some of those potential issues in your code.
JAMES:
So, I guess one good answer for debugging is if you at all can, use immutable shared state, because it makes it easier to debug. If the state can’t be changing out from underneath you, then that simplifies everything drastically.
EMILY:
If you have shared immutable state, then you’re not going to have any concurrency issues because you’re never going to have threads fighting for the same resource or trying to -- you’re never going to have a race condition because this thing is not changing over time. So as a rule, that’s probably the golden state. You don’t even have to think about any of this stuff.
AVDI:
Yeah. And definitely, if you’re communicating information between threads, try to do it as messages over queues rather than as ‘I update this global thing’ and then somehow, tell another thread to read that global thing.
EMILY:
I think Avdi, you did this actually as an example. You can build your own queue if you want using condition variables or your own actor model. But Queue is there available to you in Ruby. And as we’ve said before, it’s the only data structure that’s thread-safe in Ruby. So in MRI you might as well use it instead of trying to futz around with condition variables and mutexes yourself. I didn’t actually talk about it so much in my presentation because I wanted -- as an educational thing, it’s more informative to really look at mutexes and condition variables but the Queue is there available for you if you need to use it.
AVDI:
Right. Yeah, definitely. If you’re writing an application, there probably should not be any use of condition variable or even mutex in your code. That’s really the stuff that belongs in middleware.
EMILY:
Yeah, exactly. I read this, I’m sure you guys have read it also. It’s by Jesse Storimer. He has this great little book on threading in Ruby. He makes the same point that -- actually, he referred to some other book in there. I don’t remember it off the top of my head. But he said that you have to think about keeping parts of your code, not having them go between high level and low level too much, that you should stick to one level. And I wish I could remember the reference. But yeah, so this is a case…
JAMES:
I think we’re talking about the same level of abstraction principle?
AVDI:
Abstraction, yeah.
EMILY:
Yeah, exactly. So yeah, if you apply the same principle to using Mutex and ConditionVariable, usually those kinds of things should be in the gems that you’re using or should be in a particular area of your code. You should be writing, I don’t know. I don’t see how you would be using this so much at the application level.
DAVID:
Yeah. Can I challenge that a little bit?
EMILY:
Sure.
DAVID:
I don’t want to challenge so much as I want to ask because I recognize that I’m the weirdo on the show. But I came from a C, C++ background, did a lot of threading, a lot of UI work. And threading, how do I say this without sounding asinine? Threading isn’t hard for me. And I apologize in advance for sounding like an arrogant jerk. The reason threading isn’t hard for me is because I spent ten years hating it and living with it. You start to learn the patterns, like “You can’t do it that way. That’s going to bite you on the butt.” You just start to learn what the patterns for the thread stuff are. And as a result, I got very fluent with threading and it started to become one of my go-to things that I would reach for in my toolkit.
So, when I came to Ruby and I started writing interactive UI stuff, threading was the very first thing that I reached for. So for example, if you’re writing an IRC client, you’ve got to have one thread that’s going to take I/O from the user and another thread that’s taking I/O from the IRC server and a third thread, because those two threads are both going to block. You need a third thread to update the user interface.
AVDI:
Well, that’s one way to do it. [Electronic buzzing]
JAMES:
Uh oh. David’s audio just died again.
CHUCK:
Yeah.
JAMES:
You’re back to robot.
CHUCK:
[Laughs]
EMILY:
Really, you couldn’t have timed that better if you tried, Avdi [laughs].
CHUCK:
[Laughs]
EMILY:
You’re like, “I’m going to make an argument and now you can’t anymore.” [Laughs]
AVDI:
[Laughs]
JAMES:
That was amazing. Let this be a lesson to you. If you disagree with Avdi, he can shut you down.
EMILY: [Laughs]
AVDI:
And turn you into a Cyberman.
JAMES:
[Chuckles]
CHUCK:
Yeah.
DAVID:
Am I back?
CHUCK:
Yup.
JAMES:
Yes, you are.
DAVID:
Okay. So yeah, as I was saying, I’m so good at threading that my computer can’t even keep up. Oh wait, now I see in the back channel that Avdi did this to me. [Laughter]
DAVID:
So Avdi, you said that’s one way of doing it. And this is where I will switch from challenging to asking because like I said, threads are one of my go-to’s that I would just automatically reach for. How would you deal with an application that has two things that need to happen simultaneously that the Ruby libraries are going to block on and a third thing that needs to be happening at the same time? Or am I just wrong in assuming that those things still block?
AVDI:
Well, I might well use threads. But the point that I was making is that I might use threads for that or I might use a reactor model for that and just a single thread.
DAVID:
Oh, okay.
JAMES:
So, the reactor model being basically an event loop, right?
AVDI:
More or less. Usually, it’s built around select or something like UNIX select.
JAMES:
Right.
AVDI:
So, you have some sort of operating system primitive which will allow you to say, “Go to sleep but wait for any of these events to occur. When they occur, wake me up.” And the reactor model is a wrapper around that that says “Okay.” And then when I get woken up, I’m going to trigger various callbacks based on what event or events occurred while I was sleeping. You can basically structure. You can have all those events, the network I/O, the user I/O, and user interface stuff. You can have that all handled out of that single reactor.
JAMES:
It might be worth talking about this just a tiny bit. I used to do a lot with event-based programming because I used to work on MUDs for fun in my spare time. And in a MUD, you typically can’t do processes or threads because you typically have -- well threads are an option on modern hardware but if you’re on low [mem] servers, sometimes they weren’t a very big option because they were so heavyweight and you had so many people connected to the server at once that it was a large number of resources you would need to launch two threads per user, one for their input and their output.
AVDI:
And just to clarify here, the kind of resources we’re talking about is a stack that’s allocated for that thread.
JAMES:
Right. So to get around that, you would do what Avdi is saying and basically build a loop where you would run through every single player that’s currently connected to the MUD in the loop. And it would be like ‘any data waiting on your socket?’ and you’d get a ‘yes/no’. And then if it was yes, you could pull whatever data was available. Otherwise, you ran through the loop again and you got everybody that you could and process what you could.
But the downside to something like that, and I mean there are, but one of the big downsides to something like that is then you have things that are run in the main event loop or potentially outside of the main event loop sometimes, which can be okay. But then if they need to interact with the main event loops, then there obviously has to be some kind of synchronization to get that event back in the main event loop. And alternately, if everything is running in the main event loop, then nothing can take a long period of time because if you do something lengthy, you just stopped the world. And now, nobody’s paying attention to those sockets anymore or anything like that, right?
AVDI:
Right. So, what you usually see in reactor models, you actually see a hybrid model where the reactor is paired with a thread pool. And so, you have the concept of being able to defer an operation, which then goes off and does its things in its own little thread. But hopefully nothing that interferes with anything the reactor’s doing. And then when that finishes, that pushes an event back into the reactor loop.
DAVID:
Yeah. There was a class I was taking 10, 15 years ago and he was explaining this concept. Modern CPUs will actually support this at the op code level. You can register what’s called a deferred interrupt handler. And when you’re in a really tight, like a device driver loop, your interrupt handler is -- it’s getting radio frequency off of a wire or something that’s coming in real-time, which means it cannot not be ready for the next waveform coming off. It has to deal with the thing and then move on to the next thing and get ready. This loop has to be really tight.
And so, all you really have time for, it’s like we would be given a budget of 24 clock cycles or something like this. If you can do it in 24 clock cycles, it’s like getting things done. If you can do it in two minutes, just do it. Otherwise, schedule it. And if you can do it in 24 clock cycles, you can just do it. But if you can’t, then what you have to use your 24 clock cycles for is to build up a thing saying, “This was the thing I needed to work on,” and you push it on a thread pool somewhere as basically an interrupt handler that gets deferred.
And my favorite part about that class was that I had kind of an ADD moment. And he defined deferred interrupt handlers when I was spaced out. And so, I came back in and he was talking about deferred interrupt handlers and I had missed the definition. So, I’m taking notes furiously trying to infer it from background context. Finally, halfway through, he’d gone on about another five, ten minutes, I raised my hand and I said, “Well, what does deferred interrupt handler mean again?” And he looked at me and he said, “I’ll tell you after class,” and he went right on with the lecture. And everybody laughed. It was beautiful. [Laughter]
AVDI:
I just want to say one other thing about reactors before we move on because somebody might be wondering, “Okay, well why not use that?” And one of the other reasons not to use that, apart from the complexity of dealing with deferring long-running operations, is just that it can be a much harder model to hold in your head. The nice thing about a thread is that you have, every thread is basically its own program and it behaves exactly like any program you’ve ever written. It has instructions that run in sequence. Maybe it has some loops in there. It runs from beginning to end. It has its stack. And it’s the way you’ve always programmed.
Whereas reactor models, event models, you have these callbacks that are just occurring. And so, rather than being able to say, “Okay, here’s my main, and then the program proceeds from there to do this and it proceeds from there to do this. Instead, you have like, “Okay, well the processing may pop out of the core and go into this callback and then jump back down into the rabbit hole.” And then the processing will pop out of another rabbit hole and run around a bit and then jump back down into the rabbit hole. It’s a little bit harder to keep track of in your head.
JAMES:
With things like buffering, “Do I have any data from this socket?” “Yeah, I do.” “Okay, let me take the whole thing.” “Do I have a full line of data so I can interpret this command?” “Oh no, not yet. No new lines.” So I need to save this chunk somewhere and then next time I get data from that socket, I’ll combine that chunk I saved with this new chunk and see if I have a full line yet, right?
AVDI:
Right.
JAMES:
It’s crazy.
AVDI:
Whereas in the threaded model, that would just be a loop.
JAMES:
Right.
DAVID:
Yeah. IRC servers are notorious for giving you three and a half messages in a single read cycle.
AVDI:
[Chuckles] I want to hear about testing threads more.
JAMES:
Yeah, let’s talk about that.
DAVID:
Yeah. Emily? Thoughts?
EMILY:
Yeah, sure. So, we talked about debugging threading issues. But you can avoid some of the debugging if you test correctly. And first of all, you should test with more than one thread, obviously. So, you need to test with more than ten threads as well. For example, in my talk, I did the demo and I showed that you could run some code with ten threads and not see an issue. But if you increase that, there’s a point at which you have this threshold where you do actually have threads fighting for the same things or trying to update the same data at the same time. And you’ll see a runtime error. So, you have to make sure that first of all, you’re testing with enough threads. And what enough means, you have to sort of figure out. And you need to make sure that you’re testing the right scenarios in your code.
So, I talked a little bit about the Ruby driver in my presentation, that the Ruby driver has a shared global state or a shared global view of what the replica set is. So in MongoDB, you can have a replica set which is one primary, multiple secondaries, and you can only do certain operations on the primary, like writes for example. You can only do, you can’t do them on secondary. So the driver itself needs to know which nodes are of which type, so which nodes to send certain operations to. And because you have that, you can’t have each thread have a different view of the replica set so that there’s no choice. You have to have this shared global state. And it is immutable because you can have a situation in which one of your nodes goes down and then another one’s reelected as the primary. And if that other node comes back online, it’s then a secondary.
So given that you have this global shared state, the driver needs to test at some point. Or in our code, we need to test at some point that we do have [inaudible] and we do have threads at some point getting socket exceptions and it can’t talk to this particular node at a certain point and needs to refresh the connection. And that’s one example of a particular situation which doesn’t happen very often in practice, in reality. But most of our concurrency issues are going to happen during this time period because that’s where the shared global state is being updated.
And so, I tried to emphasize in my talk that you need to identify these areas in your code and really pound them with a lot of threads and really recreate these scenarios because for the most part, you’re not going to have concurrency issues. These particular edge cases or particular aligning of the stars that will make your code fail. So, there’s that. And then that’s also hard. So, coordinating a bunch of threads, however many you realize that you need to use in order to expose some of these concurrency issues, is really difficult.
So, there’s one thing that you can do. It’s called the rendezvous pattern which is where you have threads execute something and then you pause them in a particular point right before you would potentially have one of these scenarios or one of these situations. So, you pause all of them and that’s like the rendezvous point. And then you release them and you sort of watch them interact in the scenario. So, that’s exactly what we do in some of our tests. We spin up a lot of threads then we pause them and kill a node and then release them and let them continue on. Or have one thread do a find on the collection and then watch the connection get refreshed and all of the threads get new sockets. And make sure that that whole process happens correctly and there are no concurrency bugs that happen.
And so, as a general rule really, use enough threads. Make sure you identify these scenarios that could be problematic. Test them. Use patterns if you need more precision. And as a final thing, make sure that if people are using your code with JRuby that you test on JRuby because as we discussed before, Ruby implementations have different semantics. So just because something works in MRI doesn’t necessarily mean it’ll work in Rubinius or JRuby.
AVDI:
That’s very true. MRI lets you cheat a lot more. It’s the way I look at it. It insulates you a bit because the GIL is wrapped around a lot of core operations, particularly like the built-in Hash and Array collections and stuff like that. You can do a lot of stuff that seems thread-safe and then switch over to JRuby or Rubinius and discover that, “Whoops, that’s not thread-safe.”
EMILY:
Right. Yeah, and it’s really important to realize that early on.
AVDI:
Yeah.
EMILY:
And make sure you test on different implementations.
AVDI:
And it’s important to point out here that I think as far as all the implementers are concerned, it’s not that JRuby and Rubinius are wrong there, the fact that MRI currently wraps the GIL around all those operations is an implementation detail. And if you do depend on that, then you’re probably going to get disappointed in the future versions of MRI.
CHUCK:
Well, I have a question related to this and that is that if I’m not using Thread.new or anything like that, am I still in any danger of running into issues with threads? In other words, if I’m not explicitly calling out threads, are there situations that I can get into where this is still an issue?
JAMES:
Not…
AVDI:
Maybe [chuckles].
EMILY:
Yes. [Inaudible] or using your code. Like if you develop a gem. So the Ruby driver, we don’t spin up threads to do anything really. We provide this gem for people to use. People use the client, the Ruby driver. They instantiate a client and then you have them spin up threads and make requests using their threads. And then we have connection pooling, so we have one socket per thread. So, our code needs to be able to handle different threads having different sockets and needs to use thread locals and stuff like that. So, it depends on what you’re coding. Just because you’re not creating threads yourself doesn’t mean that whatever code your writing is not going to exist in, will be running in the context in which there are multiple threads using some global variable.
AVDI:
Right. Here’s an example from an episode I’m actually writing right now. Let’s say you are responsible for an object, some kind of object relational mapper or something like that. And we’ll call it SchmactiveRecord. And you’ve got…
CHUCK:
[Laughs]
JAMES:
I want that. [Laughter]
CHUCK:
Me too.
AVDI:
You’ll enjoy this episode. You’ve got a connection. You’ve got the concept. So, you’ve got all your little record classes like Person or Account or Product or whatever. And then all of them, they all refer to a database connection collaborator in order to put things into the database and pull things out of the database. And so, all of them somewhere deep in their CRUD operations, they say something like self.class.connection. And then .query or .transaction or something like that. They’re all referring to this class-wide, system-wide effectively, database connection. And that’s just stored as a class instance variable on the SchmactiveRecord base.
SchmactiveRecord:
:Base.connection = new connection. I build a new connection to my sample database instead of the original database and I switch the global connection over to that. And then, I’ll just tell all those records to write themselves again and now it will be written to the new database, which is all fine as long as that’s in a rake task that you run as its own process or something.
But then somebody gets the bright idea, “Let’s make it possible to put a button on the admin site that will dump them into the sample database.” And so, that becomes part of the main cycle of your app. And let’s say you’re running in a multi-threaded server. Every time somebody hits that button in the admin section, there’s this brief period where the whole site switches over to writing to the samples database instead of the main database. So sorry that was such a long example, but that’s the kind of thing that you’ve got to think about if you’re writing a gem and your gem uses global state.
EMILY:
Yeah, or like a shared resource. It’s a perfect example where you have this shared resource that isn’t necessarily exposed to you at the top level but it’s something you need to think about.
AVDI:
Yeah.
JAMES:
Also, you might just not, the code that you write today, you may not have any thread issues in it currently because of the way you’re running it. But maybe you switch to a threaded web server or maybe you switch to using Sidekick to manage your background process queue or something, which is very thread-friendly and stuff. And then all of a sudden, these issues can start to pop up out of nowhere.
AVDI:
Yeah.
EMILY:
Yeah, exactly. Like Puma for example, you could potentially have issues if you’re using the Puma server.
AVDI:
So, as somebody who’s not writing specifically threaded code but you’re worried about people using it in a threaded context, I think the biggest thing to keep track of is you don’t necessarily have to worry about throwing Mutexes around everything, just don’t use global mutable state. If you have configuration that your classes use, enable it to be passed down into them rather than having them fetch their configuration from some global accessor.
JAMES:
Wait, global variables are bad?
AVDI:
I know. I know it seems crazy.
CHUCK:
Only if you’re threading, right?
AVDI:
[Laughs] Yeah, only if you’re threading. Otherwise, you should just put all of your data in global variables.
JAMES:
Awesome.
CHUCK:
Welcome to JavaScript.
DAVID:
Or if you’re testing.
AVDI:
[Laughs]
JAMES:
Avdi said I could. That’s all I need. Alright. So, what else on threading? Any other [inaudible] burning points we need to go over?
EMILY:
I don’t know. I think people make fun of threading in Ruby a lot. But I think if you think about that GIL as the feature, you won’t feel so bad about it. [Chuckles] The GIL is a feature. It’s there for a reason. It’s not to constrain us or restrict us. It’s really just to help us out. And I do think that it will become less and less of a, if you want to call it a limit, it’ll become less and less of a limitation in the future. Even though it’s not really the focus of where Ruby is going, I would be surprised if we didn’t continue to make progress on not making the GIL slow down our code so much, in MRI, of course.
Yeah. And like I was talking about before, some of the things it does for you are downright amazing. It still blows my mind that I can just load open URI and in each loop, make a thread for each one, make requests to different places, and put them all together and then just join on all those threads. And it’ll all work. And those requests will be made in parallel because of the really smart code in Ruby that detects that it’s safe to work around that lock. It’s pretty cool.
EMILY:
Yeah, and I guess most people are using Ruby for web development and the real bottlenecks in your code are usually I/O bound. And MRI does it smartly as we’ve said. So it’s not like -- we’re not doing these crazy computations all the time in Ruby. If you’re doing that, you’re probably using another language. So, it’s not ridiculous.
JAMES:
The other thing I would say as far as keeping your sanity with threads is really sit down, learn and play with Ruby’s Queue and SizedQueue class in the standard library. If there is anything that will save you, more often than not, from really complicated threading, it’s Queue, in my opinion. Usually you can just, when you run into some case where you have to share the state and push it across threads. Usually you can just make a Queue, a SizedQueue, go ahead and do your threading. And on one end, you’re pushing stuff in and on the other end, you’re pulling stuff out. And if you can make your problem look like that, then that’s definitely the easiest way out.
AVDI:
Yeah. If you just by default use the model of a thread, structure your code in threads that each have their own single input queue and they loop on that queue, it’s hard to go wrong that way.
Because what you have there is basically a very simple actor model.
DAVID:
Oh, sure. When it was my idea, it was a bad idea. [Laughter]
CHUCK:
You’re not used to that yet, Dave?
DAVID:
Oh, good point.
AVDI:
I’ll turn you into a Cyberman again. [Laughter]
CHUCK:
He will be upgraded. [Laughter]
AVDI:
And SizedQueue is your friend, too. At first, I was like, “Well, why would I use a SizedQueue over a queue?” But then I realized, “Well, a queue, if something’s broken and a thread is not processing its queue, a queue will just continually receive objects and grow and grow and grow.”
Gigantic memory leak, yeah.
AVDI:
Whereas a SizedQueue will start rippling that problem out to the rest of your code as soon as it can’t accept any more objects.
JAMES:
Agreed.
AVDI:
I just want to say one more thing about testing with respect to threads. And Emily, feel free to jump in on this. It’s my view that you should try -- okay, so you were talking about I guess stress testing primarily. But when it comes to unit testing the logic in your application or in your library, I think it’s a good idea to try very, very hard not to have to test your logic in the context of threads.
JAMES:
Oh, absolutely.
AVDI:
So, do anything you can to pull the logic away, to de-interlace the logic from the threading. Pull that logic out. Have that somewhere separate that knows nothing about threads and have your actual threaded code be the tiniest, tiniest piece that you can stress test separately.
CHUCK:
Yeah, that makes a lot of sense because then you know at least the logic is sound. And then it’s a threading, timing, and global state issue.
AVDI:
For instance, if you’re structuring your code in this cheap actor model style where you’ve got a bunch of threads and each has its own input queue and each has a loop that’s processing stuff that comes off the queue, well, have a library or a function or something that sets that up. But then don’t put your code for processing the events that come out of that queue directly inside that code. Instead, have a separate class which is responsible for handling the messages that come off that queue and you can write a unit test where you pass that class a message and check that it does the right thing. But that class should know nothing about the fact that it’s going to be running in the context of a thread.
EMILY:
Yeah, that makes a lot of sense. And also, architecturally, it makes it much easier to understand also, and cleaner.
AVDI:
Yeah.
JAMES:
Right.
EMILY:
Even just from a design standpoint.
Right. It’s a lot like a rake task. In a rake task, you don’t write 50 lines of code. You make some objects or whatever and then in the rake task, you just instantiate something and call a method or whatever. Kind of like that. Your thread should be doing the same thing. It should be worrying about the concurrency stuff and then it just instantiates something and calls a method.
DAVID:
That’s actually…
AVDI:
I’ve got to go rewrite some rake tasks.
DAVID:
Yeah. [Laughter]
DAVID:
No, that’s actually a pattern that we’re seeing everywhere, right? Rake tasks should just be this minimal thing that hands stuff off. Your bin directory of your application, all it should do is instantiate the application object and hand off the parameters. And yeah, threading should be its own single responsibility. So have the worker, manager, whatever, that’s all it does is manage workers and threads and the workers are their own things. That’s actually a really good pattern.
CHUCK:
Awesome.
DAVID:
Actually, we’ve been talking about pitfalls. Did we just transition seamlessly into ways to do it and I missed it or did we have specific tips for how to do threading and not get bitten? Did I just miss it?
JAMES:
Yeah. Emily gave several good standpoints and then I talked about the Queue and Avdi talked about separating your tests. I think we did cover that.
DAVID:
Okay.
EMILY:
Well, there are generally, just to summarize two camps, some people think that you should really use concurrency primitives and get to know them like ConditionVariable, Queue, SizedQueue, et cetera. And then, some people think that you should code and use patterns or models and try to get around using concurrency, or avoid using concurrency primitives. And I don’t really fall into either camp. I think it really depends on what you’re trying to build and which is more appropriate.
But you might see two different approaches in general.
DAVID:
Yeah.
JAMES:
That’s a good point, Emily. And we’ve talked about in the past, I think when Charles Nutter was on,
we’ve talked about his atomic library which I think re-implements a lot of standard Ruby data structures with atomic actions.
EMILY:
Yeah.
JAMES:
And so sometimes, that can get you around the needing of these primitives, the concurrency primitives. And that’s another way to solve these problems.
AVDI:
I think it’s a true statement that…
EMILY:
Yeah, there’s a thread_safe gem also that does something similar. It makes Array and Hash threadsafe.
DAVID:
Nice.
AVDI:
I think it’s a truism that anything that you’ve learn about the systems that your code runs on is going to be worthwhile but it might not be the most important thing for you to learn right now, depending on what you’re doing.
DAVID:
Yeah.
CHUCK:
One question that I have related to all of this. I understand threads. I see some of the benefits of using them. But when is it kind of the obvious time to use them? So for example, I have other situations where I’m feeling some kind of pain so I just reach for a specific solution. What are the pain points that you would hit where you would go, “Oh well, if I threaded this, it would make my life better.”
JAMES:
You’re doing a lot of steps in serial that don’t rely on what came before. So to give kind of a classic example, you fetch some webpage looking for links in it to certain kind of data and then you do one request to get each of those data points. Once you have that first page, you have all the links you need to fetch. And those individual fetchings, they’re not related to each other. They don’t need them to happen in serial order. But if the first page comes up and then you have 10 links in there, that means you have to make 11 requests. So you can do that 11 right after the other or you can make the first one and then make those 10 in parallel. It’s going to be a lot faster.
DAVID:
Yeah. There’s a pattern that I see where I reach for threads often, which is when I’ve got a program that needs to do two or three things kind of simultaneously and long-term. And those two or three things are radically different things. So, if I’ve got a thing that needs to process physics for animation and another thing that needs to render, there’s a pipeline there. But there’s also, they’re doing completely different things from each other.
And that’s kind of a pretty old school approach to threading, but I mention it because when I came into high concurrency programming, it was like this huge epiphany that everyone else already had figured out, that the whole point of concurrency is when you have hundreds of little objects doing the same thing. And I’m like, “That’s never occurred to me.” Your separate threads should be doing something completely different. And so, that’s something that I look for. Like in the case of my IRC client, one to handle input, one to handle network, and one to handle UI.
JAMES:
Yeah. So, kind of going on what David said there, when events can be triggered in the system that you don’t control, input coming in on the socket or things like that, those are something you’re not really in control of.
CHUCK:
Yeah, that makes sense.
EMILY:
Yeah, anything that’s dependent on another resource. I/O is the biggest one, I think.
DAVID:
Yeah.
JAMES:
If a node in MongoDB could fail.
EMILY:
Exactly. But our nodes never fail. [Chuckles]
JAMES:
That’s right. There you go. [Laughter]
JAMES:
That’s another episode. [Laughter]
DAVID:
We should do a five-episode series on writing fault-tolerant code. That would just be fun.
JAMES:
[Chuckles] Yeah.
DAVID:
So mentally, I had divided the show into things that can bite you on the butt and then things that you can do that are good, just --- what’s the word, proactively, in your code. So, I do have a couple of tips to throw out for writing code that’s fairly thread-safe right out of the box, if you guys want to hear them.
JAMES:
Go for it.
DAVID:
Okay. So, these were my three golden rules that I got through a lot of C programming with. And it absolutely ports to Ruby. And what I would say, before I share these three points, I will say that I’m in both of those camps that Emily talked about. I believe you should know the primitives and really understand how they work, but I also believe that you should try to have as little code as possible inside them. Because inside these critical sections -- sorry, in C, we call them critical sections, but this Mutexed code where the processor is not allow to thread-switch away or it’s not allowed to write to your data while you’re in the semaphore or whatever. In those sections, the more stuff that’s going on, the more you have to scrutinize.
So, that’s my first rule is scrutinize any code inside an atomic section. Just really make sure you’re examining. If you flagged a Mutex, really pay attention to every single line of code and ask yourself, “Does it need to be in here and what side effects does it have?”
The most common pattern that I see used, this is the second one, I use this pattern all the time. And we all do this, by the way, just maybe not with threading. If you want to increment a number in the database, we all know that you have to get a write lock on that object in the database and then read it. Because somebody else could modify it after you’ve read it before you get the write lock. So you lock it, you read it, then you can increment it and write it back. And then you release the write lock. This is a common pattern for incrementing something in a thread-safe way. So, just understand that pattern. You’ll use it half the time out of all the patterns that are out there.
And then this last one is while you’re scrutinizing your code, it surprises me how many times this bit me in threading code, actually a couple of times in multi-process code. And that is that if you are in an atomic section and you call another function, you make a synchronous call to another function, make sure that code cannot ever possibly call back into you, back into your thread, or you basically have self-generated a race condition on yourself. So, those are the three biggest smoking guns that I’ve run into with threading. So hopefully, that helps people.
JAMES:
Good tips.
CHUCK:
Yup.
JAMES:
Alright. Is that it? Can we do some picks?
CHUCK:
I always get in trouble when I ask that. So, I waited ‘til you did. [Laughter]
DAVID:
No Chuck, you’re going to get in trouble after you do your pick.
CHUCK:
[Laughs] Oh, I’ve got some fun ones today. Alright, let’s do the picks then. James, you want to start us off?
JAMES:
Sure. So at the risk of breaking the law, I’m going to pick something I’ve picked before on this show and that is Avdi’s Ruby Tapas. We’ve just sat here and talked about threads for an hour and I think all this is fresh in my head. And it’s kind of amazing because I haven’t been doing a lot of threaded programming lately. And the reason is, is I’m watching Ruby Tapas all the time and he just had this huge series on threading. So, there are over 150 episodes now. They cover everything. Threads, I mentioned. Rake, we mentioned on this show. There’s an excellent series in Rake inside of it. It is ridiculous how much you can learn from Ruby Tapas. So, if you are not watching this, you are totally missing out. That’s one pick.
Another pick is a while back I read ‘The Heroku Hacker’s Guide’ and it was interesting. If you knew, I went into it knowing basically zip about Heroku. And if you’re that person, this is a great book. It just shows you how to get started and what the different parts are and what the concepts behind Heroku are and stuff like that. If you have experience with Heroku, you probably would get nothing out of this. But I didn’t know any of those things and this book is one way to learn them. So, I just thought I’d mention that.
And then finally, for a fun pick, I have to thank Martin Fowler for this one. He put up a list of games that he really enjoys a while back. And I read through that list and I was really surprised at how much we overlapped a lot of the games he thought were great or some of my personal favorites. And he had one on there called ‘The Castles of Burgundy’ which I’ve never played. So, I went and bought a copy of it and my wife and I played it for the first time this weekend. And it is an awesome game. If you like strategy games and resource management, that kind of thing, it’s just really great. It’s easy to get into. It’s really fun. All of the different elements interact with all the other different elements. So, you just sit there thinking about all these different ways you can go through it. And I find it super, super enjoyable. So, thanks to Martin for making me aware of that. Those are my picks.
CHUCK:
Awesome. Avdi, what are your picks?
AVDI:
Today, I have two picks. First of all, I recently set out to find a solution for dealing with support requests for both my books and particularly for Ruby Tapas. I don’t get a ton of support requests but I get a few. And since I manage them along with Mandy, my assistant and also the wonderful editor of this show, we needed a way to keep track of who had responded to what. And so, we weren’t stepping on each other’s toes with support. So, I looked at a bunch of the different support ticketing systems out there and tried a bunch out. And most of them were way, way, way overkill for what I needed. Most of them had an interface that looked kind of like Microsoft Outlook. And a lot of them didn’t have decent mobile interfaces. And really, just tried a bunch out that didn’t seem like what I needed at all.
And then my friend, Larry Marburger, pointed me to something that he’s been using called Apoio. I don’t know if that’s actually how it’s pronounced, but it’s Apo.io. And it’s like if somebody just took the 37Signals design philosophy to support ticketing. It’s an incredibly simple UI. It doesn’t try to look like a mail client. It tries to look more like a series of conversations. And the amazing thing about it is that usually when I find one of these things that’s super stripped down, I find stuff that’s essential that’s missing from it. But we’ve been using it for a while now and it’s got everything that we actually need and nothing else. It’s just a beautifully stripped down yet fully functional support ticketing system. So, for very small teams that don’t have a huge amount of support volume, I think Apoio, or however it’s pronounced, is really great.
DAVID:
That’s probably right. Apoyo with a Y is Spanish for help.
AVDI:
Oh, okay. That makes sense, then. And also, I started to watch the new Dracula series. I don’t know how new it is anymore. But it struck me as something that had all of the ingredients to be absolutely awful. It’s network television. It’s dealing with the Victorian era, which is not historically treated that well by television. And it’s Dracula and mostly when people deal with old story character like that these days, they turn it into a complete cartoon. And so, I was very, very surprised to find that it’s actually pretty darn well done. It is not the Dracula you know. It’s a totally re-imagined story with different motivations and different allegiances than you might expect. But the acting’s good, and the dialog is good and the storyline hasn’t made me hate it yet. But I’ve only watched two episodes, so we’ll see.
CHUCK:
Awesome. David, what are your picks?
DAVID:
So, I’ve been doing a lot of data processing lately and I’ve needed to read and write lots of data that can be exported out to a spreadsheet. And so, there’s actually a library in the standard library of Ruby called CSV.
JAMES:
Never heard of it.
DAVID:
I can’t remember who wrote it. Guy’s a jerk. No actually, [chuckles] for the three people out there who don’t know, our very own and beloved James Edward Gray II wrote the CSV library. And it’s freaking awesome. You can point it at a CSV file and just get back rows. And you can treat them like hashes or like arrays depending on how you access it and whether or not you have headers. It can read them, it can write them. And when I reached a point where I needed to read and write from work mode tables in Emacs which are pipe separated with padding white space and each line begins and ends with a pipe, so there’s a little bit of text processing magic that I had to do, I found it was far, far easier to just load up the file and strip off the pipes at the beginning and the end and then strip out the inter-word padding and then put it in a StringIO buffer and hand it to the CSV library for parsing.
And the fact that James wrote it to be reusable and to take any type of IO object instead of demanding that you give me a filename that I can read from disk, saved me from having to go to write a temp file out into the temp directory to do that. So, if you’re processing any kind of spreadsheet data or stuff that people need to bring in and out of Excel or Numbers or whatever, check out the CSV library. Every month or so, I see somebody trying to manually hack apart CSV because it seems like it should be so simple and it’s not. There are a lot of edge cases and James solved them so that you wouldn’t have to. So, that’s my pick.
CHUCK:
Thanks, James.
JAMES:
You’re welcome.
CHUCK:
Alright. I’ve got a couple of picks. My first pick is something I’m really, really excited for. This weekend is the 50th Anniversary of Doctor Who.
DAVID:
Woo!
CHUCK:
And they’re coming out with Day of the Doctor. And so what I mean by this is since this is released in a week, is, “Oh, my gosh! It was so awesome.” [Laughter]
CHUCK:
And they’re actually showing it in the movie theaters as well as on TV. So, it’ll be on TV on Saturday and then you can go see it in the theater on Monday. Unless you’re in certain cities, then you actually go see it in the theater on Saturday. But anyway, so I’m really excited about it. There’s actually a prequel to Day of the Doctor which is that 50th Anniversary episode and it’s on YouTube. It’s about seven or eight minutes long. Really, really good stuff. I really enjoyed it. So, you can go watch that now. Yeah. So, I’m stoked. It’s going to be really awesome.
Anyway, one other pick I have. So Josh, before the show when he was letting us know when he was sick and he wasn’t going to be able to make it, he pointed us to these sugar-free gummy bears on Amazon, which is not my pick. But the comments on the gummy bears are my pick. And I’m stealing Dave’s thunder a little bit here because this is like the epic poop joke.
DAVID:
[Chuckles] You said thunder. [Laughter]
CHUCK:
So yeah. If you’re not into lowbrow potty humor or euphemisms for intestinal distress then don’t go check it out. But I am not a sophisticated guy. And lowbrow humor just makes me laugh my head off.
JAMES:
And these are hilarious.
DAVID:
Yes.
CHUCK:
So, I was reading it and I was trying to stay in my chair.
DAVID:
These poor people were so distressed that they were moved to poetry. It was just amazing.
JAMES:
[Chuckles]
CHUCK:
So anyway, just awesome. So anyway, go ahead check that out. You’ll have to click the link and then scroll down to the comments. But, oh my gosh. And it’s totally worth clicking on, there are 215 other reviews. Go read those too.
Finally, I have two books to pick. One is ‘Remote’ by Jason Fried and David Heinemeier Hansson.
Gave me a lot of terrific ideas about different things that I can do since I work remote, as well as I’ve been working on building out some remote teams that can take on some work. Because I’ve had a fair bit of work come my way and I’m trying to figure out how to get it all done and serve the people that want my help. Anyway, so that’s been really, really helpful and I can’t recommend it highly enough.
And then the last pick kind of has to do with my week this week so far. And I’m not going to take too long. I’ll probably actually talk about this somewhere else, on my blog or something. This week, I went to the doctor. And you have to realize, I went to RubyConf a couple of weeks ago. And I was exhausted the entire time. Well, it turns out that, for those of you who don’t know, I’m a type 2 diabetic, all of my numbers were totally out of whack. And so, that’s why I was so tired. That’s why I haven’t been feeling well lately. It’s contributed to a low level of burnout that I’ve had. And so, when I went in and they explained that to me, it kind of all made sense. And so, I’ve been making my health more of a priority.
And so, the book I’m going to recommend is ‘The Healthy Programmer’ by Joe Kutner. And we actually did an episode on The Freelancers’ Show talking about it. This doesn’t solve the issue of diabetes specifically. But it has a lot of great tips for being healthy and setting up your workspace and your lifestyle so that you can maintain your health. And I just want to point out that programming is fun and I spend a lot of time doing it and giving back to the community in a lot of ways that I can. But it hit home that my health is more important. And so, I’ve had to make it a priority. So, I’m going to pick that and encourage you to go listen to our discussion with Joe on The Freelancers’ Show where we talked about his book.
Emily, what are your picks?
EMILY:
Well first of all, I just want to ask, does ‘The Healthy Programmer’ suggest eating sugar-free gummy bears? Because I’m not sure I want to read it then. [Chuckles]
EMILY:
So, my picks. First one, I looked through the list of past picks to see if it was on there and I was actually surprised it wasn’t. It’s ‘Programming Pearls’ by Jon Bentley. It’s a book that my father, who was a Computer Science professor, recommended to me when I was frantically trying to figure out how I could do tech interviews. And I was like, “What are you talking about? This was written in 1986. How is this going to be relevant to the interviews I’m doing now?” But it’s totally relevant. It’s a book about how to use insight and creativity and how to approach problems, choose the right algorithm, and solve problems effectively. So, it’s not focused on implementation details. It’s more about the thinking process as a programmer. And you could apply it to your programming. You could apply it to putting together IKEA furniture or cooking or anything really. I really love this book. It’s a gem. I think that was a double pun. [Chuckles]
EMILY:
The second one is I guess I just really love Gists. I use them for everything. I use them for taking notes, for proposals. I use them for writing proposals in markdown so I can send links to people and have them review them. I use it for code. I use it for shopping lists. I think Gists as this free text that you can put online, you can just pass around either privately or publicly and that people can make comments on are so awesome. And you can use them in many different contexts. So, use Gists a lot. It’s a great way to communicate with your colleagues also, sending them little snippets and stuff.
And then the last thing is not programming related but it is a complement to programming or really what you do on a daily basis. And it’s archery. So, I’ve recently in the last couple of years or last two years decided that I wanted to try archery. And it’s really rewarding as an activity that you do at the end of the day. I don’t get the opportunity to do it that often because you have to go to a specific place, obviously, to practice. But it’s really cool because it’s the same sort of concentration that you would use when you think about a problem or you dissect a problem into different pieces and focus on these very detailed things.
There’s not a lot of movement with archery. But you really have to focus on different parts of your body and how to make these forms and improve your form over time. So, I go to the archery lessons at Columbia and the coach always says that it’s about finesse. You don’t have to know how to run or be in any kind of good shape really to do archery, which is sort of why it’s so nice. But yeah, it’s super fun and I think it’s a really cool exercise in concentration and focus and is a nice complement to programming and problem solving.
JAMES:
Awesome.
CHUCK:
Very nice. Yeah, archery’s fun. I remember doing it when I was in scouts. Alright. Well, I guess that’s it. We’ve done the picks. We’ve talked about threading. Thanks for coming on the show, Emily.
EMILY:
Thanks so much for having me. This is really nice. Thanks to you guys. I appreciate it so much. [Electronic buzzing]
EMILY:
Oh, no. [Laughter]
JAMES:
David, your audio just died again.
CHUCK:
The musical stylings of…
AVDI:
David Brady just dropped in.
JAMES:
It’s classic. Oh, thank you, Emily. I think you’re actually the first unofficial Rogue we’ve ever had on the show. So, thanks for supporting the podcast. We appreciate it.
EMILY:
Well, thank you so much for having me. I really appreciate it. I look forward to seeing…
DAVID:
Is my audio okay?
CHUCK:
Yeah.
EMILY:
Oh, yeah.
DAVID:
Sorry.
EMILY:
No worries.
DAVID:
Sorry, I just totally talked over you. I’m sorry about that.
EMILY:
Oh no, it’s fine. I was just thanking you. I had a lot of fun.
CHUCK:
Alright. Well, we’ll catch you all next week.
133 RR Threading with Emily Stolfo
0:00
Playback Speed: