JAMES:
Alright. So, we’re talking about some DRb today?
DAVY:
That’s the plan.
CHUCK:
Derb…
[Hosting and bandwidth was provided by The Blue Box Group, check them out atBlueBox.net.]
[This podcast is sponsored by New Relic. To track and optimize your application performance, go toRubyRogues.com/NewRelic.]
CHUCK:
Hey everybody and welcome to Episode 98 of the Ruby Rogues podcast. This week on our panel, we have James Edward Gray.
JAMES:
Good morning, everybody.
CHUCK:
Josh Susser.
JOSH:
Hey, good morning from San Francisco.
CHUCK:
Katrina Owen.
KATRINA:
Hello from Denver.
CHUCK:
Avdi Grimm.
AVDI:
Hello from Pennsylvania.
CHUCK:
I’m Charles Max Wood from DevChat.tv. I'm looking forward to hearing about Distributed Regexes. We also have a special guest and that’s Davy Stevenson. I can't see your last name on here.
DAVY:
That’s Davy Stevenson. I'm from Portland, Oregon.
CHUCK:
Alright. You want to introduce yourself really quickly since you haven't been on the show before?
DAVY:
Sure. So, I worked for a company called Elemental Technologies up in Portland, Oregon. We do video transcoding software. I have been coding Ruby on Rails for, let’s see, 2008 to 2013. So that’s, if I can do math correctly, five years.
JAMES:
Awesome.
JOSH:
That’s really cool. And you did a talk at RubyConf last year, right?
DAVY:
Yes, I did. That was my first big talk at a conference. And it was on DRb and RabbitMQ.
JOSH:
Yeah. That was a good talk. I watched the video for it. Unfortunately, I missed it when I was in Denver. But it was a good talk. I liked it.
JAMES:
It was a good talk.
DAVY:
Well, thank you.
CHUCK:
So, which one won?
DAVY:
The moral story is that both win. [Laughter]
CHUCK:
Oh, you're one of those people. [Laughter]
JAMES:
Everybody gets a ribbon.
CHUCK:
That’s right. It’s like kid’s soccer, right? Well, you’ve got six goals and you got two goals but you both won.
DAVY:
Well, it’s really more to use the right tool for the right job. And that DRb can solve a lot of your problems. And eventually, you may have to scale up and replace those components with other things like RabbitMQ, for example.
JOSH:
I think that’s a great way to solve that problem, just in general, that you don’t necessarily, what they say, you don’t have to throw out baby with the bathwater. If there's part of something that’s not working, you fix that part. You don’t have to give that -- it’s like you wouldn’t throw out Rails just because the XML support isn't working right.
CHUCK:
So, when there are trade-offs, you don’t trade everything?
JOSH:
Oh, it depends. [Laughs]
JAMES:
We kind of struggle with that. But don’t we? Like we have that tendency to over reach, I think, to look deep into the future and try to change everything that needs changing instead of changing just the one minimal part.
CHUCK:
Yeah, we actually went through that with one of my clients. They decided that they were going to switch over to JRuby and TorqueBox. And so, it was like this big huge massive change instead of maybe a measured -- well, we’re just going to have to be the web server for now and then we’ll move our queuing and job system over in a little bit and use their queue. But now, they just went whole hog and then we had all these problems trying to get everything to switch.
JOSH:
So, it seems we’re wandering down a tangent. Davy, is there something you want to say about the situation described in the talk to just sort of tie it all up?
DAVY:
Well, I guess, one of the topics that we were going to talk about was how useful DRb can be as a hacker’s tool. And that’s kind of how we initially used it. It’s really easy to prototype things using DRb. That thing’s up and running really quickly mainly because DRuby can allow you to do distributed programming at the same time that it’s hiding pretty much all of the implementation and networking issues from you which is one of its really great benefits.
KATRINA:
So, one of the things we forgot to do today is define a few terms. Can we have a definition for DRuby?
DAVY:
Sure, sure. That’s a great idea. So, DRuby is a tool that is bundled within Ruby core. It’s limited to Ruby. So, it’s a distributed programming language. So, it allows two different Ruby processes to talk to each other and share state and objects between those two different processes. And those can be processes on the same computer or on different computers. And so, they can talk over the network to other processes. So, it’s 100% written in Ruby and no IDL is required if you’ve done other distributed programming. And it acts very much like Ruby in a lot of ways. It looks at methods at execution time, and tries to pass by reference as much as possible and things like that.
KATRINA:
So, what are the typical problems that it solves?
DAVY:
Well, it can really help with multiprogramming if you have different processes or you're trying to spin up multiple different processes to run on different cores, for example. And when Ruby, we may be using this a lot more than maybe other languages because our threading support doesn’t really allow us to take advantage of all the CPU cores as other languages might. So, in order to get around this, you might spin up multiple different processes. And in order to have those processes talk together, you need some mechanism.
JAMES:
Right. So, if you’ve forked a process, traditionally you might, before you do the fork, create some pipe and then do the fork. And then now, you're basically at ground zero where you have to decide, “Okay. Now, how do these two things communicate with each other?” Or alternatively, you can fire up a DRb object on both sides and just start calling methods.
JOSH:
So, this is kind of an amazing magic trick. And the first time I saw this done in a language, I was like, “Oh, my God! Wow! That’s amazing.” To contrast this style of distribution, one of the things that people typically do is if you're doing Unix programming, you'll just open a socket to another process and start shoving bits down a pipe at it. And yeah, you can do whatever you want over that socket. So, it’s just a wire protocol. So, how does the DRb style of distributed programming differ from that very standard sort of, let’s just open a socket to another process and start shoving bits back and forth?
DAVY:
Right. So, I think DRb is a lot friendlier especially for someone new to this style of programming. So, it’s really easy to fire up a DRb server within a process. And then, once you attach some data to that server, another process can create a simple DRb object, pass it in the correct URI of the server on the other process. And pretty much, magically, you have a reference to what looks to you in your Ruby code as that other object in that other process. And from there on out, you can write code using that reference, just as if that object was local in your new process. So, it’s really straightforward and really easy to read the code in order to do that. You don’t have to do any fancy networking things. It does all of that kind of magically. You don’t have to worry about the format of the data transfer between the sockets. That’s all kind of magically taken care of for you, as well. So, you really just able to say, “These are the two objects I want to share between these two different processes.” And you can change them dynamically. And the other kind of cool thing that you can do with DRuby that often is lacking in other sort of similar processes is that it actually blurs a line a lot between the server and a client. The server can actually send stuff back to the client if you set things up correctly which is kind of magical. When you see that happen for the first time, you're like, “How did that happen?”
JAMES:
And DRb uses that to great effects. So for example, if you call some method that takes a block, you got a problem there because the block’s a closure. So obviously, it needs the client side environment. So actually, DRb will fake that by sending a proxy to the other side and running it through the iteration on the other side but calling back to the client to execute the blockage time so that it executes with its closure environment. So, it actually all just kind of magically works. It’s very clever.
DAVY:
It is clever. And when you first start using it, it all seems like everything works great. You can do whatever you want. It works fine and it’s only until you actually kind of try and see how it’s doing it that you're like, “This should never have worked. How is this possibly working?” [Laughter]
DAVY:
And so, that’s kind of a point where you're like, “This is just a magic trick.”
JOSH:
[Chuckles]
CHUCK:
So, we talked about sharing it between two processes on the same machine. My understanding is that you can also use DRb across machines or across multiple machines.
DAVY:
Yes, you can.
CHUCK:
Is that hard? Is there any more to it than that than doing it across different processes?
DAVY:
No, it’s very, very similar. So, when you start up a DRb server, it gives you a URI. And that’s the host name or IP address of the machine that it’s on, and a port. And you can link to that server from anywhere. And for the most part, the networking issues are handled automatically for you. The only minor trick, of course, is if you don’t have DNS set up correctly, then you have to use the IP address as opposed to the host name. But that is probably really a problem that DRb can't solve for you.
CHUCK:
I was going to say that’s more just common network, mere network referencing issues. You know, DNS and is your firewall up, and is that port open across the network, and things like that.
DAVY:
Exactly.
JAMES:
Davy, you won't have heard it yet because it just came out a few minutes ago. But last week, we talked to Martin Fowler about Patterns of Enterprise Application Architecture. And as part of that conversation, we actually got to talking about distributed objects which he’s actually pretty down on the idea as far as like how he covered it in the book. And he gave his reasons.
KATRINA:
Actually, his first rule of distributed programming is, don’t do it. [Laughter]
JAMES:
Don’t do it.
CHUCK:
Well then, what's the second rule?
KATRINA:
He didn’t say the second rule. [Laughter]
JAMES:
I don’t think you need a second rule.
JOSH:
Don’t tell anyone you did it. [Laughter]
JAMES:
First rule of distributed objects, we don’t talk about distributed objects. Okay. So, he actually covered with this pretty in-depth what his problems with it are. And it’s basically two things. One that with that all happening kind of auto-magically like it does in a DRb scenario, then you're not really aware of this deep price you're paying. Any time you make a call with the wire or something like that, that’s a big deal. It’s an important thing that we have to be aware of. And of course, that means lots more error handling and stuff like that. And when it all just kind of happens automagically, then we’re not paying attention to that important barrier. That was one thing he said. The other thing he said is that the way we do local objects versus the way we do remote objects are very different. Like locally, we want lots of little methods, tiny pieces that we can work with, test easily, et cetera. But when we do remote calls, we want very coarse grained things. So one of his examples is locally, if I want to add this object, I want a street address, city, state, ZIP. But remotely, I just want an address method that does all of them because I don’t want to pay that price of picking four different things. How does this play into DRb, do you think?
DAVY:
I think that’s a really good topic to talk about. And I both agree and disagree with some of the statements that Martin Fowler said. And I’ll explain that a little bit more. And the first thing is that as Ruby developers and probably most of us are Rails developers, the fact of the matter is that we already do distributed objects and that is using the database as where we’re serving our objects from. And so, that’s just kind of one thing. We are already paying that price a lot of the time. And you can see that in how we structure the interface between the database and Rails in a way Active Record does that. And it matches this whole coarse grains concept that we’re talking about. Obviously, you can make queries, so they're more selective. But the default is coarse grained. As far as how that applies to DRb, I think that given the experience that I've had using DRb in an actual [inaudible] environment, there is a lot of truth of that. And part of it is what are you actually using DRuby for? And for us, we aren’t using it necessarily to send objects back and forth. That’s not really what we’re doing. We’re using DRb as a signaling mechanism. And so, that’s where we’re trying to send some small bit of data over to another process to tell it to do something at a certain time rather than sending some sort of behemoth object.
AVDI:
So, is that almost like sending a message?
DAVY:
It is like sending a message. So, it’s like kind of signal message kind of the same sort of concepts. So, you're sending much smaller subset of data. And the experiences that we have had with DRb, at this point, we try and stick to sending only Ruby primitives. So, we’re talking about sending hashes and arrays and integers and symbols and strings, not bigger objects because there can be problems with that.
JOSH:
So, this is very much like you're keeping it to value objects in the GOOS terminology.
JAMES:
Right.
JOSH:
GOOS is Growing Object-Oriented Software, the Guided by Tests book. And people keep talking about value objects as -- value objects can be primitives or they can fairly complicated but they're basically the contents of it are just by the object not so much the identity.
JAMES:
Well, not the behavior, right? Is it more not the behavior?
JOSH:
Well, it’s like strings that can be different objects but you compare them based on the equivalence of their contents.
JAMES:
Right. So, those are the DOM classes which has lots of attr_readers where you're just reading data off of it.
CHUCK:
So, this kind of leads me to a question about how it works then. If I have an object of a type or class that’s on one end of the wire and I send it over to the other end of the wire, does it have to know about that class? Or does it just get all of the contexts around that object?
DAVY:
So, the way DRuby works by default is that it will pass the objects across a network and it doesn’t care if the other side has that context at the time that it’s sending it. However, just like Ruby, it looks up basically methods at execution time. So, once you try and do something with that object, that’s when the other side needs to have that object correctly defined. And DRb does not do any of that for you. So, you have to have loaded up that class with that object or in that context already yourself.
CHUCK:
Okay. So, if you don’t have that, then passing the literals or like hashes and arrays and strings and bignums and all that stuff, that’s what really makes sense because then, it doesn’t matter that the other side doesn’t know what a widget class is.
DAVY:
Exactly, considering you already know what an array is.
CHUCK:
I sure hope so.
DAVY:
Yes, exactly.
AVDI:
Unless you’ve monkey-patched array.
JAMES:
Ouch!
[Laughter]
JAMES:
And this is probably a good point to talk back into the Martin Fowler thing. He talked about preferring things like JSON and stuff because we take it down to the primitives and stuff like that which is basically what Davy is saying. When we send messages, we just send the simple primitives that we can count on being there.
CHUCK:
Yeah. But now you're wandering down the path of what I think a lot of people do with like Rescue and RabbitMQ and things like that where they serialize the object and the YAML or JSON or something and then they re-inflate it on the other side with a comparable class or the same class.
Is there a big difference between the two approaches? I mean, what are the trade-offs there?
DAVY:
Right. I think there is a difference. And so, like one of the -- we’re talking about the data that are actually passing through DRb. But we’re now only talking about how I'm using a data. So, the one really great thing that DRb allows you to do is that you have these contexts over to this other process. And so, you're passing data but what it looks like to you as a programmer is that you're calling methods. And you're calling methods that are taking attributes. And so, the values that you're passing to the methods, that’s going to be an attribute hash. But you're calling a method. And that method over in the other process is doing something possibly much more complex. And so, in our situation, we’re using those methods on the other process are kicking off actions, spinning up new threads and doing a lot more things. But it’s triggered by the other process.
JAMES:
So, let me see if I understood what you're saying correctly. I think what you're saying is you don’t just build up some kind of normal object graph of what you need the process to do. And then connect DRb directly into that and start working. What you're saying is you build the normal object graph, then you paint a kind of generic interface on top of it and you hook DRb into that generic interface. Am I understanding it correctly?
DAVY:
Yeah, I think that explains it pretty well. And we have -- so, it might help a little bit more to kind of explain our situation a little bit in a little more detail. So, the standard communication method we have is actually between the Rails server and a long running Daemon process on the box or on a different box. So, obviously Rails, a single request lasts for just that scope of the request. It’s not going to stick around any longer than that. But if we’re trying to do things that kick off more longer running processes, that’s where the Daemon on the boxes handling all of that part of things. So, there is basically kind of a class or object within the Daemon that has an interface. So, a whole bunch of methods that are exposed through DRb that if you connect to that DRb server, you now have access to all of those methods. And those methods could be as simple as telling you if it’s up. There's actually one method that I always define on these objects as just def up true. And so, you can call that from another process connect to that DRb server, then you connect your raw object and then you can call up on it. And it will tell you if it’s up or if it’s not up.
CHUCK:
If it’s not up, does it raise an exception or does it just return like false or nil?
DAVY:
You'll get a DRb connection refused error. And so, in our code, we’ll wrap that with a catch false.
CHUCK:
Right.
JAMES:
That’s awesome.
JOSH:
So Davy, you were talking about how DRb is really well-suited for hacking. It’s like a hacker’s tool. And the way that you just described, it sounds like it’s not just like this big soup of Ruby objects that you pretty quickly go from, “Okay, we just got a bunch of Ruby objects till we have an interface that we’ve defined using these Ruby objects.” And in your RubyConf talk, you talked about switching from the DRb style of connectivity and communication to using RabbitMQ. Is this like an evolution of that process where you start with just random Ruby objects and can do whatever you want and it’s very flexible and it’s great for prototyping. Then you firm up this interface that you can use for more efficient communication. Is it an extension of that process to move into some other non-Ruby central communication technology?
DAVY:
I think that was a really long question. So, I kind of forgot the main thing once you got to the end.
JOSH:
Let me trim that rambling discussion and do a natural question. There's this process where you went from sort of prototyping to firming up an interface and it’s still in Ruby, it’s in DRb. And is the transition from DRb and Ruby only to some other technology, is that just sort of the same or is it like turning a corner and going a totally different direction?
DAVY:
I think that depends on a particular instance. For us, in order to pull in RabbitMQ, it did involve a decent amount of refactoring of how we are thinking about communication. And that is because DRb makes it very easy and it’s very, very flexible and you can change things really easily. When you're pulling in some of these other tools, you have to plan out a lot more. And that, of course, is always the transition between the really flexible kind of, “Oh, you can do whatever you want,” to something that’s a lot more structured. And that’s where you're getting the benefit of the more sophisticated tools out of that structure. So, I think there's always -- there's never a complete dropin replacement.
JAMES:
Right. You go from like just waiting on someone to call methods on you to, “Oh, now there's this queue out there that I need to pull data from.”
DAVY:
Right. And you have to actually kind of specify the data that you're going to be passing like what's the structure of that data, DRb doesn’t care.
JOSH:
One of the questions I have when I was watching your talk was that you were talking about the things that RabbitMQ provided in terms of teachers and the value that they had for solving the problems that you were dealing with. And it totally made sense. But if I've been in the room, the question I would have asked would be how hard would it have been to implement equivalent functionality on top of DRb as opposed to going with the really different technology? I mean, you could have built all that stuff in Ruby on top of DRb.
DAVY:
Yes. And actually, the kind of funny thing about that talk was that the talk directly after mine was the talk by the creator of DRb, Masatoshi Seki. And so, we actually got to meet because he really liked my talk and we chatted. And so, we talked a lot about some of the other things. And his talk actually showed me the more things you can do with DRb that I hadn’t even really been aware of. But he pointed me to a lot of the work that’s done by Rinda and Rinda::Ring that provides some of the same tools or some of the same abilities as RabbitMQ but within DRb. So in this case, Rinda provides you with this top of space where you can store data and grab data out of it. But at the end of the day, one of the really important things that RabbitMQ provided us that DRb just never could for our use case is the fact that we also have processes running that were written using C and C++. And DRuby is Ruby only. And so, no matter how much I love it, I can't use it to communicate with those processes.
JOSH:
That’s great. You can create equivalent features but it’s still Ruby only solution, so you can't integrate with other languages.
JAMES:
So, this kind of along the same topic, one of the reasons we decided to do this episode is I actually made a tweet awhile back about DRb being a hacker’s tool and there was kind of a conversation. And several people came back on me right away about its poor reliability. And I was talking to that and Davy kind of got pulled into that conversation. So, let’s talk about reliability. What do you think about that, Davy?
DAVY:
I personally have never had any issues with DRb’s reliability. But I will throw up a giant caveat over that whole thing in that. So, the way that our applications are used does not have to deal with any sort of high levels of throughput. So, we know very exactly how much data and how many calls are going to be kind of going through the system and it’s way, way below any sort of threshold of high throughput. And as far as that goes, I've actually never seen DRb fail to connect and return the correct response as long as both the client and the server are up.
JAMES:
Yeah. I've actually been thinking a lot about this. Because I too perceive that it’s fairly reliable. Obviously, there's the performance issue which you definitely had on there. In order to make these calls, it ends up marshalling all of the arguments and details of the method call, passing that over the wire, reconstructing it on the other side and stuff. So, this is definitely a performance penalty to pay. And if you're in a super tight environment where you got to pass a lot of messages fast, it’s probably just going to fall down and die. And then, we talked earlier about how if a method is not there, if you can't contact that object at that time, you're going to get some kind of exception or something which is similar to what would happen if you were using something like .HTTP. You’d get some kind of exception. It’s going to be a different kind of exception. But I was actually thinking about this. What makes people perceive it as kind of that way? And my theory is that it’s how it hides the details from you. Does it do a retry when it fails to reach something, like does it try again? We don’t really know that, at least not intuitively; whereas if we use some other kind of library, we probably have to set like a retry count or something like that. But because of DRb’s kind of behind-the-scenes-ness, we don’t really know what all it’s going through. And so, we feel kind of disconnected from that process. Do you think that made sense? DAVY: I think that definitely makes sense. And it is definitely true. DRb provides you a very specific set of tools. And if you want more things around that, you're going to have to write that code yourself and a lot of that. Like, you said with the error catching and the retries, that’s code that you would have to write yourself for sure. I actually will point out though, that in my opinion, or at least the feedback that we’ve gotten from our customers is that DRuby’s latency is actually really good. We’re talking about people who are trying to start up live streaming events and are very -- I have people out there looking at the latency of the whole system from when they press a button to when they get video up the other end. And the DRuby portion of that has never even been a blip on our radar.
JAMES:
One thing I noticed today was Eric Hodel this morning tweeted about how he’s considering writing a drbdump which would be the tcpdump equivalent for DRb. And I think that might help with some of these visibility issues that we’re talking about.
DAVY:
Yeah, I saw that too. And I got super excited because I definitely agree having a tool that kind of allowed you to see kind of what's going on behind the scenes would be really amazing as far as development goes. So, I told Eric he should totally do that. Eric’s actually been doing quite -- well, I don’t if quite a bit is really the right word. But he’s definitely been developing in this area. He actually has been submitting some patches to Rinda as well. So, I'm kind of excited about that.
JAMES:
What about durability? You do need something like durability with DRb, Rinda, et cetera. Are you totally on your own there?
CHUCK:
What do you mean by durability, James?
JAMES:
So, the need to persist data, right?
CHUCK:
Okay.
DAVY:
Right. And that’s kind of the other part of things that DRuby does not provide for you. And that was one of the other things I talked about in my talk about why we eventually decided to switch from DRuby to RabbitMQ was for the durability of the messages being sent. And I have only looked at Rinda kind of more cursorily but it does seem like that can't provide some of the durability requirements. It might be a good halfway point where it’s definitely more durable than just straight up DRb but it may not be the same level of durability as RabbitMQ, for example.
AVDI:
A message will survive having like the server not be there right now but it’s not going to survive having the Rinda server down.
DAVY:
Exactly.
KATRINA:
And persisting messaging is more of a question that like being able to replay the message if the server went down. Is that what you're talking about?
DAVY:
Right. It’s this -- this is a topic that I kind of talked with James on Twitter about which is what are we talking about here? Are we talking about data or are we talking about messages or signals? And are you trying to send data over the system that you don’t want to lose that data. And if the server isn't up when you're trying to send it, that that’s a problem? Or are you trying to send a message and trying to perform an action at that point in time. And in that case, if the server is down, then there was no way that you could perform that action anyway. And so, in that case, it doesn’t actually matter. The signal, it doesn’t matter if you drop the signal at that point.
AVDI:
Can you make things a little bit more concrete for the listeners and just give an example of the name of a class that you call methods on remotely and the name of one of those methods?
DAVY:
Sure. So, the classic example, we do video transcoding. And so, we’re managing a system where we’re spinning up live transcoding events to stream out to CDNs, for example. And so, we may have a start button on the UI and when you hit that start button, it hits an endpoint in our Rails Stack and at that point, the Rails Stack is going to connect to the Daemon’s Ruby server, the DRb server. It’s going to get -- so when you create a call to that server, so you're creating a DRb object which is now within your controller is basically you're kind of holding on to a reference to this object in the Daemon that you're trying to communicate with. And so, at that point, what we may send is the DRbObject.launchevent and give it the ID of the event that we’re trying to start.
JOSH:
Okay. So, slightly different direction here. Distributed programming is basically concurrent programming over a bunch of machines. And I'm curious like how much of your time you have to spend dealing with sort of the fundamental concurrency issues when you're doing programming in DRb. Do you have to worry about dining philosophers? Do you have to worry about, I don’t know.
DAVY:
Yes, you do. And so, the DRuby book actually has a whole chapter on multithreading and what you have to do in order to prevent deadlocks and synchronization issues. And so, DRb isn't itself going to manage that sort of synchronization. However, I do believe that Rinda provides a little bit of that. Again, I've only read about it. I haven't tried it out myself. But I think Rinda adds a little bit more of the synchronization on top of things. But if you're using DRuby directly, you're going to have to be implementing the mutexes yourself. And so, in our case, within our methods on the Daemon object, we do use mutexes in order to protect the concurrency issues and make sure that if two different people are hitting that start button at the exact same time, only one of them is going to be able to actually start the event.
JOSH:
Okay. That’s important. Do you find that the granularity in DRb complicates that? That you can -you know, we’re talking about this fine grain messages versus coarse grain messages. And so if I have some sort of server object that’s talking with the client object or being talked to by a client object on another machine, and they start having a conversation back and forth, do they need to have some sort of session to key that they know, “Oh, okay. Yeah, this client’s already in my mutex.
It can still talk back to me.” Or is that just beyond the level that you try and deal with?
DAVY:
I think that’s kind of beyond the traditional level. If you're going to be making multiple calls back and forth between the two objects, then that’s the point where you're going to need to be structuring your mutexes in a way that prevents a deadlock, for example. So that’s going to be 100% on you. Though, as far as my experience goes, we haven't ever really run into that issue. I think you’d probably be able to get around having to do some of that.
JOSH:
Okay.
DAVY:
And I think that depends a lot on the structure of your distributed system as well. And so, after hearing about Martin Fowler being on the last episode, I spent a whole half hour last night reading his stuff. And one of the things he does talk about is how you're actually structuring your distributed system. Are you structuring it in a way -- well basically, what he suggests is that your distributed system is actually multiple copies of the same process running on multiple nodes. And then you might have like one controlling node that’s kind of managing those multiple processes. And that’s actually exactly how we structure our distributed system as well. And so, in that case, you can get around a lot of these issues because you know that the control messages are only coming from one location. And those messaging are fanning out into individual nodes that don’t actually care about each other. They are fully contained within themselves. So, that can help avoid a lot of these issues. And I think Martin Fowler probably describes it a lot better than I can, in his book.
JAMES:
Yeah. It’s basically the threading problem. If you have a ton of objects that are sharing a bunch of data back and forth, that’s always where threading falls down and that is right. So, same thing, if you have two processes, DRb is under the hood using threads to send these messages and stuff. You have two processes that are sitting there, chatting back and forth to each other, you're going to have to worry about all that threading stuff. Whereas if you do the typical data flow one way through kind of stuff, then it’s not really a big deal.
DAVY:
We’ve also structured our systems so that it becomes -- the decision on which node is going to be running a certain task is decided much more early on in the process to try and avoid a lot of these pitfalls of two different nodes like fighting over the same object, for example.
JOSH:
I want to get back to fundamentals just a little bit. We’ve talked about some of the magic that goes on with DRb and we’ve defined it. But I don’t think we’ve really dug into how the magic works. I think we’ve used the terms proxying and marshalling. But I think it’d be great to break it down just to the next level so people can get an idea of just how magical this stuff is. Do you want to talk about that, Davy?
DAVY:
Yeah, I can talk about it a little bit. Hopefully, I get the majority of the details right. There's definitely some subtle interactions that can go on. So, we all probably know that Ruby itself when it passes by reference all of the time, when you pass an object within a method, it’s passing that actual object in a few modified object then the original object is modified as well. And you have to expose a recall.dup or something like that on the object in order to get pass by value. So, DRuby attempts to pass by reference as much as possible and does this using Marshal.dump and Marshal.load. So, the client will dump the object using Marshal.dump and then pass. And so, you get the string out. Basically, the string is a presentation of that object. And that’s actually what DRb passes over the line to the server. And then, the server attempts to load that using Marshal.load. And so, that’s the point where you need to have that same object and the same class definition on the server in order for Marshal.load to work properly.
JAMES:
Davy, is that pass by reference or pass by value? My instinct says that’s pass by value because if you modify the object on the other side, the change would not be propagated back. Am I wrong?
DAVY:
I think this is true and this is the point where I get a little bit confused when we’re talking about the different subtle differences between how DRuby passes different objects.
JAMES:
I view that case that you just described as pass by value because it’s kind of like you said where if I'm calling something that would normally be pass by reference and I want pass by value, then I would add a .dup on it, right? While in this case, the marshal process is basically a .dup because it post the internal gits out, shoves them through the wire, and then reconstructs them on the other side. So, it kind of makes a duplicate. So to me, I think, that’s pass by value. It is extremely confusing because the way -- the other one is pass by reference which DRb kind of pulls off with DRb on dump and proxy objects and stuff but it’s kind of weird because in its way or it’s actually passing a totally different thing that makes it look like pass by a reference. It’s all very complicated.
DAVY:
Yeah, exactly. And you know, it’s attempting -- when you talk about pass by reference, they're two different processes. It’s impossible for it to actually be passing by reference in the conventional sense. It’s really just what it attempts to have it look like to you, the user. And that of course, makes it very complicated to talk about.
AVDI:
You might also call it pass by proxy.
JAMES:
Right, yeah. That’s more correct.
AVDI:
Where you can take a proxy on the server side and a proxy on the client side.
DAVY:
Right. And I think maybe that’s kind of what the marshal allows you to do because it is kind of like pass by -- it’s trying to mimic that object as much as possible and trying to kind of hide the details from you.
AVDI:
I mean, if you are marshalling up the whole object, then I think it would be safe to say it’s pass by value although I'm not exactly sure how it works. I don’t know if it then substitutes associations with proxies or not or if it ever tries to serialize that whole tree.
JAMES:
If it serializes a whole tree all at once, then the objects will be correctly associated on the other side, I believe, because marshal basically handles that itself. It uses the reference instance. And so, when you reconstruct it, puts it back. Back if you're talking about you marshaled some tree that’s also a part of some other tree that you marshaled before, then those would not be linked up on the other side. It’s my understanding.
JOSH:
That’s sort of a policy decision when you're building your API is, where do you cut through the object graph if it’s not self-contained?
AVDI:
And that’s where that discussion with Martin Fowler of Remote Façade comes in, I think.
JOSH:
Yeah. That’s exactly the Remote Façade Pattern.
AVDI:
It’s definitely a good idea to kind of build something intended to be public facing, to be API facing rather than trying to pretend that those objects are really going to be just like local references. I will say, while we’re talking about marshalling, that’s where things can get really hairy if you are passing anything other than like basic data types because that’s particularly where the versioning issues come in. If you have a domain object on both sides of connection, but you’ve pushed new code to one side of the connection more recently than the other side, that’s where things can really get interesting.
DAVY:
Yes. And that’s definitely where landmines live. Kind of back to some of the internals and some of the features that DRuby does provide for you. So, one thing that it does when you're actually passing the object over, if it doesn’t have -- if that object doesn’t define at that point in time, it will actually kind of hide that air for you and store the marshal string and not raise an error at that point in time. It’s only when you're actually trying to call a method on that object, it does actually try and that’s the point when you have to have the object to find by that point. And so, at that point, that’s when it will actually retry the marshal.load to try and reload it. So, you have a kind of a window basically of where you could send over a class definition and load it up between when you're actually sending the object which I think is kind of cool.
JOSH:
Hey Davy, since it seems like we’re getting close to the end of the conversation, how do you recommend people get started if they want to try something in DRb? What's a good way to ease and do it or should they just jump in? What's the way to get going?
DAVY:
Well, it’s already in standard lib so you don’t need to download anything. It is pretty easy to get a standard plan server, two dummy test scripts up and running. It takes you five minutes to start playing around. The DRuby book actually provides a lot of kind of fun examples and shows kind of some of the other weird things you can do with DRuby. So, that may help get some of the juices going on in order to sort of what's possible with DRuby including some really interesting concepts where it’s actually passing standard-in and standard-out between the two objects as well which macro has enough does work even though I feel like it shouldn’t.
[Laughter]
CHUCK:
That’s crazy.
JOSH:
You could write to standard-out on a different machine.
JAMES:
Yes, you can. That’s one of the examples in the book. DRb recognizes it and does a proxy across the wire and then they end up writing the proxy which then it shovels the calls back to the client, does the writing on the client.
JOSH:
That sounds like a lot of opportunity for mischief. [Laughter]
JAMES:
It’s awesome. One other point on that DRuby book, we talked a lot about durability earlier. And the author of DRuby has another system called Drip. And Drip is actually can almost be thought of as a durability ware on top of Rinda. So, it’s pretty cool.
JOSH:
Hey, did we actually reference the DRuby book on the call? I know we talked about it on the precall. Did we give people a national reference to the book?
CHUCK:
We just did. We’ll put a link in the show notes.
JOSH:
Okay. So, who wrote the DRuby book?
DAVY:
Masatoshi Seki wrote the original DRuby book but that was in Japanese. The translation came out more recently and it was translated by Makoto Inoue.
JOSH:
Okay, cool. And that’s Pragmatic Programmer’s Book?
DAVY:
Yup.
JOSH:
As long as we’re talking about books, there's another book called Distributed Programming with Ruby by Mark Bates. And it’s one of the Addison-Wesley red cover Ruby series books. It’s a pretty lightweight book, just over 200 pages. I don’t have the DRuby book even though James says it’s awesome. But the Distributed Programming with Ruby is more of an overview of a lot of different distributed technologies. So, it covers DRb and Rinda and a bunch of different things. I mean, it’s more of a get the lay of the land kind of thing.
JAMES:
I've got one more question for Davy. We’ve talked a little bit about how to use DRb and I think people are probably familiar with some issues we see like in RSpec and stuff like that. But my opinion is that it doesn’t get used as much as you think it would. Why do you think that is?
DAVY:
That’s a really good question. And kind of this could spin off and do a whole other discussion if we wanted to which I don’t think we do. But I think one of the main things is that if you're a Ruby programmer, there’s probably a good 90% chance or 95% chance that you're a Rails developer. And if you're using just doing standard Rails web development, I don’t think there's often no need, no use for this sort of tool. Whereas if you're doing more pure Ruby work, that’s kind of where a lot of these built-in tools would really help out a lot. And so, I would love to see the Ruby community start using Ruby in a much more broad sense rather than sticking with the Rails silo. And I think if we did that, if we found other cool things to do with Ruby that we would see the uses of things like DRuby and Rinda and Drip and all of these things. We’d really be able to take advantage of those on a much more broader sense.
JAMES:
Great answer.
CHUCK:
Well, this seems like a good breaking point. Let’s go ahead and get into the picks. Avdi, what are your picks?
AVDI:
I think I just have one which I don’t think I've picked before. There's a service called CloudFare which I've been using for a while now. And it’s basically just a CDN. But their basic service is free. And the cool thing about it is that if you have some site that’s mostly all static content, you can throw it on like cheap posting on DreamHost or single Heroku instance and then get a free CloudFare account. And basically make it traffic proof because they are the bunt of all the traffic. And they do a lot other cool stuff because since they actually process quite a lot of the Internet’s traffic at this point, they're able to spot people that are attacking other sites. And then, those people will automatically be turned away from your site as well. So, useful service when you're trying to host static content.
JOSH:
That’s cool. I remember when they first appeared, they had some issues with SSL and HTTPS. Have they resolved that? Can you now use it with an HTTPS interface?
AVDI:
Yes. As a matter of fact, that’s actually one of the other things that’s nice about them is that they're a particularly convenient way of slapping SSL on top of a non-SSL site.
JOSH:
Cool.
AVDI:
Yeah. If you don’t even want to think about SSL with Heroku, you can just put up a regular site and then put CloudFare SSL on top of it.
CHUCK:
Cool. James, what are your picks?
JAMES:
Keeping in with our theme of this episode of cool things getting in Ruby Standard Library, I found this awesome article on TSort the other day which is a topological sort that is in Ruby Standard Library. And I've known it was there for like ever and ever but I just never knew how to use it actually or what it was. And there’s a cool article from Adam Sanderson on how to use it. And it’s actually pretty darn cool. Then, just some more fun stuff. I've run into some crazy cool videos lately of kids talking about their schooling and stuff like that quasi-hacking related. And so, one is hack schooling which is about this kid who kind of has his own method of learning that he basically pulled from Hacker Culture. And it’s really cool. The other video that you shouldn’t miss is this 13year old girl who learned how to program Python, builds the Game of Life, and she goes through these natural steps from outputting ASCII in the terminal up to a GUI output. And she doesn’t stop there. She gets to Raspberry Pi and hooks it up to a bunch of LEDs to doing actual OED display of the Game of Life. It’s just out of control. Thirteen-year girl, hacking software and hardware, awesome stuff. Those are my picks.
CHUCK:
Nice. Josh, what are your picks?
JOSH:
My first pick -- I got a couple. I’ll go through them quickly here. First pick is Errplane. It’s Errplane.com. And former guest Rogue, Paul Dix, this is his new startup. And it looks like he’s gone off to compete with the likes of New Relic and Wabato and a bunch of other people all at the same time. So, Errplane is basically an app monitoring for your Rails application. And it’s not a difficult thing where it just hooks into your server processes and collects data and ships it off to their servers and then you can go to their website and take a look at all of your data displayed in very pretty pictures. It’s still very early days, they're in beta. But so far, I'm liking it. It was really easy to set up and I like the UI, it’s very simple. And they're trying to give you much more detail about what you're seeing in the data and how to be more useful for low level stuff than say, New Relic does. But so far, it’s good. I've been liking it. The next thing that I have is somebody responded to a tweet of mine with this. These are version badges for projects on Github. And Olivier Lacan pointed me at this. And so, basically, you drop this little snippet of code into your Read Me on Github and it puts a uniform appearance of what Ruby gem is that your repos for and what version it is and the link to the Ruby gem’s dot org page and all that. So, this is something that I was wanting Github to do. But apparently, somebody did it open source and it was pretty awesome.
JAMES:
Now, Github can just buy them.
JOSH:
Yeah. [Laughter]
JOSH:
Yeah. Isn't that what open source is for? [Laughter]
CHUCK:
Yup.
JOSH:
Right. And then, I have kind of a fun, geeky one discovered on the call earlier that people don’t know about, Scott Kim and his Inversions. So, there's this style of textual artwork called Ambigrams. I think Douglas Hofstadter coined that term. And it’s sort of like a visual palindrome where you’d draw a word reflected or rotated around to get some symmetry. And it reads the same either upside down or backwards or rotated about. And there's some really -- Scott just did some amazing work, trying some really cool things. And it’s kind of one my goals in life to have a company logo that’s an Ambigram. Not a great goal but it’s there. Anyway, Scott just did a lot of really cool stuff with these and it’s worth going and checking out the pages if you want some cool visual geekery.
JAMES:
The Ambigrams are well-know from Dan Brown’s Angels and Demons book. They were the like brands that the illuminati will brand in people left with these ambulance.
CHUCK:
Yeah.
JOSH:
I don’t read Dan Brown but that sounds pretty cool.
CHUCK:
It was pretty cool.
JOSH:
If I ever got a tattoo, it would probably be one of Scott’s designs. They're very cool. Okay. So, that’s it for me this week.
CHUCK:
Alright. Katrina, what are your picks?
KATRINA:
So, we announced today that the episode was about Distributed Regex. And so, the whole episode has been about distributed and I'm going to add the Regex part.
CHUCK:
[Laughs]
KATRINA:
The 2013 MIT Mystery Hunt was on awhile back and it’s over. But one of the puzzles that came out of there was a Regex Puzzle. And the link is no longer up on CoinHeist.com. So, I'm hosting it, the PDF of this puzzle. It is just fun. So, look it up.
JAMES:
It’s awesome.
CHUCK:
Nice. So, I've got a couple of picks. My first pick is an app called Hazel. It’s basically, you can think
of it as kind of like your Inbox Rules for sorting your Email except it works on file system folders. So, I had a big hunk in mess of downloads folder on my computer. And so, I got Hazel and I pointed it at my downloads folder and then I just kind of worked through things and started setting up rules for the different types of things that were in there. And before too awful long, I had cleared out about 50 Gigabytes of crud that had been sitting in there. And it put all the stuff in the right place. So, just to give you an example, if I download a zip file, then Hazel will find the zip file and it will automatically unzip it for me. If that generates anything like a .DMG or a .APP, then it will put those in the right place so .APP will automatically be put into my applications folder. The .DMG will automatically put it in with all of the other installers. The same with like if I get a PKG file. If it’s an image, then it will put it over with the other images in a folder that I just go sort through periodically. But then, there's a junk drawerful of unsorted images as opposed to junk drawerful of unsorted everything. And so, I've been really, really happy with it. Just super excited about the options. I need to point it at my documents folder now and get that cleared out because I started moving stuff over there and documents folder became another junk drawer for me. So, I'm working on that. Another thing that I've been playing with lately is the Amazon Web Services. And I've been really happy with the Amazon S3. I've been putting all the materials up for ‘Rails Ramp Up’ up there so that my students can get at them and it seems to work really, really well. And I've had no complaints about people being able to get the stuff. And I don’t have to upload it to a server where I have more limited space. So, AWS-S3 is my other pick. Davy, what are your picks?
DAVY:
So, we’ve already talked about it pretty much throughout the whole podcast. But my first kind of official pick is the DRuby book. It’s such a cool book. I really like the tone. It’s really nice and accessible. And one thing I do really like about it is it walks you through a lot of the error cases and it walks you through examples where things fail in addition to how to fix those issues. So, that’s one thing that I really like. So, if anyone is interested in trying some of these out, check out that book. Second one is I wanted to point out the software I've been using to generate all of my slides for my presentations. And this was what I used for my RubyConf talk which is Reveal.js. And I really like it a lot because PowerPoint is terrible. People still fight with Keynote and at least with Reveal.js, you're fighting with HTML which will know how to love to fight with. And so, I really like the power and flexibility that it gives and also gives cool transitions that I like that aren’t too over the top. And the last pick that I wanted to send out is non-technical in general. It’s a sweet music video by a band called Darlingside which is a group of guys that I went to college with and they're doing awesome things now. And it’s called ‘The Ancestor’. And you can find it on Vimeo. It’s a beautiful song, beautiful video. I think more people should see it.
CHUCK:
Nice. Alright. Well, we’re going to go ahead and wrap up the show. Thanks for coming, Davy, really appreciated it. It’s been an interesting fore into a topic that I hadn’t really explored.
DAVY:
Yeah. Thanks for having me on. It was really fun to chat about all this with you guys.
CHUCK:
Alright. Well, next week, we are talking about something that’s just going to be us. There's not going to be a guest. We’re still discussing what we’re going to talk about. But it looks like we might talk about Ruby 2.0, it just depends on whether or not we all get a chance to play with it. But anyway, look forward to that. Go sign up for Ruby Rogues Parley. And we’ll catch you all next week.