Show Notes
02:10 - Brian Underwood Introduction
02:55 - Neo4j
04:31 - Graph Databases vs Traditional Databases
06:02 - Relations Have Directions
06:58 - Modeling a Domain as a Graph; How it Works
13:25 - Built-in Query Processor
15:04 - Neo4j.rb => ORM; OGM
- Mongoid Influence
18:06 - Declarative Schema
21:09 - The Ruby Client vs The Java Client
25:48 - Use Cases
35:53 - Who is using Neo4j?
38:42 - Challenges as an Open Source Maintainer
39:44 - Funding Neo4j
41:00 - Working Abroad
42:16 - Getting Started with Neo4j
Picks
Elle Luna: The Crossroads of Should and Must (Jessica)
Lynda Tutorials (Avdi)
How to Win Friends & Influence People by Dale Carnegie (Avdi)
Marked 2 (Coraline)
Fund Club (Coraline)
RubyTapas #334: Rspec Compound Matchers (Brian)
Pyrosomes (Brian)
Americapox: The Missing Plague (Brian)
Lynda Tutorials (Avdi)
How to Win Friends & Influence People by Dale Carnegie (Avdi)
Marked 2 (Coraline)
Fund Club (Coraline)
RubyTapas #334: Rspec Compound Matchers (Brian)
Pyrosomes (Brian)
Americapox: The Missing Plague (Brian)
Special Guest: Brian Underwood .
Transcript
You’ve reached Charles Max Wood at DevChat.TV.
CORALINE:
Oh, my God!
[Laughter]
If you want to get hold of me, the best way is email chuck@devchat.tv.
CORALINE:
Alright, I hung up on Chuck.
Oh dang it! I wanted to leave him a message.
[Laughter]
[This episode is sponsored by Hired.com. Every week on Hired, they run an auction where over a thousand tech companies in San Francisco, New York, and L.A. bid on Ruby developers, providing them with salary and equity upfront. The average Ruby developer gets an average of 5 to 15 introductory offers and an average salary offer of $130,000 a year. Users can either accept an offer and go right into interviewing with the company or deny them without any continuing obligations. It’s totally free for users. And when you’re hired, they give you a $2,000 signing bonus as a thank you for using them. But if you use the Ruby Rogues link, you’ll get a $4,000 bonus instead. Finally, if you’re not looking for a job but know someone who is, you can refer them to Hired and get a $1,337 bonus if they accept the job. Go sign up at Hired.com/RubyRogues.]
[Snap is a hosted CI and continuous delivery that is simple and intuitive. Snap's deployment pipelines deliver fast feedback and can push healthy builds to multiple environments automatically or on demand. Snap integrates deeply with GitHub and has great support for different languages, data stores, and testing frameworks. Snap deploys your application to cloud services like Heroku, DigitalOcean, AWS, and many more. Try Snap for free. Sign up at SnapCI.com/RubyRogues.]
[This episode is sponsored by DigitalOcean. DigitalOcean is the provider I use to host all of my creations. All the shows are hosted there along with any other projects I come up with. Their user interface is simple and easy to use. Their support is excellent and their VPS’s are backed on Solid State Drives and are fast and responsive. Check them out at DigitalOcean.com. If you use the code Ruby Rogues, you’ll get a $10 credit.]
CORALINE:
Hello and welcome to Ruby Rogues episode 236. Today on the panel we have Jessica Kerr.
JESSICA:
Good morning.
CORALINE:
We have Avdi Grimm who's currently putting a baby to bed and will not say hello at this time. And I'm Coraline Ada Ehmke. Our guest today is Brian Underwood. Brian, do you want to introduce yourself?
BRIAN:
Yeah. Hi. So, my name's Brian and I'm a Developer Advocate at Neo Technology among other things. I am also one of the maintainers of the Neo4j.rb gem. And yeah, I love programming with Ruby and I love Neo4j. And that's part of what I'm here to talk about.
CORALINE:
Awesome. How long have you been doing Ruby?
BRIAN:
So, let's see, since 2006-ish. That's probably how long. So, almost about 10 years, jeez.
CORALINE:
Did you get started with Rails or with pure Ruby?
BRIAN:
I did pure Ruby for a year or so, and then I got a job on Rails. And so, that was kind of interesting, learning the magic of Rails at the same time I was learning the magic of the application I was working on.
CORALINE:
Cool. So, you worked for Neo4j and we're talking today about Neo4j. Can you give a brief overview what Neo4j is all about?
BRIAN:
Yeah, definitely. So, Neo4j is a database. And unlike other databases where you've got tables with rows or columns or you would have a collection with documents or some other paradigm. Neo4j is a graph database, which means that it stores data in nodes and relationships. Mathematically from graph theory, it's like vertices and edges. But Neo4j calls them nodes and relationships. And both of those things are key-value stores, much like a document.
And in fact, it has some advantages in the traversal of relationships between objects because if you can find a particular entity, then you can find the things that are related to it very, very quickly, as opposed to doing join on a table for example. And so, that's one of its big advantages that a lot of people talk about, is the speed at which you can do a lot of computations of things that have complex relationships. But one of the things I actually like it for a lot is just because it's a really nice, natural way to represent data and it tends to just feel really nice, which as a Ruby programmer is kind of a big advantage.
CORALINE:
Right. I remember reading in the O'Reilly book on graph databases that relational databases do relationships poorly, which seems kind of ironic given their name.
BRIAN:
Right. [Inaudible], I actually didn't understand that for a while. Someone explained to me recently that that's the mathematical or computer science term of relation, which means something different than relationships.
CORALINE:
Cool. So, what are some typical use cases for a graph database over a traditional relational database?
BRIAN:
Right. So, graph databases are really good at things where you're browsing out a number of relationships. One of the big examples that people use is like social networks. If you want to find friends of friends, you could do that very quickly as opposed to a relational database where you might have to query a table, then query a join table, then query that same table again, and back and forth, and back and forth.
And it's also good at finding things like where there are patterns in data. So, another big use case people talk about is fraud rings where you might have five different people connected where two or three of them have the same address and some of them have the same phone number. And so, they're all connected in different ways and you can look for, essentially what Neo4j does is it allows you to specify a pattern that you want to look for and then it helps you find that pattern.
CORALINE:
And the query language is pretty neat in that you draw the connections with ASCII.
BRIAN:
That's right, yeah. It's a bit hard to explain in audio. But basically, you're drawing arrows. You're drawing parentheses to surround the nodes that you're defining, that you want to look for. And then you draw arrows with hyphens and greater thans [and less thans]. Just specify the relationships that you want to look for going between them. And then once you've defined a pattern then you can say what you want to do with that. And that turns them to more of a table structure that you can manipulate and work with.
CORALINE:
And the 'Relations have Directions', can you explain what that's about?
BRIAN:
In Neo4j, relationships, like you say, are directional. And so, they always point from one node to another node. And sometimes you don't necessarily care about the direction. Like if you just want to find, you might have something where like 'Brian is a friend of Coraline's and Coraline is a friend of Brian's'. And those independent relationships you can create. But if you wanted to, if you didn't care about that when you query, you can just not specify the greater than or less than, and just say you want a [bidirectional] match. The way that Neo4j works is that it can browse any direction without any performance difference.
CORALINE:
So, it's not a matter of foreign keys, for example, and having to go one way through the sequel.
BRIAN:
Yeah. That's exactly right. In fact, some [inaudible] that I see people using a foreign key in a graph database and that's kind of, it's usually a code smell.
CORALINE:
So, a graph database is like a very different way of thinking about our data models. What's the hardest thing for people to grasp when they're first setting out to model their domain as a graph?
BRIAN:
That is a good question. So, one thing that takes a little getting used to is that nodes, at least in Neo4j, have the concept of labels, which are analogous to tables in that this is what you sort of dig into to search for something. But a node can have multiple tables. And so, it's… a cross [inaudible] thing that you could do. For example, a node can be both a person and a teacher. So, you can search for all the teachers or you can search for all the people. And so, that's some getting used to, to work with.
Other than that, the thing that most people I think have trouble with is when [querying the data], that they need to, they're looking for matches and figuring out how to look for those matches because you're used to just pulling a set of rows out of a table. Whereas what you want to do, you just want to, you look for this, it might be multi-armed pattern that you want to pull variables out of and then turn that into a table. And [visualizing it] in your head I think can be tricky to understand.
JESSICA:
Oh, that's really, that's a different way of thinking about it. But can you give us a more concrete example? Like maybe, you know the canonical Rails implement a blog in five minutes, what would that data, the people have blogs which have entries which have comments, what would that look like in a graph database?
BRIAN:
[Right]. So, you might have a post label, which represents your posts. And there'd be nodes. Each node represents a post. And that might have a title and a body and a 'created at', 'updated at'. And then you might have a comment label which would have comment nodes. And you might then have a relationship between comments and posts.
And so, that could actually, this [is another] thing that some people have a little trouble wrapping their minds around, is you can go either way you want to just depending on what you're comfortable with. So, you can say a 'comment belongs to a post' or you can say a 'comment comments on post', this is the type of the relationship is the term. The label for the relationship is called the type. But you can also have a 'post has comment comment'. And then the relationship in that case would be 'has comment'. So, you can go either way. And it really is just up to you to put your own semantics on that.
JESSICA:
When you create that, would you create a post node, create a comment node, and then create a relationship between them?
BRIAN:
Yes, exactly. That's what you would do. Usually, if you were creating a comment, you would create a comment and then usually you would immediately create the relationship to the post because it doesn't really make sense to have a comment without the thing that it's commenting on. But oftentimes you probably have experienced in different Rails applications like one comment pattern is we want to be able to have comments on everything. So, you have a comment, comment nodes, and a 'comment on' relationship that points to a post. But maybe it also points to a person saying that someone commented on this person. Or even the could comment [inaudible] on that. It's a lot easier to do in a graph database.
JESSICA:
So, you could comment on comments.
BRIAN:
Yeah, you can comment on comments, definitely. You could make that the same way you can say 'comment comments on comment', and have hierarchical comments. Or you could say 'replies to' and you could make that a different kind of relationship. [It depends] on how you want to approach it. But you definitely could.
JESSICA:
You could comment on comments on two different blog posts and relate them to each other.
BRIAN:
Yeah, indeed. Yeah, you can, it's crazy. You could do anything that you want.
CORALINE:
One of the cool things about that is that, like in a traditional Rails app with AR you'd end up with polymorphic associations which are just a nightmare, frankly. So, I like the fact that with Neo4j you don't have to, the relationship itself is a first-class entity, the edge. And you can just create what those relations are and name them, which I think is very important, and just have everything work.
BRIAN:
Exactly. And actually, in the Neo4j gem, it's [inaudible] that there's not really a concept of polymorphic associations. There's just 'has one', 'has many'. And 'has one' just imposes an artificial limitation that you're just going to have one object coming out of it. But the graph database can let you have as many as you want.
CORALINE:
I've done some work with Neo4j and one of the things I liked so much about it is that the relationships are named. So, it's not just a 'has many blah, blah, blah' but it's like, 'describes' is the name of the relationship. So, it makes it semantically much more rich. And I think it makes those relationships so much more clear than Active Record.
BRIAN:
Right. One of my favorite things about a graph database is that, say in our case with posts and comments, you also have a person label. And the person nodes point to both posts and comments with a 'created' relationship. So, 'person created a post', 'person created a comment'. So, now you have a relationship, a transitive relationship between a person who created a post, and the post has a comment, and a comment has a person who created the comment. And so, if you wanted you, you can say that, take a specific person and find all of the posts that that person has created, and find all of the comments on all of those posts, and find all of the people that created all of those comments on all of those posts for that person who made the post. And with one query, you could go from one end to the other and you'd say, return all of the people that have commented on all of my posts and give me a count of the number of times they've [commented] on my posts.
CORALINE:
That's a really powerful query language. It's called Cypher, right?
BRIAN:
That's right. If you're using Java or [inaudible], JRuby, you interface directly with the Java APIs which give you some [inaudible] performance and control in some cases. [Cypher is] a really powerful way to query [inaudible] in an SQL-like way.
CORALINE:
One of the things that impressed me about Neo4j when I first started experimenting with it is the built-in query processor that you get running on a local host port. Can you talk about that a little bit?
BRIAN:
Yeah, so you mean the web UI that you type things in and see.
CORALINE:
Yeah, that amazing web UI.
BRIAN:
Totally. Yeah, that's definitely the thing that I think everybody falls for. The thing that really struck me after a little while of using it is it's like looking at your data under a microscope, kind of literally, right? You type in your queries and that if you're returning whole nodes and relationships as part of your query, it will display them in a graphical, visual, graph interface. And I want to clarify here that I'm not talking about graphs in the sense of pie charts or bar charts, but like nodes and relationships connected [together] and showing all that stuff. And you can [drag them around], click on individual nodes and relationships and see the properties of them, even changing style things and change the colors of the nodes. So, that's a really fun way to experiment and [inaudible]. I [inaudible] it a lot of times to the Ruby IRB or Rails console session, because you can just iterate and iterate.
CORALINE:
Yeah. It's really neat being able to see the relationships between your data, especially when you're starting out and just doing your model. I think seeing the relationships graphically goes a long way toward, at least for me, increasing my understanding of what I put together.
BRIAN:
Definitely.
CORALINE:
And the drag and drop is bonus. It's cool to be able to drag a node round and have everything react to it and it's all springy and cool. That's a really neat thing for me to experience. So, you wrote the Neo4j.rb library which is essentially an ORM. Do you want to talk about that process a little bit?
BRIAN:
Yeah, definitely. Full disclosure, I took over the project from a guy named Andreas Ronge, who is Swedish. And I worked with another guy named Chris Grigg, and both are great guys. And so, the process of creating the gem, it actually started off, because Neo4j originally was all very [Java] focused and working through [Java] APIs. And so, that gem has been around for a really long time and was the JRuby Java APIs gem. But about a year or two ago, Andreas and Chris and I all got together in Malmö, Sweden. And we hacked on version 3.0 of the gem, which turned it into this sort of ORM thing that could be used both in JRuby directly in embedded mode with Neo4j…
CORALINE:
Can you define ORM real quick?
BRIAN:
Yeah, totally. [Inaudible] Object Relational Model is the typical thing. We actually call our project an OGM, or an Object Graph Model. But it's a way of putting classes and objects in front of your database to represent the data. And so, in Active Record that's rows are objects. And in Neo4j, nodes and relationships are objects. Yeah, so we made this OGM as I said. And it can work embedded in server modes and it allows you to query for the data in the database much like you would with Active Record or Mongoid or other ORM tools. But it gives you some of the power of Neo4j. And also with the ease of use that you might expect from [Ruby] or something like Active Record.
Like for example, that query I talked to you about before where you went from a person to a post to a comment to a person, you could if you started off with a User object in Ruby, you could say 'user dot posts dot comments dot creators' or something like that. And then at the end, that would [inaudible] out all the creators without having to [inaudible]. If it's very compact, that's even more compact, which is really nice.
CORALINE:
It seems to me that with declarative… like in Neo4j.rb when you're creating your classes, when you're creating your maps, the schema's declarative. You're actually defining what the attributes are right in the model file. Was that really heavily influenced by Mongoid?
BRIAN:
Yeah. Yeah, it was. I think just because Neo4j is a schema-less database and so, we had to… I don't know if we necessarily had to do that. But I think it was just sort of like maybe [assumed]. It's like, “Oh, that's what Mongoid does. And nodes are similar to documents. So, we're just going to do that,” yeah.
CORALINE:
I think that's… so, Mongoid is an ORM for MongoDB. And that schema-less nature is something that Neo4j shares with that particular MongoDB implementation.
BRIAN:
Yeah.
JESSICA:
So, are you saying that there is a schema defined in your application somewhere in a model file?
BRIAN:
That's right. Yeah so, when you define a model, if you were defining the Post model, then you might call the property class method in that, kind of like you were defining an association in Active Record. You say property and then you give it a title symbol. And then you might say property and give it a body symbol. And you can even say what type you want it to be. So, you want to make sure that this is always a string or an integer or whatever. And it will take care of it for you.
JESSICA:
Does Neo4j have that kind of typing?
BRIAN:
It does. Properties don't necessarily care what type they have. But there are types in the system. So, if you save something as an integer, it's going to come back out as an integer. But it doesn't… you can then save it as a string later if you want to, because it doesn't really care.
CORALINE:
What do you think are the advantages of having a declarative schema like that are?
BRIAN:
I think one of the advantages I thought of is if you're using a Neo4j database with some other application and your Ruby or Rails application is just accessing certain nodes and certain properties, then you're just working with certain parts of the database. And you don't necessarily have to concern yourself with the rest of it.
CORALINE:
Just the subset, you mean.
BRIAN:
Right, exactly. Although at the moment, one thing that we actually would like to fix is that right now it returns all of the data for all of the nodes, which can sometimes be inefficient. So, it would be nice to be able to change it so that it returns just the properties that you define on your model.
CORALINE:
Yeah, that would be pretty cool. I find the declarative schema… actually, Mongoid gets a lot of hate. MongoDB gets a lot of hate. I find it really useful when I'm prototyping because I don't have to have the overhead of migrations when I change my mind about the data model. I can just make a change in the actual model file itself. And I think that's an important thing with Neo4j too. Especially when you're starting out and you have to think about modeling differently than you do with Postgres or what have you, with Active Record. So, having that forgiveness, that flexibility to define things as you go, I find really helps a lot.
BRIAN:
Yeah, definitely. And I think the thing that also where I think where Neo4j goes beyond is that it's schema-less in the data sense but it's also schema-less in the structure sense so that you can have these relationships and you can change them just as easily as you can properties. So, that's I think, I think that's one of the big advantages that maybe a lot of people don't necessarily think about.
JESSICA:
Right, because in SQL we did that kind of thing too, relating comments to other comments and it would be a link table that hooked the two together, which I suppose corresponds to a relationship or an edge as a first-class object. But it was a pain in the butt and it was slow. And so, you hardly ever did it unless you absolutely had to.
BRIAN:
Totally.
JESSICA:
Whereas in Neo4j, that's your core model.
CORALINE:
Yeah, with Active Record you end up with a module called 'commentable' and a polymorphic join and it would be really nasty.
BRIAN:
Exactly.
JESSICA:
How does the Ruby client differ from the Java one? What makes it Ruby-ish?
BRIAN:
Do you mean the Java client in the sense of the Java APIs in Neo4j or the lower levels Ruby thing in the gems?
JESSICA:
The Java APIs.
BRIAN:
Right. So, the Java APIs are there if you want to say, “Okay, I'm going to find a particular object by an index and now I want to query for its relationships by this type.” And now I get those objects back and I'm going to take a couple of those and traverse them further or look at their properties or whatever. It's a very declarative and step by step in the Java APIs.
JESSICA:
I would call that imperative, the way you described it, not declarative at all.
BRIAN:
You're absolutely right. It is imperative, not declarative. Cypher is more the declarative side. I…
JESSICA:
Cool.
BRIAN:
Was thinking of the wrong thing. So yeah, and it's very imperative and it's doing step by step. And that has the advantage that it's much faster. Because you're also working in embedded mode so you're having direct access to the files from within your Ruby process, which can be very, very fast to [sort] lookup nodes and then traverse lookup nodes, and traverse. And it also has somewhat higher level tools. Like there's a traversal API where you can specify how you want to go places and things like that. And you can also make Cypher queries through the Java APIs but you have these different levels of APIs where you can work with the lowest of the low levels, which is the fastest, or the highest of the high levels, which is more convenient to work with but slower. And then on top of all that is Ruby where you're essentially, you're generally making Cypher queries and then it's turning into Ruby objects and that's going back and forth between the database.
Although you can do the Java APIs and turn those into Ruby objects if you want.
JESSICA:
So, you have a lot of options. You can make your Ruby look like Ruby or you can get more imperative.
BRIAN:
Yeah, definitely. Although there is the trade-off that if you want to work with the Java APIs you have to be in JRuby obviously. And the other trade-off is that your Ruby process then becomes your database server in that it's managing your database files for you. And so, if you wanted to for example start up your database in that web UI that we talked about before or just make command line queries to it, you have to shut down your server and then you can connect to it with this other process, because only one process can connect at a time.
JESSICA:
Wait, wait, what? Only one process can connect at a time?
BRIAN:
Only the files for the database have a lock on them so that only one process with work with those files at a time. Because if two processes were working on them, you would get data [inaudible] corruption. And so, if you're working in the JRuby embedded mode, you need to be accessing those files directly.
JESSICA:
So, that mode is like the 'I am the database server' mode?
BRIAN:
Basically, yeah.
JESSICA:
Because normally, you would have one process accessing your database files and it would be the database server process.
BRIAN:
Right. And so, if you download Neo4j then that has binaries that will run your database for you on its own process. And you can connect that in server mode. But if you want this extreme performance and you are okay with working in JRuby mode then you can take that avenue.
AVDI:
So, you're saying you can embed it in-process if you really want to, sort of.
BRIAN:
Exactly.
AVDI:
I guess similar to like the way you embed SQLite in-process.
BRIAN:
Exactly, yup.
AVDI:
That makes sense. It's nice to have that option.
JESSICA:
Yeah, yeah. If you need to do a really fast migration or something.
AVDI:
Well, and especially if you're using it as like a local data store for an application. I don't know how many people do this. But if you're using it for like a desktop app or something like that where it's just the local data store.
JESSICA:
Right, because if I loaded say my company's employees in as nodes and started creating all the Slack messages between them as edges and then tried to measure which ones were more connected to each other and to which GitHub repos which are also nodes and every commit is an edge, yeah I could do that locally for my personal exploration on have it be both ridiculous and possibly even fast.
BRIAN:
Right. I think one of the other things is that a lot of times people use Neo4j for analytics of entities and relationships. And so, a lot of times you just need one process to chew on your data.
AVDI:
I'd actually kind of like to talk a little bit more about use cases, if you don't mind.
BRIAN:
Mmhmm, sure.
AVDI:
So, like a lot of the data models that we've been talking about just as examples so far have been the kind of things that in the Rails world we typically think of as, “Okay, this is the stuff that we, this is our main data that we store in some SQL store like Postgres.” Do you see Neo4j as being an alternative, like your primary data store, for a typical application?
BRIAN:
So, I often…
AVDI:
Like, system of record is I guess… I'm sorry, system of record is the term that I was looking for and failing to find.
BRIAN:
Yeah. So, it can definitely be that. And in my opinion and experience, the thing that brings in that case is it's much easier to use and sort of how we've been talking about, easier to change things and iterate on things. But definitely a lot of people use it as something where they might already have a traditional database but they want to mirror some of their data for graph database processing. Or they might have certain parts of their data that they want to store in Neo4j and some parts that they might want to store in something else. And they would have links between them. And that way, they can run these graph queries whenever they want to. So, you can run the gamut there.
AVDI:
If someone didn't have any kind of legacy to deal with and they decided to just go ahead and make Neo4j their primary database, is there anything that they would run into where they'd be like, “Oh, this is the thing that would have been easy on our SQL store but now it's hard,” or now it's slow, or something like that?
BRIAN:
Yeah. I think there are definitely some cases. Like the big case that I often think of is if you have logging of data where you're just writing similar things over and over and over again. Or you're just writing lots of numbers that couldn't necessarily be considered entities to themselves, they're just data that you're storing, that' snot so much the ideal use case. But a lot of the applications I think that I see people making, like web applications where you're saying you have entities of people and events and posts and all these things, it I think works pretty well for those use cases, yeah.
AVDI:
So, what you're saying is don't use Neo4j as Redis?
BRIAN:
Yeah, definitely, yeah.
AVDI:
Cool, thanks for that clarification. But you do feel like you can model a lot of the typical business objects using Neo4j in place of a traditional SQL store?
BRIAN:
Yeah, definitely. And the other thing that I think people find a lot is when they store their data in a graph database, they often find relationships that they might not have otherwise exposed, because it's so easy to just create them. You can just throw things in there. And so, you can have relationships [between] things and browse them whereas before it was like, “Oh, I have to create another table or a foreign key, or whatever. That's not a big deal. It's not something we'll deal with.”
JESSICA:
Right. I guess that…
AVDI:
That's actually kind of cool.
JESSICA:
Yeah, that is cool. It probably in business software, it's a lot less common to create a new object, a new kind of thing, than it is to recognize more relationships between the concepts you already have in your system.
BRIAN:
Right.
AVDI:
Yeah, I mean that excites me a little bit because well to me, the term 'schema-less' is kind of meaningless because there's always a schema even if it's implicit. But what's interesting to me is the concept of schema pluralism where you don't view it as there being a primary schema and then maybe we also have some reports that view it differently. But the idea of a more real-world perspective on things where there really are very different views of the data and no single one of them is like the sole correct schema, if you're looking at it from a different department in the organization there might be a different appropriate schema to view things from.
BRIAN:
Totally. And I love the idea of being able to, you might have different applications working on the same database but you might have an Employee object that has, one application cares about certain properties, the other application cares about certain properties. And they still, they can browse out to all the other entities that they care about, or if they don't want to have the same entity they can still have those two things and have a relationship between them that says, “These are actually the same thing, by the way.”
AVDI:
Do you then need some kind of procedures or something just to make sure you have a stable set of terms for properties? Like, what do we call this relationship? What do we call this property? So you don't find yourself with different groups calling the same property by a slightly different name, or the same relationship?
BRIAN:
That's a good question. I think I'm more in the excitement phase of the idea. But I haven't actually done it myself.
AVDI:
Because this is kind of the problem that a lot of organizations have tackled with having controlled vocabularies or organization standardized vocabularies, because of that kind of proliferation of terminology for similar things.
Yeah, indeed.
CORALINE:
Is there any disadvantage to having a single relationship with different labels? Like naming the edge, having essentially what logically is one edge but it has two different names associated with it, two different labels.
AVDI:
Sometimes it's referred to as 'friend' and 'friend of' and sometimes it's referred to as 'buddy of'.
BRIAN:
Right. So, that actually isn't, at least in Neo4j not possible. The relationships can have only one type. Nodes can have different labels, but relationships just have one type. And you could have multiple relationships that go between two nodes, even in the same direction, with different types.
CORALINE:
Yeah, that would allow that use case to happen where it's 'buddy of' or 'friend of'. You just define them as two different kinds of edges, two different kinds of types of relations.
AVDI:
I'm thinking about the scenario where somebody comes in later and wants to do some analytics and they don't realize that the 'buddy of' exists as well because a different group created that relationship. Or it's kind of similar to issues you run into in other so-called schema-less databases where not all of the properties, you know you change the property name but not all the objects in the system had that property name updated and stuff like that. You could miss, when you're doing analytics, you can miss relationships because of these ambiguities.
BRIAN:
Right. I think that feels to me like an organizational and communication problem. And I think maybe the best that you can do is make it so that it's just really easy to fix those things, or query past those things when those come up. So, at least in the case of Neo4j you could query and say, “Okay, I want this relationship to be either this relationship or this relationship. I don't really care.” And so, that you can do. Or you can just run a Cypher query that says, “Everywhere that you see this relationship, change it to this relationship,” and now we're going to standardize.
AVDI:
Can I easily say, given this set of nodes show my all of the types of relationships that exists between those nodes?
BRIAN:
Yeah. You could definitely, you could take a particular set of nodes and browse out to all other nodes, or you could take two different kinds of nodes and say, okay any time you find this node and this other node, find the relationships that are between them. And yeah, you could, in Cypher there's a type function where if given a relationship object, it will tell you what the type is.
AVDI:
Okay. So yeah, I guess the question, what I was really asking is like, I assume you can have wild cards in your queries for the nodes. But can you have wild cards for the relationships as well?
Yeah, definitely. Actually, if you don't specify a relationship, it just assumes that you want to match some relationship.
AVDI:
Okay, cool.
CORALINE:
That's pretty cool, because [inaudible] the relationship could be literally anything, right?
BRIAN:
Yeah, definitely.
CORALINE:
And another thing I think that's pretty powerful is that, like I alluded to earlier, edges are first-class citizens. So, edges actually have properties.
BRIAN:
Mmhmm, yeah. That's actually another one of my favorite features. I think I keep listing favorite features, but that doesn't give it so much meaning anymore. But yeah, I think sometimes a property doesn't really belong on one entity or another. It belongs on a relationship. So, an example I've used on the past is if someone is making a blog post and you want to store what, like if you had geolocation tracking and you wanted to know what location they were at when they made that blog post, you could store that on a relationship because it doesn't really belong on the person. It doesn't really belong on the post.
CORALINE:
Or in the case of a friend relationship, you might store when that friendship was created.
BRIAN:
Right, definitely.
CORALINE:
I can't [under] emphasize how powerful that metadata on an edge has been in my work. So, I'm using Neo4j for an artificial intelligence project. I'm using it for semantic and existential meaning mapping. So, like finding parts of speech in all their forms but also relating them back to context. So, the really fluid nature of relations between entities is really, really great for this sort of application where there's a whole lot of data. I need to be able to query things like, show me all nouns related to love, or something like that. The querying is really, really powerful and the flexibility in the relationships makes it possible to model data in a way that I don't think I could do in a relational database.
BRIAN:
That sounds awesome. I love any time that someone's doing AI or NLP, Natural Language Processing stuff with Neo4j. Because I've done some of that stuff and it's just so nice.
CORALINE:
It's a natural fit, because the graph metaphor is used so much in natural language processing anyway, for dismantling the structure or understanding relations between words and phrases. So, it seems like a really, really natural mapping onto a graph database.
Yeah. And I think the other thing is that so often when I'm working on projects like that, partially because I'm learning and partially because it's just such a complex thing to chew off, I end up having to change my schema and move things around. And so, that's nice to be able to easily change things.
CORALINE:
Definitely. So, we talked about some use cases and we've talked about some specifics about Neo4j and Neo4j.rb. Who is using it and what are some of the applications that people in the real world are finding advantageous with Neo4j?
BRIAN:
One of the big use cases I like talking about, just because I was able to visit them at one point and work with them, is a company called Shuttle which got bought by eBay. And their use case is they wanted to be able to bypass the shipping of UPS and FedEx because these all, these companies have to go through main hubs. And that takes extra time to ship things. And they want to be able to do same-day shipping. And so, they essentially aggregate local courier services in a certain city and then use, they use Neo4j to be able to very quickly analyze a lot of this data. I don't know the exact details. Basically to very, very quickly find a quote from one of these couriers and be able to offer the person who wants to find the product options for same-day shipping. And they got acquired by eBay. So now, I think if you shop at eBay and you do the eBay Now I think it is, that's using Neo4j under the covers.
CORALINE:
I saw a use case again in the O'Reilly book that you get when you download Neo4j, for modeling data centers. Do you know of anyone who's doing that in real life?
BRIAN:
Mm, yes, let me think. I just saw a presentation. So, Neo4j or Neo Technology that makes Neo4j holds a GraphConnect conference regularly. And they had a presentation. They built a tool called MacGyver which they use to track their applications in their data center and be able to say, they have this active and dead pool thing going on where when they deploy, they deploy to the dead pool and then they switch them when they want to deploy a new application. And they also have 130 I think something different microservices. And so basically, whenever they deploy one of these microservices it reports into MacGyver automatically.
And so, Neo4j can track all of the applications that are currently live and what servers they're on, and be able to say, have dependency trees to say, “If this thing goes down, what applications are going to be affected?” And they're also on virtual hosting, I think. So, sometimes that virtual hosting gets moved around. And so, they can say, okay, well if this virtual block goes down, would anything be affected? And they can actually run that query on a regular basis so that they can just have it report back to them and ping them to say, page them to say, “Hey, our servers are in a weird state where they're vulnerable, so you should know about that.” So, that was actually a really cool video.
I definitely recommend that.
CORALINE:
Neo4j.rb, in terms of maintaining the project itself, what are some of the challenges you find as an open source maintainer?
BRIAN:
Having a child is one thing, and just balancing family and open source software. That's probably
pretty typical. But I think communicating remotely with my colleague, Chris, on the project. And we use Gitter a lot, so that helps. But just making sure that we're on the same page and checking in occasionally and making sure that we have the same ideas for road maps of what we want to do. But it ends up being like, oh, we put down all the issues, the things that we want to do. And then we grab things as we want to do them. And then as a major version's coming up, we scramble to be like, “Oh, this will be cool to get it. This will be cool to get in,” and we throw things in there. But it's all just very ad hoc. And that makes it fun but at the same time also makes it a little stressful I guess for making sure that there's some predictable road map for people to be able to follow along with.
JESSICA:
Brian, you've mentioned you are a Developer Advocate for Neo4j. Do they pay you for your work on the Ruby client?
BRIAN:
They don't. So, that's volunteer effort. There have been a couple of times where I've had the opportunity to work on for example a Rails project where I've happened to have to fix something or to make that go along. And so, on the side I've done things. But the Neo4j.rb project is all volunteer.
JESSICA:
What are you doing for money these days? And tell us a bit about your [inaudible].
BRIAN:
So, I mainly work for Neo Tech as a Developer Advocate. And so, that means just being there to help people. I go on Stack Overflow and help answer questions there. On Slack, helping people there, and just helping where I can with people that need help, particularly with Ruby, because that's my area of expertise. But I also, one of the really neat parts of my job is that I also get to play around with Neo4j and find different use cases and interesting things to do with Neo4j and then blog about them and share that out with the world. And so, it's a lot of just experimenting and sharing and loving. [Chuckles]
CORALINE:
Brian, you said before the show started that you and your family are on a two-year travel plan and you're going all over the world. Can you talk a little bit about what you're doing there and also how that impacts your work life?
BRIAN:
Yeah, for sure. So, we're let's see, about 16 months into a two-year trip around the world. And both my wife and I do contracting work for different companies. And we have a three-year-old son who, anybody who has a three-year-old knows that that can be a handful sometimes. But it could be definitely a lot of last-minute planning, probably like any real household. It's like, “Oh, I have a call that I need to be on in the morning. So, can you switch me times?” because we'll, one person will watch him while the other person works, and then switch over. And so, figuring those things out. We've actually been lucky the last couple of places to be able to find daycare in some of these places. So, he's gone to school in New Zealand and Indonesia. And now he's going to daycare in India. So, that's been helpful for getting some more work done.
CORALINE:
That's pretty awesome. So, if I'm a developer who has heard the program and is intrigued by what Neo4j can offer, what's the best way for me to get started?
BRIAN:
So, there's a couple of good ways. There's an article on the Neo4j.com website which is like setting up Neo4j with Ruby, if you're particularly interested in Ruby. And that has an overview of Neo4j and an introduction to the gems and a walk-through on how to create an application. There's also, I made a video screencast series. That was actually inspired by Avdi's RubyTapas series. And I think I was able to get the tight episode style down. But I don't quite have his conversational style yet, I think. But…
CORALINE:
Nobody does.
BRIAN:
So, that's I think a pretty good way, to watch those. Each of those I try to make it not more than four or five minutes. I think one or two of them might have snuck over. And if you want to dive into, there are definitely a lot of general Neo4j.com resources on Neo4j.com in the developer section. But if you want to also dig more into the gem and also to get help, because we love actually helping people, you can go to the Neo4j gems GitHub repository. And on there on the readme we have links to our Gitter channel and a link to ask questions on Stack Overflow with the right tags. And also we have a Twitter account and I think one or two other ways to contact us. But Gitter's probably the best way to get a hold of us. And we love answering questions. And since Chris is in New York and I'm in India right now, we've got 24 hour support.
CORALINE:
Awesome. And I noticed too that when you download the Neo4j server itself, that you get a free eBook from O'Reilly.
BRIAN:
That's right. That's actually a really good book for learning the internals of Neo4j and how the different files work and their [inaudible] stores and all that stuff. I found that really interesting.
CORALINE:
Awesome. Well Brian, thanks a lot for talking to us today. I think we're going to move on to picks now.
BRIAN:
Okay.
CORALINE:
Okay Jessica, do you want to start us off with some picks?
JESSICA:
I'm going to refer back to a blog post that I read a while ago. And it has made a pretty good impact on my life. I don't know if anyone's picked this before. But we'll pick it today, anyway. It's called 'The Crossroads of Should and Must'. And it's about doing things you think other people expect of you versus doing the things that you expect of yourself and being conscious about that. I'll put a link in the notes. I recommend that anyone who hasn't read it, read it. Maybe. Maybe I'll just read it again today. It's that good.
CORALINE:
Okay. Avdi, do you have some picks?
AVDI:
I do, I do. So lately, I've been diving into the Adobe Creative Cloud ecosystem. I'm transitioning some of the tools that I use over to the Adobe stuff. And I have been using the Lynda.com tutorials to figure that stuff out. They're a subscription courseware site and I've found their tutorials to be very high quality. I've really been benefiting from them. To the point, they seem to do a good job of choosing people that know what they're talking about and they've got some actually fairly impressive course delivery systems. You can actually accelerate the speed of videos if they're talking too slow for you. And you have the script of the video below it and the script actually follows along with the video. They've clearly put a ton of work into it. So, yeah for that kind of creative software, a lot of people recommended it to me and I think they were right. It's really good stuff. So, that's my first pick, is Lynda.com.
My other pick is a book, the last book that I finished listening to via Audible. And it is 'How to Win Friends and Influence People' by Dale Carnegie. It's probably the book that I've seen more recommendations for than just about any other book ever. And smarmy title side, it actually lives up to all the recommendations. It really is a classic, just a terrific set of really concrete and straightforward tips and encouragements on how to be a nicer person. So, it's a good book. And that's it for me.
CORALINE:
Cool. I have a couple of picks. I'm currently working on a book, which I'm really excited about. And we are writing it in markdown. And I just upgraded my markdown reader from Marked 1 to Marked 2. So, Marked is a preview app for markdown files. It works in conjunction with a text editor and gives you real-time markdown previews. And Marked 2 has some new features. It highlights word repetition and overused words. It gives you word count, reading time estimation. You can do manuscript format preview and output. It even supports GitHub flavored markdown support with automatic syntax highlighting. And if it doesn't do what you want it to do, you can create custom pre and post-processors as well, which is pretty cool. So, Marked 2 is $12 in the Apple Store. And you can get a free trial from their website, which I'll link to in the show notes.
My second pick, I have picked it before but I want to re-emphasize how important and how great this is. It's called Fund Club. And what Fund Club does is you pledge, when you join you pledge that you're going to give $100 a month to fund pet projects that are by and for marginalized people. Every month, Fund Club sends you a new pick, a project, an initiative, an event, an organization, something focused on diverse communities in technology. You give $100 to that month's selection. They don't manage your money. They don't ask for it. You submit directly to the recipient. And then you simply click a confirmation button in the email to say, “Yes, I paid.”
Last month's project was The Anti-Eviction Mapping Project, a data visualization/digital story-telling collective documenting the dispossession of San Francisco Bay Area residents. They've done a really interesting set of projects this year so far, including one for diverse representation in stock photographs, which was pretty cool, done by women in tech chat, women of color in tech chat, sorry. They've raised close to $40,000 so far this year. And they're a great cause, supporting a lot of other great causes. So, join FundClub.com.
Brian, do you have some picks for us?
BRIAN:
Yeah, I've got a few picks here. So, my first pick is, I watched episode 334 of RubyTapas recently. And I thought it was awesome because I… it's RSpec Compound Matchers which is basically a way to specify multiple conditions for a test, for an expectation, but in a single example. And that was, I think, I feel like it's something I've been looking for, for a long time. And just looking for an excuse to try it out. So, that was awesome.
I also read this story recently about these things called pyrosomes which are, I'll just quote from the article here because I don't think I could do it any better. “These hollow, worm-like entities can grow to a mammoth size. In fact, they have been recorded at lengths comparable to a sperm whale. Even more eerie, they are bioluminescent and will glow when touched. Pyrosomes may look like giant sea worms, but they're actually hollow on the inside. And while they appear to be a single organism, they are colonies of individual creatures that have banded together for a common purpose. Exactly how these massive colonies coordinate their behavior is still being studied, but researchers suspect they communicate through light signaling.” So, there were just so many things in there that I thought were awesome that I had to share that.
And then my last pick, just I think earlier today or yesterday, if anybody follows the CGP Grey channel, I definitely recommend it in general. But he posted a video called 'Americapox' which is about the question, why did the Europeans bring a plague over to the indigenous people of the Americas and why did the people of the Americas not give any plagues back to the Europeans? And apparently it's an argument, a thing that was explained in the book 'Guns, Germs, and Steel'.
But he does a really good job of explaining all of it in video form.
AVDI:
Terrific book, by the way.
CORALINE:
Very cool. Well, I think that wraps up for this week. Brian, it's been a pleasure having you on as our guest. I'm really excited about Neo4j and I hope that this episode will inspire other people to check it out as well. So, thank you so very much for your time.
BRIAN:
Yeah, me too. And thank you so much for having me.
AVDI:
Yeah, thanks a lot, Brian.
CORALINE:
And we'll see everyone next week. Bye.
[Hosting and bandwidth provided by the Blue Box Group. Check them out at BlueBox.net.]
[Bandwidth for this segment is provided by CacheFly, the world’s fastest CDN. Deliver your content fast with CacheFly. Visit CacheFly.com to learn more.]
[Would you like to join a conversation with the Rogues and their guests? Want to support the show? We have a forum that allows you to join the conversation and support the show at the same time. You can sign up at RubyRogues.com/Parley.]
236 RR Neo4j with Brian Underwood
0:00
Playback Speed: