Occams Record with Jordan Hollinger - RUBY 599

Jordan Hollinger is a Ruby developer for over 12 years now. He joins the show to talk about his gem, "occams-record". It is the missing high-efficiency query API for ActiveRecord. He begins by explaining his gem and why he created it.

Hosted by:

Valentino Stoll

Special Guests:

Jordan Hollinger

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

Jordan Hollinger is a Ruby developer for over 12 years now. He joins the show to talk about his gem, "occams-record". It is the missing high-efficiency query API for ActiveRecord. He begins by explaining his gem and why he created it.

Socials

Picks

Jordan - Julia Evan’s Mess with DNS
Jordan - Implement DNS in A Weekend
Valentino - Best Practices for Prompt Engineering

Transcript

Valentino Stoll:

Hey everybody, welcome back to another episode of the Ruby Rogues podcast. I'm your host today, Valentino Stoll, and we're joined today by a very special guest, Jordan Hollinger. Jordan, do you want to introduce yourself and tell everyone why you're famous?

Jordan Hollinger:

Yeah, thank you. It's news to me. Yeah, I'm Jordan Hollinger. I've been doing Ruby for maybe 12 years now. I think I own my career to Ruby and Rails. No framework is perfect, but yeah, it's great and I love working with it.

Valentino Stoll:

Yeah, I'm the same way. Almost exclusively Ruby, with the exception of the JavaScript sprinkles. Right.

Jordan Hollinger:

You do what you have to. Yeah.

Valentino Stoll:

Unfortunately, it's not so bad. We've had the JavaScript, Java folks on the show before, and there's a definite cohesion that's required in the web world, so it.

Jordan Hollinger:

Yeah, I'm a little envious of TypeScript, I have to say.

Valentino Stoll:

Yeah, what do

Jordan Hollinger:

Valentino Stoll:

you

Jordan Hollinger:

wish

Valentino Stoll:

Jordan Hollinger:

Ruby

Valentino Stoll:

about

Jordan Hollinger:

had

Valentino Stoll:

it?

Jordan Hollinger:

the types.

Valentino Stoll:

The time.

Jordan Hollinger:

I wish, yeah, I wish, I know there's the whole Sorbet and the new features in Ruby 3, but I wish we had a little more opt-in type kind of things that there are benefits to that, that I do miss.

Valentino Stoll:

Yeah, for sure. So we invited you on today to talk about this awesome gem you made called Occam's Record. Do you want to talk a little bit about what that is first before we dive into it?

Jordan Hollinger:

Sure. Let's see. I think the tagline I gave it was that it's the missing high performance API for Active Record.

Valentino Stoll:

That's great.

Jordan Hollinger:

So I don't know if you want to dive into the whole history of it, but that

Valentino Stoll:

Yeah,

Jordan Hollinger:

history

Valentino Stoll:

I'm curious.

Jordan Hollinger:

might help explain where and why it came.

Valentino Stoll:

Yeah, I mean, so anytime I think about, you know, an active record, you know, rapper or addition, you know, I think about basically sequel gem.

Jordan Hollinger:

Mm-hmm, right.

Valentino Stoll:

So maybe like start about, you know, what's missing from active record that kind of got you here.

Jordan Hollinger:

Sure. So yeah, like I said earlier, it's rails and active record are good enough. Usually they're fast enough and flexible enough until they aren't. And I've, I've, I've worked on a number of large and or complex rails and active record code bases and in those projects you hit points where maybe you need to write unions, or you need to do with queries, And there are escape patches for those kind of things, find by SQL, so you can hand write it all. And sometimes that works great, but sometimes it's not enough. Maybe you need to eager load associations off of that. Can't do that. Maybe you need to use find each or find in batches so you don't run out of memory. Nope, that's not there. And even when you have those things, let's say you're looping through all your users, right? And you eager load their orders. On Amazon, I've got, gosh, I'm embarrassed to even guess how many orders I would have. You wouldn't want to load all of them. You'd want to eager load the last 30 days, or maybe sort them a certain way. And there's no great way to do that on ActiveRecord. There are some hacks, but they're verbose, and they're They're not put, they use private APIs, so it's risky. And so, yeah, those frustrations have always kind of been there in the back of my mind. You find ways around it, but they're bad. I had this one project come up. It was too complicated to get into, but ActiveRecord was way too slow. It was taking hours to run, lots of hours, like overnight. And so we talked about, okay, maybe an external tool. or maybe even, yeah, a gem like SQL, then we realized all the functionality we'd have to duplicate and keep up to date in sync and forget about the automated unit tests. But I remembered that Active Record has some really amazing meta-programming abilities. Like you can call, on a model, you can call dot columns. You can get all the names of the columns, all their data types. Also on a model, you can call it dot reflections. And you can see, oh, here are all my belong-to's and has many's and all the models and columns they refer to. All that good stuff is in there. That's really powerful. So we use those reflections and we generated all the SQL we needed. And then we ran it, I guess you'd say raw, like not through a model, just with the, you know, active record connection.exec query. It's a very low level API. Just returns tuples, but it's really fast. And that took things down from like 15 hours to 15 minutes. And all because we weren't loading and saving hundreds of thousands, maybe millions of Rails model records. I'll admit that's an extreme use case. But it

Valentino Stoll:

I mean,

Jordan Hollinger:

got me, yeah.

Valentino Stoll:

it's not that extreme, right? Like it's kind of, as your application grows, it becomes more and more popular, right? Like the size

Jordan Hollinger:

Ideally,

Valentino Stoll:

of your tables,

Jordan Hollinger:

yeah.

Valentino Stoll:

yeah. I mean, but you know, if you ever end up on a team with people, like

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

chances are good, you're gonna have tables large enough where you're gonna have to start optimizing for things, right? And

Jordan Hollinger:

Right,

Valentino Stoll:

your

Jordan Hollinger:

right.

Valentino Stoll:

associations have grown where you need to start like. having to know where to put indexes and how those, and even like you're saying, where to eager load things, right? The savings there can be significant when you're batching so many. And I mean, the first thing that comes, actually a problem I've had repeatedly, but recent, there is an issue with MySQL in the find and batches. with limits and ordering.

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

And Notoriously Finding Batches will use order and limit to do that batching. And it's a problem with MySQL because it'll order first. And so if you have like a super large table, it will do a full table scan to order and then start limiting. And so you really have to like roll your own cursor, which I...

Jordan Hollinger:

Yes.

Valentino Stoll:

Apparently there seems to be an in real seven a new feature coming where you can have it basically drop the order and just have it use ranges for the IDs.

Jordan Hollinger:

Okay.

Valentino Stoll:

So seems promising. But, but you know, like you're saying, you know, it hasn't catered toward these advanced use cases, right? It,

Jordan Hollinger:

Right, yeah, it's usually been

Valentino Stoll:

you know,

Jordan Hollinger:

a lowest common denominator kind of ORM.

Valentino Stoll:

Right.

Jordan Hollinger:

It's gotten better. If you're using Postgres, you can now do things that you couldn't do before. But still, it does have its limits, yeah.

Valentino Stoll:

So as you found these limits, like what's prompted you to be like, all right, it's time for a gem, like time to consolidate all this stuff that I've figured out and problems we've solved and focus it into a gem, right? Like what sparks that?

Jordan Hollinger:

I think it was. my brief sojourn into Elixir and Phoenix. I really liked its ORM, which is called

Valentino Stoll:

Huge fan.

Jordan Hollinger:

Ecto. Are you?

Valentino Stoll:

Oh yeah.

Jordan Hollinger:

Yeah. I'm sad I don't get to use it much anymore, but I really did enjoy Ecto, especially how fast it was. And part of it, you know, sometimes people say that constraints can open up new ways of thinking. And I think that's what had to happen because Elixir is a functional-ish language. And you can't have an active record-like interface on that. All the objects are immutable. You can't change an email address and hit Save. You've got to have different ways of doing it. And those ways turn out to be a lot faster. And so yeah, that combined with the experience of using these reflections APIs got me thinking, what if something like that were in active record? You could use a different ORM, but switching an ORM in an existing project is. fraught. So yeah, I started toying with what I conceived as kind of the middle layer API. It gives up a few of the bells and whistles. You can't save objects like with Ecto the same way. But you get most of the speed from the low level APIs that's just returning structs and tuples. Plus, as I was working on that, I thought, oh, maybe I could do eager loading a different way and make it more flexible. So you can add where conditions or change the order or any other kind of thing. So that's kind of where the idea came from. Yeah, I wanted to pull this ideas out of an application and just make it a general purpose jam. And that's where Octum's record was born.

Valentino Stoll:

Yeah, that's awesome. I mean, there are so many features in here that I wish existed in, you know, active record for sure. You know, and I'd be happy to just like walk through them with you because

Jordan Hollinger:

Sure. Yeah.

Valentino Stoll:

like like we were saying, you know, earlier with the like batch loading as an example. It seems you already have kind of like a cursor based approach for that.

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

Which is nice But another thing I saw in here which I found myself doing you know, unfortunately too frequently is the You know the custom sequel You know and wanting to use active record with that right

Jordan Hollinger:

Right.

Valentino Stoll:

Which I tend to just like you know rely on hey fetch these this sequel and just deal with the array that it returns, right?

Jordan Hollinger:

Yeah.

Valentino Stoll:

Which is the popular choice because there's not too much alternative. So how does Occam's Razor kind of handle that case? Like what kind of like features of Active Record does it like add on top of the raw sequel?

Jordan Hollinger:

Right, well, the first thing it keeps, I'll say, is the query builder. So, you know, model.ware, whatever, keep all that. But you just pass that into Occam's record. You don't run it with ActiveRecord. And so, Occam's record will run that SQL for you. You also use Occam's record for all the eager loadings. So, it's completely custom eager loading code. There wasn't a great way to use Rails built in. And... Like if you're eager loading 20, 50 associations, that nested hash and array syntax can be really gnarly to read anyway. So yeah, we use a block-based syntax, which is I think easier to read and also lets you pass in all the options. I want to customize this one to add a where order by anything like that.

Valentino Stoll:

Yeah, this is really cool. That was one thing I definitely always found confusing in Rails is what syntax to use, when to use the hash, what to include in the hash for

Jordan Hollinger:

Right.

Valentino Stoll:

when the Eagle will write. It's not exactly straightforward. And I have the same problem with the parameter, right, where,

Jordan Hollinger:

Yeah.

Valentino Stoll:

what is it, permitted attributes or something like that. It has a very similar syntax and it's much easier just to read through your examples here of using the eager loading in a block syntax. And I really like that. And so. How do you go about building that your loading context like is that all. How is that managed internally?

Jordan Hollinger:

I think there's a class called context. It's a nested structure, and it makes heavy use of the reflections APIs from the Active Record models. So you say.eagerload customer. It looks at the custom reflection, figures out what the heck that is referring to, and it figures out the right keys and stuff to join it with.

Valentino Stoll:

Gotcha. So how much of active records like you're loading? Are you taking advantage of, are you just like completely throwing that out and like this is all completely custom.

Jordan Hollinger:

I had to throw it all out. Someone who knows the Rails code base well may have known a way to take advantage of some of it, but yeah, that's, it's gotta be some gnarly code in there because it's some gnarly code in here in Ockham's record too. There are

Valentino Stoll:

Yaminde!

Jordan Hollinger:

a couple of exotic use cases that I had to say, nope, not gonna touch that.

Valentino Stoll:

I mean, that's the value of having your own gem, right? As you get to

Jordan Hollinger:

Right.

Valentino Stoll:

decide which edge cases to consider or not, right? Which I think is part of the disadvantage of active records current state, right? Is it has all these old, you know, edge cases that it needs to keep around. Right.

Jordan Hollinger:

Mm-hmm,

Valentino Stoll:

Right.

Jordan Hollinger:

yeah, yeah, bugs become features over time, for sure.

Valentino Stoll:

So you mentioned like being able to kind of perform conditions on the eager load to filter them down to only eager load certain kinds of batches. How does that work practically? Can you give us an example of something you use it for?

Jordan Hollinger:

Oh. Well, I guess the user and orders thing is probably the easiest example to pull out. I haven't used that exactly, but it's analogous. So if you're looping through your users, you want to eager load their recent orders, you just append a little.ware created at greater than 30 days ago kind of thing.

Valentino Stoll:

Oh, that's

Jordan Hollinger:

And

Valentino Stoll:

cool.

Jordan Hollinger:

you can use the scopes from your Active Record model, the custom scopes you define. So you can just define a scope on your model and refer to it in ACPA's record. It will figure out what you're talking about and use that.

Valentino Stoll:

I see you have scopes on the inside of the block. Yeah, I mean the block syntax definitely gives you a lot of flexibility there. Um, being able to modify it in the scope that it's in. That's pretty cool. Uh.

Jordan Hollinger:

I should say

Valentino Stoll:

So,

Jordan Hollinger:

they're...

Valentino Stoll:

uh... Looking at the next example here for me, read me, uh, is cursors, which,

Jordan Hollinger:

Yeah.

Valentino Stoll:

you know, obviously it's like something I have been looking for for a while. Uh,

Jordan Hollinger:

Valentino Stoll:

Jordan Hollinger:

think

Valentino Stoll:

Jordan Hollinger:

Valentino Stoll:

mentioned.

Jordan Hollinger:

added, I'm sorry.

Valentino Stoll:

So, uh, like what is something Like where, how do you use it? What is it? Right, like, yeah, I know what it is because I desire it. But how is this useful for a lot of people and, you know. Yeah, how do you use it?

Jordan Hollinger:

Yeah, so Active Records find each. Like we talked about earlier, it's all offset and limit based. And so was Occam's record initially. I just did it the same way, because that's what I'd been using for many years. And it didn't even occur to me that I could implement cursor support. But yeah, find each. Offset and limit work, great for small to medium-ish size tables when you get into hundreds of thousands, millions of records. each loop you go through to load a new set of records, your next 1,000 records, the database has to do a bunch of work to find, oh, this offset is now 500,000. The database has to just walk through that table and find 500,000 one record. And yeah, that gets slower and slower as the table grows. And so a cursor. So it keeps that state open in the database. So it knows it's the same connection. It knows exactly where it was in the table. So you just say, oh, give me the next thousand. Give me the next thousand. Give me the next thousand. Doesn't matter how large the table is. There's a constant speed there. And I should call out, yeah, right now, Occam's only supports cursors with Postgres, because that's the only major database that supports cursors in the way that... your average web app would want to use them. I think MySQL has some kind of cursor support, but it's like only in functions or something. It's not easy to use.

Valentino Stoll:

Yeah, my sequel has been unfortunate. Yeah, use Postgres. That's a full disclaimer there.

Jordan Hollinger:

Yeah, yeah.

Valentino Stoll:

Yeah, I mean, definitely for like paging results, like cursors are just like so much faster,

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

especially when you need to start dealing with orders and order by clauses and things like that, where it starts to get a little tricky on, you know, what is trying to like rank things on, are they in the index? Like, are you doing table scan? Like you end up having to run explains on everything to make sense of it. And cursors, you kind of just like let it run, right? And if you have a big giant chunk of data, it'll just like move through it, right? And so you'll have some kind of interesting functions like behaviors around cursors, which are like moving around and fetching the next pieces of it. Can you elaborate a little more on what each of those kind of does?

Jordan Hollinger:

Yeah, there's sort of two API levels there. There's the high level API, which is analogous to Rails findEach. It's called findEach with cursor. It works the exact same ways, the findEach we all know and love, but it just happens to use a cursor. There's also ActiveRecord has, I think it's called FindInBatches or something. And yeah, there's an analogous FindInBatches with cursor. The name's a little long, but it was easy to remember. Just add with cursor to everything. And then there's a slightly lower level API where you can actually use the cursor commands, like move forward, move backward, fetch, X records. And there are a number of others. I don't think I've ever actually used that. But it was just kind of an internal API. I thought, oh, this could be useful. I'll clean it up and just make it a public API in case someone needs to do a little more than just loop forward constantly.

Valentino Stoll:

Yeah, this is cool. I'm wondering, have you considered extracting pieces of this as pull requests to Reels? Or has it not received very great feedback?

Jordan Hollinger:

Uh.

Valentino Stoll:

Why is it still its own thing?

Jordan Hollinger:

Yeah, the number of times I've thought about that. Here's the thing, and I could be wrong, but here's my thinking. Writing code, that's easy. Dealing with a major open source project and saying, here's a very different but what I think is a way better way to do things, that's hard. That takes a lot of time and commitment. And I have not filled up to that challenge.

Valentino Stoll:

Yeah, you know, that's a really good point. You know, it would be nice if you know, open source projects in general were more approachable.

Jordan Hollinger:

Right.

Valentino Stoll:

You know,

Jordan Hollinger:

Yeah, I'm not criticizing the Rails maintainers

Valentino Stoll:

no.

Jordan Hollinger:

or anything. It would just be, I think it would be an order of magnitude more work than creating this gem. And

Valentino Stoll:

Yeah.

Jordan Hollinger:

maybe someday I'll pursue that, but yeah, I've not had the energy to so far.

Valentino Stoll:

Yeah, you know, it does bring up a good point. Like, you know, unless you're on like larger teams or organizations where you have resources to like contribute back, you're kind of just like burning through your own time, right?

Jordan Hollinger:

Right.

Valentino Stoll:

Like,

Jordan Hollinger:

Yeah.

Valentino Stoll:

yeah, why you have to weigh whether or not that's worth it for you, right?

Jordan Hollinger:

Right. Yeah,

Valentino Stoll:

So.

Jordan Hollinger:

I've done a couple bug fix PRs to Rails, but that's a totally different beast than a, here's a brand new way to do

Valentino Stoll:

Right here,

Jordan Hollinger:

half

Valentino Stoll:

feature.

Jordan Hollinger:

of everything.

Valentino Stoll:

Yeah.

Jordan Hollinger:

Yeah.

Valentino Stoll:

Yeah, it'll be interesting to see. I know for, there was the, you mentioned the with query, what is that called again? Table expressions. I know that Rail 7, you know, kind of just implemented its own version of that. And so it is interesting to see kind of how those things come about. I'm curious if anybody sees your gem and, you know, it's like, we can probably rework this, you know?

Jordan Hollinger:

It's...

Valentino Stoll:

Do you get anybody reaching out to you like that? Or is it more of just, oh, you notice something's new?

Jordan Hollinger:

Yeah, I haven't had anyone from the Rails team reach out to me, but I have noticed, like I started this back in 2017. And I think the first big thing was this will fix your N plus one query problem because it doesn't support lazy loading or dynamic loading of associations. You have to be explicit upfront and say I'm gonna load these ones and no more. And a couple of years after that, Rails added.strict loading, which does that. in Rails. It's opt-in, but it's still there now, and it wasn't there before. And yeah, I think... I expect everyone's hitting these problems that I had. I made a standalone thing. And just over time, people are trying to fix those same things in Rails. I think it's a case of everybody wanting the same things and just getting there in different ways is what I suspect.

Valentino Stoll:

Yeah, how to get it in.

Jordan Hollinger:

Right, and it's necessarily slower in a project as big as Rails than starting

Valentino Stoll:

Right.

Jordan Hollinger:

something up fresh. Yeah.

Valentino Stoll:

Yeah, I mean, in a way, there's a reason why SQL gem is separate, right?

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

For the same reasons, you know, it has its own, you know, I definitely use it in a lot of side projects because it, because of its support for like many different databases at once, you know, which I know, you know, Active Record does as well. But it's much easier to get set up in SQL, I think outside of the Rails context. Um. Yeah, this is, I mean, this is really cool. I mean, there are so many features in here. Like, how did you like, was it a very incremental process or

Jordan Hollinger:

Oh, yeah.

Valentino Stoll:

were you,

Jordan Hollinger:

Oh, yeah.

Valentino Stoll:

yeah.

Jordan Hollinger:

It was at first all about speed and killing off that n plus 1 problem. That was my only goal. And then I realized, oh, I can do some fancy stuff with eager loading, too. And I can do even fancier stuff with eager loading. And that went on for a long time. And then, oh, I can add cursors. So yeah, it's been very iterative as I hit things that I'm working on that, oh, I need a way to do this. Rails doesn't support it. I know what to do.

Valentino Stoll:

So I'm curious, have you, I don't know if it's in here or not, but have you considered like elaborating on the explains and the introspection queries that you get to like make a better explain?

Jordan Hollinger:

I haven't, no. That's an interesting idea, but no, I haven't thought of that.

Valentino Stoll:

That's one thing I definitely, I'm

Jordan Hollinger:

Well,

Valentino Stoll:

always looking

Jordan Hollinger:

I'm,

Valentino Stoll:

at explains trying to

Jordan Hollinger:

yeah.

Valentino Stoll:

make sense of it. And I, I feel like there's a website that explains explains. And I always copy paste in there. I forget the name of it now, but

Jordan Hollinger:

Oh yeah,

Valentino Stoll:

you

Jordan Hollinger:

I'm

Valentino Stoll:

know,

Jordan Hollinger:

happy

Valentino Stoll:

okay.

Jordan Hollinger:

to accept poll requests if you wanna contribute that.

Valentino Stoll:

But yeah, I mean, you seem to have a pretty good foundation here for, you know, expanding on things. Do you have any like next steps you're planning to tackle? Or is this kind of like a feature complete for what your use cases are?

Jordan Hollinger:

Yeah, the last big thing added was cursors and I had a few people ask for streaming support in Postgres. I played with that for a while. It's not really on the roadmap right now, but that could be the next thing. But yeah, after that, I don't have anything, don't have any big plans for it at the moment. It's nominally done. But yeah, that can always change.

Valentino Stoll:

Streaming in Ruby is still hard.

Jordan Hollinger:

It is. I have another gem for that actually, but

Valentino Stoll:

Do you know what's that called?

Jordan Hollinger:

it's, it's for streaming JSON specifically it's called JSON emitter.

Valentino Stoll:

JSON emitter.

Jordan Hollinger:

Yeah. When you don't want to generate 20 megabytes of JSON in memory and then spit it out at the client, you can

Valentino Stoll:

Oh, you can chunk it

Jordan Hollinger:

generate

Valentino Stoll:

up.

Jordan Hollinger:

it as you

Valentino Stoll:

That's

Jordan Hollinger:

stream it. Yeah.

Valentino Stoll:

very cool. I'm gonna check this out. Yeah, for those at home, this is, we'll put in the show notes, JSON-EMITTERGEM. Very cool. Yeah, I'm curious what you use for that internally as far as process structure.

Jordan Hollinger:

Got to say, I think I wrote that in a weekend, like three years ago. So yeah, I kind of forget. But yeah, it uses the, what's that gem called, that it'll detect the fastest JSON parser you have on your system and use that.

Valentino Stoll:

Oh, okay. Yeah, I forget what that's

Jordan Hollinger:

Yeah,

Valentino Stoll:

called too.

Jordan Hollinger:

it's got a little internal buffer. It'll generate so many kilobytes at once and then emit it out to. to whatever's listening, be that rack or something else.

Valentino Stoll:

Oh, that's very cool. Directly to the file or IO object. Nice. Yeah, AI has like made everybody start looking at streaming again, right?

Jordan Hollinger:

Yeah.

Valentino Stoll:

Which I know the Rails core team is probably not too happy about. Because I'm sure it's not a focus that they were planning for. But yeah,

Jordan Hollinger:

Probably not, probably not.

Valentino Stoll:

I mean. I mean, this JSON emitter is really neat. That's gonna be super handy. Yeah, I'm always looking for ways to. to make the streaming process easier.

Jordan Hollinger:

Well, while we're talking about the gems, I'll just throw out another gem I maintain. It's called OTR-ActiveRecord. Stands for off the rails. So

Valentino Stoll:

off the rails.

Jordan Hollinger:

it's a helper for when you want to use ActiveRecord, but not Rails. It's possible to do on your own, but it's not documented and it's awkward.

Valentino Stoll:

Jordan Hollinger:

And

Valentino Stoll:

always wondered if this was possible. This is,

Jordan Hollinger:

it is,

Valentino Stoll:

this is neat.

Jordan Hollinger:

yeah.

Valentino Stoll:

I thought, how, how is it possible?

Jordan Hollinger:

There's actually not much, there's surprisingly little code in there, but you just have to initialize certain things, configure the database connections. You have to add a couple of your own rake tasks if you want to create new migration files. But a lot of the stuff, like running migrations, is built in and it can just reuse all that. But it's really just a wrapper to initialize the Active Record gem, and then you kind of use it however you want. Once I just needed it in a rake script. So I used that.

Valentino Stoll:

really cool. Have you tested it with multiple database connections?

Jordan Hollinger:

I had some pull requests to do that, and

Valentino Stoll:

Okay.

Jordan Hollinger:

I've tested it. It works. I have not used it personally in production. I've just tested it on my machine that it does seem to work. But

Valentino Stoll:

That's very

Jordan Hollinger:

yeah.

Valentino Stoll:

cool. It's possible, folks.

Jordan Hollinger:

It's possible.

Valentino Stoll:

Off the reels. Are there other, uh, you know, off the reels projects out there? I mean, that just sounds, that's such a catchy name.

Jordan Hollinger:

But not when I wrote it. I haven't looked for a while, but.

Valentino Stoll:

Oh, that's super fun. Alright, so back to Occam's record. I couldn't help but notice the name Occam's Record to Occam's Razor.

Jordan Hollinger:

Mhm.

Valentino Stoll:

I imagine that is related directly.

Jordan Hollinger:

It is, yeah. People quote Ockham's record, especially movies and TV shows. They say it's something like, the simplest answer is usually the best one.

Valentino Stoll:

Right.

Jordan Hollinger:

But I actually went and I found the quote from William of Ockham from back in the 1500s or something. And what he actually said was, do not multiply entities beyond necessity. And what I came to learn is that the Rails internals, because it's optimized for reading and writing records, it has a lot of entities in there, a lot of objects, a lot of initialization. And most of our workloads in our web apps are read heavy. They're 80, 90% reads only. And when we load thousands of active record objects, There's a lot of CPU and memory stuff happening there that's just wasted. It's never used because we're not writing anything back. We're not making changes. We're not running the Active Record callback chain. And so yeah, for a lot of the time, I thought, oh, those are entities that have been multiplied beyond necessity. So yeah, Occam's record was a way to try to fix that.

Valentino Stoll:

Yeah, you know, I often find myself like adding selects, right, everywhere, just to slim

Jordan Hollinger:

Yep.

Valentino Stoll:

down the record, you know, the memory footprint of it. And mostly because you don't need half of it, right? Like...

Jordan Hollinger:

Right, yeah. Yeah, DBAs love when you can do that. But yeah, you can't do it on eager loads, for instance.

Valentino Stoll:

Yeah,

Jordan Hollinger:

So.

Valentino Stoll:

yeah, I mean, I'm a little surprised because, you know, Rails gives you that full, like end-to-end footprint of what's being used in that, right?

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

I'm a little surprised that there isn't more, yeah, I don't know, data or something like that, statistics that you can gather to identify these, like, areas, like, once you get to a certain point. I know there's, like, the... What is it, the bullet jammer or things like that where they help with M plus ones. But more like to like finding the areas where you can even reduce the record size, the object size that you get back. I feel like that's something I'm always looking to do. You know, retrospectively, after you've gotten it out, right?

Jordan Hollinger:

Yeah.

Valentino Stoll:

But I mean, because you're in a view or however you're using it is always in the same rails. you know, the frame, the context of the framework, I always thought there was, there's something missing there, right? Where it could give you a little signal, you know, or just automatically select things based on the usage of the subsequent requests. But I digress. So how do you, how are you handling the M plus one aspects? I know you mentioned briefly that you have solved like some common M plus one problems. What are those that you've solved with Hawkins Records?

Jordan Hollinger:

So let's go back to our user and orders example. So let's say there's an orders metadata table. I'm sorry. Let's say, yeah, user orders and then the items, the products. in ActiveRecord unless you use the recently added strict loading option, if you don't eager load that final products, like leaf node, every time you loop through and say user.orders.products, new query, new query, new query. And that just magically happens because it's an ActiveRecord record, and it has all that knowledge. I'm going to try to be helpful. And you know. Select this stuff for you. I don't know I'm in a loop. So with Occams, the results are not active record class objects. They're not model instances. They're basically structs. And if you don't, say, eager load the products on the orders, there's just going to be a no method error when you call order.product.

Valentino Stoll:

Yeah, that makes sense. I like that

Jordan Hollinger:

Right.

Valentino Stoll:

better than

Jordan Hollinger:

And

Valentino Stoll:

how

Jordan Hollinger:

like,

Valentino Stoll:

reals handle it.

Jordan Hollinger:

yeah. And I try to throw friendly error messages when I can. Like, you know, it intercepts that and tells you, oh, here's like the table that was pulled from. And so you can find it. You can find it in your massive nested list of eager loads more easily. And same kind of thing for columns, too. Just select the three columns you need, and you call a fourth column you didn't load. Instead of just saying no method, it'll tell you, oh, that looks like a column on the model, but you didn't select it in your statement. So maybe go do that.

Valentino Stoll:

Oh cool. Yeah, I'm guilty doing that.

Jordan Hollinger:

Yeah, yeah.

Valentino Stoll:

I mean, you know, so many of these things are just common. You know, address so many common pitfalls of active record, right? As you start to get more complicated queries.

Jordan Hollinger:

Yeah, it's almost like necessarily, I don't think someone could have designed something like this or something better than this. Because I think there could be things that are better just from scratch. We had to all use ActiveRecord for a decade and realize, oh, here are all the pitfalls. Ooh, now it's hard to go back and fix all that because it would be breaking.

Valentino Stoll:

Yeah, and I mean, even some of the ways that there are, you can do with the AREL table, right?

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

There are ways to do some of these things. It just doesn't look that great.

Jordan Hollinger:

Right.

Valentino Stoll:

And it definitely doesn't cover all cases. I feel like it just leads, because it's hard to look at, it also causes problems that you aren't thinking about. I know I've done that by accident and then been like, oh, I was missing a, you know, I don't know. Could be anything. But yeah, I mean, I've always found that just like doing a custom SQL statement is just like so much clearer and obvious, right? Then

Jordan Hollinger:

Yeah.

Valentino Stoll:

having to travel through all of the active records specific syntax, right?

Jordan Hollinger:

Yeah, yeah, it gets, it's

Valentino Stoll:

Gets to be

Jordan Hollinger:

very

Valentino Stoll:

cumbersome.

Jordan Hollinger:

great up to a point, and then it's too much. You gotta remember, what is this doing?

Valentino Stoll:

Right.

Jordan Hollinger:

Yeah, it's easier just to read the SQL sometimes.

Valentino Stoll:

So I was seeing in here that you have a way to kind of include custom modules that get used in the eager loading process, which is really cool. Can you walk through what that is and how that you're using it?

Jordan Hollinger:

Yeah, so with our users example, let's say you have a column for title, the prefix, first name, middle name, last name. You probably have a method in your model called just name, where you concatenate all that together. And it would be nice to have that in an Occam's record result, too. So you can abstract that name method from your model, from a user model, out into a module. Include that in your user model. And then when you're running an Occam's record query, say, oh, use this module on the user results so that I have that name method available.

Valentino Stoll:

Oh, that's really cool. So you don't pollute the model with that customization. It stays on the record aspect of it.

Jordan Hollinger:

Right, right, or in some cases, I've seen people add, they'll just add like a module inline in the active record model and then include right below it or something. So you still have that

Valentino Stoll:

Yeah,

Jordan Hollinger:

context.

Valentino Stoll:

I've done that a lot with concerns, right? To do

Jordan Hollinger:

Yeah.

Valentino Stoll:

the concerning blocks, and then you have all of your specific chunks of stuff, record related.

Jordan Hollinger:

Right.

Valentino Stoll:

It definitely makes it easier to manage those specific things, right? If you have a huge order history and you only want all the financial stuff in one place, it's easy to manage tax stuff, or have all that stuff together.

Jordan Hollinger:

Yeah, that was kind of a nod to the real world. Like, my idea was

Valentino Stoll:

Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha

Jordan Hollinger:

only structs, no methods. But if you're replacing a giant query with this to make it more efficient, you really kind of need that capability in there. And more recently, I added, I think it's down the page, but something called ActiveRecord fallback. So let's say you're looping through some kind of attachments model. to like a message or something, and you're using carrier wave or paperclip to handle all your upload and file fetching needs, pretty hard to extract that into a module. And so you can say, oh, on this eager load, if I call a method that doesn't exist, please initialize a Rails model for me and call that method on it. It takes away a little bit of the speed because you are loading an active record model, but... Sometimes that's the only practical way to load your file attachments if you're using Occam's record.

Valentino Stoll:

Yeah,

Jordan Hollinger:

another

Valentino Stoll:

that makes

Jordan Hollinger:

nod

Valentino Stoll:

sense.

Jordan Hollinger:

to the needs of reality.

Valentino Stoll:

Uh, so one thing I was wondering about while reading through this, uh, How do you handle, do you use Occam's record for handling custom selects that calculate things or aggregate functions or things like that? Or is this meant more for the eager loading aspect and optimal batching?

Jordan Hollinger:

You definitely can use it for that. Like you're saying, just write your own custom SQL full stop. Yeah, there's a method on there where you can pass in a SQL string. You can pass in bind parameters to it so they're safely escaped. Yeah, and you can use it to run raw SQL and get back tuples. And you can also do eager loading against that raw SQL if you want to as well.

Valentino Stoll:

Cool. Do you provide any mechanism for like referencing those custom columns or pieces of the custom SQL? Or is it more of just keeping it in the tuples form?

Jordan Hollinger:

Yeah, it's really just in tuple form. You know, you do your sum as and give a name, and just that field will be on the result. That's all there is to it.

Valentino Stoll:

Yeah. Cool.

Jordan Hollinger:

Speaking of.

Valentino Stoll:

I see your big disclaimer here. Results are read only. I love that. Ha ha ha ha ha.

Jordan Hollinger:

Yeah, I thought that was important to throw that up somewhere near the top so nobody's surprised.

Valentino Stoll:

Sorry, what were you gonna say?

Jordan Hollinger:

Oh, since we were talking about raw SQL, there's a... there is an option where you can eager load a SQL string statement as if it were an active record association, even though it's not. So like I've had to in the past, Let's say. It has many through, but you don't have that set up. So you need to write that SQL yourself. And it can actually be more efficient than loading, oh, I need to load these six nested associations just to get to the leaf one. Sometimes you can write that SQL. If you're good at SQL, you can write that yourself and just get a single select. And that can be a whole lot faster than loading a bunch of intermediate records that are just joining and slowing things down.

Valentino Stoll:

Yeah, that's really cool.

Jordan Hollinger:

Though I'm not

Valentino Stoll:

Jordan Hollinger:

sure

Valentino Stoll:

the.

Jordan Hollinger:

anyone but me has ever used that. I've had people ask me, I don't get this. I'm like, I know it's complicated to look at. I couldn't find a simpler way to express it.

Valentino Stoll:

It's hard. I mean, that advance when you want these advanced edge cases, or I guess they're not edge cases, but these advanced concepts, it's going to be difficult to get a good, good anything out of it.

Jordan Hollinger:

Yeah.

Valentino Stoll:

I think you did a good job here, at least consolidating a lot of the common, you know, issues, specifically around a year loading rate and getting rid of a lot of those unnecessary select, you know, nested selects. Um So I see you have a benchmarking, entire section here on benchmarking.

Jordan Hollinger:

Yeah!

Valentino Stoll:

Do you wanna explain maybe what you've focused on benchmarking and kind of like the significant improvements here that I'm seeing number-wise?

Jordan Hollinger:

Yeah, I'll say I've not run these benchmarks on Rails 7 yet. And maybe not even on 6.1. I know Rails and ActiveRec, they've tried to make a lot of speed improvements. So most of these are against, I think, Rails 5 or 6.0. But yeah, on average, I was able to see a 3x speedup. And the benchmarking code is in the repo, if anybody. I need to run it again, because it's probably 2.5 now instead of 3. But yeah, I wanted to measure. Is this actually better? And after any significant change, I would run it to see, is this, did I just kill the performance by doing this, or is it still OK?

Valentino Stoll:

Yeah, I want to call out the memory test has some pretty significant improvements. which I think says a lot. It's probably what gives the speed the extra boost as well. Especially

Jordan Hollinger:

I said, yeah,

Valentino Stoll:

with

Jordan Hollinger:

that's

Valentino Stoll:

these,

Jordan Hollinger:

gotta

Valentino Stoll:

especially

Jordan Hollinger:

be.

Valentino Stoll:

with these has many aspects, those are the most impressive, you know, a thousand percent improvement. Yeah, I'd be curious to see what that looks like against like a real seven, active record seven or something like that.

Jordan Hollinger:

Yeah, I'll be honest, I've been a little afraid. I'm

Valentino Stoll:

Hahaha

Jordan Hollinger:

like, oh no, what if it's now just like a 1.2x improvement? But.

Valentino Stoll:

I mean, with the cursors, you're definitely going to be up there still. Right? So I wouldn't worry too much about it. So what is your process for? When you make adjustments, where do you look at for a lot of the performance testing?

Jordan Hollinger:

Um,

Valentino Stoll:

What's

Jordan Hollinger:

most,

Valentino Stoll:

your process for that?

Jordan Hollinger:

yeah, mostly it doesn't touch on all the eager loading stuff, or it's just like run the query over these large tables. What does the memory look like before and after? What is the speed of this versus ActiveRecord? It's fairly simple and straightforward, potentially even a bit naive.

Valentino Stoll:

Do you have any plans to add any other adapters to it? Or are you just sticking with Postgres?

Jordan Hollinger:

It's, well, it technically supports anything that ActiveRecord supports. So Postgres, MySQL, SQLite, and yeah, a couple other databases. Yeah, it's the only thing that is database specific, I think, is the cursor support.

Valentino Stoll:

that cursor sport.

Jordan Hollinger:

Yeah.

Valentino Stoll:

Well, that's cool. So how do you stay up to date with Rails? That's my life. You

Jordan Hollinger:

Valentino Stoll:

know,

Jordan Hollinger:

ha.

Valentino Stoll:

so as, you know, as you see, things come out in Rails, do you like go to look at it right away and see if like there are things you can start dropping off? Do you drop things off if you find them? You know, what, what is your approach to that?

Jordan Hollinger:

Yeah, that hasn't happened yet. I know Rails added the strict loading option, which is their solution to n plus 1, which is great. But I don't feel like I would want to drop that from this. That wouldn't make sense. Yeah, so far, the things Rails has added haven't really caused me to want to drop anything from this. There are still gaps. There's still gaps between them. I don't foresee Rails ever adding everything that's in this and making

Valentino Stoll:

Sure.

Jordan Hollinger:

it unnecessary.

Valentino Stoll:

Yeah.

Jordan Hollinger:

If all you're concerned about is n plus 1, you don't need this anymore as of, I think, Rails 6.1 or something. But if you want some of the other goodies, eager loading, a number of other things, cursors, then for the moment you would still need this. And yeah, it sounds like cursors might be next on that. If you only want cursor support, maybe you can use ActiveRecord soon. But I don't think it's there yet.

Valentino Stoll:

No, I don't think it is either. Uh, but I know there's a lot of people, uh, working on it. So hopefully, you know,

Jordan Hollinger:

Yeah.

Valentino Stoll:

we'll see, uh, you know, thank you to all those that are working on it. We appreciate

Jordan Hollinger:

Yes.

Valentino Stoll:

you. Uh,

Jordan Hollinger:

Yes.

Valentino Stoll:

so this is great. So, uh, Are you using this with, I'm curious if you're using this like in collaboration with other gems and finding like even more optimization.

Jordan Hollinger:

Not really. Yeah, not yet, at least I'll say.

Valentino Stoll:

Do you find yourself using something like bullet as an example with your Occam's record just like as a way to like help you identify things within octave Occam Occam's record right.

Jordan Hollinger:

Yeah, I've definitely used Bullet. I don't think Bullet would be aware of what's going on in Occam's. Trying to remember how bullet-work, I think it only is active in tests. Is that right? Does that sound right?

Valentino Stoll:

I think they have a development toggle now where you can have

Jordan Hollinger:

OK.

Valentino Stoll:

it on in development too.

Jordan Hollinger:

OK. So yeah, I think I definitely use bullet in apps. That's a great way. I think every app should probably use bullet, honestly, because you can still fix those problems without with just ActiveRecord. Yeah, you don't need to use this just to fix a few n plus 1 problems in your app, for sure. Bullet is great for finding and fixing those manually.

Valentino Stoll:

Well, this is really awesome. Is there anything else you wanted to call out? call out on active record or any of your lovely gems here.

Jordan Hollinger:

Uh, no, I'm, I was kind of hate self promotion. So,

Valentino Stoll:

Hey,

Jordan Hollinger:

but

Valentino Stoll:

promote away.

Jordan Hollinger:

yeah, no, uh, nothing else. That's, I'm really excited that the Rails core team is addressing some of these issues, uh, even though it, yeah, it takes a long time. And I don't know. I think if there's one thing that they could pull over that they probably don't intend to is the eager loading stuff. I think that's, I think Rails having an alternate eager loading API would be a great thing. And maybe someday that's where I'll just start all open. I'll try to start a discussion or an issue and say, hey, you know, what would syntax like this with these capabilities, look, that might be worth doing. But yeah.

Valentino Stoll:

Yeah, it reminds me of Aaron Patterson's adequate record.

Jordan Hollinger:

Yeah.

Valentino Stoll:

Oh my gosh, that was great. For those that aren't familiar with adequate record, it was like an alternate, was it active record implementation. Aaron Patterson had significant improvements in various parts that he just called adequate. And eventually pieces of it did end up in Rails. If not all of it, I don't. I don't remember how that panned out, but I thought

Jordan Hollinger:

Yeah,

Valentino Stoll:

it was pretty

Jordan Hollinger:

I'm

Valentino Stoll:

amazing.

Jordan Hollinger:

pretty sure some of that did get folded in.

Valentino Stoll:

Yeah.

Jordan Hollinger:

Yeah.

Valentino Stoll:

But I appreciate you taking the time to talk today and I'm definitely looking forward to playing around with Occam's record, lots of cool stuff in here. Yeah, and to be honest, your JSON streamer, JSON emitter looks really cool. I'm gonna be playing with that. Okay, if there's nothing else you wanna talk about, let's move into PIX. What do you got for us today?

Jordan Hollinger:

You have to cut this part out because man, I've got... I forgot to think of anything.

Valentino Stoll:

No worries, I can go

Jordan Hollinger:

Okay.

Valentino Stoll:

if you want to think about it.

Jordan Hollinger:

Yeah.

Valentino Stoll:

I've been working on an AI project recently. I'm constantly like more and more doing, I hate to even use the term AI, but it's so popular at this point. Like I have no choice. Otherwise like people just aren't gonna know what I'm talking about.

Jordan Hollinger:

Mm-hmm.

Valentino Stoll:

But so I've been working on, you know, trying to guide, you know, these bots to give you the responses that you want. And I found this really great write-up that has pretty much like a how to use embeddings, what they are, like different ways you could format them in order to get the best search results out of it and to help guide like completions for a lot of these large language models. It's been super helpful in like helping me understand it. but also to find out the best ways to do things without having to trial and error your way to success. So I'll recommend that. Highly recommend checking that out.

Jordan Hollinger:

I'm going to go with just a little side project that I've been following. Julia Evans has a lot of great guides. And I think, did you say zines or zines? I've never heard that word pronounced.

Valentino Stoll:

I think it's Zines. I

Jordan Hollinger:

Zines,

Valentino Stoll:

don't,

Jordan Hollinger:

OK. Yeah,

Valentino Stoll:

honestly, I don't know. Let's go with Zines.

Jordan Hollinger:

let's go with zines. I've been following how to implement your own DNS resolver. This guide is in Python. Like, Julie Evans does a lot of these basic infrastructure, how does this work? Here's a guide to do it on your own. And I've been going through a lot of those lately. Yeah, right now it's a DNS resolver. And yeah, I think understanding that those lower levels, you don't need it for your day job, but I think it will make you a better program and more aware of what's happening underneath all the layers that we work at.

Valentino Stoll:

Yeah, that's super cool. I love playing with that DNS playground that she has. Julie Evans has some awesome stuff. I'm always on the lookout. I have her Linux toolbox behind me and I'm always referencing it. So many great gems in there. So huge plus one there. All right, well, thanks again for coming on Jordan. It was awesome to talk to you and learning all about how to make your, you know, Rails application record, active record, whatever we're gonna call it these days. It keeps changing the names I feel like.

Jordan Hollinger:

Yeah, well, yeah, thanks for having me. It's been really fun.

Valentino Stoll:

Yeah, awesome. Well, until next time, folks. Valentino out.

Occams Record with Jordan Hollinger - RUBY 599

0:00

00:55:37

Playback Speed:

Show Notes

Sponsors

Links

Socials

Picks

Transcript