Rails at Super Scale with Kyle d'Oliveira - RUBY 667

Kyle d'Oliveira (Clio) shares his survival tips for dealing with tens of thousands of commits, massive migrations and the very limits of databases. We discuss the lessons learned from Rails megaprojects and how to use these tips in your own projects to reduce technical debt and tools to keep your monolith majestic when the code won't stop coming.

Special Guests: Kyle d'Oliveira

Show Notes

Kyle d'Oliveira (Clio) shares his survival tips for dealing with tens of thousands of commits, massive migrations and the very limits of databases. We discuss the lessons learned from Rails megaprojects and how to use these tips in your own projects to reduce technical debt and tools to keep your monolith majestic when the code won't stop coming.

Links


Picks

Transcript


Hi, everyone. Welcome to another episode of Ruby Rogues. I'm David Camira. And today on our panel, we have Matt Smith. Hello.

Luke Suttters. Hi. And we have a special guest, Kyle de Oliveira. Did I say that right? That's de Oliveira.

De Oliveira. Yeah. Gotcha. And I So, Kyle, would you mind telling us a bit about who you are, who you work for, and some of the things that you're doing? Sure.

My name is Kyle. I've been working for a company, named Clio. That's a legal practice management SaaS software. It's based out of Vancouver, Canada. It makes, practice management software aimed at lawyers.

We're looking at transforming the legal space. Our mission is to, transform the practice of law for good. There's a nice little double entendre there. And it's been really interesting seeing some of the those changes in legal that we've kind of made an impact with over the last few years. I've been working on Ruby on Rails for, the better part of the last decade.

But when I started working on Rails, it was Rails version 0, and I've been upgrading Rails ever since. And so now, finally, up to, Rails 6. And so touching all of the major versions. My major focus at Clio, which I've been at now for 8 years, has been on the back end infrastructure side of things. So the main focus is scalability for the code base, but also in the terms of the organization.

Like, what happens when we have 200 developers working? What happens when the dataset sizes are in to the size where we can exhaust regular integers and we need to actually go into, like, bacons? We look at approachability. How easy can we just take a new developer and dump them into the code base and have them up and running? Because as things go to scale, there's obviously new patterns that need to be adhered to that, you know, we don't necessarily need to focus on with small projects, but we do need to focus on for large projects.

And my team has focused a lot of that to making the effort and experience for all of the developers easy and fast. Yeah. Absolutely. One thing that kinda rings true is you always have to think about scalability when you're developing, but don't actually write for scalability when you're developing. So keep it in the back of your head saying, is this gonna come back and bite me later?

Or is it, you know, a really nonissue? I remember one time, I had a situation where I was storing just, 3 kilobytes of data in a database. And I thought, okay. This is gonna get used a little bit. They were images.

So you can kinda see where this is going. I'm like, you know, that's not a big deal. It's only 3 kilobytes. But, unexpectedly, the consumers loved the feature that it was supporting. And now that single table is over 30 gigabytes, and it has millions upon millions of records.

I'm like, oh, that was an unexpected, but I guess that's kinda where I did not think of scale at the time or proper way. So introducing that kinda technical debt kinda painted us in a corner because now transitioning away from that model is gonna be a pain when you're dealing with that much data. Yeah. Absolutely. It's hard to know what you don't know.

And so if you don't think about the scale at that point in time, it's hard to know what problems you're even going to run into. So you gave a talk last year about death by 1,000 commits. Could you give us a high level overview of that talk and kinda some of the things that entails? Yeah. So working at Clio, the code base is quite large.

We have tens of thousands of commits that we go through, and it's really easy to see patterns of developers working on features. The features go live. And at some point in the next 6 months, a year, those features come back to bite us. So as, like, the first commit is great, the 10th commit is you're starting to notice some things, but a 100th, there's maybe some problems. And by the 1,000th 1,000th commit on it, right, you've stopped because now you have to to completely refactor and rebuild a lot of this technical debt that you introduced.

So my talk was talking about some of the lessons that, we've learned. And although these the lessons are very specific to specific problems, there's kind of a generalized idea of what some approaches that you can take to dealing with technical debt in your own projects. If you are able to, for instance, keep if you're able to automate technical debt away entirely, well, then there's a whole classification of problems you no longer need to think about, and you can feel confident that those are just automatically protected. And if you are cleaning up up yourself as you go and making it easier when there are curveballs being thrown at you, you know, fixing technical deck and dealing with it when you hit scale doesn't have to stop you entirely. It just becomes a constant, small tax that you pay.

But you if you invest in the tools, you can actually start moving faster even as you scale. Right. And so would you mind also explaining what technical debt is? What would you consider technical debt, and what are some things that you would maybe not consider technical debt? Kinda like some debusting myths about technical debt.

I would say technical debt is like accumulation of decisions that are made while coding that you eventually need to correct in the future. And as developers, I think we're always making these decisions. Can we cut a corner here to deliver a feature out a little bit early? And I think those are like, technical debt isn't bad. I think when you are willing to get something in front of the users and deliver value earlier by incurring a little bit of this technical debt that you then have to clean up, I think that's totally okay.

But I think technical debt often comes in the situation of developers making a decision that a framework needs to be super generic, and it's got a little bit speculative. And then they come to implement something in the future that's just really difficult to deal with because it's so generic and hard to understand that new developers have to then unpack them and deal wind it back just to implement something new in it. Some things that I think are not necessarily technical debt can kinda come from maybe decisions that actually made sense at the time and aren't necessarily any cutting a corner. So, I mean, it might make sense to build a a system that is very generic, And maybe that is the correct choice, and you build through and then things change. And when things change, that's when you might have to have, like, the technical that comes back.

But until things change, it actually might not be. I think that's a bit of a, like, a generic answer, but it's it's hard to pin down a concept like technical debt because almost everything we write is debt of some form. Mhmm. Yeah. I I definitely have to agree with that.

So what are some of the real ex real world examples that you guys have experienced over your years where, at the time, you made a decision and you or the team thought, like, this was a great choice. This is a right way to do it. But then later, you found that it became more troublesome or more of a headache than it was worth. One of the things that popped up is actually something that, you know, we decided on because the Grails community pushes for, and this is what comes out of the box. So if you think about Rails, migrations, if you think about how they're often applied, if you think about some examples that you've worked on, there are often times where you use something like a a tool like Capistrano, which deploys some code, and as part of the code, database migration gets run.

And for projects, that's fine. That's like a for most small things, like, that migration that runs is fast, and it's not a problem. But so this is an example of a decision that we kind of we're like, let's just inherit what the community uses. But as we started scaling out, we started encountering problems with it. So for instance, a table that, if you ran a migration on it, took 30 minutes.

This means that our deployment took 30 minutes. It also timed out, so we lost all of the context of it. But also during this period of time, the table locked. So any developer or any queries that started going to that table stopped being answered. So all of our servers shut down.

And we couldn't kill the alter table because it was already mid progress. And after it finished, we now had a table in, with, like, a new state, but the code hadn't actually finished deploying. So now we're running into different problems. So this is a little bit of a, you know, decision that it makes a lot of sense when you're small. Like, go really quick because you can, and it makes sense.

But when you hit a certain piece of scale, you can no longer run with those assumptions, and you need to change those. So a new process needs to be built. And for database migrations, we need to build them in a way that are, like, entirely asynchronous to a deployment process. Thirty minutes is a that's quite a migration. Yeah.

I I think the table this table that we run uses, stores a little bit of all of the activity that users do. And it was, like, the first table we ran into that it exhausted, like, 32 bit integers, and we needed to flip the IDs to be biggants. We didn't think that would be a problem either. And it's it's leaps and bounds bigger than any of the other tables we have in our system. I'm gonna ask the obvious question now, which is, how do you make your system capable of asynchronous table migrations?

There's actually that's a good question, and there's actually a lot of tools that exist that we don't necessarily need to build ourselves. GitHub has a tool called Ghost. There's another tool by Percona. It's in the Percona toolkit. I can't remember.

It's, like, maybe online schema replacement. Can't remember the exact name. But the general strategy is to instead of changing a table with, like, an alter table, you actually create a brand new table, populate that table with various mechanisms. Some some of them use triggers. Some of them use the binary logs.

Get the table to, like, a table that's in sync, and then do quick renames. And so you rename the table to be the old one to be old. You change the new table to be the new one, and then new queries start flowing into this new table. And you can do this as long as you want. It's entirely non blocking, but it has to be in a process that exists entirely outside of, like, the deployment stack.

Yeah. And that could have its own issues if you have, you know, thousands of requests per second coming in. So, yeah, definitely not a fun problem to solve. And it's also, I guess, good to know what kind of migration or, really, what kind of SQL functions will cause a table lock. So adding a index or adding a column and stuff can lock your table.

So being aware of what actually is going to lock the table is really good information to know. Some of them seem obvious. Like, I think if you're dropping a column or adding a column, that could potentially lock. But some of them are not. Like, if you changed a VARCHAR from, like, a VARCHAR100 to VARCHAR200, you're just increasing it, does that lock?

Maybe. I actually don't know off the top of my head. What if you change the the character set? What if you change the collision? I don't know.

Is this on MySQL or, Postgres? This was in we use Percona, which is just an offshoot of MySQL. So it'll also be different for between databases. So Percona might have different decisions. Shout out to the Percona guys.

I've done some work in a place where we had some Percona consultancy. They were really good, really delivered. So that kinda covers the database and schema side of things. To step away from the code, you had mentioned about onboarding people. With a larger client base, what does that process look like for you guys, and how do you really bring a junior or mid developer into the company and have them productive quickly?

Yeah. So a lot of this comes from tooling and education. Right? There's as, like, senior developers or people who have just different experience from different places, we've accumulated huge amounts of knowledge, and it's kind of all tribal. And I think the if you join a company that doesn't have a great strategy, a lot of the strategies for sharing that knowledge is, like, just work together.

Go submit pull requests and have them code review it and learn from the code review. And I think that's okay. You can learn that way, but there are better ways to push information to people. And this is the concept about, like, just in time education. So one of the an interesting example of this can be through the linters.

So I did a talk about this as well for the 2020 couch edition of RailsConf called communicating with cops that focused on using, Rubocop as a mechanism to provide education. Did a little bit of deep dive into how Rubocop works and how to build your own custom comp. But one of the things that we approach with at Clio is as people make mistakes and learn about bad patterns, we try to codify those patterns so that it's it doesn't happen again, but people get education about it right as it happens. A good example of this that is super trivial and doesn't often bite people until, like, there's just an unexpected case would be maybe the Rails convention of naming files. We've seen cases where people maybe make, like, a user's model, but then make a typo in, like, the spec.

So rather than calling, like, user spec, call it users, and it's plural or something along those lines. And, you know, this like, the spec will still run, but there might be some tooling that we expect to adhere to the Rails convention, and it doesn't quite line up. So you can have a a linter that basically checks the name of the files and the name of the classes and make sure that they're in line. And if not, alert people and do that as part of their editor, or do that as part of them committing code. And they get warnings, and they get education as they're writing code.

So if they just wrote something, they save the file, they get a little warning popped up being, like, hey. You may have made a typo here. And this goes even to as far as behavior. If we know that there exists bad patterns, so for instance, making, an HTTP call inside of a transaction, which we know is gonna be potentially bad, we can actually automatically prevent that. And as soon as that starts happening and soon as we're able to detect it, so it might be in a test, might be as part of a linter.

We provide that education right back to the developers so that they understand what they did wrong and the avenues of what they need to do to fix it. So now when a junior developer enters the company, they can actually just feel free to start writing code and write even code in kind of a way that maybe breaks some patterns. And a lot of times, they're gonna start getting education right away. And then we can do all of the usual things as well as as pull requests come in, we can review them and provide more education that way. And if we find constant patterns of every junior developer we come in makes the same mistake, let's codify that so that they get the feedback immediately.

Yeah. That's kinda one of my pet peeves, I guess, you could say with linting, is that if a particular project has a set of practices it likes to follow, maybe it is no more than 100 characters on a line, that kind of feedback should never happen in a code review. That if you have those kind of expectations, then they need to be known expectations via a linter, whether it's RuboCop or Standard RB, and it should never be a unknown exception to or unknown expectation to the developer. So I'm definitely on board with that, and that's something that I've had to fight and struggle with is going through code reviews and having everything kinda nitpicked. Because, one, it did it decreases the morale of the developer if every pull request they're making is just getting bombarded with styling, quirks or requests to change.

So I could definitely agree with that point. And I think that every project should adopt some kind of linter if there are expectations of what they're doing. Even if you bring in Rubocop, you disable everything by default, and then you just start adding in or allowing which exceptions your team follows on that particular project. Yeah. Absolutely.

And I think there's even one step farther of a lot of the linters can do autocorrecting. So if you, you know, if you care about having, one one line space between methods, don't even have Rubocop or Linter warn about this. Just autofix it. Like, that's something that a developer just doesn't need to worry about. And, you know, it also removes a lot of this argument over, like, should I use double quotes?

Should I use single quotes? If it just automatically fix and developer can write whatever that they want, that's fine. But I've also run into issues of having pull requests being bombarded by style, and it really distracts from the code review about the behavior. Yeah. Absolutely.

Although, yeah, you do have to be careful about the auto correction. I remember one time in my earlier days of development when RubyMine came out. I tried out RubyMine's code refactoring thing. I forget what they call it. But I had some really poorly written classes, and it just absolutely broke everything.

Like, I have no idea how that happened, but things just were not working the way they were before. I had to pull that merge back out because, you know, of course, as a early developer, I didn't have any tests on the application. So I didn't really notice that things were broken until they got deployed. Yeah. Yeah.

You definitely need to be careful there. So you also previously mentioned about so not necessarily onboarding developers, but having a lot of developers work on the project. So at what point do you go from a small shop to a large shop where you have to start putting different kind of practices in place? And what are those kind of practices when you're dealing with a lot of developers on a single code base? So, actually, it's not clear where that point exists.

I think it's probably gonna be different for every organization and probably different for exactly the the work that you're running into. I think the thing is to be listen to the pain points of the developers. So if you notice that there are you know, there's pieces of friction, that occur between developers. Like, that's the point where you there maybe there's actually some tools that need to be built to make this easier. So one thing that I think comes up really quickly in I in organizations is often the concept of, like, a testing server.

So you've got your developers environment. You've got your, you know, maybe your CI, but maybe you want, like, a production like environment for things, and so you have a staging, sir. You know, when there's 5 developers, it's really easy to just coordinate and be like, oh, staging is mine now. I'm gonna test something. When it's done, I will hand it off and maybe reset it back to whatever, the master branch and let people work that way.

But that really falls apart when you have a 100 developers. How do you coordinate 1 server where everyone is trying to test something if you have a 100 developers fighting for that resource? And you can kind of fudge it a little bit by, maybe having a fixed number and, you know, you round robin them out. But again, at some point, that's going to break down. So if you think about, like, what's what's the problem here is that every developer wants to potentially test something on an asynchronous schedule.

Maybe it actually makes sense to build some tooling so that we you can spin up a, like, staging servers on, like, Amazon EC 2 or on Google on demand and just route them there. And so that's something that we we ended up having to do really early of building our own tooling so that we can we call them beta environments where we can have arbitrary number of them. Someone spent the effort to basically say, like, this branch on GitHub, I want a clone of the site on Amazon. And within, like, 10 minutes, you've got a domain that points to it. You've got the full stack.

You can you have full control. You can do whatever you want. You can break it, and it gives developers a lot of autonomy to test things that they want and, you know, removes a lot of this, oh, let's deploy it and see what happens. You have a full environment that you have full control over. Go test it.

Go see it with as much data as you want and then see what happens. Another example kind of along those things is, like, deployments. Do you have, a handful of senior developers who can deploy, or do you deploy on, like, a every Monday, you do a big deployment? Like, that's gonna start really breaking down when you have a lot of developers. You know, at Clio, we everyone has the ability to to deploy.

Everyone has the ability to merge code. So we give the power to the developers, and now, you know, a junior developer can come in, write a fix to a readme, merge the code, deploy it without, you know, having to really bother people outside of getting a code review. And, you know, now we're deploying code probably upwards of 30 ish times a day, and that number is just only going to go up. And so as we're running into these issues, we are just looking at what can we do to build tooling so that it's no longer frustrating for developers. And the important part of this is developers need to voice things, and, you know, managers and companies need to listen.

If we're wasting 5 hours a week per developer on this one thing that's frustrating, like, build tooling around it. Yeah. That's one of the things that I did just for my own hobby project and just continual learning is I have a self hosted GitLab instance, and I set up a Kubernetes server, which will automatically create the infrastructure for the application that got pushed. So it always happens on any kinda development or master branch push, and then also on each commit up to the repository. It'll spin up a entire infrastructure within Kubernetes with the FQDN that that feature can then be tested.

So it works on smaller applications. I don't know how it would work on applications that consume 30 gigs of RAM of resources. But I think on smaller applications, that kind of thing can really save you from having to have dedicated test servers that's shared by several people. When are you gonna do an episode on that, Dave? I do have a George for Ruby episode on Kubernetes, which that's where I got the inspiration from on that episode.

I just didn't tie it into the CICD portion. I got a I got a question for you, Kyle. It sounds like you got a you got a lot of data if you're running 30 minute migrations, and you've got a lot of developers, and you've got a good testing, good infrastructure. What I what I found is a lot of the kind of real memorable problems I've had is where you get something running and it feels like it's gonna be fine, but then it gets deployed to the master database. And that's the point at which there's some bad data in there.

There's something in there from ages ago, from a previous version, and it absolutely syncs you. And these days, whenever I possibly can, I just pull the entire production database out and test against that? Do you do that? Or is your database just so huge, you ended up kind of throw it around? You you you can't do that, especially with a lot of the developers.

It used to be something that we did. We used to have we used to call it the snapshot, and you could point environments at the snapshot and run test queries on it. But we actually do did hit a size where the time it took to set up the snapshot every day was taking longer than it would take to actually back it up. So we it was just starting to become unfeasible for us. And we're also dealing with sensitive data, and we don't necessarily want to give free access to all of that that data for our our our clients.

So we instead tried to invest in a little bit of tooling. We we definitely still have issues where everything looks good in development, everything looks good in, like, beta or test, and we deploy it to production, and something is wrong. So we think about what can we do to make make that better. And so we you know, if if it's about a lack of index on, like, a database query or something like that, we can try to check that ahead of time and build some tooling and alert people when something goes wrong. But, also, like, in production, we can be able to say, like, hey.

This query took 30 minutes. That's unacceptable. This query took 5 minutes. And return that information as, like, an exception to the developers that they need to fix, but without interrupting the the actual request behavior. And if things go really south, just roll it back.

Like, we we don't really have it's not a a blame if someone deploys something and goes south and they quickly roll it back. We just try to take that as a learning opportunity. And how can we take that learning opportunity and share it to everybody so that everyone learns from it? Did that answer your question? Yeah.

I mean, you must be dealing with a lot of data. And, I mean, I've worked with you call it HIPAA data in the states, where it's kind of confidential data, and that hugely complicates testing data transfers because, you have to have to either heavily anonymize or write your own tools, kind of replicate a few, 100,000 medical records. Yeah. What we can also do is I I mentioned earlier that we could talk. We have these, data environments that we can spin up.

We you just use, like, a SQL dump to store data in there. And although this isn't necessarily production data, developers have full control over what that data looks like. And so, you know, if we wanted to see what happens if there is tens of thousands of something in a table or more, we could just build, like, little scripts that can feed this seed that database and then test it outside of production. It's not perfect because it doesn't always match the same shape as production, but you can it's an iterative process, and the that information gets codified. So you can keep adding to the seeds in those manner so that it becomes a better and better representation as we go forward.

Yeah. So kinda back to the technical debt, I have a unfortunate story of something that I inherited one time where I think metaprogramming is awesome and can do a lot of really cool things and can really get you out of a bind in certain situations, but then it can also be overly abused. And I was searching for a function that was not working properly within Ruby, and I couldn't find it in the code base at all. So I thought, okay. Well, surely that this is in the gym or something.

So I started looking at all the gems that's included into this Rails application, started tearing apart the gyms, opening them to search for this function. Still couldn't find it. Turns out, they were doing a class eval on something that's pulled from the database. So they actually stored Ruby functions as column or data within a column on the database, and that's what was getting executed. That's where the function was defined.

So What's wrong with that? To me, that's a what's that? What's wrong with that? Yeah. So, you know, other than you could not possibly even test that bit of code any with any kinda reason, but it was a nightmare.

So just a warning to when you think that you're doing something really cool and elegant that's avoiding code duplication or whatever, I would much rather have code duplication all across my application than having that level of obfuscation where you're never gonna be able to even remotely troubleshoot it. Yeah. Metaprogramming is a like, it's actually one of the best strengths of Ruby. You can do so much with it, but it's, like, once you once you have it, it's the hammer and everything is a nail, and you want to use it. And that's that's often a trap that new developers, when they learn about metaprogramming, they really want to go into.

I think a good lesson to come out of that story is that if you think about code, it's written once, but read countless times. And so if you can take simple things to optimize the code for the reader, that is much better than sacrificing readability to optimize for the writer. So if it takes you an extra 30 minutes to write a whole bunch of cookie cutter methods, but now those methods are in place and they're static and it's easy to read and reason about and test, that is well worth that 30 minutes because you're gonna lose more than that reading that that piece of code in the future. Yeah. Absolutely.

And it could even be taken to something like private methods, where if you have a class which has a bunch of methods, start sorting them out which ones are private methods so they do not need to be accessible to the consumer. Because I've had situations where I've worked on a class that grew over a 1,000 lines, and there were hundreds of methods in there. And I had no idea which ones were publicly accessible that were truly supposed to be publicly accessible, and which ones were really meant to be private. So not having that level of abstraction, so to speak, you lose a lot of visibility in how important is this class to the consumer. Yeah.

Absolutely. Anything that you can do to make those kind of classes easier to understand and read for a new new person is great. And, also, just backing up a little bit to your example, this is also an instance where metaprogramming bit you, but metaprogramming is also interesting that it could save you because you can also ask Ruby about Ruby. So if if anyone didn't know, what this is just a tactic that I use all the time for debugging pieces of code that I've never been familiar with. You can if you can have access to a console, you can ask Ruby what methods are available with, like, a dot methods call.

You can also get access to the method itself and then ask it, like, what is its source? Where does it live? Where that can make life easier to track down methods that may be dynamic or created by gems. I've I recently learned how to use the ls command in pry, And now I just live out of the ls pry command. The Ruby API traffic's dropped off considerably.

I find the dot methods to be quite noisy. This is very verbose if you're kind of trying to pick out which command it is. And I really like the pry LS command. Yeah. One thing you can do to make that less noisy is take, like, object dot new and subtract the methods out of that and sort it and all that sort of stuff.

And you can do it all on a one liner because we're in Ruby. But, yeah, ls is another great option. It's my my documentation suffered for it, I must admit. Now my attitude is just, oh, they can just the less the class and see what's going on, man. I think that's another example of someone making some tooling that, you know, makes something that yeah.

If you knew to call dot methods and subtract object dot new dot methods or object dot methods, it's great. But now, it's 2 characters, and it's nice and easy, and it's much more approachable. And then you can have access to things that you may not knew existed. Can I ask you about can we turn back the clock and ask you about rails 0? Oh, it's been a long, long time since I've worked on rails 0.

I can try to answer questions, but So it sounds like you've been on a bit of a journey with scaling things up. What what did you do before Rail 0? Oh, I actually most of my eCareer has been working with Rails. So before Rail 0, I was working, at, like, an enterprise Java shop that I don't remember a lot of the details of it anymore. It's kind of too far in the past.

But I think I've been working with Rails now for 11 years, I think. So it's been just a long time just Rails. I don't remember a lot of the pre Rails, well, to be honest. That that is the correct answer. There is there is no other system.

I I ask because we were talking about the n plus one queries. And my my complaint is that Rails makes it too easy to do n plus 1 queries because if you just kind of follow all the guides, that's what you get. If you kind of do a dot all the each, then you're gonna be there for a while. And you start noticing that when you start getting into a few 1,000 objects. So you can be sitting there prototyping something and think this is great.

And then when people start using it, you'll drop it in. That's when you start hitting these gotchas. But I think people forget what the bad old days were before you had the rails tooling. The amount of time it took when you had to write your own queries was really quite significant. And you mentioned, Enterprise Java.

That was not a whole lot of object relation mapping going on in that. So the it is it is a double edged sword. When you're operating at the scale you do, what are the parts of Rails that start to bite? We've definitely been bitten by how easy it's been to make n plus one queries in the past. I think pretty much any Rails shop is going to be doing it.

Rails offers tooling to help with that, but the the tooling still requires, a lot of effort. You have to kind of know what n plus one query you're introducing and fix it. So though that's where you can build some more tooling, there exists, a gem that we built. It's a JIP preloader. There's also another community gem called the Goldiloader that removes stuff like n plus one queries.

And those are ways to, like, basically eliminate those kind of problems. Some other things that kinda come off on Rails as we are building is, like, discoverability of templates. So I think you're one of all the previous episodes of Ruby Robes was talking about this. But as as it scales up, like, Rails ERB makes it really easy to render partials all over the place. But it's all really hard to understand, like, if you're looking at a page, where are those partials actually coming from, and how can you dig back into them?

So we've like, that's a challenging thing with Rails as well. There's also some things with the community for things like paging that can be problematic at scale. If you look at what some of the basic gems that offer, it often comes down to, a limit offset, which is also really fine on small datasets. But as you get to datasets that are really, really large and you're going to page really deep into them, it actually starts really falling apart and breaking down and things that you might not know until you actually just hit that scale. I think the the some of the Rails conventions, also starts becoming a little bit problematic, and you see a little bit of discussion about this.

You know, Rails, at one point, said throw all the logic into the controller. And then, eventually, the controllers became skinny, and all the models became really fat. And I'm sure everyone has that god object that exists in their project, the user object or the account object that is 5,000 lines and really difficult to reason about. And people are offering opinions of having, like, service classes or, various different patterns to try to combat that, but we're still trying to unpack some of the things that you started the rails projects with. One question on that, as far as how you've seen and the progression of the companies you've worked at, how documentation.

Right? Like, on the one hand, we've just talked about you can use cops, you can use linters, and say, go out and try things, break things, autocorrect things, experiment, basically. And then there's self documentation, making sure you're writing good method names, good class names that are intuitive, And then there's inline documentation. And then there's high level documentation of, hey. We're using this.

Set some conventions and everything else. I this is a big question, but how what what do you think is the right thing to put in each of those buckets in order to make an intuitive project that scales across, you know, more than 20 developers up to a 100 developers? Yeah. And, you know, here's a little bit of, like, my kind of thoughts from it, but I I'm not gonna say, like, my thoughts here are perfect. I think everyone's mileage will vary because documentation is a tricky thing.

So when you get to if you're getting to, like, gotchas, like, if you ever tell someone like, oh, if you see this pattern, don't don't do it. Like, this is like if you have code reviews that are like, oh, I've been bitten by this before, that should be something that falls into, like, the the linting or the, like, the just in time education where, you try to codify that. If you see people that have inline comments in code, that says, you know, like, these next few lines are going to iterate over something and do these operations, that's probably an indication that their code is not written well to describe it, and that comment is not super valuable. So that actually might be my something of, like, that comment shouldn't exist. And instead, we should maybe extract a method that describes it better and kind of move in the direction of code describing itself.

When you are implementing something that's specifically tied to code, it should probably exist at the code level. So if you are you have, like, a a module that you want things to include and someone developers need to implement certain methods in there, maybe the module should define those methods and raise, like, a not implemented error that have a very clear, this is what this method should do. This is what it should return. Here are some examples, and just link to them in your own codebase. And so now when a developer looks at that specific piece of code, it's it's still tied to the codebase.

But all of that's you know, at the codebase level, there still needs to be something at, like, a higher level. That's like a read me in the documentation or in something else entirely. So we have stuff that exists in a read me that's kind of more about, like, process, but process is specifically related to our code base. So, a good example of this would be, how do you do these these asynchronous migrations? So, like, this isn't really super tied to code because you might might get migration, but then what's the process of getting that live?

So we have a, like, a step by step, guide to the for for at Clio. If you wanna do a migration, here are the steps that you need to take. And as much as we can, we just link back to code rather than reimplement the code, but we'll also just describe things in English and offer templates there. And then we go one level higher to things that exist more at, like, a process level for the organization. So for that, we use a tool called Confluence.

There's lots of tools that exist that kind of do similar things. But for those, that's that's things that exist outside of the code base. So if an incident happened, how do you do a postmortem or re root cause analysis on that? And there'll be documents for for that. Or, you know, if you wanted to propose a new style of, new feature that you wanted to get some buy in using some new architecture, just wanted to make sure that the approach is correct, you can do, like, a design doc in this conf in Confluence and get people kinda bought in well before you've actually written the code.

But once the code is written, that document is less relevant. Absolutely. Right? I was kinda going from the standpoint of, like, we were talking about bringing a new developer in and getting them used to the whole environment, and you've definitely tackled some of that in terms of, you know, here's the process migration example there. What about just getting them used to the entire structure of your application where certain logics live, certain design paradigms that you've talked about?

Some of those can be encapsulated in linters, but some of them are larger than than linters. And so is that when you're doing the specific guide for walking them through that process? Yeah. So, there's definitely things that linters aren't gonna be able to to do. Like, the linter won't be able to tell whether this this thing should be a model or a service class or something.

Right? It's not really gonna be able to it doesn't understand the business logic of it. So for things like that that we kinda have to rely on, like, little handbooks of being like, here. Like, we've codified our style guide, and we try to make sure that we keep that up to date. There are some things that we still teach through kind of tribal knowledge and code reviews.

Like, if someone submits a pull request and we notice it, we'll still correct it there, and we'll do a lot of pairing. So we'll get developers up to speed by working with people, as opposed to just going off on their own. But I think this is just a learning process. Like, we I don't think we are perfect at getting developers onboarded, and I don't think anyone is. And I think that's the important distinction, is you just it's an iterative process.

It's if you if you bring in 3 developers and they all have the same issue, that's probably when you might need to introduce some new documentation and be like, hey. Here's our new developer handbook. You might wanna read it. And then absolutely. And then on top of that, you have personalities too, and, you know, certain people gravitate towards certain things.

How what's your methods to when you have what I would consider external documentation, whether that's living in a Readme, it's not in a, a Ruby file or a HTML file or something like that. How do you guys have any triggers in order to hey. If something happens over here and we decide on a new paradigm, make sure you go update that guide documentation. Or is it, oh, just like, we brought a new person in, and we've got an this new convention that's not documented. Oh, jeez.

We gotta go update that documentation, and it's kind of a only when you discover it type of issue. So I think the answer is both. I I I definitely think we still have places where our documentation drifts, and then somebody notices and we're like, oh, shit. We gotta we gotta fix that. But we we all we also do leverage tools like, danger danger.

Js, like GitHub, where it can, like, look at code, and it's not necessarily like a linter of basically saying, like, hey. This is bad. But it can make a comment of being like, hey. You're doing something. Maybe this is related to this this link over here, and direct developers or whoever's reviewing to go take a look at the documentation.

Maybe there's no changes that require them. We definitely need to be careful about how much noise we generate. But, you know, if in the case of, like, a migration, if a developer writes, some a migration and then submits it, we could basically say, hey. Did you add a new file to, like, the DB migrate file? If so, like, make sure you're following, the gut steps in here, and make sure that it aligns.

And kinda point them back at the documentation, both for the writer of the pull request, but the reader. And then kind of helps make sure that things stay in sync. Not a perfect process. I think we're just we're we're slowly getting better at making sure the documentation stays up to date. Yeah.

But that's always the painful part, and, those are great insights. What do you think about DHH guy? He's a bit of a weirdo, isn't he? The I was what I was I was it it it it don't answer that. Trust me.

The, no. I love DHH. He did a a book quite a few years ago, could rework, which was prophetic, really, in the current situation about working from home. He did a RailsConf, you know, I think it was, a couple of years ago, where he said that at base camp, they have never had a DBA. So they've never employed a person whose job it was to administer the database.

This is something which Rails has just just magically scaled up and the database is scaled up. Do you are you in the same situation? Have you never employed a a DBA for your very large Rails database? Yeah. Actually, we are in the same situation.

I believe we I think we were going to hire a DBA this year prior to the pandemic, and then I think there were some complications. But prior to that, the company has been operating for over 11 years, and I think now no DBA. We definitely have some DevOps that are, a little bit, like, focused on making sure that the the database is running and making sure that, you know, we've got replication set up and proper statistics. But we kinda put the onus on everyone. Like, you don't have one person who is the guru of SQL.

You have everyone. And so everyone tries to teach everyone these things, and we try to do our best to share that knowledge where we can to make everyone as experts as we can. So we've we've managed to go, you know, 11 years with no DBA. And I think we're only getting to wanting one now because we're trying to do, like, really customized processes of how do we you know, this online schema migration stuff, how do we make that completely automated, which is actually gonna be a completely distinct system to the rail system? Because we're gonna want it to apply it to any of our projects.

Or maybe some gotchas between, like, upgrading MySQL. Right? There's probably some things that they might actually have really good insight into. But I think our general approach is, even in that situation, we're gonna have 1 DBA and hundreds of developers. And we wanna make sure that, you know, they may have knowledge.

It might be useful for talking through things and sharing things, but the work is gonna still fall on the developers. And, you know, we need to make sure that everyone is learning as much as they can and not just blindly hoping that the DBA is gonna handle it. Yeah. I mean, it's the way the way, DHH presented it, it was kind of this is this is a necessary evil mind, was to have a database specialist. This instead rails enables developers to kind of handle this themselves and not just kind of blame the database man or woman when the when the thing goes wrong.

Surely, as a company gets bigger, you have more specialized roles and not less specialized roles. Yep. I would agree. And I think there are more specialized roles, but I think there are skills that apply to everyone. So, you know, as the company grows, you may have more specialized roles that have more specific knowledge.

But I think, probably, with that specific knowledge comes the responsibility that they are not gatekeepers of that knowledge. Right? They they may be experts, and they're maybe building content. But I would say part of their job is to make sure that it's that content is consumable by everyone. And, you know, if they are answering the same questions over and over and over, they're not doing their job to educate people on how to self serve and do it themselves.

And that's how we learn and grow as a community and get better is just by sharing this knowledge. It's it's a it's a really quite an interesting situation. I don't know what it means for the DBAs, but I I think there's there's definitely more database work out there. But I I think that because Rails just makes it so easy to work with databases at scale that you kinda tend to hint that hit that stage much, much later on. Yeah.

I agree. You don't necessarily have to have everyone customly building SQL. Like, ActiveRecord does a pretty good job of being an ORM that lets developers just do the things that they need to do. And, you know, there's data notifications available to easily add tooling that you don't need the DBA for. But, you know, as things grow, there's things that Rails doesn't yet have tooling.

And maybe that's something that, like, if it you have a DBA who is well versed in rails, like, maybe they can contribute back to the framework or add their own gems that can help everybody get better at working with databases. And, you know, it doesn't necessarily invalidate their job, but their job becomes more of a knowledge producer, and they try to share that knowledge and make the community better. Yeah. We're in the same boat. We we like to push that knowledge down as far as possible, but there certainly are opportunities when you're deep in the materialized views and windowing and Postgres or something like that where you're just like, I I really wanna phone a phone a DBA friend, and that's the con side, I would suppose.

Yeah. And I I think that's that's like, the roles of the specialists, the the people who have the specialized knowledge, they're probably more consultants. And, you know, you have somebody who's like, I've got a really gnarly problem. I don't know what to do. Yeah.

Like, let get them to, like, sit down and help with you, and that's a big asset that they can help with people. And, you know, if that's a one off, it's a one off. But if they do this 10 times in a week, maybe there's education there, or maybe there's tooling. And I think it goes to pretty much any role that you ever feel like you're just throwing something over the fence. If you push that responsibility also to the developers, you can also end up with a much higher quality project.

Teach them how to use includes and avoid some of those massive queries or n plus one problems. Or use some of the the the gems available and have the m plus one queries just automatically avoided for you. You bet. Yeah. I've had some include statements which span 50 lines on some projects I inherited.

It's insane, the kind of data that they're trying to return. But, yeah, it's crazy. Good advice. One also Is there anything else that we wanna talk about? I know we're getting at about that time.

I was just gonna mention one thing about includes, because, I think this is a another gotcha of Rails, is they don't really teach you what happens with includes. And includes is actually does 2 things in the background. There's it either uses a preload or an eager load. And a preload splits it off into a different query entirely, where you do something like select star from table where ID is in this big list. But then there's eager load, which tries to smush it into one big query.

This is something where Rails always suggest using includes because it'll handle that distinction for you. But that distinction actually makes a difference at scale. And when you're dealing with large tables, like eager load is almost always worse significantly. And so it's almost all the time you actually want to use preload. Same same interface, but it's just this interesting little gotcha that you don't really realize until it starts biting you.

And you gotta remember everything is just a tool, and you can either smash your finger with that hammer or you can build what you wanna build with it. Exactly. Alright, Kyle. Well, if people want to follow you and some of the stuff that you're doing online, where should they go? I don't really have a huge online presence.

I do have, like, a GitHub account, but that's mostly working on, like, either public gems for the company. But what I'm trying to do is be a little bit more present in the community. So I do have some talks available at RailsConf, and I will like, my goal is to be pushing out a little bit more written content, which is available at, like, the the blogs that Clio provides. So we I can provide a link for that in the future, as well as a link to any of the talks that I have. But, unfortunately, I'm not a super, user on Twitter, but I can also provide my my LinkedIn where I sometimes post new information there as well.

Awesome. Well, I'm gonna move us over to some picks. Luke, do you wanna start us off? Yeah. Listen listen to this.

Listen to this. Can you hear that? I can't hear anything. That that is the sound of me signing up for driftingruby.com, which is a quite excellent series of Rails gusts, including the excellent from jQuery to ES 6 episode. I am a notorious jQuery user.

Almost an unrepentant one, but drifting Ruby has let me see the light. And I'm a newly reformed character. So my pick is driftingruby.com. I must say that's a great pick. So alright.

Hey, Matt. You wanna chime in with some picks? Well, my pick comes out of this. I'd say that Danger JS is something that I really wanna look into. We're significantly investing in CICD infrastructure and deploying those branches like you were talking about, Dave.

And so that looks like a really great way to tie back the documentation and check the best practices that conform with the rest of the company, and that's my pick for today. I'll let you know what I discover. Awesome. I'll jump in with a couple of picks. One is from Google.

It is the Titan Security Key. Other companies have similar products, like the UBO key. It's a USB or a NFC key that will do your authentication for you. So I actually have a couple of these arriving in the mail today in preparations for another Drifter Ruby episode that I want to do on these things. So that should be a pretty interesting one.

I don't think it's gonna have too much popularity because I never had one of these keys before later today. And the other is I have now in front of me a little rack of Raspberry Pi, 8 gigabyte of RAMs that I'm building into a tiny Kubernetes cluster for well, just because I can, really. So I love Raspberry Pis, and they just released their 8 gigabyte versions, which actually makes it nicer to run some heftier things on it now. Still slow, but still a lot of fun. Alright, Kyle.

Do you wanna join in with some pics? I didn't prepare anything. So I actually don't have anything that's off the top of my mind here for things to just call out. Alright. Fair enough.

Well, it was great talking to you, Kyle. And I always like talking about technical debt because I am notorious for introducing it. I'm always happy to like, building tools to fix these things so that we can make the game better. Alright. Well, that's a wrap for this episode.

We appreciate you coming and talking with us. It was a lot of fun. Yeah. It was wonderful. Thank you.

Alright. Bye. Take care.
Album Art
Rails at Super Scale with Kyle d'Oliveira - RUBY 667
0:00
53:58
Playback Speed: