JSJ 448: MongoDB Schema Fundamentals with Joe Karlsson - JavaScript Jabber -

JSJ 448: MongoDB Schema Fundamentals with Joe Karlsson

MongoDB is a popular option for databases which provides objects that look and act like JavaScript Objects. We brought an expert, Joe Karlsson to clear up some of the confusion on how to arrange your data in MongoDB. Joe provides a rundown on how to think about your data with a smaller dataset, a medium sized dataset, and a large dataset. The panel also dives into how the database works and how things are managed and arranged by the MongoDB database engine.

Hosted by:

Special Guests:

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

Transcript

CHARLES MAX_WOOD: Hey everybody and welcome to another episode of JavaScript Jabber. This week on our panel, we have Amy Knight.

AIMEE_KNIGHT: Hey, hey from Nashville.

CHARLES MAX_WOOD: Dan Shapir.

DAN_SHAPPIR: Coming at you from the very hot Tel Aviv.

CHARLES MAX_WOOD: Steve Edwards.

STEVE_EDWARDS: Hello from sunny Portland.

CHARLES MAX_WOOD: I'm Charles Max Wood from DevChat.TV. And I just have to say, Dan, you said hot Tel Aviv and you were talking about how it might get up to 80. It's already 80 and it's 1030 in the morning.

DAN_SHAPPIR: So I said it might, I said it got up to 90.

STEVE_EDWARDS: 90, oh wow. Smoking hot. 8

DAN_SHAPPIR: 80 is like springtime or fall, you know. We're summer here

STEVE_EDWARDS: and just to clarify that's 80 Fahrenheit not Celsius, right?

DAN_SHAPPIR: Yeah, sometimes

CHARLES MAX_WOOD: 80 Celsius is 200 Anyway, we have a special guest this weekend as Joe Carlson Joe, welcome back.

JOE_KARLSSON: Yeah. Hey, what's up? It's I'm coming from Minneapolis. It's nice and cool. I was just gonna say it's not the heat that gets you It's the humidity

STEVE_EDWARDS: and mosquitoes, right and the mosquitoes absolutely and

DAN_SHAPPIR: and Tel Aviv can get pretty humid as well

CHARLES MAX_WOOD: for two years and I remember one August, the whole month was like 105 degrees and 100% humidity. And you would just walk outside, just melt on the sidewalk. So I hear you.

DAN_SHAPPIR: Well, what you do is you walk outside and then you walk straight back in again.

CHARLES MAX_WOOD: That's right.

JOE_KARLSSON: It's easy in quarantine too, we're just staying inside anyways.

CHARLES MAX_WOOD: Yeah, fair enough.

One of my favorite communities in programming these days is the Angular community. Every time I go to an Angular conference or meet up with some of my friends who are in the Angular community, I have a great time and a lot of them have wound up on Adventures in Angular. So if you're doing front-end development, you're looking for a way to keep current on the Angular ecosystem, and you want to have a good time listening to fun people talk about great topics related to Angular, then go check out Adventures in Angular at AdventuresInAngular.com.

CHARLES MAX_WOOD: So we have you on to talk about MongoDB Schema Design, and I have to preface this a little bit by saying that I have never actually done this right. Or at least maybe I did it right and I just wasn't happy with it, but I've tried using MongoDB a few times and you know, it's like, okay, well, how many of these things do I put in the different collections and how do I set this up to work? And maybe it's just because I have SQL brain damage or something else. You know, it's like trying to write Ruby, JavaScript.

JOE_KARLSSON: Yeah, no, I think that's super common. And that's what I mean, that's what I want to discuss here too. It's like one of the most misunderstood and like misused things. And I think designing like a schema that's going to be scalable and durable and last a long time, I think is hard to do.

STEVE_EDWARDS: Yeah. But Joe, it's schemaless, right? It's what's the flexible schema.

JOE_KARLSSON: Exactly. No, totally. Yeah.

STEVE_EDWARDS: I thought we were going to talk about modeling and I had all my jokes about modeling and how Joe looks like Fabio and everything, but I guess we'll talk schema instead.

JOE_KARLSSON: That's a great podcast joke because everyone could see my long luscious hair right now. No, I love it.

DAN_SHAPPIR: In this podcast, if you have hair, you're already ahead.

STEVE_EDWARDS: Ahead, no pun intended, right?

CHARLES MAX_WOOD: I have hair. I haven't shaved in almost a week, so my chin is...

JOE_KARLSSON: It looks great on you.

CHARLES MAX_WOOD: Yeah, thanks. Makes me more sophisticated.

JOE_KARLSSON: Yeah. I should mention too, I work for MongoDB. I'm a developer advocate and software engineer. And I think I come from a lot of these perspectives too. Honestly, like before I started working for MongoDB, I didn't really understand the stuff either. I know that I'm here, obviously, I'm working with it every single day, which helps kind of understand it a little bit better. But I think these are all super common, like criticisms and places to be, you know, coming to MongoDB. Hopefully someone will learn something from this podcast today. That's our hope, that's the goal.

CHARLES MAX_WOOD: Given my knowledge level, that's almost a given.

JOE_KARLSSON: I prefer everyone's pod or their, their knowledge levels to be super low because it makes my job way easier. So that's great. Thank you for that. No one's gonna ask any hard questions today. That's great. But if you're like new to it or you've been using MongoDB forever, or if you've never even used it and you're like thinking about it, like I'm hoping that we can like unlearn some of your SQL brain damage or like get you set off in the right spot from the beginning. That's the goal here today.

CHARLES MAX_WOOD: Yeah.

AJ_O’NEAL: So this feels like deja vu. Was this the episode that we didn't record?

CHARLES MAX_WOOD: No.

AJ_O’NEAL: Oh.

JOE_KARLSSON: Last time we did like just an intro to MongoDB.

AJ_O’NEAL: Oh, okay. Okay.

JOE_KARLSSON: And this time it's like, so this is like a little bit more. Yeah. So I've been, I was on last, if you've missed that episode, it's a great place to start if you're just brand new to it too. And this is like 102, this is like less than 102, right? We're like one level up here.

AJ_O’NEAL: Excellent. I love being one level up. Amy has the question to start.

AIMEE_KNIGHT: I do. So I was actually talking to somebody about this the other day. So I feel like it's maybe it's a good place to start, but I feel like most people kind of know the difference between like NoSQL and SQL, but there it's my understanding, like there are different types of NoSQL options. So can you explain the difference between like MongoDB and Cassandra or like any of these other options? I think there aren't, isn't Cassandra a document based, but there's others too. There's like graphs and, and stuff like that. So explain to us the different NoSQL options.

JOE_KARLSSON: Yeah, that's a great, yeah. So I think that, yeah, that's a great, that's a great place to start. Like what is NoSQL and what is MongoDB? I love that. So let's just start, I guess, NoSQL. This is controversial. I think it stands for not only SQL. There's some fights about what that actually is, but there you're right. There's a bunch of different, and it basically is any database. That's not a traditional RDBMS or relational database management system, which we traditionally think of SQL. But yeah, that includes things like document based databases, which MongoDB is. And documents basically mean like it's like a JSON like document. We actually call it BSON binary instead of on JavaScript. But yeah, you just save things like, like you wouldn't object in memory, you can save that in a database. Or there's things like key value pairs. That'd be something like Redis. So you just, instead of having like more deeply nested data structures, like you can have an object, you can just usually retrieve things really quickly just from key value pairs. It's very flat. And then graph databases like Neo4j, those are things that are good right for like, yeah, social graphs or like kind of connecting things and more, with more complex connections. Think of like a social network, like your friend graph or. I'm missing, I'm missing a couple here. What else am I missing here? Team? What's, what's it, what else is no SQL database?

CHARLES MAX_WOOD: Well, Cassandra is columnar.

AIMEE_KNIGHT: Ah, you're right. I think.

JOE_KARLSSON: Yes. Yeah. One's an example.

CHARLES MAX_WOOD: The way they think about data is different than it's, it's not documents. It's anyway, I haven't used it in a long time.

AJ_O’NEAL: Both Mongo and Postgres are something that you use typically for applications where you have users. Cassandra is odd in that it's used for an application where there's users because that sort of database, the column, the what's it database is typically used for data warehousing which is for statistical analysis of large swaths of things. Then you've got things like Neo4j that are object databases, which are typically for things like desktop applications where you want persistence of the memory state. Actually, I shouldn't say that. I don't know if that's actually what it's for. That's just what I thought of it as, but, but you can hydrate, you can rehydrate an application. I think it has that server side aspects, but it's when you want to be able to rehydrate something in memory, uh, almost like a sleep mode. Something goes to sleep, something wakes up and everything's still there. Then you've got document databases, which are not used by, but perfectly great for something like Wikipedia. There, what, what other, are there other mainstream databases, types of databases that I didn't mention there? Cause I think we do tend to lump everything in a NoSQL, but it's like everything from enterprise-class data warehousing to Mongo is NoSQL and that's quite a wide range.

DAN_SHAPPIR: I do know that sometimes when people talk about NoSQL, what they actually mean is that they're using a SQL database, but they're just have data like a table with one value and they just store JSON into it and like it's a blob.

AJ_O’NEAL: Yeah. So that's a great feature of Postgres and I think that MariaDB, the original MySQL, has integrated that if I recall correctly.

JOE_KARLSSON: And I believe they have too. Yep, I think they have too.

AJ_O’NEAL: And you can easily do it in SQLite, but SQLite doesn't specifically support querying by JSON types, whereas Postgres, and I believe MariaDB do. I know that Postgres does, but Postgres has two different JSON types.

JOE_KARLSSON: And there's different use cases we're using. All of them have different special features or things you may want to use those for, but typically I think no SQL would think of like more highly performance or like different forms of availability. Like MongoDB, we have like shardings. You can like split it up differently. Horizontally scale is close to vertically scaling a database. And sometimes there's flexibility with your schemas or whatever. Right. I think with a traditional SQL database, you have a lot of, there's like a lot of rules and it's kind of locked in place, but if you don't need all that, then there's other options.

AJ_O’NEAL: Well, and I think I said this the last time we were discussing it, but I think the primary problem with all technologies is people kind of pick a, what do you call it? Like they pick a post and they stand by it and they don't move. I think that SQL databases are perfectly capable of most of the things that a no SQL database can do. Obviously, it's SQL is just one type of, like I just mentioned, like seven different types of databases that all have very specific use cases, but as a general purpose database for a user facing application that has 10, a hundred thousand users, maybe, maybe not, you know, when you get to 10 million users or maybe, maybe when you do, I mean, I'm, I'm sure that people scale these things out every which way, but I think there's a lot of great arguments for other types of databases once you're approaching 10 million users and that sort of thing, but I you know, even at the small scale, before you start getting into really custom architectures, you can choose to not follow the rules of SQL. I think SQL kind of attracts, I'd take this the right way, but a certain type of neuroticism. We all have different neuroticisms that are all focused around different, you know, different faces, right? It's like, which neuroticism are you? And SQL is very math-based and very exact. And so when you talk to people that are SQL heads and you suggest relaxing constraints and denormalizing the database. Denormalizing basically just means don't follow the rules. You know, they get in a huff. And so if every time you approach a SQL person with a problem, what you get is you're an idiot, go to then, then you're going to think, Oh, SQL databases can't do this and you go to the no SQL community and they're like, yeah, come on over here. Throw away the constraints, denormalize. It's all good, bro.

JOE_KARLSSON: Yeah, yeah, yeah. Totally.

AJ_O’NEAL: But it's, it's. I think in large part, it is a matter of the avenue of access. If you come to an angry SQL, SQL nerd with a problem that they don't like, you'll believe that SQL databases aren't capable of solving that problem. And I think we get the opposite thing too, where no SQL databases now have indexing and lots of nice features to make them fast and performant that traditionally was associated only with SQL.

JOE_KARLSSON: Yep. I think you bring up a great point though. Like I think, and I think this is great for like users, like of any database, but we're seeing like a huge hybridization, like traditional SQL databases are becoming more like no SQL databases and no SQL databases are becoming more like SQL database. Like Like MongoDB, we now support asset transactions and joins. Like, and right, like there's like, there's this middle ground that's starting to happen, which I think is great. Like obviously these are the data demands are just getting bigger and bigger and bigger. And like we need easier and better, more scalable ways to deal with that. I think that everyone's kind of, kind of building up each other, to like make something even better for all of us.

AJ_O’NEAL: Heaven forbid we find the truth in the middle.

JOE_KARLSSON: I know, no, I'm gonna stick my flag in the ground. I'm sticking here. I'm not listening to anything. Yeah, no, it's not great, right? Like I think the best tool to solve a problem is like it's whatever, whatever problem you have, but all of us have different problems and different ways to fix it. And that's OK, right? Like we shouldn't be bashing other people's tools or languages just because it doesn't fit our use case. It may fit their space.

AIMEE_KNIGHT: Amen.

JOE_KARLSSON: Yeah.

DAN_SHAPPIR: But I do think that there is an alternative way to look at it, which is you know, people these days, it's become almost a dirty word to think or to talk about or to think about architecture. It means that you're like kind of this astronaut floating way up in space where we need to fix real world problems, and it's all agile and we just write the failing test and then we get it passed and life's good. But there are certain architectural decisions that you do need to make. And I think one of the architectural decisions that people need to make relatively early in the cycle is what database they're going to be using and how they're going to be structuring this database and maybe what their schema might be like, unless you tell me different. So like saying like everything's good, you know, pick whatever you want. Life's good. Oh, it's all good. It's not necessarily the best thing in all circumstances in all situations.

JOE_KARLSSON: Yes, absolutely. I totally agree.

CHARLES MAX_WOOD: Yep. And going back to what I was saying at the beginning, I can attest to this, right? It's like. Oh, well, I think it'll work great this way. And then I start throwing data in it. And then I'm like, okay, this is hard. It's hard to figure out what I've got going in. It's hard to figure out what I've got coming out. It's hard to figure out how these associates to that associates to the other thing. And, you know, and I think if we're talking about database use cases, we could keep going, but I really do want to get into this, like how to set up, how to design your schema in MongoDB. Especially since it just, it makes me laugh every time I say that because then I'm like, but it's schema list, right? So.

JOE_KARLSSON: Yeah. Well, maybe should we start with like, kind of, because I'm assuming most people coming to this, if either you're an SQL person coming here or you've knit, like you've heard of MongoDB or document-based databases and like, you're kind of curious about it, but it might be helpful to start for like, just do a high-level overview of what traditional SQL database schema design looks like. Like there's just like normalization, just like, and then we can start going into like how that compares to a document-based database.

CHARLES MAX_WOOD: Yeah, that's fair. Yeah. Start with what I know. I like that.

JOE_KARLSSON: Exactly. Let's start on a common point. Most people, let's meet people where they're at. You know what I mean? You're probably coming from an SQL background. I guess let's talk about normalization. Typically when you're designing a database schema for a SQL database, we talk about normalizing the data. Basically what that means is we're splitting the database into our tables into separate tables and we're joining them by foreign keys. There's four different levels of normalization. I always forget. Typically we do the third form.

AJ_O’NEAL: There's six. The sixth level is every piece of data just goes into one column that always references itself.

JOE_KARLSSON: I've never seen that in the wild.

AJ_O’NEAL: That's not real.

JOE_KARLSSON: And I get it like from a math, it's all math based right? Like and the question too you're asking yourself when you're normalizing with an SQL database, it's not like how am I going to be using this data? You're asking yourself what data do I have? So it's like, like for example, I always use the example of like a school schedule. I'm designing a school database, I have students, students have classes, students have professors, right? But yeah, I probably have like a student table and maybe a professor table. Maybe they're like, we can link them by like a advisor, right? Like this, this professor is a student's advisor and we linked that with the foreign key.

AJ_O’NEAL: So I, I'm going to, I agree that this is how most people think about this.

JOE_KARLSSON: Yeah.

AJ_O’NEAL: I don't think this is a good way to think about the problem.

JOE_KARLSSON: Ooh. All right. Let's hear it.

AJ_O’NEAL: Well, so what you just described is what I call categorical thinking. And this is what we tend to do when we're asked to engage the logic part of our brain. We tend to make this illogical jump to start putting things into buckets or classes or categories.

JOE_KARLSSON: Yes.

AJ_O’NEAL: And we forget the use case.

JOE_KARLSSON: Yes.

AJ_O’NEAL: So I, it's not that I think that when you put things into categories, you often will arrive at an optimal solution for general use cases. So if you have no other metric, which to guide you sorting things categorically will yield good results, but I would say if you want to, no matter what type of database. The thing that you should start with is what are the questions that I'm going to need to ask the database?

JOE_KARLSSON: Yes.

AJ_O’NEAL: Because it's more important to know what the data will represent to the person that needs to consume it. Then it is to ask the question categorically, how do these things go together?

JOE_KARLSSON: I totally agree. Yes. One of the things too is like, cause you have to consider performance, right? Like, cause if you're splitting things up unnecessarily and you need to like, do joins every single time, right? Like that's expensive and time-consuming and sort of blocking your operations. So like, you're right. But certainly, can we group things that we need to access often together in a SQL database? That makes so much sense. I wish people would think about that more too. And I think that that's not something, a thought that really comes up unless you start seeing performance issues in your application, which is honestly fine, right? Like you may want to like restructure, like adjust the schema based on performance hits.

AJ_O’NEAL: There are tricks you can do that will I mean with all things I'd really love to learn how Mongo works in this regard. But there's tricks you can do where either you're going to bloat the size of the database.

JOE_KARLSSON: Yeah,

AJ_O’NEAL: you're going to increase read time or you're going to increase write time Like you're always sacrificing your people talk about databases typically among like the acid metrics Yeah, but that's important for consistency like perfection consistency or eventual consistency. When you think about performance, you're either sacrificing read, write, or storage. You're not, you're never going to reach a trifecta unless you only write the database ever once, in which case you

JOE_KARLSSON: probably don't need a database, but yeah.

AJ_O’NEAL: You're sacrificing in one of those areas and can do tricks. Like if you want your data to remain consistent, you can add foreign keys that on reads, you don't check because you don't do a join. So you could duplicate data across multiple tables, say put a person's phone number in many different places. Have a phone numbers table that you only ever reference when you put a phone number in market as a foreign key so that you have a relationship that if you ever need to restructure the data, you have an easy way to get to it, but then you don't actually use it or join on it. So the ID of the phone number is the phone number. I don't know if that's a little too low level for talking about a podcast where we don't have a white.

JOE_KARLSSON: Jumping right in here. No, no, no, that's totally right though. Right. Like, yeah. And if you have a use case, or you don't need to pull that phone number up every single time, like, great, that makes so much sense. We just need like a reference to it in case we need it someday.

AJ_O’NEAL: Well, I'm saying that if the idea of the thing if it's a true enum, that its ID is represented by its value, like Matt is not a good enum because there are many instances of people who are named Matt, but they are not the same. However, in the entire world, there is only one phone number that is my phone number. So its ID is its value. That's what I was saying is that on those cases, you can get optimizations, which are quite often things that you want to do like states phone numbers, things that typically, if you were taught in a classroom, they would tell you, Oh, you need to join the state to the country and you need to join the country. You know, they tell you to do all these joints, but in reality, for these things where the ID and the value are synonymous, you don't need to do that. That's what I was saying.

JOE_KARLSSON: I see.

AJ_O’NEAL: We got a little too much into the weeds.

JOE_KARLSSON: No, that's awesome.

AJ_O’NEAL: That's the point I was trying to get at.

JOE_KARLSSON: Yeah, but you're right. There's always trade-offs, right? Like for indexing, if you add an index, your rights are super fast. You're sacrificing the other way or your reads really fast. Your rights get slowed down, right? So you have to re-index that table every single time you make a change the database at all. Or another thing you can do with an SQL database is denormalize it, which basically means you remove those tables, keep all that data in a row. So you don't have to do those joins anymore or you have duplicate data in that database. I'd say that's not an anti-pattern, right? So you're sacrificing, you're increasing the number. Duplicating data.

AJ_O’NEAL: You're angry the sequel gods The thunder is rumbling. The lightning is ready to strike you.

JOE_KARLSSON: Yeah.

AJ_O’NEAL: No, it's totally I agree with you I agree with you. I'm just saying like the things you're saying sequel heads man. Oh, yeah What's this episode?

JOE_KARLSSON: I'm gonna get it for this one I think it's it's an option right like if you're seeing if your joints are slowing like becoming a bottleneck like that's a that's a common thing to do normalizing denormalizing our data. Yeah, I guess any other anyone else want to bring anything up about SQL based normalization or scheme design before I move on to MongoDB.

AJ_O’NEAL: You were on a good path. I'd like you to finish your thought before I cut you off.

JOE_KARLSSON: Oh, no, I don't have much else to say about it. I think it's, I just want to bring up the point, right? Like we're, you're used to using tables and columns and joining by foreign keys. And we split up based on the data we have, not always right. But it's, we're, we're used to bucketing things. And I think duplicating data is scary for most people coming from an SQL background, which makes sense. It's like, you start thinking about like increasing the chances of read, write errors or whatever, right? But like.

AJ_O’NEAL: I think the reason that we don't like to duplicate data is that data becomes quickly inconsistent. Every place you have the duplicate piece of data means that if you need to change it, you need to know two things. One, that you're changing it in every place that it needs to be changed. And two, that the value is identically representative, meaning that if I'm going to change every single place, let's say that we have my phone number one, 801-555-5555, right? If that's in the database, if we were to just do a full table scan and replace every instance of that number, would we in fact only be replacing the instances of my phone number? And intuitively you'd say yes, because obviously only I can have that phone number, except that some places in the database that will be my wife's phone number and that phone number doesn't change if it's the house phone, for example, right? Or some places in that database, it will be the phone number of the administrator for that group, which perhaps the phone number is, you know, I guess the better example would be if we're, if we're going to replace the, replace the administrator's phone number, which is my phone number throughout the database, that's the example where we'd see very clearly, okay, I'm going to go change every place this phone number exists. Well, I just accidentally replaced AJ's phone number for his house and for his wife and for all these other things. When, what I intended to change was a specific instance of the phone number that was scoped to be just an administrator phone number. Totally.

JOE_KARLSSON: Yeah, I think if you bring up a good point to about, yeah, there's a lot of risks with data. And I think databases are one of the scariest things to deal with, especially like a production database. I think it's the least sexy part of a web application. Like I've never heard anyone come and brag to me about their amazing schema design. You know what I mean? No one, that's not something you can talk about.

AJ_O’NEAL: You're not hanging out with the right people, Joe.

JOE_KARLSSON: I'm not. I work for a company.

AJ_O’NEAL: I can introduce you to some of those people. I can introduce you to some of those people.

JOE_KARLSSON: Even my coworkers are like, I've never heard, I've never heard any of them brag about their schema design. But it's so important. Your business logic is important and if it gets knocked out or gets changed, right, like deleted, that could be catastrophic for business. Or you.

AJ_O’NEAL: There are beautiful patterns that you can create no matter, I'm sure no matter which database you're using, but there are things to be proud of. It is largely boring because largely you're just categorically saying fruits are connected to grocery store sales or connected to customers. But there are times when they're beautiful and elegant solutions.

JOE_KARLSSON: I totally agree. I totally agree with that. All right, cool. Anything else about SQL design? Otherwise we can jump right into MongoDB. Let's do it. So MongoDB design, this is going to be the most frustrating part. I'm going to come back to this over and over and over again. But I think with SQL database schema design, there's a lot of rules. We have normalization. There's a lot of best practices. It's been around for forever. With MongoDB schema design, there is no rules. And I think that that freedom can be really great or can be leads to really bad decisions. And that's why I think it's important to kind of learn about it. And AJ, you touch on this too. I think the most important thing you could ever consider when you're designing your MongoDB schema is how you're going to be using it. And every application is going to have a different schema design. Honestly, for me too, the most important thing I'm considering is how I'm going to be consuming that data. I'm trying to like, I want to, in order to keep performance high, I want to keep the data that I'm going to be accessing often grouped together in a single object or document. And stuff that I don't necessarily need every single time, I'm probably gonna be putting in a separate collection or a separate document linking those with foreign keys. I also wanna touch on a couple of misconceptions I think MongoDB has. And this is honestly our fault, but we're a schema-less database. We call ourselves a flexible schema database now. You can actually enforce a schema from a database level. So you can be as flexible or inflexible as you want. If you are coming from an Esco background and you wanna lock that whole thing down or like you work for a huge bank, awesome. Let's lock that schema down. Or if you only want to lock down a couple of key value pairs, maybe keep them a couple more flexible, great. Or if you want to like constrain the number of things, you can like a couple of different data types or whatever in a single key, you can totally do that too. But the point is, oh, and then asset transactions too. So splitting things up, you can also ensure that you're updating multiple documents within a single transaction and it'll roll it back if any of those fail. So to recap, there's no rules. The only rule is that it's based on whatever you're trying to build and I'm going to be as flexible and it also supports asset transactions. Yeah. Does that make sense so far? We on everyone on the same page so far,

AJ_O’NEAL: except the part where you said it supports asset transactions, because how can you both be scalable and support? First of all, what is asset?

AIMEE_KNIGHT: I was gonna say, I feel like we talked about this in the last episode.

STEVE_EDWARDS: No, let's recap it.

DAN_SHAPPIR: Yeah, but it's one of those things that I think you need to define each and every time. I don't think anybody will go back to listen to that entire episode.

JOE_KARLSSON: Let's go back to 2316 in the last episode. No, totally. Acid is a psychedelic drug. Acid is something we talk a lot about in databases and acid stands for atomic, isolated, consistent, and durable. It basically means that every transaction you have in the database is separate and contained. And if any, if you're doing like update, delete, create, delete, update, delete, like all those would be a single transaction. And if any of those fails, it rolls everything back and you're guaranteed the previous state before the transaction. We'll throw an error. It also means durable means that if you turn that your database server off and you turn it back on again, your data is not going to disappear. So like something like memcache where all the data is in memory, right? Like it's all in RAM. And if RAM gets, if your computer turns off your RAM memory is cleared. Right. So that's not durable,

AJ_O’NEAL: but also databases like Redis use caching. So they're only durable to a certain degree. If the database gets turned off at a particular time by, by data.

JOE_KARLSSON: Yes, absolutely. Yeah. And MongoDB. So like

AJ_O’NEAL: a commit does not represent saving in Redis.

JOE_KARLSSON: And by default with MongoDB, if you're updating a single document that is acid trick compliant. But what's new now is you can actually do acid transactions across sharded clusters with MongoDB. I think that came on like four three, which means like if you have data spread out all over the world, we can actually, you can actually do multiple trends, like multiple things in your database, a couple updates, deletes, creates whatever and across the sharded cluster. And if any of those fail, we'll roll back.

AJ_O’NEAL: So I'm going to give an example of something that is kind of not real, but it is illustrative. So I'm on Amazon. There are four, there's a quantity of four for a fuboldenger. And so five people come in by a fuboldenger. And this is not how Amazon would do it because this would never work, but it goes and updates the quantity of fuboldenger table and decrements it every time. One of us is in Asia. The other one is in America. Our local replica is representing to us to the quantity is now one. We both purchase, we both get an order confirmation. Because so this, this is the problem of atomic rights.

JOE_KARLSSON: Yes.

AJ_O’NEAL: And though there's two ways to kind of just solve it that I'm aware of. One is that a specific shard is specific to a piece of data, meaning that the person in Asia and me over here, if we want to buy a FUBLDanger, then we actually have to make our HTTP call or our database rights specifically to the server. That is the server that manages foobledeggers out of all the Amazon products. And that could work. And another strategy, which is the strategy that Amazon actually employs and the airlines employee, and that almost everything employs is to break asset compliance and say, well, we're just going to let everybody buy foobledeggers and when the database resolves first right wins in this particular case and last right gets an email notification saying, we're sorry, there was a problem with your order.

JOE_KARLSSON: Yes. I love this. Yes, absolutely.

AJ_O’NEAL: So how does Mongo solve this problem? It does it do both eventual consistency and the charted right?

JOE_KARLSSON: Yes, absolutely. So you can set the level of consistency you want on both reads and writes. So let's say I want to, let's say the data is charted on a cluster. I know I only want to read once all the data has been written to all the different clusters or like, I can only do one or like. You know what I mean? Or like it'll send a receipt back once it's written to one or it's been duplicated to all of the backup databases as well. And you bring up an excellent point. The freedom to do that is great because you can really fine-tune the application based on what you need. And the path you're always making is performance. So I think it's called linear reads where it's like everything like. I do a read and a write and a read and a read and a read like, and it all comes in the database in that order. And you're guaranteed to get that data. Like it's going to be correct a hundred percent of time, but it's super slow. And Amazon's decided that they they're willing to sacrifice some of that. They want higher performance.

AJ_O’NEAL: Well, you can, if you do the sharding, then your write locks scale with your number of shards. So for certain applications, like I mentioned the example of you find these specific shards that is responsible for that specific data set. Right. But you're not gonna get as much performance as if you say we're going to let things flow in and then reorganize them and then resolve errors.

JOE_KARLSSON: Totally, yep. I was thinking like.

CHARLES MAX_WOOD: I wanna chime in here for a second. You guys have totally convinced me just to hire a DBA.

JOE_KARLSSON: Ha ha ha ha ha. It gets so, especially when you start talking about scaling.

CHARLES MAX_WOOD: It feels like it. Woo. Right? I mean, it feels like there are, you know, a zillion things that I have to understand, you know, you and AJ going back and forth, you know, half this stuff is going over my head. The other half of the stuff I've played with to some degree. Yeah. But yeah, I mean, do I have to understand all this stuff with sharding and everything else?

JOE_KARLSSON: Yes. Well, it's with MongoDB, honestly, if you were deploying your own databases, yes, you do. That sucks. However, we have a new cloud option Atlas that we basically manage all this for you. And also have like a bunch of machine learning stuff happening on the backend. Help make giving you advice for what's going to work best. Like we call performance advisors. Like, Hey, we see you accessing this thing a lot or people in China are accessing this data a lot. Maybe you want to show Geo shard that data. So it's closer to them or we'll auto-scale it up and auto back it up. And you just like in the GUI, you're just setting the level of read, write consistency you want.

DAN_SHAPPIR: Before you continue, I have to like interrupt. It's a thing that kind of annoys me about this sort of discussion. Where, look, I understand that we all are under pressure and we all need to get things done. But the assumption that if understanding and knowledge is required, then it's somehow bad, is not acceptable to me. I mean, some decisions have consequences. And if you're going to make these decisions, you're going to require some basic knowledge. So if you, if, if the database that you're building, it has to have like really great performance or you're committed, you're, you have some SLA requirements for consistencies and whatnot, then yes, you may need to get a DBA either that, or maybe you need to become a DBA yourself in order to come to the appropriate decisions. Now it's really great. If Mongo is able in certain situations where I'm, you know, the customer is acceptable to be more lenient or not so precise and you're kind of able to walk this person through the process. But some complexity is essential complexity. That's what I'm trying to say.

JOE_KARLSSON: Yeah. Oh, just these massive distributed problems are fairly new for us. Like I think, or at least like a lot of us dealing with it. And I feel like a lot of the tools we have in the market today are like still hard to learn. Kubernetes to me is a super difficult concept like Brock, you know what I mean? Distributed problems are super hard. And once you're talking about scaling massively, setting up a single MongoDB database, awesome. I think it's super easy to learn. You're just throwing in some data into this way we think about. But I've never seen, I haven't seen a database be able to massively scale without being able to master it.

Are you stuck trying to figure out how to get to the next stage of your developer career? Maybe you're just not advancing fast enough in the job you're in or you're trying to break in in the first place, or for whatever reason you keep going to interviews and it's just not working. You wanna land that dream coding job but it just doesn't seem to be working out. Well, Johnson Mez has written a book for you called The Complete Software Developer's Career Guide. He walks through each stage of the development career and all of the things that you need to do in order to move up, keep learning, keep growing and find that next job that's going to get you where you wanna go. So if you're stuck and trying to figure this stuff out, go pick up the Complete Software Developer's Career Guide. It's the number one software development book on Amazon. It's sold over 100,000 copies so far. I actually have friends of mine that reach out to me and go, hey, do you know this John Sonmez guy? Because his book is awesome. So go get the book. You can get it at devchat.tv slash complete guide. That's devchat.tv slash complete guide.

CHARLES MAX_WOOD: Right. I guess, I guess that's what I'm driving at too though, is that, yeah, I mean, if I'm doing the massive scale, if I'm going to be throwing, you know, millions and millions of records into it, right. Then it's a different conversation. If I'm just setting this up for my, you know,

AJ_O’NEAL: reasonably small database.

JOE_KARLSSON: Whatever it is. Yeah. Big one. Yeah.

AJ_O’NEAL: Millions of records is a few thousand people on a chat app. Yes.

CHARLES MAX_WOOD: Fair enough. But all I'm saying is, if we're talking about hundreds of big gigabytes of data that I have to munch through, it's a different story than if it's, hey, I just want to set up this small app that keeps track of these hands full of things that write. And I've struggled with the schema just at those levels. And so I guess what I'm driving at is, okay, are there gradations to this, right? Where if I'm just building a smaller app, it's like, look, make sure you're doing this handful of things and then you're good. Right. And then when you scale your setup, well, to be able to do these handful of things. And then when you get really big, then you're right.

AJ_O’NEAL: I'm going to argue that when you scale, you need to have a DBA. I think you need to not worry about scale until you scale. And then when you scale, you need to worry about scale because if you, if you're designing your app for scale, you're going to waste too much time. And if

DAN_SHAPPIR: you're just destroyed the dream of so many startups all over the world. I'm planning my startup to scale to 10 trillion users from day one.

AJ_O’NEAL: If only there were 20 trillion users Joe, are you raising your hand?

JOE_KARLSSON: I am

AJ_O’NEAL: Like you had like one of those tingles and you were just raising your arm to get blood flow

JOE_KARLSSON: I know I do that too. But also it was just trying to jump in. I just want to say so AJ, I think that's like totally valid. I just maybe we should focus just on like a beginner use case too for people. And I do want to say because like most people don't care about scalability. Like honestly, when I'm making a web application, all I want to do is put data in there and I want to say to get it back. I think that's what most people care about. You know what I mean? Like I don't want to

AJ_O’NEAL: and not to expose it to the public internet. That's a bonus.

JOE_KARLSSON: Yeah, that's a bonus.

AJ_O’NEAL: People don't care about that at first. But it usually comes to bite you once you get beyond a thousand users.

JOE_KARLSSON: So I totally agree. I mean, that's what I think that's why Firebase is so successful too. They just like you put data in, you get it out, right? Like, and I think Atlas we're trying to get there too. Also, can I make a wild speculation? That's gonna be really embarrassing in five years about the future of databases. I think that we're going to, I think that a lot of this is going to be manual or like automatically handled for us in the future. Right now, like it's still complicated and we're not good at it, but I think we're machine learning models are going to get better at auto-scaling, auto managing our databases. You know what I mean? Like picking up things that are, I mean, that's the dream. I like, I'm seeing it go that way. I'm like, cause no one has to worry about this stuff. Just like have a computer manager data for you and make sure it stays fast and bring it back up again if it fails.

AJ_O’NEAL: I agree. And when you look at, it sounds like you're already doing this with Atlas based on what you just said, but Postgres calls their machine learning, the, the query plan. It basically looks at data coming in and out, recognizes patterns and then says, Oh, you're doing this type of thing. I'm going to start treating the data this way.

JOE_KARLSSON: Yes.

AJ_O’NEAL: Based on how, based on how large a data set is, it will decide to rewrite your joins to fit what it knows about your data from being the database.

JOE_KARLSSON: That's it.

AJ_O’NEAL: Sounds like Atlas is doing the same thing.

JOE_KARLSSON: Yeah. That's what I mean. That's what, that's really what we want. I don't know. I don't like no one takes joy in having to re-design schema for production database. That's a scary, dangerous thing to do. But if it can be automatically handled, that's-

AJ_O’NEAL: And it's a perfect use case for machine learning because it's all data that the system that manages the data knows both what the data is and how it's being used heuristically.

JOE_KARLSSON: Exactly. And I'm sure you've seen this too, but I find that the tools that succeed in tech are the ones that are easiest for developers to use. And I get it, right? Like asking developers to understand-Massive distributed computing problems to make a database is a lot of work. No one wants to do that. I don't want to, I don't want to do that. You know,

AJ_O’NEAL: plenty of people want to do that. Joe can they do it? Well, anyway, you're talking about a, like a, the basic use case.

JOE_KARLSSON: Yes. Absolutely.

AJ_O’NEAL: Simple, the simple app.

JOE_KARLSSON: Yeah. Let's see here. Okay. So with this, yeah, let's get back to schema design here. It's actually, I'm going to break. It's actually pretty easy. There's. Two things you can do with a MongoDB database and everything we do is built on these. You can either embed it within a document or you can reference it, that's it. And similar thing with a SQL database.

AJ_O’NEAL: You can either put it in the row or you can join it.

JOE_KARLSSON: Exactly, it's the same deal. And I think you have a little bit more flexibility on your structure because you're using a document or like an object. So you can do an array or whatever. There's lots of different things you can do with it. But the question you should always be asking yourself is, should I be embedding this data or do I need to reference it somewhere else? And again, that comes down to what your data looks like. I'm thinking, should we just start with like some simple, like simple examples and I want to get more complicated and kind of do more complex schema examples. Is that cool?

AJ_O’NEAL: It's simple.

JOE_KARLSSON: All right, let's say, okay, one to one, right? Like I think this, we'll just go through these. I think we have like one to one, one to few, right? And we typically think about those with SQL too, but like a one to one example would be like, I have a user document and my user has a name, right? And so we just have a key value pair that saves that data. Pretty simple, right? We're on the same page there. You're just using key value pairs. You could do, I don't know, address or city or whatever, right? Like just a simple key value pair. One to few, right? We have arrays or nested objects within there. So let's say I have several addresses. Maybe I'm saving my work address, my home address and my light comb or whatever. You know what I mean? You'd save that in array. The key to what I'm thinking is like, you want to embed the data within the document. If you, my default is embedded. My default is embedded. Unless I have a compelling reason not to reference it within that object.

AJ_O’NEAL: From the perspective of, let's say I'm scaling my app and I've hired my Mongo DBA. I would imagine it's, it's probably simpler in Mongo than it is in a typical SQL database to create a new schema that will transition things from embedded to referenced.

JOE_KARLSSON: Absolutely.

AJ_O’NEAL: Assumption, correct?

JOE_KARLSSON: No, totally. Yeah. You just like could. Do those joins and add that to the document. You just update all those. Yeah, absolutely. So for example, too, with the addresses, right? Maybe I have an array of addresses. If I wanted to access that data every single time I was making a query to the user table, I would probably embed that. But I might pull that out if it was, I didn't need that every single time. Or I had another collection that was going to be using that more often or something. I would probably move that into a separate database.

AJ_O’NEAL: Can it do, so under the hood, is it JSON strings or which is document style or is it optimized in such a way that I could pull particular columns out of a row so that I could get both the advantage of embedding lots of things and also the advantage of not having the performance hit of only needing a few things out of the full embed.

JOE_KARLSSON: Yeah. Oh, absolutely. Yeah. You can for sure do that. Yeah. So it's MongoDB documents are B-SAN which MongoDB actually supports or like runs that standard for it, but yes, you have different data types you'd be saving for that too. Yeah. So you can get like, you can get kind of tricky too with how you're like sharing or saving the data. Let's see here. Let's do...

AJ_O’NEAL: And one other thing, you made a new vocabulary term that didn't exist before, or probably does exist in the Mongo community. But one to few.

JOE_KARLSSON: Yes.

AJ_O’NEAL: One to one, one to many, and one to few. So bounded sets.

JOE_KARLSSON: Yes. Yep, absolutely. Yeah. So one to few would be like... So with MongoDB, there's a 16 megabyte limit per document. And one to few, you have like addresses. A person probably doesn't have 16 megabytes of addresses. They're going to be added to the survey. So we're probably safe.

AJ_O’NEAL: And they're probably not shared by anyone else or when they do, it doesn't matter.

JOE_KARLSSON: Exactly.

AJ_O’NEAL: So referential being able to do analytics on a particular address is not useful. Yes. Meaningful.

JOE_KARLSSON: Yep. But for example, like a one to many, we're like, it'd be, you have lots of things in there. So I use the example of like, let's say you're, let's say you're an Amazon and you have a product and you want to keep track of all of the sub parts. Like I have N64 screw and a foobar chunk. You know what I mean? Like you may have thousands of parts per thing. Maybe you want to do a flake warranties or assembly instructions or for whatever. But in that case,

AJ_O’NEAL: people that have purchased a product or number of products that people purchase.

JOE_KARLSSON: Absolutely.

AJ_O’NEAL: Unbounded sets.

JOE_KARLSSON: Yes. Yeah. And that's where you like start getting to like dangerous territory. Plus with this product thing, if I'm making an e-commerce store, like Amazon, I'm probably, I probably like when someone hits my product page to view it, they probably don't need to see that gigantic list of parts every single time they do that list. So it's going to be wasted bandwidth sending that data back and forth. So I need a reference to that for that product, but it's not important that I actually have all that information every single time that page gets loaded. So if I'm designing that schema, that's a flag to me, then I'd probably want to reference that data because if I'm not using that data all the time, let's split it up, this not access every time, what a waste, you know?

AJ_O’NEAL: So one-to-one, the object itself, one to few bounded sets of related data about the object that probably doesn't need to be queried or analyzed in a meaningful way. One too many unbounded sets of relationships between two objects.

JOE_KARLSSON: Yes, absolutely. And actually there's another one too. So one to squillions, we call it. That's like, that's like even bigger. So one to many, like we have parts, like we have the parts, there could be thousands of parts. One of squillions is like, you could, I use the example of a server log. So right like servers were writing all of our logs to a MongoDB database. And we have an unbounded massive data set that could be sending to that data or to that database, like for that server, for that log. We call that one to squillions.

AJ_O’NEAL: So this, this would be the use cases things that I'm probably not going to search on frequently for that.

JOE_KARLSSON: Yeah, absolutely. And, and we want to be mindful of the size of it. I do something. So when I'm just, if I was designing a log system like this, I would do something a little bit different.

AJ_O’NEAL: How do you spell squillions?

JOE_KARLSSON: S-Q-U-I-L-L-I-O-N-S. I think it's a made up word. I think I just made that up.

CHARLES MAX_WOOD: It has an S-K-Q and an L in it.

JOE_KARLSSON: Oh, it's true. Ooh, I like that. But if I was designing this log system, what I would probably do with my schema is I would have a single document that kept track of some metadata for my server. So this is like server, I don't know. S3, I don't know, it's Chewbacca 1. I know we're calling it Star Wars, or server Star Wars characters. But each log then, instead of keeping a reference, each log within that document as like an array, because that's going to get huge, right? What I would do is every single time I make a new log, I'm creating a new document and I'm linking the foreign key of the server metadata into that log data. So we don't have to worry about it at all. It's easy to make queries on that, super easy. And there's no embedding. So if we're comparing that to the one to many where we have all the product parts. I'm still referencing an array of those product parts in that document. This time I'm not referencing those at all. My logs are responsible for keeping track of it's which server it's actually connected to. Does that make sense?

AJ_O’NEAL: That does make sense.

JOE_KARLSSON: Okay. Yeah. Cause it's, I mean, right. Like even a reference can start to get huge. That I'm not like, I

AJ_O’NEAL: actually, we should ask someone else if that makes sense. It makes sense to me. I might, I might be the wrong person to be answering that question.

JOE_KARLSSON: So I said there's no rules and I see the ironiness, but I do have some general rules to follow. And if you have an unbounded array, that's a key that you want to reference and maybe split that data up. So let's review the rules again. So the first one, we talked about one to one, just embedding that in there. And my first rule is prefer embedding. And we talked about one to few, where you have like the list of addresses. We just want to keep that inside of there. And then we talked to one to many, where the product, all of the product parts in there. It's a nested array of all those product parts. We probably don't need all that data all the time. So we can reference those like the actual, all the data about that product via separate document we're linking there. I, and my next rule then is to avoid joins because they're expensive and slow unless you have a compelling reason to actually do it, right. And we talked about splitting that product part up. And then we talked about one of squillions where it's even bigger. We have a totally unbounded. It's going to keep going server logs and get crazy. Right. And we're doing like a reverse reference to the host. So what are we doing?

AJ_O’NEAL: One question about, asked this earlier, but I wasn't very precise in it and I wanted to get a little more clarity. So if I was, if I was going to adjust relationships in a SQL database, pretty much without, with very few exceptions, the easiest way to adjust that is going to be for me to use a programming language like node or go. And to iterate over all of the affected rows and to manually make the copies or the data exchange, it might do it in a transaction, so I might bolt them, but I'm going to write some sort of code that is fetching and making writes in the code that I write because there's nothing that I know of where you can say, here's the schema, here's a migration for the schema and the database to be able to interpret that migration.

JOE_KARLSSON: Yes.

AJ_O’NEAL: It's with Mongo, do I have to follow the same approach where in my code, I have to handle the migration as code or does Mongo have a way where I can publish say, you know, this is version one of the schema, here's version 1.1 and have annotations in that so that Mongo would know how to update the schema without me having to go and update everything.

JOE_KARLSSON: We currently don't have a migration tool, but you've touched on a pattern. Actually. You can version your documents and I would typically just do that as a key value pair. It's like V 1.1 or whatever to kind of keep track of where things are or what needs to be updated. But you still have to write the code.

AJ_O’NEAL: So for performance, what I could do in that case, if I didn't care about everything getting done at the same time and didn't have, well, I guess even if I had analytics that relied on it, they could still follow this pattern. In front of my read method, or rather behind my read method, I could add something that checks the document. And if it's an old document, rewrites it on the fly as it's being pulled out.

JOE_KARLSSON: You totally could do that. Absolutely. Yeah. And it depends on what you want to do with it too. If you just want to like overnight run a migration that updates your schema or whatever, that's totally fine too. But yeah, you can totally do that. Absolutely. You have the control to do that. If that was something that was made sense for your application. I do want to go over one other one, which I think is good to go over with MongoDB schema. I feel like if I talk about schema, people like it would get too far into the weeds. People's eyes start to glaze over, but I do think it's helpful to touch on something that's more concrete to help kind of understand it.

AJ_O’NEAL: Well, with what Dan said, I think that we kind of need to be a little less resistant to that type of mentality. Like if you're being paid to be an engineer, you're being paid to use your brain.

JOE_KARLSSON: Yeah, no, totally.

AJ_O’NEAL: And using your brain is sometimes I mean, requires understanding things that which were not intuitive to you before.

JOE_KARLSSON: Yeah, that's true.

CHARLES MAX_WOOD: I agree that the only point that I was trying to make at that time was that there are some things that I need to worry about right now and some things that I don't. Yeah. And so what I wanted to know was what things do I need to worry about right now? And what things can I put off until I realized that I'm heading into a place where I need to care about the other things.

JOE_KARLSSON: Yeah. I think that's our job, right? Like there's a lot of things we can do. There's always phase two.

CHARLES MAX_WOOD: I have a giant problem to solve.

JOE_KARLSSON: Exactly. Yeah.

AJ_O’NEAL: And if you all could use the word thing a little bit more often in that conversation, I think it's even less understandable.

JOE_KARLSSON: Yeah. Is this too abstract? I don't think so.

AJ_O’NEAL: So there's a thing and when I need to know the things, then if the other things matter, that time the thing I need the thing will thing it.

JOE_KARLSSON: You guys do a great job with this actually. Like I think it's so hard to describe technical concepts in a podcast. You can't like it is I can't point to a board on it, right? That's hard to see your everyone's going to visualize this in our heads. But I think you guys do a great job.

AJ_O’NEAL: I'm just I'm just teasing you just so you know.

CHARLES MAX_WOOD: And really what I'm looking at is yeah, I mean, I have I have a big pile of stuff that my boss. So what's the minimum viable amount of knowledge that I have to have in order to do my job? And then is there like a framework or some system that I can use so that okay now you're starting to see this, go solve and learn this other thing before it becomes a big problem for you.

JOE_KARLSSON: Can I give my personal opinion on this that is not substantiated by any facts?

AJ_O’NEAL: You did not raise your hand long enough. Please raise your hand for a district.

JOE_KARLSSON: If John's at home, I am raising my hand every time. I learned a lot in school, but my opinion on that is like, you wanna get it working, and if it starts breaking, then figure it out. And even with databases, like, you know what? Like-

CHARLES MAX_WOOD: That's programming 101.

JOE_KARLSSON: Exactly!

CHARLES MAX_WOOD: It is.

JOE_KARLSSON: Just throw it in a database, honestly, like, and this is my personal opinion, but like just throw it in a database, and if you start seeing performance hits on it, time to come back and some do like rethink your schema, rethink your scalability. If it ain't broke, don't fix it, you know what I mean? Also, like right now, like with databases, like if it starts slowing down, you could, easy quick fixes. You can throw an index on it, you can shard it, you can spin up more servers. You can pay money to kind of delay that. But, you know, it's, just get it working. Even people who are in here today, if you don't grok everything I'm talking about here today, don't worry about it. I still recommend just make crummy database, make some mistakes, iterate and grow. That's okay, that's totally okay.

AJ_O’NEAL: Amen.

JOE_KARLSSON: Yeah. I just want people to make more stupid stuff that sucks. Because that's the way that we learn.

AJ_O’NEAL: I feel so mixed about that.

JOE_KARLSSON: Let's talk about it.

AJ_O’NEAL: It's so true. No, no, it's just, it's so true. It's so true, but it's just, it's frustrating when you walk into that. Like, when you're the DBA that gets hired on for the app that is now approaching 10,000, 100,000 users, it hurts. I mean, I haven't been in that situation perfectly, personally as a DBA, but. Just saying like, it's the double-sided sword. Like if you don't do the stupid way, you'll never learn. And if you do do the stupid way, at some point you got to go back and, well, if you're lucky, you have to go back and fix it.

JOE_KARLSSON: And that's been a criticism of MongoDB too. It's like, it's so easy to learn. Beginners can pick it up and mess it up. Totally reasonable too. But yeah, it's also easy to fix too. And I don't know, I feel like it's really gatekeepy to have a, to insist you have a database that's really hard to learn. Because newbies won't be able to touch it. I think that sucks.

AJ_O’NEAL: I think one of the most difficult things about databases is the differences that don't matter. You know, the things that are frustrating are not SQL itself. You know, being able to do a couple of joins. That's not frustrating. People understand JSON. That's not frustrating. What's frustrating is that, you know, in Postgres, if you want to describe a table to figure out what the heck you did, it's slash D but If you want to do it in my SQL, it's described tables, but if you want to do it in MS SQL, it's show table description. And if you like, I don't know if that's actually it, I'm getting some of these wrong, but it's like, and if you want to do it in Mongo, there's some way to do it in Mongo, if you want to do it in couch, I think the frustrating part is like all the little tiny thousand paper cuts. They're actually not important to the data. They're actually structuring the data. I don't find that to be the hard part. You know, that's not where I I mean, it's where you rack your brain because you're like, let me think about this and make a good solution. But it's not like where you're stuck in, you know, pages and pages of documentation, trying to figure out an example that actually does the thing that should be obvious that you want to do.

JOE_KARLSSON: Yep. Can I talk about another concrete use case? Actually. So the first time I was on this podcast, it was talking about my IoT kitty litter box, and I used a MongoDB database for that. And I liked this example because it's kind of goofy, but I also think it's easy to understand. But when I designed the database schema for that data, for my IoT kitty litter box. The main thing I was thinking about was how I was using that data. And what I was using it for was I built a dashboard that displayed the number of times my cat went to the bathroom per day. So I started thinking about my schema in terms of like, how do I make that? How do I design the schema that I can just, that data is ready to go. And it's so what it did, it's, it's a pattern we invented called bucket pattern. And every single day it creates a brand new document in a database. And it's a one to few or one to many. So my cat's probably not going to go to the bathroom, the equivalent of 16 megabytes times. It's probably, you know, three to five, maybe. But so I just have an array and I'm adding a new document every single time you use the data, using the box with a timestamp and then it's super, I just make a query and it's super easy to throw on a graph it's in the structure. Or I already need it. Does that make sense? Like every example is different and how you're using that and what kind of performance you need affects how you access and design the schema for that application.

AJ_O’NEAL: So Amy, Amy had a question on analytics. This totally like throwing this thing off the rails, but I see you had a good point. I was throwing rails. And and time is running. So I want to make sure Amy hasn't been raising her hand.

AIMEE_KNIGHT: Cause I have my camera off. Are you sure now's a good time though?

JOE_KARLSSON: Let's do it.

CHARLES MAX_WOOD: I was going to say her photograph has not moved this entire time.

AIMEE_KNIGHT: I always turn it off because I don't know. I feel funny. Like people watching me anyways. Yeah. So my question was like, I feel like more and more companies are doing different stuff with just, you know, the amount of data they have and stuff like that. And like one of the popular things I've seen with MongoDB is people will like take the data in MongoDB and if they want to put it somewhere for like analytics rather than using MongoDB or something like that, they'll like do an ETL, extract, transfer, load into BigQuery or something like that. So I guess my question is, in your opinion, is that still the best bet or is that something that like MongoDB is looking to keep you know, like in-house so that you can use like a MongoDB solution or is the recommendation still to just like move it into like a data warehousing tool?

JOE_KARLSSON: It totally depends. Like classic, classic answer. It depends. I think, I mean, we obviously support the tools to do data analytics. So if you want to like once an hour or whatever, you have a separate database that just goes and runs some aggregate queries and kind of repopulates your database or whatever, that's totally possible and totally fine. But yeah, if you have like another tool that you want to throw that into, cool. You can totally export that out or some people would use MongoDB as their data analytics, so they'll save their data on an SQL database and then just have that run analytics queries in the background. Yeah. I don't know. It depends.

AJ_O’NEAL: What type of analytics Amy? What type of analytics?

AIMEE_KNIGHT: Yeah. I mean, I guess to me, I guess from like the stuff that I'm thinking of, it's a lot of data, so I feel like it would probably be better in something like BigQuery, which is Google's.

AJ_O’NEAL: What's a lot?

AIMEE_KNIGHT: In this case, I'm talking...

JOE_KARLSSON: Petabytes, terabytes?

AIMEE_KNIGHT: Oh yeah, yeah, yeah, yeah. Petabytes. Like, you know, so... And it might be... I might be.

AJ_O’NEAL: I can't imagine Mongo being a good tool for petabytes of data.

AIMEE_KNIGHT: Yeah, so that's why I'm saying...

AJ_O’NEAL: That's not to knock on Mongo, but that's a very different problem.

AIMEE_KNIGHT: Yes, well that's why I'm wondering, is that something MongoDB is looking to solve, or are they just not interested in solving that kind of problem?

JOE_KARLSSON: Absolutely. I mean, I'm sure anywhere there's a chance for us to get more people saved data on our database would be a win for us too.

AJ_O’NEAL: But petabytes, man.

JOE_KARLSSON: That's a lot. Yeah. It's, it's hard to tell. So I can,

AJ_O’NEAL: do you have petabyte customers right now? Can you like, we do.

JOE_KARLSSON: I can't, I can't say who they are.

AJ_O’NEAL: Well, that's fine. But you do have petabyte customers.

JOE_KARLSSON: Yes, we do.

AJ_O’NEAL: Okay. Yeah.

JOE_KARLSSON: It depends. It's hard. Like those sorts of comes down to like performance and I feel like performance is a tricky thing. And the only way to tell if there's like, what's going to be more performance is to like test it. But yeah, maybe you have like, maybe you're indexing on the right things or you're aggregating at the right. I don't know. It depends. We are right now to what we're doing is trying to move the MongoDB query language into more areas. So like we just unveiled this thing. You can now query Amazon S3 data or you can use like S3 buckets and we can query those with MongoDB. So you can like archive mass amounts of data and we call it MQL or the MongoDB query language to actually use it. We're seeing that the industry now is starting to adopt the query language that's so intuitive for devs, right? Using like dot notation to kind of build queries.

AIMEE_KNIGHT: It really is. Yeah, that's one of the things I like about it. It's so easy to learn.

JOE_KARLSSON: Because it's like you're writing it and saving it the way we think. Cool.

AIMEE_KNIGHT: As a JavaScript developer, it feels very JavaScripty.

JOE_KARLSSON: Exactly. Yeah, so we're trying to move that in more areas, which I think is kind of kind of cool. We talk about this too, like learning new stuff sucks. I have a limited amount of time. I can't learn much stuff. I already know MongoDB. I want to use that to make as many queries as possible in as many different areas as I can.

CHARLES MAX_WOOD: This is an area of interest for me too because I've been wanting to build my own podcast statistics analytics system, right? Because the ones that are out there all suck. I'm sorry. I have friends that work at those companies. They suck.

JOE_KARLSSON: Totally. Yeah. Or just a custom solution. That makes total sense. Yeah, absolutely. No, I totally agree. Especially if you're building something, I think it's, I mean, MongoDB, obviously, it's an easy way to get it set up. You can start solidifying that schema, growing it, scaling it, whatever you need to do. But I'm obviously biased, but I think it could totally work, but it depends, it depends. It might not always be the best solution. Well, thank you. Yeah, I guess any other questions about schema design? Do we get too deep? Do we need, should we do a quick recap here? Do we wanna?

CHARLES MAX_WOOD: Yeah, we gotta do it fast. I have a hard stop in 15 minutes, but yeah.

JOE_KARLSSON: Yeah, okay, cool. I'll just do like, let's do a TLDR. I'll probably forget something here, but MongoDB schema design, there's no rules and basically have two ways to handle MongoDB schema design. Either embedding it within the document or referencing it. And some considerations you want to make when you're designing your schema is like how and where you want to save your data. You want to provide a good query performance and you probably don't want to use an unreasonable bot amount of hardware to build your application out. Like no one wants, we want to try to keep our data costs as low as possible. Right. But yeah, we talked about one-to-one. So that's just like one user and maybe, or like an employee and have they work in one department. So that'd just be an embedded document. We talked about one to few. So it'd only be like a handful of things on there. Talk about like embedding that in the document as an array. Then we talked about one-to-many and that's where we started talking about actually splitting that data up into references or joins in the separate documents or separate collections. And you're joining those with a reference. And we use the example of a product and keeping track of all those parts as references in there. So you probably don't need that data every single time. And then lastly, we talked about one to squillions, which we talked with that even bigger, more massive data that you're saving, right? Like a log system for a server.

AJ_O’NEAL: Did you just go to be clear? Did you make you made up that term or that's a term that Mongo uses MongoDB uses it?

JOE_KARLSSON: Yeah. Okay. To be uses it. Yeah, I stole it. I stole it. Thank you. Good credit. Good credit to my corporate overlords. But we use embedding references.

AJ_O’NEAL: Did you say overlords? You make protectors

JOE_KARLSSON: protectors. That's right. Yeah, my my dearest protectors. There's a bunch of other ways to do it too, which we didn't talk about today, like tree structures or craft structures, many to many relationships, like a to-do list with lots of users that have lots of different shared to-do lists. We didn't even discuss those today too, but those are the basics of MongoDB Scheme Design. Any questions? Anything I left out? We talked about how start easy, build it till it breaks. Distributed systems are really hard. Don't worry about it too much until you need to worry about it or hire someone that needs to worry about it. Again, personal opinion, but I don't know building it, just build it and then fix it once it breaks.

Are you freelancing or mood lining? Or maybe you've thought about going out on your own. Every week, we have a group of developers at various stages of a freelancing journey on The Freelancer Show to talk about becoming better at freelancing. We also bring in experts to talk about marketing SEO and delivering high quality to clients. So if you're interested in going freelance or you are freelance, check it out at freelancershow.com.

CHARLES MAX_WOOD: Well, I'm going to push us to picks because like I said, I've got to jump off in about 12 minutes.

JOE_KARLSSON: Love it.

CHARLES MAX_WOOD: Steve, do you want to start us to picks?

STEVE_EDWARDS: Yeah, I've been talking too much, so I haven't had a chance to think about anything. Welcome back, Steve. I've just been overwhelmed with the depth of knowledge of AJ and Joe and everybody, you know. But seriously.

JOE_KARLSSON: You flatter me.

STEVE_EDWARDS: No, today, so I've been trying to find a pick here and I'm going to go with the good joke that I found today because I never tell jokes anywhere else on the podcast. So one of the favorite tweets that I've seen referenced over the past year or so is the Flat Earth Society and talking about how they had members all around the globe. And so the new one I saw today was the COVID-19 has really been stressing out the Flat Earth Society because they're stressed that all the social distancing is going to push their people over the edge.

JOE_KARLSSON: So you got one left.

STEVE_EDWARDS: That's great. I will not claim that I wrote that myself. I just, I, uh, I borrowed it. That's just how I borrowed it.

AJ_O’NEAL: I have to say I did chuckle when I saw it on Twitter, even though I had, you didn't get the full effect here. Cause I'd already. You'd already shared that.

STEVE_EDWARDS: Well, I got a laugh from Joe. So I got something.

JOE_KARLSSON: It counts. I didn't even get a single sharding joke in today. I'm disappointed in myself. At least you're bringing the jokes. Yeah. Dang it.

AJ_O’NEAL: I feel like sharding jokes are inevitably going to be party jokes.

JOE_KARLSSON: It's the only one.

STEVE_EDWARDS: Yeah, that was my thought too.

AJ_O’NEAL: That's where my brain went.

JOE_KARLSSON: Yeah. That's the only, that's the only place. Steve, thanks for bringing the jokes. Bring the humor to the podcast.

STEVE_EDWARDS: Anytime. Absolutely.

CHARLES MAX_WOOD: Amy, what are your picks?

AIMEE_KNIGHT: I'm going to go with, let's see. It's been a while since I did a non-technical pick. So I'm going to go with that. I bought on Amazon, which I've been wanting to buy for a long time. I don't know if it's like just like quarantine and sitting too much and not working out like I used to, but like my back has just been like, just so cramped up to the point where like, I feel like I can't even focus most days. And I'm just like every hour like on my phone roll or trying to like roll it out so I can keep focusing. But I bought an acupuncture mat and it is very painful to lay on at first, but eventually you get used to it. It's only 20 bucks. And I feel like it's helping. My back is like, you can literally see like these like punctures in it when you get up off of it, but it's helping. And so that's going to be my pick. I see Chuck Googling.

CHARLES MAX_WOOD: I know. I'm going to have to check it out for my wife. We've been trying to find ways to reduce her pain.

AIMEE_KNIGHT: Yeah. I mean, it's like in like my, like my traps and there's like a little pillow. So I've been like leaning on the pillow thing and it's not real. They call it an acupuncture mat. It's not really acupuncture, but it's supposed to like increase blood flow just because it's kind of like has these little baby spikes that go into your back. But I do feel like it's helping a little bit.

CHARLES MAX_WOOD: Why they got all kinds of stuff. Put a link into the one you bought and

AIMEE_KNIGHT: I'll will do.

CHARLES MAX_WOOD: All right, AJ, what are your picks?

AJ_O’NEAL: Let me unmute here. Okay. So I. I found an article of SQL versus no SQL when I was searching during the podcast here, I was looking for something about big data and then this popped up and it seemed to have like a pretty good breakdown of different use cases and whatnot. So I'm putting the link in there to that because I think that it is actually a pretty decent article. I'm also going to pick Jonathan Colton of so much fame. Oh my goodness. It's such a great musician bringing you just all sorts of nerdy nonsense and particularly Chiron Beta Prime, which is what I was making reference to earlier. Did I say overlords? I meant protectors. So if you're if you're stuck on Chiron Beta Prime and you need to send your relatives a Christmas card, just send them that link instead. I am also going to spout out about a tool that I created called SSH pub key. Very, very simple thing. I've got the cheat sheet up on webinstall.dev and basically, you know, you just, you have these times and this works on windows too. This works on windows also. So that's important. No, but you have those times when you just want to grab your public key pretty quick and you might be on a new machine and, and you know, it's not that it's hard, it's just that there's like three or four steps and I just wanted a single command that creates the key pair if it doesn't exist, prints the public key to the screen and puts the public key in the downloads folder. Because like, for example, if you're using Amazon, you can't copy and paste it. You have to upload it and then you need to have a name that makes sense. So it copies it into the downloads folder as the username.ssh.bub or whatever, which is important because otherwise on Amazon, you end up with like six different keys that are called IDRSA and then you don't know which one it is. And it's also hard to navigate to the hidden.ssh folder from finder and similar. So anyway just a little utility for those of you that understand the problem. You understand the problem for those of you that don't, you don't know biggie. Thank you. There's some other stuff on there. You might like to, if you look at web install.dev, if you've got any suggestions, let me know speaking both specifically to Joe, as well as anybody listening to the audience. And then I am going to pick the same thing that I picked last time that Joe was on here, which is the ours. Technica war story. I have to go find the link again, but I'll post it again. In which the guy who coined the term sharding, which had to do with magical shards, I think is what it was. Like the kingdom was governed by shards. Some game, it was the first massive multi online player game that required databases. Well, maybe not the first that required them to be sharded, but the first in which the term became popularized. And the interview is just the war stories are, the Ars Technica war stories are just amazing. I highly recommend. If you listen to this podcast, you would absolutely love watching the war story video interviews

JOE_KARLSSON: and confirm. They're great. They're really fun to watch and like very accessible.

AJ_O’NEAL: Yeah. They're, they're high production values. So they, they, you know, tease the questions and the answers probably a little better than we do. The 10 minute versions are super digestible in the full, like our versions are so amazing.

CHARLES MAX_WOOD: Cool. I'm going to jump in here with a few picks. One pick that I have, and this is a book I'm reading right now. It's called Leadership and Turbulent Times. And I can't remember the author's name, but I'll put a link to it in the show notes. It's been an interesting read and she goes through, uh, and talks about four different U S presidents and some of the turbulent stuff that they went through. One of them is Abraham Lincoln. I'm trying to remember the other ones.

JOE_KARLSSON: And Doris Goodwin is the author.

CHARLES MAX_WOOD: Yeah. Yep. And anyway, so it's interesting. I don't necessarily come down politically you know, agreeing with some of the presidents that she talks about, but they all did go through stuff and exhibit some leadership qualities that are worth emulating. So, you know, sometimes you have to overlook the parts that you don't disagree with to get the gold. And so I'm going to pick that book. And then I kind of want to pick some other stuff, but I'm having trouble remembering what it was I was going to pick.

JOE_KARLSSON: Too many good stuff.

CHARLES MAX_WOOD: Yeah. So I'll hold on to it. I'll pick it next time. But yeah, Joe, do you have some picks for us?

JOE_KARLSSON: I do. I'll go fast here. We're reading, actually we're doing a book club at work, but we're reading Stamp from the Beginning by Ibram Kendi, but it's been great. It's been an awesome discussion. Like just, it's been a great place, way to like talk about race at work, which has been kind of uncomfortable, but it's been, it's been helpful for all of us.

AIMEE_KNIGHT: We've been doing that here too. It's kind of cool. They gave us recommendations.

JOE_KARLSSON: Right, and I feel like race is something that we've been told like not to bring to work, but like obviously it's a big deal right now. So like, I think it's, it's been great. I don't know. And the other thing I want to talk about too is if you're inspired or interested in learning more about data modeling, we actually have a free course on university.mongodb.com it's m320 data modeling. I'll have a link to that in there as well. That's it for me.

CHARLES MAX_WOOD: All right. Well, thanks for coming, Joe. If people want to follow you online, where do they find you?

JOE_KARLSSON: Oh, I'm joe Carlson. That's J O E K A R L double S O N the number one on Twitter and also on TikTok. So I'm on there too, but that's just my first last thing. Hit me up on Twitter.

AIMEE_KNIGHT: And Instagram for good stuff.

JOE_KARLSSON: And Instagram, yeah, same. It's just my first last thing. Amy and I are Instagram friends.

AIMEE_KNIGHT: Good videos.

JOE_KARLSSON: Like. Yeah.

CHARLES MAX_WOOD: All right, well, we'll wrap this one up. Until next time, folks, Max out.

AIMEE_KNIGHT: Bye.

Bandwidth for this segment is provided by Cashfly, the world's fastest CDN. Deliver your content fast with Cashfly. Visit c-a-c-h-e-f-l-y.com to learn more.

JSJ 448: MongoDB Schema Fundamentals with Joe Karlsson

0:00

1:08:23

Playback Speed: