Full-Text Search And Name Search With Postgres - EMx 199

Peter Ullrich is Senior Elixir Engineer at Remote. He is also an experienced Elixir Developer, certified Blockchain Engineer, and Entrepreneur. He joins Allen and Adi to talk about his blog articles, THE COMPLETE GUIDE TO FULL-TEXT SEARCH WITH POSTGRES AND ECTO, and EFFICIENT NAME SEARCH WITH POSTGRES AND ECTO. He starts off the show by explaining what inspired him to write these articles.

Hosted by:

Allen Wyma •

Sascha Wolf

Special Guests:

Peter Ullrich

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

About this Episode

Replacing "ElasticSearch" with "Full-Text search"
Benefits of using Full-Text Search in Postgres
Understanding Indexes in Postgres

Picks

Allen - Actual E2E Testing with Cypress, Vue and Elixir/Phoenix, using Ecto sandbox
Peter - Watch Inside Men | Netflix
Peter - Smashing Security
Peter - Building Table Views with Phoenix LiveView
Sascha - Inscryption on Steam

Transcript

Sascha_Wolf:

Hey everybody and welcome to another episode of Elixir Mix. This week on the panel we have Alan Weimar.

Allen_Wyma:

Hello?

Sascha_Wolf:

and me, Sasha Wolff, and we have a special guest this week. That is Peter Ulrich. So Peter, why don't you tell everybody who you are, why you're here, and what we are going to talk about today.

Peter_Ullrich:

All right, thanks, folks, for having me. Well, I'm Peter Ulrich. You might have seen me on Twitter and Alixirfirm and stuff. I'm a senior software engineer at Remote. I live in the Netherlands, and I have a blog on peterulrich.com that I like to fill every week or every two weeks with a new blog post. Recently, a lot about Postgres, and that will also be one of the topics today. Well, I guess in particular, it will be full-text search and index, sorry, name search. with Postgres and why you don't need Elasticsearch maybe. All right. Thank you very much for having me today.

Sascha_Wolf:

We're happy to have you and I just couldn't fail to notice that I pronounced your last name German and you pronounced your own last name English.

Peter_Ullrich:

That's true.

Sascha_Wolf:

That was weird.

Peter_Ullrich:

Now, I like to make it easy for English speaking hosts. So I say Ulrich, but actually not German. It's Ulrich. That's true. Yeah. But nobody can pronounce that. So I don't even try anymore.

Sascha_Wolf:

Ellen, do you want to try?

Allen_Wyma:

I think I went to school with somebody and we called him Ulrich. I forgot.

Sascha_Wolf:

Okay,

Peter_Ullrich:

Exactly.

Sascha_Wolf:

sorry for derailing.

Peter_Ullrich:

Moving on.

Allen_Wyma:

Yeah.

Sascha_Wolf:

But yeah, moving on. It's

Allen_Wyma:

Or Euric Euric. Warwick

Sascha_Wolf:

just gonna be the rest of the episode. Okay, so yeah, I saw your blog posts and I feel they're out there in the dev community, at least from my experience, people reach for Elasticsearch very quickly. It's like always this, hey, you want to do full text search, why not do Elasticsearch? It's already added to the stack. It's already a Docker Compose file. How did that happen? All right. So how, maybe give us a bit of a story. Like how did you end up trying to do this in Postgres? Was it more like a personal interest thing? Was it a work thing? What's the story behind this?

Peter_Ullrich:

Well, it started with the last blog post that I wrote before this, which was about name search, and especially the I like and the similarity operators on Postgres. And I tend to use my blog post to also learn something new. And I just want to dive into how to do text search in Postgres, basically. And when I wrote the first blog post about name search, I realized that. the methods that I'm describing there, the I like and the similarity operators, they don't quite work if you have very long texts. And I knew about full text search with Postgres, but I'm actually I used it once in production. We actually replaced Elasticsearch with it because it was too expensive. And we thought, you know, we already have a Postgres database. Why should we use an external service for a very small, short abstract

Sascha_Wolf:

Makes

Peter_Ullrich:

text

Sascha_Wolf:

sense.

Peter_Ullrich:

search, basically? Yeah. So I took the second which is about full-text search. And I just started to dive into it. And pretty quickly, I realized that there isn't really one resource, like one blog post or book or something, that explains all the facets, all the use cases of full-text search with Postgres. And there were lots of different articles which always highlighted single things, but not a single complete guide, actually, to Postgres full-text search. So I used that. challenge, and it took me three weeks, to be honest, to write that 4,500-word long article in which I dive into many different use cases. So if you only have a single language, for example, there's a very easy approach to that. If you have a fixed set of languages, it's also easy. But then what happens if you want to add languages dynamically? So you have a forum, and then people decide, I want to post in Spanish now or in Italian or

Sascha_Wolf:

Mm-hmm.

Peter_Ullrich:

something. Then you need to. to be dynamic in that way. So that took me a long time to figure out. And in the blog post, I explained how, and maybe I can explain it to you later too.

Sascha_Wolf:

Yeah, I briefly dabbled with something like that. And at that point, I already realized there's like different algorithms for tokenization you can use, which depending on the language do different things. So it's actually quite a rabbit hole to go into. You might think, hey, full text search, oh, how can it be? Well,

Peter_Ullrich:

Exactly.

Sascha_Wolf:

as usual, there's complexity under the hood, which is not apparent from the first glimpse at it. I'm actually interested to hear a bit more about what you just said. You said at this one time at work, you actually replaced like an Elasticsearch, Riverfall, etc. Maybe you could tell us a little bit about that and also like what you learned from that and how that then contributed like maybe to in details to your blog post.

Peter_Ullrich:

Of course, yes. So my employer before Remote was a company in Germany in Cologne. And we had jobs for students, basically. So we had job descriptions. And then students could apply for them and work in these jobs for a couple of hours, a week or so. And so we had these descriptions of the jobs. And there were always like between 100 and 300 words, approximately. and we wanted to make that searchable. And I joined a team that used a very old Ruby monolith, and they had problems updating the Ruby monolith, and they wanted

Sascha_Wolf:

Classic.

Peter_Ullrich:

to, exactly, they were still on like Ruby 2.2 or 2.3, 2.3, I think, yeah, when already the latest was 2.6, but I'm not a Ruby developer, I'm sorry if I get these versions wrong, but it was old and we couldn't update it, and everything was breaking and nobody wanted to touch it. So what we decided was to replace that old monolith using the strangler pattern, where you extract smaller parts of the giant monolith. You identify like small components, functionality components, like for example, this tech search, and you try to extract it into a new system. It could be a new monolith or it could be a microservice. In our case, it was a microservice. So we identified this particular tech search, this job search, basically. And we saw that we used Elasticsearch for the tech search. And also that Elasticsearch version was like two years behind. We couldn't update because of the Ruby library and so on. So it was like a big, big problem. And we decided, OK, instead of first updating to the latest Ruby version and then updating the library so that we can talk to the latest Elasticsearch version, we decided to just get rid of Elasticsearch because we already had a Postgres database with all the data lying in it. just quote unquote, needed to add a full text search on top of it for a single language, which was German at the time. And we also kind of knew the what like, sorry, what I meant is we added the text ourselves. So we didn't have external user input, like people writing

Sascha_Wolf:

Okay.

Peter_Ullrich:

blog posts or forum posts. So we knew that the content that was in that text was safe, so to say, so we could easily implement it ourselves. And didn't really have to worry that much about the security or the safety implications. But also, that's actually not a big problem if you implement full-text search. So yeah, it was a very controlled experiment. So we decided to just slowly replace the Elastic Search with our own full-text search there. And it worked pretty well. I mean, we had a switch over, so to say. At one point, we just decided, OK, from now on, we're going to do all the job search requests, we're going to handle them in our new microservice. So there was the strangler pattern again, where at one point you just turn off the old service. And it worked flawlessly. So that was the first time I dabbled with full-text search. But it was one of these things, as you said, Sasha. You just know that you need it, or you hear about it. And then you just do some basic research on it. You understand how it works. You set up the basic logic, maybe write one or two tests, and then you just deploy to production. And the. you don't really go into the details of everything and why it works and how it works and how you can improve it because also for us, we had a couple of thousand maybe texts to search through. So it wasn't millions of documents

Sascha_Wolf:

Hmm.

Peter_Ullrich:

and even performance efficiency, it wasn't that big of a concern to us. So we could easily deploy it. And now once I wrote the blog post, I thought, okay, I wanna dive a little bit deeper into this topic. So I did and I wrote a very long blog post about it.

Sascha_Wolf:

Yeah, nice. I'm always kind of amazed by how people find the motivation and the capacity to write these long ass blog posts because I can't do that. I don't want to try to start my own blog. And I was like, Oh man, this is like actually finishing a blog post is. I'm not built for that.

Peter_Ullrich:

Yeah,

Sascha_Wolf:

Peter_Ullrich:

no, I, I,

Sascha_Wolf:

congratulations

Peter_Ullrich:

Sascha_Wolf:

for seriously.

Peter_Ullrich:

Thank you. But you're right. It takes quite a long time, especially if you do these deep dives, as I tend to do. And this particular, the full text search blog post, it was maybe three weeks, like three weekends. So in total, maybe 20 hours of my own time. And then my regular blog posts, which tend to be a little bit shorter, even those are like eight hours, so like one Saturday or Sunday.

Sascha_Wolf:

Mm-hmm.

Peter_Ullrich:

So yeah,

Sascha_Wolf:

Yeah.

Peter_Ullrich:

I spent a lot of time on these, but I learned a lot of this as well. So that's why it's worth it for me.

Sascha_Wolf:

Yeah, makes sense. I just blabble about things in podcasts that works better for me.

Peter_Ullrich:

Sure. You let other people speak.

Sascha_Wolf:

Exactly. Exactly. That's how I learn on your things. Shh. Don't tell it to nobody. That. Um, but to come back to before maybe we go into like the nitty gritty details of what you uncovered. Is there anything you would say, like, especially welcome with full text search and maybe also with that, but like one use case you just talked about you. where you now know, hey, I wish I would have known that before, you know, like some common wisdom or some pitfalls or anything like that. We're like, hey, when I started learning about this, this would have been so helpful, you know.

Peter_Ullrich:

Yes, not really related to full-text search, but while I was doing the research on this topic I also had to work a lot with indices in Postgres, so with gin index and in general indices and I really finally understood how they work and how you can make them work and how you can force them to work sometimes and it was quite interesting. So my biggest learning there was basically if you have any query. Like if you, for example, look for a short string in the database, like a name, for example, if you have a WHERE statement, where my user's name equals that, what's on the right side? Like the thing you compare it to, that thing needs to be in your index, almost one to one. So when you have a full-text search, for example, you can create TS vectors of your text. So it's like a very simplified. version of your text. It's actually called lexemes, which is the basic foundation element of the language that we're using. It doesn't really make sense when you see it, but it makes sense in, I guess, linguistic terms. And so whatever you want to find, that thing needs to be in your index. So I had a lot of problems where I thought my index should work, but then Postgres decided not to use it, because, for example, there was a variable somewhere. I created a ts vector. as T as vector and the language that I used, I used from a different field. And that field was dynamic. So my vector itself was dynamic. And if it's dynamic, Postgres doesn't create an index for it because it thinks it can change. And I had to do a lot of tweaking to get it to work. But eventually I did. And that was one of these things where I'm just saying. If you learn more about this topic, about full text search, you will also learn a lot about more indexes and these kinds of things. So it's quite helpful there.

Sascha_Wolf:

I actually have an anecdote to share here, which is not related to full text search, but like the importance of what you just said with like all the fields on the right, because we had this one performance issue at a place I worked at where like some queries were super slow and it was also like depending on user input. It was so weird. And like if a specific collection selection of fields, basically, if you filter by that, then that's super slow. And at the end of the day, like with like query explain as it turns out, hey, we actually missing this one field and this one index over here. So it's actually doing like a sequential scan of the whole table, which has millions of entries, but only for like this one particular thing. Otherwise it fell back to some parts of the query and then it only took a smaller scan and that was then fine, but yeah. uh, indexes is like sometimes feels like a dark art. And I feel that it's probably one of the areas that I think that's what you're trying to say where people would probably benefit from at least having a basic understanding of how like indexes work and how Postgres chooses to use an index or rather any relational database. And when maybe some types of indexes make sense and when others, I mean, at the end of the day, we are all not. database administrators, but still I feel nowadays, with like with a cloud being as readily available as it is, people just spin up a postgres in AWS or in Google Cloud or whatever. And then there is no database administrator which screams at you and says like, hey, we need an index here, right? So some of those things I feel is a topic where the average backend developer really could benefit a lot from formatting at least like a more. No one's understanding can just tell you let's slap an index on it.

Allen_Wyma:

That's interesting, too. That's actually where I tried to do is speed up some queries on a project. I just started adding indexes and basically running queries and then seeing the explain and then adding indexes. And actually, I found the biggest boost. I don't know if this is weird, but the biggest boost was when I went from a T2 micro to a T2 small. I don't know why, but a micro to a small was a huge spin up. And I don't know, it doubled or tripled the output. So sometimes. Indexes are not enough, so maybe try bumping up the database also, because I think, I don't know, do you guys understand the t2 small and t2 micro? It seems big enough that there was a noticeable thing. It was just hanging sometimes for some weird reason.

Peter_Ullrich:

So you went from the big instance to the small instance and that sped

Allen_Wyma:

No, no,

Peter_Ullrich:

up the

Allen_Wyma:

the

Peter_Ullrich:

thing

Allen_Wyma:

micro

Peter_Ullrich:

by three times

Allen_Wyma:

is the smallest one. So a T2 is just basically like a development one. It's good. Like, my client is very small. It's not like billions of users. I mean, not even hundreds and thousands of users. It's just like a couple hundred users a month. And it's enough for a T2 micro to handle, except for some reason, some of the queries were running very slow. Even adding indexes to it didn't make much difference. But I just bumped up from a T2 micro small in AWS. And it was like, it was like when you if you guys remember the first time when you when you take a hard drive, a spinning hard drive and then put it on SSD, it was like that.

Sascha_Wolf:

Yeah, yeah,

Allen_Wyma:

That

Sascha_Wolf:

yeah.

Allen_Wyma:

was

Sascha_Wolf:

Okay,

Allen_Wyma:

like the

Sascha_Wolf:

okay.

Allen_Wyma:

feeling I was like, Whoa, what what the heck happened? This thing is zooming nowadays. It's amazing.

Sascha_Wolf:

It sounds like something that maybe you just reached your RAM limits, right? Like where I had to do

Allen_Wyma:

It could have

Sascha_Wolf:

swapping

Allen_Wyma:

been, yeah. In

Sascha_Wolf:

because

Allen_Wyma:

general,

Sascha_Wolf:

I'm...

Allen_Wyma:

you probably shouldn't use any T-instance in a production system. And also a T-MICRO is probably not a good idea. But yeah, anyways, just saying. Maybe indexes

Sascha_Wolf:

Shoulda,

Allen_Wyma:

are not the...

Sascha_Wolf:

coulda, right? If

Allen_Wyma:

You

Sascha_Wolf:

Allen_Wyma:

find

Sascha_Wolf:

works,

Allen_Wyma:

out.

Sascha_Wolf:

it works

Allen_Wyma:

So if

Sascha_Wolf:

Allen_Wyma:

indexes

Sascha_Wolf:

a certain

Allen_Wyma:

aren't working

Sascha_Wolf:

degree

Allen_Wyma:

out for you,

Sascha_Wolf:

for

Allen_Wyma:

try

Sascha_Wolf:

small

Allen_Wyma:

to bump

Sascha_Wolf:

projects.

Allen_Wyma:

the database instance is kind of my antidote to that one.

Sascha_Wolf:

Yeah, okay. Makes sense.

Peter_Ullrich:

But it's so tricky because Postgres is such a good piece of software, and it makes it so easy for you to just set up your project and have the first functionality, and then even scale with it to, as you said, maybe a couple of thousand users. And you don't really have to worry about it. If you have a problem like that, OK, you throw some money at it. But the difference there is also maybe $100 a month or so, maybe not that much. So it's very easy to just use what Postgres gives you and scale with it. And I think that's maybe also a downside, because then you never really get into that situation where you actually have to debug why your index isn't working or why you use that much RAM, unless you get a user base of a couple of million people or so. And then it's inevitable to do that kind of check. But yeah, it's a little bit like a, it's really cool to use, really easy to use. But then it also makes you lazy, I guess.

Sascha_Wolf:

Yeah, I know exactly what you mean. And we just, I feel like why at least some basic level of understanding how the different parts work together can be helpful to at least avoid some of these pitfalls, which is also why I asked you, what are some things you might've wanted to know at the beginning? Like what are some common pitfalls to avoid, right? But which

Peter_Ullrich:

Exactly.

Sascha_Wolf:

is why we ended up at indexes.

Allen_Wyma:

Have you actually heard of the website called use use the index? Have you heard of them before? Yeah, yeah. Use the index.

Sascha_Wolf:

You always

Allen_Wyma:

Luke.

Sascha_Wolf:

know the weirdest things, Allen.

Allen_Wyma:

I came up because I was looking for something in Postgres and there's a website called use the index luke.com. Obviously it's supposed to like use the force and they, they cover basically the whole things about like why you should be using indexes and how each index works and they break it down. DB2, which I'm not quite sure exactly what that one is. I think that's some Oracle one. MySQL, Oracle, Postgres, and SQL Server, and they talk about stuff. And it seems pretty updated, because it says they can, they run tests from Postgres 9 through 14, which is pretty recent. I think 15 is not quite out yet.

Sascha_Wolf:

I'm, I'm, I'm, we can, I can include a link to it in the show notes and then people can check it out. So yeah, use index indexes are great and understand how indexes work. And so maybe getting back to the, to the topic at hand. So you. I'm actually curious, maybe before you, we go into the gritty bits is like, have you yet. Were you able to apply any of this Peter, like at work in production, like where you said, Hey, now I know about full text search now. So. This is a problem and actually this is a hammer and this is a nail that's hammered in, right?

Peter_Ullrich:

I'm sad to say that no, I have not used it, but I would feel much more comfortable using it. However, I have not used full-text search, but at the company I work at, at Remote, we have, of course, name search. So we have like

Sascha_Wolf:

Mm-hmm.

Peter_Ullrich:

10,000s of users, and we need to search for the names. And the blog post I wrote first is exactly about that. And it uses I like, the I like was a similarity operator. And especially, it also looks into how you can speed that up There, the fun fact is if you put the similarity thing on the right side, you don't use the index. If you put it on the left side, you use the index. So it's a very small, litty-gritty thing. I

Sascha_Wolf:

Wasted.

Peter_Ullrich:

don't know exactly why. I guess Postgres understands, again, the right side is a dynamic field, and the left side is a static field, but it's actually not. It's a little bit more complicated than that. I can recommend to look either at my blog post or other articles that wrote about this. But it's one of these very small details that you need to look out for. And after I wrote that blog post, I went to our code base and checked how we do it. And we did it exactly wrong. But again, also, as Alison said, we didn't even notice. Because if it takes

Sascha_Wolf:

Mm-hmm.

Peter_Ullrich:

5 milliseconds or 100 milliseconds still, it's fast enough for our users to use it. So we never optimized for it. But it was one of the things. Like when you learned about it, and then it's exactly as you said, Sasha, where you're like, If you have a hammer, you're looking for a nail. And well, if you know about this, you look into your code base and you find the nails which are, which are wrong.

Allen_Wyma:

But no, the other thing that I kind of ran into is I went once you had a couple of queries speed up dramatically from indexes or whatever, you're like, oh, I'm going to find some more. And you start going crazy for adding indexes to everything. Now, how do you know what things you should be kind of optimizing, which ones you shouldn't? Because I think once you get started, you're like, oh, I see something over here. I see something over there. That was kind of something I ran into once I started seeing the power of indexes and et cetera.

Peter_Ullrich:

Yeah, it's definitely a trade-off. And I would say you need to have observability of your system anyway. So maybe set up DataDoc or something where you can trace which queries you run and look at which ones you actually run. Because you can set up an index for every single field in your database, but it doesn't make sense if you never use the index. And this is just going to blow up your database storage capacities. or the need for that. And it's also going to slow down any insertions you make into the database, because the index, like if you have a schema with 10 fields and you have 10 indexes, if you insert a single schema, you need to add 10 fields to 10 different indexes. So it actually really slows down your insertion right there. So it's one of these things where you need to think about, do I insert more? Do I read more? All these trade-offs. So yeah, don't just create an index because you might need it in the future. I would rather say go without index until you see a problem and then add it. I wanted to, because you also mentioned other people that you learn about, like your sources where you learn things. And I must say, I learn a lot from a person on Twitter called Tobias Petri. He's another German. And I think it's tobiaspetri.sql. He is the database guy. He calls himself. And he posts these little snippets of ideas or of tips about using Postgres and MySQL, especially. which is really interesting. And he also has a monthly newsletter, which I follow. So if you, you know, not really spend a lot of time on learning Postgres and reading books and stuff, but if you just want to learn something on the side every now and then, when you browse your Twitter feed, maybe follow Tobias Petri. Highly recommended.

Sascha_Wolf:

His website also, if you want to check it out, we can put it in the show notes. It's sql4devs.com, all written together. So it's potentially also something you might want to check out.

Allen_Wyma:

What I also want to say, too, is for that same project, another thing I do is I ended up using what is the name of that tool? I forgot how you can gather statistics about everything. Sorry, I forgot the name. You can attach, and you can look for the repo query, and you can get the query time. And what I did was anything that was longer than 300 milliseconds, I would log out the query. Then it would kind of help me to figure out where some things were going a little bit slow. So I didn't need like Datadog or something. What is the name of that thing called again?

Peter_Ullrich:

I will be

Sascha_Wolf:

Telemetry?

Peter_Ullrich:

interested too.

Allen_Wyma:

Yeah, telemetry, that's right. Sorry, it's been a while. Yeah, telemetry, that's super useful. I think to at least, so you don't need like the, so in case you have something like locally or whatever, you could just use that to start getting an idea about where some things could be going.

Peter_Ullrich:

Can you explain how did you do that? I never really worked with telemetry.

Allen_Wyma:

Oh, finally, I know something somebody else doesn't know that feels good.

Peter_Ullrich:

Well, we already

Sascha_Wolf:

Yeah.

Peter_Ullrich:

established that

Allen_Wyma:

OK,

Peter_Ullrich:

this is how we learn. Just talk

Allen_Wyma:

OK,

Peter_Ullrich:

about it in the podcast.

Allen_Wyma:

just remember, I know what I'm talking about. Anyways, so actually, I saw this on a blog post somewhere. So I'm not going to say it was all my idea. But so in, I don't know, actually, I don't even know if this is the right thing to do. But so in my application.ex file, I added in telemetry.attach. I just called it ecto-logger. And then it's the name of your. application that's running. And then repo query. So these are all, obviously, it's a list of what do you call it, of atoms. And you attach the handle event function, which is arity of 4. And in your handle event, I don't like Visual Studio code. It's not easy to command click into where you want to go. There we go. Okay, actually work this time. Anyways, the second argument will give you query time, which is in some kind of strange time format. And then you have to use system that convert time unit and converted from native milliseconds over or whatever this time is to native milliseconds. And then you can also get the query and the parameters for that query in the third argument also. So what I do is I do a logger dot warn. So that way I just look for warnings. And I And then I use just format, you know, query time, how many milliseconds with query, throughout the query with params and give the params and just inspect the params. So it kind of gives me more idea because otherwise you just get the query with all these question marks or dollar signs. I forgot what, what it's using in this format. And you want to know what it is and what the data is because it could be a specific user or something, right? So I want to get as much information as possible.

Peter_Ullrich:

It's quite interesting. Actually, now that you said it, I also looked at a recent project that I just generated, like a Phoenix project, and it had the live dashboard included. And if you include that, actually, it sets up for you some telemetry metrics as well. So if you generate

Allen_Wyma:

Yeah.

Peter_Ullrich:

a Phoenix live view project with live dashboard, then you go to telemetry X and then you actually see some database metrics and VM metrics as well, so maybe you can use that as an inspiration too, but this

Allen_Wyma:

I think

Peter_Ullrich:

is a great tip.

Allen_Wyma:

you'll get an idea at least about what that something's happening. But I don't think you'll have an idea about which part of your query is giving the problem, which is what I was trying to solve.

Peter_Ullrich:

Right. Yeah. Because you needed to have the inputs as well, like the user

Allen_Wyma:

Yeah, I need

Peter_Ullrich:

Allen_Wyma:

to know which

Peter_Ullrich:

and these

Allen_Wyma:

query,

Peter_Ullrich:

kind of things.

Allen_Wyma:

right? That I can

Peter_Ullrich:

Yeah,

Allen_Wyma:

run it

Peter_Ullrich:

definitely.

Allen_Wyma:

locally and figure out what's going on. I think Sasha is actually probably has more experience with telemetry than maybe any of us.

Sascha_Wolf:

Nah. I mean...

Allen_Wyma:

OK, so again, I'm the master of telemetry in this room. So.

Sascha_Wolf:

I've used it, that's about it. But never

Allen_Wyma:

Yeah.

Sascha_Wolf:

in anger, you know? That is I feel where most of the valuable learnings come from when you use something in anger. There's also this very great book, All Long in Anger, which is basically about that.

Allen_Wyma:

Yeah, I never understood what that term actually means. It just means that when you use it like furiously and

Sascha_Wolf:

It means that when you use it in to such a degree where you encounter the ugly words and the, the, the, um,

Allen_Wyma:

Oh.

Sascha_Wolf:

where, where it breaks down, right. And then you really learn about, okay, the nitty gritty details of, of technology where, okay, if I push it to its limits, then it acts like that. And I can overcome it by doing this. And then. Basically, where a piece of technology you're using transitions from, hey, I know about this and I learned about it, to I've actually used this thing in production and it made me angry. So now I know its limits and I can know better when to use it and when not. Because I think all of us remember, not all of us, I guess, but a lot of people, I guess, remember the time when like MongoDB came into the market. Everybody's like, it's web scale. It solves all your problems. Then well, Reality is more complex than that. Is a document database suitable for every problem? No. Is it suitable for some? Yes. But as usual, it depends. And that is what I mean when I say using it in anger, and I have not used telemetry in anger yet.

Allen_Wyma:

I feel like every Java developer has done Java in anger. That makes sense. I

Peter_Ullrich:

Allen_Wyma:

always

Peter_Ullrich:

Allen_Wyma:

hear

Peter_Ullrich:

ha.

Allen_Wyma:

a lot of stories from these veteran Java developers, like, you know that JVM is like this, and the memory is mapped out like this. It's like, I have no idea how the memory is mapped out in Erlang and stuff, but it just kind of works. I never had to know about all this weirdness.

Peter_Ullrich:

Yeah, I feel the same. Actually, I recently, I tried to write another blog post essay about software design, especially design patterns. And when you get into design patterns, you very quickly find books about Java and C sharp. And they explain

Sascha_Wolf:

Hmm

Peter_Ullrich:

very nicely how you can have a factory or like a builder pattern and do all these nasty, very beautiful things

Sascha_Wolf:

Hahaha

Peter_Ullrich:

actually. And the fun thing is I thought, OK, this sounds cool. Let's take a factory pattern, I think. And let's do it in Ehrlichia. It's like, oh, it's just a pipe. It's like, you have a user, and then you have a next function with name, and the next function with age, and next function with location. And now it's a factory, apparently. But if you actually do it in Java, because you have the types, like the object, the classes, You need to define all these classes. And if you don't, well, there isn't like an any class, like you have in TypeScript where you can just say, ah, I don't care about this, should just be any. Well, then you actually need to be really imagining, like you need to imagine a good solution. And the funny thing is like in Java and C Sharp, you need these kinds of things. So it's like, literally you need to understand these patterns otherwise you can't do what you want. Then in Elixir, you don't even need to know about these kinds of things. You just need to build it as you want to. So... I always find it funny when people explain me these beautiful things about Java and I'm like you, Ellen. I think like, why should I know about this? Why do I need to know about this? Just the language should take care of this for me.

Sascha_Wolf:

I'm not sure who said it, so I'm gonna have to quote somebody without knowing who I'm quoting. I've read that basically design patterns are more often than not, or not always, but more often than not a workaround for a limitation of a language. Which

Peter_Ullrich:

Yes,

Sascha_Wolf:

does ring true to a certain degree.

Peter_Ullrich:

I would say so. Yeah, I mean, at least I tried to apply some of the patterns to Elixir and it didn't work. It was really weird to implement it that way. But you also have some bigger ones, like a facade or an interface pattern, or adapter pattern, these kind of things. It's not something, if it's big enough, then it's language independent, I would say, language agnostic. But those, it's like a handful, and you most likely use them already. But you just. never heard the name adapter, for example,

Sascha_Wolf:

Mm.

Peter_Ullrich:

or interface, but you use it anyway. So yeah, it's one of these things where, again, you should be forced to use them. Like, for example, if you generate a Phoenix project, what do you get? Context. What are context? I would say interfaces, to interface with the database logic, depending on how you use it. But immediately when you generate a library, you immediately have that pattern in your face, and you don't even know it's called that.

Allen_Wyma:

Yeah, and also, I mean, we do have some kinds of patterns, right? I think there's the book by some Prague prog writer. I forgot the Bruce. And I forgot the other guy who wrote this book about how to structure your apps, right? With the beehive. I think it said busy bees or angry bees or something. I forgot what the acronym was. But they talk about how to structure your app. So we do kind of have some kind of design pairs, because you have to have some kind of pattern or design or else your code will just be spaghetti. A very simple thing, I think, was, if I remember correctly, so I'm going to the Live View book recently. And I believe they talk about that in your context function, that's where things can go wrong. But in your module function, things should never go wrong. So you would see the change set stuff be inside of your module. for like that specific struct, you know, like a person or whatever. But the context like zoo. That would, you know, the zoo could either, it could create an animal or it could not. And that's where things can happen. Right. That's where you, and so that's, that's a way of design pattern. Right.

Sascha_Wolf:

Ah, I think that that's like where the line gets blurry. Some would say that's an architectural pattern, like keeping uncertainty at the edges of your system and then having inside in the core of your system being having this beautiful world where everything works, which makes it easier to like work in that context. And then when you kind of have to cross the threshold from the messy outside world into this beautiful inner world, that is where uncertainty should be lying. Otherwise you get into a state where Some parts of your system maybe assume the thing works, some parts don't, and then it becomes harder to work with. I'm not sure if you... I personally wouldn't necessarily call that a design pattern, more of an architectural pattern, to make writing maintainable software more easy, but like I said, the line gets blurry here.

Allen_Wyma:

How would you describe it then? What's the difference between the two in your

Sascha_Wolf:

A design

Allen_Wyma:

world?

Sascha_Wolf:

pattern is like something where like a specific way how you arrange some code to achieve a specific goal. And like what he just said is more like more of an architectural principle. Like it's not, you don't say I can use it. I can write a bunch of code and like it's always self encapsulating a certain type of behavior because it's kind of what design pattern does, right? Like it gives you a certain type of behavior for a piece of code. and then we can kind of have it encapsulated and do things with it. While an architecture principle is more of a, hey, if you follow this, you got these benefits, but it's more of like a bigger picture kind of thing. But that would be my, my take on it. Uh, so take it with a grain of salt. I haven't, I haven't eaten wisdom of a spoon. I haven't been fed wisdom of a spoon. That's, that's the saying.

Peter_Ullrich:

I find it interesting because I have a master's in software engineering and distributed systems. And our professor, he said basically, if you have architectural patterns, he couldn't even describe what is the difference between an architectural pattern and a software design pattern and so on. And I ended up with the definition that if you have an architectural pattern, it is on a hardware level, so to say. So an architectural pattern could be something with microservices where you have different patterns and so on. If you have a single server, then it's very hard to apply that architectural pattern. So then you'd rather have a design pattern, so to say. So the MVP model view, MVC, sorry, MVC, model view controller pattern, that is something where you can have all the classes, all the modules in a single server. And I would call that a design pattern. But yeah, it's not that easy. But in your case, Ellen, I would also say it's more like a coding principle where you say every service, every like module that I have, it should be, um, yeah, like independent, so to say, or like encapsulated. It should only do its own thing and not worry about the world outside, so to say. But I also heard different things where like the, if you have a higher level, like a handler or like an application service that uses different modules. So for example, your controller, right? A Phoenix controller where you get a request in. For example, that controller could also just orchestrate different services. And then these services might give back a true, like, OK, or an error. And based on that, the orchestrator, the controller, does something differently. We know that when you, for example, try to update something, like you have an update endpoint in the controller and you want to update a user, for example, Well, the first thing you do is check whether the user exists. And if it doesn't exist, then you don't do the rest of the request. That's another thing I wrote already. I think it's more a coding principle, where you decide for yourself how you want to code it. And then you just communicate with everybody else so that everybody tries to code the same way, so that if you go through another person's module, you know that it will look a certain way. And it's easier for you to understand how it should behave. And if you need to debug, then you can easily see why it doesn't behave as it should.

Sascha_Wolf:

Yeah, I always tend to say that most design principles, architecture patterns, so on and so forth, they boil down to us making us humans able to understand these complex code bases because we are not very good at a lot of complexity in a small amount of space. It's just our brains are not wired for that. In theory, if like we could understand arbitrarily complex stuff, we could just create one big file and put all of our code in there and that would be fine. But well, we're not good at that. So we have these design principles, we have these architectural patterns to make it easier for people at the end of the day to understand code, because the machine at the end of the day, this was one binary, it doesn't care, right?

Peter_Ullrich:

And some people even do that. If you look at remoteok.com, I think the person who wrote it, Levils, I think he's called Peter Levils, it's a single PHP file. It's one of the

Sascha_Wolf:

Yeah.

Peter_Ullrich:

biggest remote job boards in the world, and it's a single PHP file, just index PHP. And he coded it in a weekend or something. It even works sometimes. Yeah.

Sascha_Wolf:

Maybe to give like to round it up with another quote, which I also find relevant here. Architecture is about the important stuff, whatever that is. That was said by Ralph Johnson in Design Patterns, elements of usable object-oriented software, which I think closes the circle nicely.

Peter_Ullrich:

Exactly, I agree.

Sascha_Wolf:

Okay. We kind of went on a tangent here about design patterns and software architecture, which I think is interesting also for our listeners, but is there anything you would like to to say about Tech Search, Peter, because that was the original topic, right? And we kind of scratched at the surface. But since you're the expert here, quite literally compared to me and Alan, is there anything you would like to share and let listeners know, like when they maybe dive into the topic? Of course they can check out your blog post, but is there anything else, like I said earlier, where you feel some valuable learnings have been gleaned and some... be some resources you would be willing to share or some details about how to use this in Elixir, that kind of thing which is maybe not as, not readily visible from a blog post.

Peter_Ullrich:

Sure. Well, the first thing is I can't really reference any other resource that has all these pieces together. And that's why I wrote the blog post that I wrote. And that's why it's a complete guide to full text search with Postgres and Acto. Because, yeah, you can do their own research, but then you still have to read like 10, 15 different articles to understand it. I really like the Postgres documentation. However, it is a little bit difficult to understand it. But usually what I did when I learned about these things is I read the Postgres documentation first. Whenever I didn't understand something, I basically just copy pasted that line into Google. And mostly there was a Stack Overflow question about exactly that part. So it's very easy to take that apart, the documentation, and learn about it. So in that case, yeah, I would say, so what I would give as an advice as well, as you said earlier as well, Sascha. is before you reach for Elasticsearch, give Postgres full-text search a spin, or a try at least, and see whether it works for you. Whoever said that, I forgot. But I think it was Dave Luca in one of the last episodes. He said that the cooler thing about Postgres, the greater thing is about Postgres, is that it's probably already in your stack anyway. So if you have a Phoenix library project, then immediately you will have Postgres. So just, you know, it's something we can quickly and easily add to the project, and if it really doesn't work out, you can reach for Elasticsearch, but yeah, you will have a much harder time to implement it quickly and to maintain it too, and you have a higher server invoice as well. So yeah, try it out first before reaching for something you might not need.

Sascha_Wolf:

Postgres is really ending up as this one Swiss knife of tools, like where it kind of can do everything. It's crazy, seriously. I love it to death, but it's getting ridiculous to be honest. I'm not sure how you feel about that, but it's really at this point where, well, Erlang Postgres or Elixir Postgres is all you need.

Peter_Ullrich:

Well, I realize that every time when I think, oh, I need this special thing, I need to run the special query or something, and then I Google it, and the first result is always how to do it in Postgres. And so it's one of these things where, like 99% of use cases, it has already covered. Like, they thought about that already a couple of years ago, and there's plenty of documentation for it. So yeah, it's probably, it's literally, it's one of these things where... You just like, it's like a very nice car where everybody knows how to drive the car and how to use the steering wheel and gas

Sascha_Wolf:

Mm-hmm.

Peter_Ullrich:

pedal and everything. But not like very few of us have like true mechanics. And whenever something goes really wrong with the car, only few of us can understand what's wrong. So the same with Postgres. I think it's very easy to use Postgres. You know, everybody knows the, the, how to do a query there, but if it goes wrong, you need an expert and I think you can, you know, be the expert and open the handbook and see. in your car handbook, you know, how to fix something yourself first. So I would encourage people to do that.

Sascha_Wolf:

I would plus one of that, but I also say don't be afraid of it because Postgres actually comes with a fair bit of very useful tools. So like the query explainer, for example, is something where you can say explain it in this query and there, yeah, there's some terminology and some details you will have to read up on. But the first time I did that, it was surprisingly easy to follow along. And yeah, I had to look up some details in the documentation of Postgres, but it all made sense. And at that point you actually realized, yet again, there's no magic here. I mean, Postgres is a really solid piece of technology. A lot of people have worked on it and it has tested, stood the test of time. But it's still not just that. It's a piece of technology which you can understand and there's no magic going on. Maybe except for the whole time handling. That kind of is black magic.

Peter_Ullrich:

I would like to make two notes about this. So I think Postgres is great, but there are two things. Well, there are two tools missing. Postgres itself is great, but if you work with Postgres, I think two things are missing. And if anybody out there wants to start a little side project, start up with that thing, I'm more than happy to pay the money for the tool. One of them is actually, it's more like a book, really. So Ellen, you said earlier the use of the index look. website

Allen_Wyma:

Yep.

Peter_Ullrich:

that already shows you how to use indices in Postgres and that's great, but I really miss like a proper book that explains you like a Postgres for the rest of us kind of book, you know, that explains to you, okay, if you have any problems, like dive in here, or like this is how you do full text search, for example, so we'd be part of that book. I miss that because it's very fragmented, the knowledge base of that. And the second thing is a proper tool that gives you very like easy, understandable instructions about how to read the explain, analyze output of your query. Because until this day, I don't understand it fully. I

Sascha_Wolf:

Hahaha

Peter_Ullrich:

just see it and then I just copy paste parts of it into Google and try to understand what it does. And even if you see index somewhere and you're happy and you think, it's great, it's using my index, but why is it still slow? And then, well, if you have a filter underneath that index, it still means that it does a sequence scan, for example. And I did not know that. And it's one of these things where you just see text on your screen and there's no explanation to it whatsoever. So if somebody out there wants to create a, or maybe there's a cool tool already that really goes step-through-step through that output and explains to you, oh, your problem is probably here, because that's a sequence scan. You don't want that. So one of these things that would be great to have.

Allen_Wyma:

The only thing I could think of is, I mean, I use pgAdmin4, and you can run the explain. And there's parts that get highlighted in yellow and then red, for sure red, I believe. And you're like, that's where I'd look at it. Oh, what is that? And you can basically start drilling down, because I think everything starts at a high level. And then you have to open up a tree and keep following down to see which part's the problem. But then you could just say, oh, that's weird. And then you have to think about the query. OK, this is because of this part. I should probably add an index here. And that usually solves most of the issues. But if you

Peter_Ullrich:

Allen_Wyma:

start

Peter_Ullrich:

cool,

Allen_Wyma:

skipping

Peter_Ullrich:

then I should...

Allen_Wyma:

indexes, then that's something I'm not really sure of. That, that I think I'd probably start scratching my head and getting confused unless you could just say, Oh, I mean, it's hard. I mean, cause it sounds like you just said, Oh yeah, I remember this and this would black in, but how do you, how do you do that without, you know, if there's a tool that could tell you, Oh, we didn't do this route because of this, that would be nice to have.

Sascha_Wolf:

Yeah, I get what you're saying. This one part of like, explain, analyze weight. Basically the reason why it does it that way, that is sometimes not easily gleanable.

Allen_Wyma:

Yeah. Cause sometimes it's like you add like, you start like ripping apart the query and you're like, okay, let me just cut this out and then see how it runs. And you add one part and you're like, why the hell is this one thing not work? It doesn't really make sense.

Sascha_Wolf:

Okay, any last words before we wrap it up and kind of slowly transition to pigs? Okay, then Peter, if people want to reach you, how do they do that?

Peter_Ullrich:

I am very active on Twitter, which is PJ Ulrich, U-L-L-R-I-C-H, or my personal website, peterulrich.com.

Sascha_Wolf:

Okay, then let us go to PIX and... Ellen... Is there rustbook this week?

Allen_Wyma:

I haven't had time to play around with Rust. But no, what I do have for you, which is pretty cool, actually I've been working on another project. We have React front end and Elixir back end. And somebody suggested that we use Cypress to do end-to-end testing. Have you guys ever heard of Cypress before? Looks like Peter has. OK, both of you. Yeah, I just heard of Cypress now. There is a really, really awesome. video that came from ElixirCon for Africa. I don't know if you guys have seen this one, where the guy talks about how you can use Cypress with Elixir. And it really goes quite in depth. And I think it's a really good video if you're interested in doing some kind of end-to-end testing. And you have SPA. It's really cool. And basically, you can also have it set up so that you can actually run your tests in parallel. So that's even more exciting because Yeah, who doesn't want to run their stuff in parallel when you can, right? So I'll link the video.

Sascha_Wolf:

Nice. Peter, do you want to do any picks? I only have one

Peter_Ullrich:

Еще!

Sascha_Wolf:

weird nerd pick, so I'm gonna pick that last.

Peter_Ullrich:

Okay, not from I have three picks which are actually not related. Well, one of them is related to software, but the other two are not really. Um, so the first pick I recently watched on Netflix inside man. It's a short, uh, series with four episodes and it's really amazing. Um, it's one of these situations where you don't like, nobody knows how they ended up in that situation, but now they somehow have to get out of it. And it's kind of hilarious, but also tragic. So I would. recommend you to watch Inside Man. It's very interesting. Shall I do all the three picks at once?

Sascha_Wolf:

Yes, sure, go ahead.

Peter_Ullrich:

Sure. The second pick I have is my most favorite podcast, Next to Elixir Mix, of course, in the whole world. It's called Smashing Security. It's about cybersecurity and other topics. And it's been running, well, I don't know, six years already. And fun fact about that, I. I'm one of the first listeners of Smashing Security and back in the day in 2018, I really wanted to be on that podcast. So I created my own podcast, which was called Explain Blockchain about blockchain stuff, and I sent it to the people at Smashing Security and they found it interesting, so they invited me for a guest talk there as well. And until this day, I've listened to these people in Africa on my motorcycle and all different kinds of parts of the world, and this always makes you laugh. So. Highly recommended. And well, the last pick that I have is a little bit to toot my own horn because I wrote a book, and I would like to pitch it to you or to recommend you to check it out. It's called Building Table Views with Phoenix Live View. It's on Prag Park, currently in beta, but it's very close to the final version. So if you buy it, that's pretty much what you get. And yeah, if you read it and you have any recommendations, or if you don't like something about it, so yeah, I'm always open to discuss it. especially on Twitter. That's it.

Sascha_Wolf:

Nice. Good balance of picks. I like it. I'm Jakasar Dagnuska. I have one weird nerd pick this week. And that is a video game I've been playing. And it's not niche, but it's still an indie game. So there's a good chance a lot of people don't know about it. It's called Inscription. And I'm just going to say this, if you are a fan of like trading card Then you should give really give this thing a spin. It's not strictly about that. I mean that is the core of a gameplay but it also Does a lot of interesting things with its narrative where As soon as you finish act one the game does Change Substantially and not necessarily not not not only link limited to the mechanics but also like the look of a game changes so like it It kind of plays with a player and at least scratches on that fourth wall without saying too much. It does unexpected things and it's a wide ride. It's a really wide ride. But at the core it always stays this trading kind of card game-esque gameplay. So if you like that then definitely give it a try. If that is not your jam then you're not going to have a lot of fun with this. But nowadays I've played so many games in my life that I'm always... happy when I find a game which does something unexpected and something I haven't quite seen like that yet and this definitely scratched that it for me. So give inscription a spin if that sounds interesting.

Peter_Ullrich:

Sounds interesting. I'm still busy with God of War, the old version, not the newest one. And that's also enough surprises

Sascha_Wolf:

Excellent

Peter_Ullrich:

for me.

Sascha_Wolf:

game. I

Peter_Ullrich:

Yeah.

Sascha_Wolf:

really really enjoyed the go-go of all the reboot kinda.

Peter_Ullrich:

Definitely.

Sascha_Wolf:

So yeah Okay, then. Thank you for being on the show Peter. It was a pleasure

Peter_Ullrich:

It was a pleasure. Thank you for having me.

Sascha_Wolf:

and you all tune in next time with another episode of Eleximix. Bye bye!

Peter_Ullrich:

Bye, thank you.

Full-Text Search And Name Search With Postgres - EMx 199

0:00

51:11

Playback Speed:

Show Notes

About this Episode

Sponsors

Links

Picks

Transcript