What's Haystack with Philip Brown? - EMx 215
Philip Brown is an Elixir software engineer, and entrepreneur with over 15 years of experience building and scaling internet software applications and services. He joins the show to talk about "haystack". It is a simple, extendable full-text search engine written in Elixir. He begins by sharing his motivation to create his project and his purpose for building it.
Show Notes
Philip Brown is an Elixir software engineer, and entrepreneur with over 15 years of experience building and scaling internet software applications and services. He joins the show to talk about "haystack". It is a simple, extendable full-text search engine written in Elixir. He begins by sharing his motivation to create his project and his purpose for building it.
Sponsors
Links
- Adding a Table of Contents to Nimble Publisher | Culttt
- GitHub - elixir-haystack/haystack: Simple, extendable full-text search engine written in Elixir
- Building a full-text search engine in Elixir | Culttt
- Prise
Socials
Picks
Transcript
Adi_Iyengar:
Hey everyone and welcome to another episode of Eliximix. And today on the panel we have Alan Weimer
Allen_Wyma:
Hello.
Adi_Iyengar:
and myself, Adi Aingar. We don't have Sasha today and Alan has coerced me into hosting today and he's very happy. And we also have a really cool guest today, Phillip Brown. Hey Phillip.
Philip_Brown:
Hey, how's it going?
Adi_Iyengar:
Philip, yeah, today you're here to talk about a few things, but before we start talking more about that, why don't you tell our listeners what you're all about, give a quick intro.
Philip_Brown:
Yes, my name is Philip Brown. I'm based in the UK. I'm a full-time Alexa contract engineer. Been working with Alexa since about 2015, I think, or 2016. And yeah, full-time Alexa contractor since 2019. Last year, I built an application called prize.com, which was a productivity byddwch i Elixir, a dyna, yn ymwneud yn ymwneud yn ymwneud yn ymwneud yn ymwneud.
Adi_Iyengar:
Yeah, that's really cool. I mean, I did not realize he were doing Elixir since 2015. So I'm going to kind of like change the script a little bit. I would love to learn how you got into Elixir. Because 2015 was like, actually, end of 2015 was when Elixir 1.0 came out, right? So yeah, what got you into Elixir?
Philip_Brown:
Yeah, so before Alexa, I was like full time doing PHP and I kind of PHP was like a, it always to me always felt like a tool. Like, you know, it was a means to an end. It was things that I was using it as a, just as a tool, but I didn't particularly like using it. I liked the thing that I was building. Um, and I doubled with Ruby, but I, I've never actually been employed as a Ruby developer, but I did. I did have a foot that community, I followed a lot of like Ruby developers on Twitter. And yes, just started picking up the kind of vibe from people that was following that they were interested in Elixir, building things with Elixir. I think the person that I was following at the time who really tip me over the edge of to learn Elixir was Stevie Graham, who started doing teller.io. It's like a API for your bank. And he was like hacking on projects. And yeah, I thought I was like really interested in and then then yeah, the more I kind of looked into it, it felt like similar to Ruby, but but then which was something I kind of I wanted to do more of at the time. But never never had a never worked full time in But then, yeah, like the more I looked into it, the more things started to resonate. So like one of the early things was I stumbled upon Joe Armstrong's description of Erlang, where you know, you have this like this world of processes and then messages are passed between processes. And that just seemed like a million miles away from what I was doing I just kept pulling on that string really. And where PHP felt like a tool, just a tool that I was using, it was like I'd just pick up this hammer and I would use a, licks a felt more of a, oh, this feels right. This feels like something that I want to use, that I'm not just using to build something. I want to explore more. I want to use it more. And yeah, so I was doing that for a few years, just hacking on things with the Alexa for quite a long time up until 2019 when I thought screw this I'm gonna like take the jump and do Alexa full-time because that's really what I wanted to do.
Adi_Iyengar:
That's really cool. Yeah, it's always interesting to hear everyone's story, how they got into Alexa. But it's particularly interesting to hear people pre-2016, how they got into Alexa because that's before Phoenix 1.0. So it's generally, without web development, at least general population, the incentive for them to get into Alexa isn't as much. But yeah, really interesting to hear that story with PHP. I guess I was getting to haste. right? So that's gonna be like our main topic for today. It's a full-text search engine that Philip wrote completely in the lookser. Do you want to start talking about that? Like maybe what was the motivation to do that?
Philip_Brown:
Yeah, so my blog, cult.com, c-u-l-t-t-t.com, it's a pretty much a vanilla Phoenix application. I use Nimble Publisher to manage the content. But other than that, there's no database, there's no external services. And that's kind of how I want to keep it that way. I don't want to start, if I don't have to have a database, I don't want the headache or the cost of a database. And I certainly don't wanna start adding services. And for a long time, there's cults being running, I think that you know, the first post, like it hasn't always been in Alexa, but the first post is like really old. And so like, yeah, kind of just got by without having full tech search, it's kind of annoying because if you want to search, you had to like go to Google and then like with a, like, you know, put in some query that like is specific to cult, and then with the term that you're looking for, but it's like, it was manageable, but it's not a great user experience. And then earlier this year, I kind of started thinking, I want to build more projects like that, where it's essentially just almost a vanilla Phoenix application, where it maybe does one or two things. But the goal of the project is to try and get traction, to try and get scale to do something. But I don't necessarily want to start managing databases for these things. I really want to start managing more infrastructure or more third party dependencies. I certainly don't want the cost of doing these things. And so yeah, I kind of like thought, well, this is kind of a good opportunity to you, to build something. So originally I wanted to start using Elastic Luna, which is an existing Elixir full text search, basically a slimmed down version elastic search, but built entirely in Elixir. And when I stumbled upon that a while ago, actually, it must have been six or seven months ago. And immediately I was just like, oh, this is perfect. This is, this provides full text search, but it doesn't require any additional external, external dependencies. So like, if I was, if I was already using Postgres or SQLite, then, you know, I could have full text search from there, but I don't know. But with Elastic Luna, this was perfect. And then a couple of weeks ago, I installed it for the first time. I pulled it down in a project and started using it. And there was just a few things that kind of hit me immediately that it just wasn't going to work. So one of the things was it took a long time to start up the application. just the way serializing to serialized
Adi_Iyengar:
Mm.
Philip_Brown:
the data, it meant for the data that I had that I wanted to index, which in the grand scheme of things, there's not a huge amount of data, but it was taking two minutes to start the application. And I was just like, oh, come on, this is not really workable.
Adi_Iyengar:
right
Philip_Brown:
And so I looked into what are the things that I needed to change to contribute to this project. project that would make it viable for my use case. And like the more I looked into it, it was kind of felt like it was going to be like more of a fundamental rewrite rather than me making some smallish PRs like contributions to this. And most of the time, I want to be the person that contributes to an existing project. I don't particularly want to like But in this case, because it felt like a more of a fundamental rewrite of how like the core of the library worked, it felt like I'd be coming in like a hostile takeover to kind of rewrite a lot of the stuff that was already there. And I think in that situation, it is better to just, you know, if you've got a fundamentally different idea of how something should be built, I think it is better to just build a separate project than it is to try and take over or force a rewrite on the author of this person who already has this library out there. And so yes, I decided to build it. So I wanted it for my blog, but then I wanted it in future projects that are fit the same kind of mold. And so I could have just built a crappy version of it in my blog. But then if I wanted to use it in multiple projects, it kind of made sense for it to be like an open source thing, because it's just like easier to pull into multiple projects. And then the more I thought about it, I was like, actually, I should put the time and effort into doing this properly. So like other people can use it as well. And so yeah, so, so and like I knew I had this interview coming up. And so I kind of that kind of let a fire up my ass to like, okay, I to get it done for this interview. So we could talk about it in this interview. So that was another good motivation.
Adi_Iyengar:
That totally makes sense. That's really, I can really relate to what you said about, you know, wanting to use something that's already built, but if there's like a fundamental disagreement in like how to approach a problem, it's better to, you know, build it yourself than like force, you know, your version of implementation on an open source library. Yeah, totally makes sense. Yeah, I guess. One sec, let
Allen_Wyma:
Well, I
Adi_Iyengar:
me...
Allen_Wyma:
just wanted to say, because I didn't hear the name very clearly, Elastic Lunar, I think it's called, right? L-U-N-R. So
Philip_Brown:
Yeah, that's right, yeah.
Allen_Wyma:
I thought you said Luna, so I kept looking for Luna. I'm like, I cannot find this thing. Am I spelling something wrong? I finally found it. And then I was thinking myself, we actually had him on the show. That creator, I believe, in the past.
Philip_Brown:
Oh
Allen_Wyma:
Philip
Philip_Brown:
really?
Allen_Wyma:
Jacobs said, yes, Adi looks confused. So I remember he's based out of Africa, I believe. He's on the show, I don't know how long ago. And it looks like the last update he had for that project was, was about a little bit more than a year ago, March of
Adi_Iyengar:
Hmm.
Allen_Wyma:
last year. And
Philip_Brown:
Yeah.
Allen_Wyma:
somebody else also asked us this project that so, it's too bad because it sounds super promising and it seemed like the guy really was working hard on it.
Philip_Brown:
Yeah, I was like really excited when I found it. I thought this is like absolutely perfect and even more perfect if I don't have to build a maintain it. Like if someone's already solved these problems, that's amazing. But yeah, I kind of, I tried hard to use it. But like I said, like the startup problem where it was just so, you know, I don't want to wait two minutes to start the application every time I want to start it. And then like, there was a PR that is kind of, to me, look 90% done. 99% done, but it hadn't been merged. And I was kind of like, ah, you know what? If I'm going to be using this as a fundamental core part of my projects. Like I don't want to, you know, if it was Jose for limb, who built this thing. And it was like, you know, I like, I know how good he is at like maintaining these things, how like thoughtfully it's about building these things, like how good these things are, like how well maintained they are. Like I've gotten, like I've got no doubts that I would just use it straight away. But if a project seems. of like dead or like not maintained anymore like that's no problem but like yeah kind of I don't want kind of don't really want to tie myself to that.
Allen_Wyma:
Yeah, it's like I said too bad because it seemed really promising. I do remember him being on the show and I'm pretty sure Audi was like amazed. It sounds
Adi_Iyengar:
No,
Philip_Brown:
Ha
Allen_Wyma:
like
Philip_Brown:
ha.
Allen_Wyma:
something, somebody
Adi_Iyengar:
I would
Allen_Wyma:
was
Adi_Iyengar:
have remembered
Allen_Wyma:
amazed.
Adi_Iyengar:
it. I would have remembered it. I think I might not have been in the episode. Sorry, I lost my train of thought.
Philip_Brown:
Yeah, so basically, Haystack is like, you know, it's like a full text search engine. It's built entirely in Elixir. So one of the things that I wanted to do with it was to make it, if it's in Elixir, it should be easy to extend or to be able to do whatever you want with this. So, you know, if you are using full text search in Postgres or full text search in SQLite or, you know, like some other method of doing full text search, control over it really if you want to extend it, if you want to apply some other metric or some other thing to it, it's either going to be difficult or it's going to be impossible to do that. But with other things in the Elixir community where there is an Elixir version of it, so like an ex or do something with an ex, then it is for me to do the same thing in TensorFlow or like NumPy or whatever. And so that was like a strong motivation for wanting to build it as well. I wanted to be able to meet my requirements of what I needed but then have it built in a way that would allow other people to extend it in like whatever way they wanted.
Adi_Iyengar:
make sense. Actually, I'm looking at the code here. It looks like you have like two ways of storing your data map
Philip_Brown:
Yeah.
Adi_Iyengar:
and ets.
Philip_Brown:
Thank you.
Adi_Iyengar:
How does ets thing work? Do you use ets with your block post? I imagine you probably just use a map, right?
Philip_Brown:
So you could, I mean, you could use either with like specifically with my with with cult.com, like I could have used either like, so you just have to wait. So like the map is like, you know, it's just a map, but you have to like store it somewhere. So I mean, you can shove it in like an agent or whatever.
Adi_Iyengar:
Thank you.
Philip_Brown:
It would work. But I primarily built, like I built out the first version of Haystack using the map storage backend to, uh, and like, so I use it tests and stuff basically. And then I use it actually use it says, um, as the storage mechanism for cult. And like, it's probably what I'll use for, uh, other projects as well. And so it's like, it's beautiful that like you have this storage mechanism as part of elixir. So I don't have to tell you to install anything, you know, you don't have to pull in anything. It just works like, um, trans like, you know, like you don't, you wouldn't it was
Adi_Iyengar:
All right.
Philip_Brown:
using it in the background. But that was like the storage mechanism is one of the things that I wanted to make extendable. So you might not want to use the state of the server that is running the application. So it'd be very easy to build like a redis backend that then the state is stored elsewhere. So you don't lose the state and things like that. Or like, you know, you could use like a Postgres, you could build like a Postgres implementation of the backend, you could build it, you know, you could build it, you could have whatever it's just like, it's essentially just a key value interface. But so you could do like, it doesn't really matter. But like, yeah, the two out the box, like that map is what I use for testing primarily. And like, it's is like the default one that you would probably want to use in production, because like, you could use a map, but then you are like, you're limited by the messages to that process to get the state. Whereas like with it's it's like public read. So you're not limited in that way. So you probably could get away with using like a map in a gen server or an agent. But it's just as easy to use like it's instead. Yeah.
Adi_Iyengar:
Yeah, totally. And great thing about it's in this case is like if you know you're running low and low on memory, it's very easy to add a debts back. That's which is
Philip_Brown:
Yeah.
Adi_Iyengar:
Yeah, again, like without even changing the implementation much.
Philip_Brown:
Yeah.
Adi_Iyengar:
Yeah.
Philip_Brown:
And I had some people reach out, actually, where they were, they were already using it for like some things in their application, but they needed like a full text, a way to search full text on the data in it. So you can obviously write match specs for it, but you don't get full text search.
Adi_Iyengar:
Right.
Philip_Brown:
Um, and so like this use case of was using Haystack, like I didn't imagine that people would use it internally to the application. it was going to be the way I was using it as a public facing, the user actually typed something in to search. But if you wanted to provide full text search on a data structure internal to your application, like it doesn't matter where that data is, but you wanted to provide full text search on that data, then yeah, this is another like, it's like a light touch, you know, simple, like it's very simple to get going with it, simple to use, simple to extend, way to provide that.
Adi_Iyengar:
That makes sense. I have a few questions on the indexing process. But before we get to that, Alan, do you have any questions on storage or ads versus anything else?
Allen_Wyma:
No, I just wanted to know, like, how do you feel about the match spec for ETS? Do you enjoy it?
Philip_Brown:
So like in in a here stack, I'm not like it's basically just a key value. Like I'm not using much specs at all.
Allen_Wyma:
you then you got to just whatever you have thrown away, get the matchbacks going. And that's not how you feel because that's that's where the power is.
Philip_Brown:
Ha ha.
Adi_Iyengar:
No. Ha ha
Philip_Brown:
Ha ha.
Adi_Iyengar:
ha ha ha ha.
Allen_Wyma:
Yeah,
Adi_Iyengar:
I built
Allen_Wyma:
but you need
Adi_Iyengar:
a-
Allen_Wyma:
to you need to learn it though, because even when you're tracing, you have to use match specs. So it's kind of something that nobody wants to learn, but you kind of have to learn it, I think.
Adi_Iyengar:
Well, I think you can do very little match specs and get by tracing, but it's very deep. I built an OLAP engine on Amnesia, and that's all match specs. And none of it is like, you can't find it. There isn't like a book you can buy that teaches
Philip_Brown:
Yeah.
Adi_Iyengar:
you how that works. It's hell. It's hell. And it's unintuitive.
Philip_Brown:
Yeah, exactly, yeah.
Adi_Iyengar:
Yeah, I'm so glad that they just chose to use it as key value. And although indexing and tokenizing you did that on the looks, it's easier to follow as well as someone who's reading through
Philip_Brown:
Yeah.
Adi_Iyengar:
it. But yeah, talk about the indexing process, how does that look?
Philip_Brown:
Yeah, so I must admit, I only had a vague understanding of how to build a search engine before I started doing this, a very, very high level where I could probably describe how it works, but I wouldn't know the individual steps. But building a project like this, it's kind of like, it's similar to building a compiler or where it's one of those things where people who are are fascinated in like how it would work. So like, you know, the technical, the details and stuff. But like, yeah, so the, I actually like basically just read a blog post of like, this is how you build it, build a search engine in 150 lines of Python. And like went from there. Cause I like, yeah, like I said, I didn't read, you know, I've never built a project like this before. I kind of only had a high level understanding. But like, yeah, that was another thing where I wanted that is extendable. So like I've provided the, the, the like implementation of the box to do like full tech search. So like to organize and transforming and then like actually like storing that data in the storage, but it's kind of I tried to build it in a abstracted way. So you can provide your own organizer, you can provide your own steps to transform the data, like however you want. And then like, like the way it's stored as well. You could like, you don't even have to use my ways of storing it. You can provide your own or you could. So like, it's using, so it's like a full text search engine at the minute, but I also foresee that you could do like semantic search or you could do like locality sense sensitive hashing search as well. Like I built this. I think it would be possible to do, multiple types of search, like search index, like search in quotes, using this approach, which I think is quite interesting as well.
Adi_Iyengar:
very cool.
Philip_Brown:
And so one of the things that I kind of thought like, would be interesting to build as like an open source. So I haven't started this at all. I haven't like even started it, it's just an idea. But one of the things that I think would be quite interesting is like an open source, like local first, privacy way to store documents. And so like, now that I'm an adult, I've got like, box full of papers of like when I've bought and sold houses, when I've bought and sold like insurance or you know, like I've got endless amount of documents that I've got, like just in a big pile. And I want a way to be able to like index and and like store that somewhere and and like make it searchable. And I don't really want to upload those documents to like any cloud service, because it's like personal information, it's like, you know, very private information that's in these documents. And so like one of the interesting things that I think I could build as an open source thing is like just a Phoenix application that you would run locally but you like you scan your documents it indexes them and then you can search over them and it's just like a it's a totally encapsulated application that you could run and so like because it's it's it's like I'm not relying on like other services or other dependencies to install. But then you could do, so you could do like full text search, which is what Haystack offers at the minute. So like say, you wanted to find a letter from a specific person, you'd be able to search for the name. Or if you wanted to do like semantic search, so you wanted like, you know, you don't know the specific search term that you would need, but you kind of provide something that's related, you could do something like that, or you could do like a locality sense, a sense of hashing index.
Adi_Iyengar:
Hmm
Philip_Brown:
say like, okay, just show me all the things that are related to insurance or show me all the things that are related to my car or my house or things like that. And so yeah, that's another avenue that I think it's interesting as well, because if you were going to have this as an open source thing, and then for you to run locally, kind of, I think it's easier. I mean, you could do it with like SQLite or whatever. And, you know, but then it just amount of dependencies or the overhead that you
Adi_Iyengar:
3
Philip_Brown:
need to run it. And then if you want to extend it or do anything like, you know, provide a different type of search or provide your own implementation of indexing or your own storage backend, or, you know, whatever you want to do, that you have all these those things available to you.
Adi_Iyengar:
Yeah. Makes sense. Oh, by the way, all these pauses, we'll edit those out just FYI. I had a few questions, and then I started looking at a different part of the code, sorry. I'm sorry. I'm sorry. I'm sorry.
Philip_Brown:
Yeah.
Adi_Iyengar:
I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm All right, I was curious about how the querying thing also works. I see quite interesting things over here. I see the IDF function, and I don't see a Levenstein Yaro distance. I was not expecting that. But what are your thoughts on all of that? Is there any plans on implementing Levenstein distance or any of that? Yeah.
Philip_Brown:
Yeah, so that's another thing where what you see now is the most basic implementation that I needed to be able to deploy to my application that works. But it certainly doesn't cover every metric or every way to search or index content. But hopefully it's built in a way that allows it to be extendable. like distance metrics or different ways to compare documents or different tokenizers or transformers or backends. So yeah, it's bearable with the query specifically. So ideally I'd want to get to the point where it has feature parity with elastic search query syntax. But at the minute it's just like or queries and then just a match. But then, like, ideally, I would want to have like a more of a fuzzy match, more of a, like a not match. The way it works, the way the querying works again, so it's, you can build like an infinitely nestable set of clauses where the clauses like any or all, which maps to like and or all. And then like, expressions, but the only expression so far is match. But then again, you can provide your own implementation of clauses, you can provide your own implementation of expressions. And so adding additional expressions or additional clauses is just a case of adding this tiny module and then
Adi_Iyengar:
Great.
Philip_Brown:
passing it as a config thing. It's not a huge amount. It wouldn't be a huge amount of work for me to do it or for somebody else to contribute to the library. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Sorry I'm cute coffin.
Adi_Iyengar:
That's totally fine. But that's really cool. I guess that's like the power of making it so extendable. And
Philip_Brown:
No.
Adi_Iyengar:
everything is a behavior. Everything
Philip_Brown:
Merci d'avoir regardé cette vidéo.
Adi_Iyengar:
has
Philip_Brown:
Je vous invite à partager le lien dans les commentaires et à vous abonner. Merci d'avoir regardé cette vidéo. Je vous invite à partager le lien dans les commentaires et à vous abonner. Merci d'avoir regardé cette vidéo. Je vous invite à partager le lien dans les commentaires et à vous abonner. Merci d'avoir regardé cette vidéo. Je vous invite à partager le lien dans les commentaires et à vous abonner. Merci d'avoir regardé cette vidéo. Merci d'avoir regardé cette vidéo. Merci d'avoir regardé cette vidéo. Merci d'avoir regardé cette vidéo. Merci d'avoir regardé cette vidéo. Merci d'avoir regardé cette vidéo.
Adi_Iyengar:
properly defined specs. And I was actually looking at your transformers and the stop words. And that was when the first one came to mind. Like that's making that configurable. But even though the file name is not configurable, but the transformer as a whole
Philip_Brown:
Yeah.
Adi_Iyengar:
is. So, which you can provide. when you evaluate a query, or rather index
Philip_Brown:
Yeah.
Adi_Iyengar:
a set, yeah, that's really cool.
Philip_Brown:
Yeah, so like quite selfishly, I basically want people to be able to use it without ever speaking to me.
Adi_Iyengar:
Ha ha ha!
Philip_Brown:
You know, I don't want to solve other people's problems and I don't really want to be, I don't want to block them from using it. Like, you know, it's kind of if you want to use something, but then it just doesn't have like the thing that you need. But if there's a way to very easily add it, you don't necessarily have to fork the project, get like, you know, get somebody who's probably lost interest in the project because it was built years ago, to review the PR, to merge it in, for them to ultimately take responsibility for the code that you've written. If you could just if you could just use it as it is, but then provide your implementations. So it ended up taking longer than I would have liked. Obviously, it offers a fairly basic way to index and search content So I like almost certainly could have built it in like a fraction of the time, but I tried to build it in a way that would be very open to having people like, oh, you know what, I don't like the way stop words is implemented. I want a different list of stop words. So I'm just going to provide my own module and there you go. You can, you know, you never have to speak to me. You never have to open a PR like that. Never your implementation never has to be built. merged into the repository. Like I don't have to even know that you've ever used HISTAC and that's the kind of the way I would want it to be.
Adi_Iyengar:
That's very cool. Yeah, I guess I was gonna move to using Haystack with the Phoenix application, but Alan, do you have any questions on Haystack before we do that? Okay. Okay.
Allen_Wyma:
No, it's something I'm thinking about these days, adding some text searching in there and stuff for products. I mean, it's nice not to have another service, right? Just everything within memory.
Philip_Brown:
Yeah.
Adi_Iyengar:
So I guess that's a good segue to how it can be approached adding Haystack to Phoenix, what are a few things to keep in mind. I guess it specifically said Nimble Publisher as well.
Philip_Brown:
Yeah, so on my blog, I've got a blog post that basically walks through the entire implementation of Haystack and explains like why I've made certain decisions or why I've built things in certain ways to try and like show that it's like firstly like, you know, I'll learn how to build the Haystack from that 150 lines of Python. So like I wanted to provide my own version of that for the next person to come along and build their version. And then, yeah, I followed that up with another blog post, which is basically almost a copy and paste of my implementation of using Haystack Incult. And so, yeah, it ends up being super simple. You basically provide a module with a few configuration things. And that's it, really, like, you know, choose which fields you want to index, choose the storage implementation. There's a few different ways of, like, using the doing things, I guess, it's not particularly opinionated in how you do things. So one of the things is that you can either, so like because the state of the HISTAC lives inside the application, when you deploy, you're obviously gonna lose that state. So you could either like, and then you can either, DC relies the state from a file system somewhere. Like it doesn't matter where it is, where you could like, when HISTAC starts, you can like, slurp up that index from somewhere or you can rebuild the index like it's inkslessly. And so like the first 30 seconds of the application start like it's like, that HISTAC wouldn't have all the data, but it probably would very quickly. And so it depends, it really depends on your, preference there like you know having the having the index see real as see realize somewhere like in a S3 bucket or something that is just automatically pulled in and Like you know you might not want to do that So like for me, I don't do that because I don't want to have to rely on an external storage somewhere or like have that Severe lies somewhere. I just build it like every time the application Starts up again and like having that like It is fine for like, you know, my blog that it doesn't, you know, it's not like it's getting like pounded with searches like every single second that's a fine compromise to make. But hopefully Haystack is, you know, it offers you a flexible, like a flexible ways to make those decisions or make those tradeoffs based on like what you want to do. And so like one of the things that was kind of wasn't great with Elastic Luna was, I was the way it serialized and deserialized the data. Like with Haystack, I'm just doing like turn to binary and then binary to term, which is obviously very quick, whereas like elastic Luna was writing it to a text file, but it had to go through every single
Adi_Iyengar:
right
Philip_Brown:
thing to write it and then auto read it again and to like hydrate it back into the index, y gallwn y gallwn ymlaen, a'r ystod, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, ond, on
Adi_Iyengar:
make sense. I was like, the more I'm looking at this post, like you have a very cool live view example of how the live view part, there's an event for search and like say articles, right? That's what you have here as an example and how you can search for articles in the live view itself. Like I think like a good example, like if you worry about indexing it on start or there's like a lot of, you know, things to index, like live view is great like in mount you can dispatch like a separate process that does indexing for you. Right? Yeah, show the search won't be ready until the indexing is done, but again live view can show the search box when the indexing is
Philip_Brown:
Thank you.
Adi_Iyengar:
done.
Philip_Brown:
Thank you.
Adi_Iyengar:
That's the beauty of live view too. So
Philip_Brown:
Yeah.
Adi_Iyengar:
yeah, that is really cool.
Philip_Brown:
And it doesn't like, so it doesn't do it as, it doesn't index as part of a transaction. So it's not like it's nothing or everything. You know, you say like the first article that index could be the first thing that you were like, you know, you just
Adi_Iyengar:
That's
Philip_Brown:
have
Adi_Iyengar:
a good
Philip_Brown:
a
Adi_Iyengar:
point.
Philip_Brown:
sub, you just have
Adi_Iyengar:
Yeah.
Philip_Brown:
a subset of the results until all of the, all of the data is indexed. So
Adi_Iyengar:
All
Philip_Brown:
it's
Adi_Iyengar:
right.
Philip_Brown:
not like, unless you had like a hard requirement where you needed to, to, you know, to provide
Adi_Iyengar:
Yeah.
Philip_Brown:
like the full results. then you could, I mean, there's even ways of doing that, ways of dealing with that, you know, you could just like warm up the service before it actually switches over. Like there's an endless amount of ways to kind
Adi_Iyengar:
Alright.
Philip_Brown:
of deal
Adi_Iyengar:
Yeah.
Philip_Brown:
with this problem. It like, yeah,
Adi_Iyengar:
Yeah.
Philip_Brown:
like the the thing I wanted to to focus on was making it quick.
Adi_Iyengar:
Yeah.
Philip_Brown:
And then like everything else is like, well, it depends on your specific requirements or like, or like the your choice of deployment or your or like yeah it's entirely up to you but it's quick
Allen_Wyma:
What's kind of next for the next milestone for this? Or it's kind of feature-complete and that's it.
Philip_Brown:
Yeah, so it's mostly just extending, providing more implementations, but hopefully each of those... ways of like, you know, I probably won't implement any more storage backends because like, I don't need them. But if you wanted to, if you wanted to have a Redis back end, like, you don't even like, you know, if somebody wanted to build one and then contribute to the GitHub organization, great, but I'm probably not going to build that. But then, yeah, like more types of expressions, more types of like, you know, like various implementations of the things that have kind of left open to be extended, but I haven't necessarily provided a comprehensive list of implementations yet. A couple of months ago, I was playing with locality sensitive hashing. So like, I think that'd be really cool. Like, so I built like a minhash implementation that I've got, it's private at the minute. I haven't, I kind of just built it as like, oh yeah, this is really interesting, anything with it. And so I think that could be probably something that I'd like to do. And like, yeah, with like, like, NX and axon to do like a semantic search as well. And then I think that, like, incredible really, when you compare Elixir to like other communities that you could have like, a search engine that offers semantic search built entirely in Elixir. That's like, Who else has that? Like, nobody? I don't think.
Adi_Iyengar:
right? And that's pretty, it also pretty easily can do it without
Philip_Brown:
Yeah.
Adi_Iyengar:
like having to write a lot of code. Yeah, that's a good point.
Philip_Brown:
Yeah. And it's like a completely encapsulated in Elixir. No external
Adi_Iyengar:
Right.
Philip_Brown:
third party, like dependencies, no extra infrastructure. No, it's like you could run it on your computer, you could run it on a server somewhere else. It's exactly the same. It's like, yeah, it's, I think, I think for me, that's like one of the most compelling things about Elixir that you can, it's so it's so well integrated that you don't need the headache of like, you know, like a huge number of other that you need to kind of deploy or manage or like interface with to offer this kind of functionality. It's like to me it's like it's insane and like yeah that was one of the strong motivations for me wanting to build Haystack so I could build more projects that were basically you know like almost like a vanilla Phoenix application but that could do like these incredible like things when you membrane on Broadway for like machine learning with an ex and axon and bumblebee with hastag to do search with you know like uh yeah it's amazing i love it
Adi_Iyengar:
Yeah, I think with a lot of machine learning, though, I think you might have to start supporting other storage and other than the binary term storage as well. Like
Philip_Brown:
Yeah.
Adi_Iyengar:
doing that in memory would definitely kill
Philip_Brown:
Thank you. Thank you.
Adi_Iyengar:
the Elixir container. But yeah, I think you have the kind of structure to support that, so that's really cool.
Philip_Brown:
Yeah, I think it's one of those things where you, I felt like the decision where, you try and second guess everything that should be extendable and you're never really gonna guess everything right.
Adi_Iyengar:
Great.
Philip_Brown:
You really need people to use it and then for it to settle down. So I imagine there would be a period of time where the API is a bit like volatile. settle down into the future, like more of those things where actually, you know, I provided this way to extend, but actually nobody uses it. So that doesn't need to be there. Or I've provided or I haven't provided this way to extend this, like, core part of it, which it needs to evolve to allow that. So yeah, imagine there's at least, you know, multiple things that I've probably overlooked or thought like, well, listen, this'll never be need to be extended. So I'll just provide this like naive implementation but then
Adi_Iyengar:
Yeah,
Philip_Brown:
yeah thanks a lot
Adi_Iyengar:
totally.
Philip_Brown:
for the change.
Adi_Iyengar:
I mean, the primary goal of this library was for your use case. And
Philip_Brown:
Yeah.
Adi_Iyengar:
that's what's always good to default to. Like, let me make sure it works for what I'm building it for instead of what I think it will become. So that
Philip_Brown:
Yeah, exactly.
Adi_Iyengar:
totally makes sense. Yeah. That's very cool. I guess you mentioned NX and other stuff. I think that's a good segue to talk a little bit about price.com. Yeah, you mentioned at the beginning But yeah, do you want to elaborate what it's about? You mentioned that you use some machine learning and built entirely on Phoenix LiveView. Yeah, it sounds very interesting. And that's it for today. I hope you enjoyed this video. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it. I hope you enjoyed it.
Philip_Brown:
Yes, so I'm an Alexa contractor and so either in between contracts, you know, you have time to do other things or you choose to not just not do any more contracts for a period of time. It kind of gives you that flexibility to work on your own projects as well. And so in like December 21, I think, I finished my, the contract I was working on. January 22 I started working on prize. So prize is a product, it was like a productivity application that uses machine learning to basically understand the things you need to do. And so, you know, you, most people who I speak to have like multiple sources of things that they need to do. So like most people have at least a calendar and a to-do list. So that's already two things. But like quite a lot of people have like, you know, multiple to-do lists or like multiple things. So like if you're a contractor, you are, you probably get invited to every clients like to do like project management application plus like
Adi_Iyengar:
Yeah.
Philip_Brown:
calendar plus email plus, you know, like you have all these sources. And so prize is a way to basically aggregate all that information. Prize then runs a machine learning model to understand the intricacies of the things that you have So you might have, so you know, like you could have like tasks, which is like record eliximix interview, which is like, you know, one type of task, you could have a task that is like build here stack. So that's very different to recording an interview, or you could have another task, which is send invoice to client. And that's again, very different again. And so, like, how you structure your work, like what you choose to work on in any given moment, is very important to your productivity. You know, you want to have big blocks of time for deep work. If you are, when you've got a lot of energy, if you want to, you know, you've got some time at the end of the day. And so you don't really want to get into a big tasks, but you could maybe knock off sending invoices or paying expenses or things like that. And so prize is basically a way to, prize understands, everything that you need to do and then can offer recommendations and advice of like what you need to do or how you need to do things or like you need to speak to this person or you know, things like that. And so yeah, like started in January, January 22. It went live in March, 2022, so it was three months. So it was built with Alexa, Phoenix, Liveview, for the machine learning stuff, it was NX and Axon. So I was originally like toying with the idea, but it was built with like TensorFlow. But then that meant I had to basically, or I would have had to deploy a Python. So like TensorFlow offers like TF serving, which is like an image that you can use to serve models. So, but then, you know, a layer of complications because now I have to like have that somewhere, run and have to make sure it can speak to my application. I need to get data back and forth and like, you know, all of these things aren't like completely unsurmountable. Like obviously it's, you know, fairly routine to do these things, but it's another thing to maintain. It's another thing to set up and deploy. It's another thing to go wrong. And then so when NX and Axon was announced, I was just like, oh my god, this is like perfect. This is exactly what I need. And so yeah, as soon as it was available, I started building. I started playing with it to build prize. And then I was fairly early to build, or one of the earliest people, I imagine, to have something
Adi_Iyengar:
Thank you.
Philip_Brown:
running
Adi_Iyengar:
Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.
Philip_Brown:
in production with NX and AXON. But yeah, and that kind of like links back to Haystack in a way, I guess where it's like The reason I was able to do prize. I was able to build and deploy the whole thing in three months And I was able to build this Application that is essentially competing with companies that have raised 10 plus million dollars that have team like you know if you typically look at the people how it's competing against The companies are like, you know 20 30 40 50 people And obviously, prices like Pales and Camarison took those applications. But I was able to deploy something that was able to do a set of functionality related to that. And I do attribute a lot of that to the fact that I chose Elixir. So if I was to build a feature that touched machine learning, back end, front end, I'm in one code base. I'm in one. I can just take a slice across the entire application. I'm not having to. build and deploy a service in Python somewhere, modify the back end somewhere, modify a front end somewhere else. And yeah, like I think the fact that I was able to do prize, like a huge amount of it has to be attributed to the fact that I built it in Alexa.
Adi_Iyengar:
That's really cool. It's, I mean, I want to give a specific, I'm looking at your website right now, I want to give a specific, like mentioned, to the auto categorization
Philip_Brown:
Yeah.
Adi_Iyengar:
filters and smart recommendations. It's really cool, especially, I think, integrating with all these like, ticket management systems. And nowadays I know like a lot of people like track, well, my team doesn't do it, I wish they did, but I track like the estimate to the actual time. And then being able to run analytics on what part of the system is less deterministic. And stuff like
Philip_Brown:
Yeah.
Adi_Iyengar:
that, this really opens up a lot in terms of machine learning and what it can be super useful for businesses. But even with this, it's really cool. All the features you managed to build in three months, it's crazy. The website looks awesome. Yeah. I mean, no one can know that no one can even like tell it's like pedal stack. Uh, uh, it looks like an actual like super professional
Philip_Brown:
Thank you. Thank you.
Adi_Iyengar:
site. Uh, I don't know if this site is also pedal stack. It looks like it is, but yeah, it's really cool.
Philip_Brown:
Yeah, so it was just a part of the single people. But
Adi_Iyengar:
Nice.
Philip_Brown:
yeah, pretty much every person that I speak to about it is kind of like, well, how much money have you raised or like how many people are on your team? And I was just like, well, I haven't raised any money and it's just me.
Adi_Iyengar:
That's really awesome. And then again, that's friends, the power of Elixir. Like every startup advice, it tell them to use Elixir, Phoenix, LiveView for this specific reason. It's just so easy to spin up a prototype. And a prototype that will last you a while and like Ruby
Philip_Brown:
Yeah.
Adi_Iyengar:
on Rails, something that's scalable and easy to extend upon as well. Really cool. And then again, I'm gonna go back to the previous one. I'm gonna go back to the
Philip_Brown:
Yeah,
Adi_Iyengar:
previous
Philip_Brown:
exactly.
Adi_Iyengar:
one.
Philip_Brown:
Yeah.
Adi_Iyengar:
Awesome. Alan, do you have any other thoughts?
Allen_Wyma:
You did the design yourself?
Philip_Brown:
Yeah, I did everything like all the design,
Allen_Wyma:
Mm-hmm.
Philip_Brown:
all the code, every single word that is written on the website and on the blog is me.
Adi_Iyengar:
That's very
Allen_Wyma:
Oh,
Adi_Iyengar:
cool.
Allen_Wyma:
you got some skills, huh? You're a one-man band.
Philip_Brown:
Yeah.
Allen_Wyma:
Especially the copy is always difficult, especially if you want to use it for SEO.
Philip_Brown:
Yeah, so like, that was like many iterations and it's probably, you know, like pills and comparison to if someone was doing that as their full-time expertise. But yeah, like that's kind of, so I've stopped doing prize in January and I'm just letting it run in the background but I will shut it down. And so like, I kind of hit the crossroads where for me to compete in that market I basically had to raise money because it was just, it wasn't manageable for me to do everything. You know, like it's one thing for me to build the application but then ought to build the website. But then it's a full-time job to get traffic to the website to, you know, speak to customers to like do all the things that are like outside of the product as well. And so I just didn't have time to do everything.
Adi_Iyengar:
Yeah, this is really cool. It's really commendable design. This is really awesome.
Philip_Brown:
Thank you.
Adi_Iyengar:
I guess if we don't have anything else we can transition to pics Alan
Allen_Wyma:
So my pick is I got a new headset just for gaming because now I've got a couple of steam friends, finally, because Adi doesn't want to be my steam friend. But yeah, because sometimes I get phone calls during a match, it's always difficult to pause the game or stop or whatever, but I got this one. It's called the Nova Pro Wireless. So it has a dongle that comes with it, the 2.4G, that signal range, USB typical thing, but it also can connect to it, whatever. So it could be stereo, could be in this case for phone for me. And it's super helpful because sometimes, like I said, you get a call during a match and you can easily answer the call and talk and don't have to stop and run and look for your phone, etc. So super, super nice. Sounds really great. Bad part, wicked expensive. I think it's like 250 bucks or something. It's crazy. But sounds great though. It's got noise cancellation, everything else. So that's kind of my pick.
Adi_Iyengar:
That's awesome. Yeah, I always love hearing about gaming headphones. I end up getting non-gaming ones because I just can't justify spending that much money for gaming for myself. But yeah, I'll add this to the list of headphones I'm not going to buy. Philip, what are your picks for this week?
Philip_Brown:
Yeah, so the first one is Richard Taylor wrote a blog post on MRSK. I'm sure you've probably seen DHH's like deployment tool. So he wrote a, so Richard wrote a blog post on deploying elixir applications using MRSK, MRSK, like the ability to cluster, deploy to multiple clouds, have a service running across multiple clouds. multiple locations. And so you can kind of think of it as build your own fly.io. It's kind of, it's not quite the same, like obviously having an integrated service like fly manages that, like the entire thing for you and provides like a layer on top to make it easier to use. But if you want to run a similar, a similar setup, say, or you want to have more, you more control or things like that. So like an interesting thing that I don't do, but I'm kind of interested in doing is, you know, when you sign up for a service and say you're in Europe, like give them the option to have all of their data and all of their traffic rooted through a European server And then like, you know, the same with like a US company. And so like, this is really interesting way to do it. And like, you know, achievable way to do it. And Richard had a really good write up on that. So yeah, I haven't played with that yet, but it's something that is on my list of things to do once I've wrote some more here.
Adi_Iyengar:
Something tells me I'm gonna need it soon. It's very cool.
Philip_Brown:
Yeah.
Adi_Iyengar:
I hadn't checked
Allen_Wyma:
Yeah.
Adi_Iyengar:
this blog post but that's very
Allen_Wyma:
You
Adi_Iyengar:
interesting
Allen_Wyma:
never seen MRSK. So basically 37 signals is moving away from cloud.
Philip_Brown:
Yeah.
Allen_Wyma:
They basically found that they already have a ton of services it is. And so this tool is kind of like Capistrano, except it's very specific for deploying containers to static hosts that are running Docker on them.
Adi_Iyengar:
Cool.
Allen_Wyma:
Yeah, so it's definitely interesting. So I really, You know what cloud is wicked expensive, especially like I'm running into the ocean and I gave my stats to an AWS person They're like, yeah, yeah, we can beat any any pricing and they're like, oh, here's our pricing and it was like $20 higher like not huge but also my bills not huge either But I was like, well, so you told me you can beat it, but you're not and then they didn't reply back I think they just lost face but in any case like You know all these clouds I mean basically if you have the infrastructure because they start off doing internal hosting anyways so They had everything already and their servers are like 10 years old or something, but it still runs great. I mean, they're, they're actual servers bare metal, right? So it's interesting. I just, I, I like that people are getting off the cloud, not because, you know, I hate the cloud or something, but I don't like that people over glamorize the cloud is what I'm kind of after.
Philip_Brown:
Yeah, I think like
Allen_Wyma:
I think they,
Philip_Brown:
the,
Allen_Wyma:
yeah.
Philip_Brown:
if you use an AWS, the kind of, I think almost like people have kind of lost sight on the real benefit of AWS is that you can immediately scale and then scale up and scale down very, very quickly. But if you're just, if you have a workload that is basically always consistently the same, then you, by using AWS, you can't, you are overpaying for something because you, you know, you've picked a service that, uh, like, can offer that scalability. And so yeah, I think I'm really excited about this. And particularly this write up, because it's like, you know, like Richard has like went through all the steps to figure this out for like, specifically for an Elixir application. So he uses like, so he shows how to do like, cluster in with tail scale, like everything that you would need to offer like a, like almost fly.io. experience but not quite.
Adi_Iyengar:
Wow. And it's hard to beat the 100K startup credit from AWS though, right? Ha ha ha
Philip_Brown:
Yeah.
Allen_Wyma:
That's why they, that's why they do it, right?
Philip_Brown:
Yeah.
Allen_Wyma:
But in any case, like I said, if you're just starting off, you don't know, it's fine. But if the problem is they have predictable traffic, it's good. It's actually going up, but it's very predictable. And
Adi_Iyengar:
Right.
Allen_Wyma:
so they've already figured everything out. Anyways, interesting. Speaking of that, maybe, sorry, let me actually do one more. So now that we're on this topic, I'm going to actually send out the video. I'll put into the show notes about DHH is talking about this because I think it's super interesting. And I just want people to check it out. And evaluate their. own IT bill and just see, you know, if you can come up with something similar. I'm betting that you probably may be able to. It depends on if you have the hardware enough, but in any case, check it out.
Adi_Iyengar:
I guess my pick for the day is a video game. It's a new one. It's called Chia, TCHIA. It's like a game inspired by New Caledonia, which is like a northeast of Australia. It's like a set of islands. And you can, it's like an open world sandbox. You can swim, climb, glide. slide, sail, the aboard, whatever. It's like Breath of the Wild, but better. That's gone to a shit. But yeah, it was a very surprising game. I have only played it for like three hours and I was very immersed. The music was really beautiful. The story seems like it will kind of add up and make sense. But yeah, I can only get disappointed now. So I would highly recommend anyone playing at least for three hours. So I guess if Alan and Philip have nothing else, we can call this a...
Philip_Brown:
Cool? Yeah.
Adi_Iyengar:
Awesome. Well, thanks everyone for joining us today, and we will see you next week for a new episode of Lexamix. Bye.
Philip_Brown:
Hey, thank you.
What's Haystack with Philip Brown? - EMx 215
0:00
Playback Speed: