Apache Arrow with Matt Topol - DevOps 175

Matt Topol is the Staff Software Engineer at Voltron Data and the Author of "In-Memory Analytics with Apache Arrow". He joins the show to talk about his book. He begins by explaining what Apache Arrow is, its benefits & advantages, how it works and many more!

Show Notes

Matt Topol is the Staff Software Engineer at Voltron Data and the Author of "In-Memory Analytics with Apache Arrow". He joins the show to talk about his book. He begins by explaining what Apache Arrow is, its benefits & advantages, how it works and many more!

On YouTube

Sponsors


Links


Socials


Picks

Transcript


Jillian (00:01.506)
Hello everybody, this is another week of Adventures in DevOps, and with me in the virtual studio, I guess, I have Will Button, Jonathan Hall, and our guest Matt or Matthew Topple. Which one do you prefer?
 
Will (00:09.794)
Hey, everyone.
 
Jonathan (00:12.07)
Hey, hey, hey.
 
Matthew Topol (00:18.261)
Matt Tocque.
 
Jillian (00:19.722)
Matt, Matt Topol, okay, all right, I was wrong on both counts. That's good.
 
Matthew Topol (00:23.483)
Everybody, it always goes back and forth, topple, topple. It's topple, it never goes, you know?
 
Jillian (00:29.51)
Okay, all right, well that's good to know. Do you wanna introduce yourself to us and tell us why you're here, why you're on the show?
 
Matthew Topol (00:33.853)
Absolutely sure. Absolutely. So yeah, obviously, Matt Topol. I'm a software engineer at Voltron Data currently. And I'm on the PMC for the Apache Arrow project. That's the project management committee. And I also wrote the book on Apache Arrow. Because they've just.
 
Will (00:57.198)
Yeah.
 
Matthew Topol (00:58.485)
There's just the one, at least currently there's just the one, in memory analytics with Apache Aero. If you can get it on Amazon you can get it packed. But yeah, I'm here to talk about Apache Aero and all things related to it.
 
Jillian (01:16.01)
That's great because Apache Arrow is like my favorite data frame analytics library and I've been using it all the time and it's become like practically, I don't know, somewhere between a religion and a cult for me. I'm not sure how emotionally healthy that is, but that's, you know, that's where we're at.
 
Matthew Topol (01:30.793)
I'll take it. I'll take it.
 
Jonathan (01:30.977)
aren't cults religions too? Like can't you be both?
 
Jillian (01:33.586)
I mean, should we really be doing this on the show? Like, I don't know.
 
Will (01:37.534)
All right, hold on. Before we...
 
Matthew Topol (01:37.889)
Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha
 
Jonathan (01:40.221)
I'm just trying to figure out how you get between two things that are the same thing.
 
Jillian (01:44.013)
Uh, that's...
 
Will (01:44.39)
Hold up, before we go down that path, we have another guest to introduce. We can't ignore our guest. Matt, who's your co-host there?
 
Jillian (01:51.498)
the cat.
 
Jonathan (01:51.547)
Who's with you there, Matt?
 
Matthew Topol (01:54.037)
This would be Penny. She's my little old lady. She's about 15 years old. And very, very pretty. She's diva.
 
Jonathan (01:55.933)
Penny?
 
Will (01:56.194)
Penny.
 
Jillian (02:01.11)
She's very cute.
 
Jonathan (02:01.373)
Alright.
 
Jillian (02:06.483)
as she should be.
 
Matthew Topol (02:08.821)
I love my little kitty. I've had her for almost that entire time. We took her in off the street right before I moved up to Connecticut for work right out of college.
 
Will (02:21.718)
right on.
 
Matthew Topol (02:23.373)
and the vet estimated she was roughly a year and a half into that type of a ring. And I've had her ever since, which is actually really nice because when you get your first job right out of college or the first time you're living completely on your own without any roommates or anything, it's nice to have a living creature that relies on you when you come home to keep you company.
 
Jonathan (02:28.626)
Well done.
 
Jonathan (02:43.293)
Mmm.
 
Will (02:46.198)
Right, for sure. I mean, it's totally that anger point that you need.
 
Matthew Topol (02:51.141)
Oh yeah, especially because they cuddle.
 
Will (02:53.57)
Yeah, right. Which, they can't say the same for humans.
 
Jonathan (02:53.777)
Hehehehe
 
Matthew Topol (02:58.833)
some humans will kill it. True. That's true.
 
Jillian (02:58.982)
No, might not want that from your human roommates.
 
Will (03:01.59)
Right? There's rules that get involved in that.
 
Jillian (03:03.862)
depends.
 
Matthew Topol (03:07.445)
Hahaha
 
Jillian (03:07.53)
I know, just like restraining orders, you know, once you get up in there.
 
Matthew Topol (03:12.545)
Hahaha
 
Jonathan (03:13.445)
Is this from experience, Jillian?
 
Jillian (03:15.782)
Oh, you never know. You never know. I got some pretty crazy life stories. Although I don't think I'd want to live alone again, which just makes you like... I don't know. Must be a glutton for punishment over here.
 
Jonathan (03:16.998)
Ruh-huh.
 
Will (03:21.023)
Pfft!
 
Matthew Topol (03:27.169)
Hehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehe
 
Jonathan (03:30.501)
I see that your co-guest has abandoned us and left, so I think it's time to get back to a Aero. I would like to hear, I mean, I've already talked to Matt and I know that he's a real straight shooter. Q, drum, whatever, you know, Aero jokes here. There we go, waiting for it.
 
Jillian (03:30.839)
Alright, but back to the show.
 
Will (03:32.907)
Hahaha
 
Matthew Topol (03:37.249)
Yep, yep.
 
Jillian (03:37.255)
I know.
 
Will (03:49.358)
There we go.
 
Jonathan (03:55.463)
Well, you're letting me down.
 
Jillian (03:56.066)
I don't have the sound effects, I think only Will has the sound effects and he's slack, look at him over here. He's like, just wait, I'll get it. We'll clean it up in post. There we go, that was a good one Matt, thank you.
 
Will (03:57.678)
I'm trying, I'm trying, I'm trying.
 
Jonathan (03:59.583)
Hahahaha
 
Jonathan (04:07.169)
Oh there, Matt did it for me. Great. Here we go. Man.
 
Will (04:09.294)
I got it.
 
Will (04:15.154)
Oh my God, that was awkward.
 
Matthew Topol (04:15.47)
Maybe? There! Well, that was... Anyways...
 
Jonathan (04:16.741)
Finally. So I was going to say, I've interviewed Matt on a previous podcast, shameless self promotion, Cup A Go is a great podcast to listen to. So I know a little bit about Apache Arrow already, but I'm going to assume that people other than Jillian in our audience don't, would you start by maybe introducing what Apache Arrow is, what problem it solves and that sort of thing.
 
Matthew Topol (04:43.817)
Yeah, yeah, sure. So Apache Arrow is an in-memory column-oriented data format. The intent there is that because the representation of the data in memory is identical to the way it's represented on the wire or anything like that,
 
You can pass data back and forth through systems without the cost of serialization and deserialization. You can pass it between runtimes in the same process with zero copy. And the idea is that you can reduce the number of copies that are necessary to process data.
 
while the data is also, since it's column-oriented, is also very, very efficient for lots of analytical and computational workflows. And there are implementations of the arrow format in... Insert your language here. There's probably an arrow implementation for it. Off the top of my head, there's Go, C++, C, Ruby, Rust.
 
Python, Java, JavaScript, TypeScript, Julia, probably others I'm missing. But, you know, there's an R implementation. And because there's so many implementations in all these languages, and because the representation is identical in memory no matter which language you're using,
 
Jillian (06:10.05)
are.
 
Jillian (06:18.358)
you
 
Matthew Topol (06:26.237)
is extremely useful as data interchange computational and interactive speed performance. And so you end up with a lot of systems that are using Arrow both as their internal memory format for computation but also as just the data interchange.
 
And the project, the ARA project itself, has expanded beyond just that representation. You've got an RPC protocol, ARA-Flight. You've got an extension on that RPC protocol, Apache ARA-Flight SQL, which is additional definitions for that protocol that are centered around using databases in SQL situations.
 
There's ADBC, Arrow Database Connectivity. Basically, like ODBC, only it's arrow native. And you get enormous performance gains over using JDBC or ODBC by using ADBC.
 
And we've got different drivers implemented for ADVC, such as Snowflake, with the most recent one. There's obviously Arrow Flight driver. There is Postgres and SQLite drivers. And we're also working on other drivers for different databases for ADVC there, which is itself just a kind of a C interface with a lot of drivers implemented, so that you can use it from anywhere.
 
Matthew Topol (08:01.073)
Yeah, that's kind of it in a nutshell. Okay, it's a nutshell.
 
Will (08:03.818)
So whenever you talk about it being in memory and these different libraries.
 
Will (08:13.515)
Do you share just like a pointer to the same memory location or do you actually just share the data and put each person wants a different copy of that data?
 
Matthew Topol (08:18.526)
Yeah.
 
Matthew Topol (08:22.206)
Cool.
 
Matthew Topol (08:25.585)
There's two different primary ways that the data is shared. There is an IPC protocol for going across the wire or between processes, which is essentially a very small flat buffer message with some metadata, followed by the raw buffers, just the bytes as they are in memory.
 
Will (08:47.893)
Okay.
 
Matthew Topol (08:52.169)
And the benefit there being that when something receives the IPC protocol, you don't have to copy or deserialize those body buffers. Whatever you receive that IPC protocol on, the bytes as they are, are what you want. And there's that tiny flap buffer message with the metadata to tell you how to interpret it.
 
The other way of passing data back and forth is what's called the C data interface, which is a small C struct that effectively contains that same metadata, you know, the format, you know, is this a string array, is this a N64 or whatever, it contains the information such as, you know, the number of buffers that are there, the number of nulls in the column.
 
total length of the column, and then just the raw pointers to the memory. You know, if there are two buffers, then you have two pointers that point to the raw body of the body buffers there, and that's the raw memory. It works the same way for complex columns like nested struct columns or list columns because Arrow has a very thorough type system.
 
Will (09:53.378)
right on.
 
Jillian (10:02.402)
you
 
Matthew Topol (10:16.301)
And so in that case, you also have pointers to the children arrays in that same struct, so that when you pass data within the same process between different runtimes, you're just passing pointers. And the point copies.
 
Will (10:32.565)
That's cool.
 
Jillian (10:35.398)
It is very cool. Yeah, I really appreciate that you guys did not mess around with the everybody is getting the same data all the time and just picked that as like, you know, like I don't know that was like your line in the sand or something, cause I work on a lot of interdisciplinary teams where everybody's using like different languages and different things. And Arrow is the only library that I've really found like no matter what, you know, we're using Python, T++, R, usually it's like some mix of those and bioinformatics. Everybody really does get the exact same data and it's very nice because that does not,
 
Will (10:52.014)
you
 
Matthew Topol (10:59.977)
Mm-hmm.
 
Jillian (11:05.632)
That does not typically happen, right? You'll get like these weird little like idiosyncreties between data sets whether or not people are accessing them and like you know Python or R and then people start to write these like libraries on top to make sure like You know, okay when you read the data and with R you need to make sure that you do this checking and when you read It in with Python you need to do this but with arrow like all that has gone away
 
Matthew Topol (11:13.803)
Yeah.
 
Matthew Topol (11:23.873)
Yep.
 
Yeah, I mean, the project itself, most of the implementations are in a single monorepo, which definitely helps a lot in a lot of the discussions and keeping things as close between the implementations as possible. Like there's also an effort made in a lot of development to try to keep the interfaces, the actual APIs in the libraries.
 
at least similar. Obviously they're going to be more idiosyncratic for the individual languages, but we try to keep the APIs similar between the languages just so that if you're jumping from, say, PyArrow to the C++ library or the Go library, you can still kind of orient yourself in what the functions do and stuff because...
 
things are named similarly, the APIs are similar in the way they interact with data and stuff like that, to kind of keep that ease of use between the different languages.
 
Jillian (12:28.478)
Yeah, I think you guys definitely accomplished that.
 
Matthew Topol (12:31.047)
The Arrow project started back in 2016-ish. Actually one of the co-creators of Arrow was Wes McKinney who created Pandas.
 
Will (12:41.999)
Oh wow, right on.
 
Will (12:48.706)
So what are the typical, what are like the prime use cases for Arrow?
 
Matthew Topol (12:55.625)
of anything to do with data. I mean, I mean.
 
Will (13:00.839)
Is there like a barrier to entry? If you've got X amount of data, it may or may not be worth it, but once you pass this threshold it's kind of a no-brainer?
 
Matthew Topol (13:08.897)
So, yeah.
 
Matthew Topol (13:13.333)
So because Arrow is column oriented, if your primary workflow is very, very row oriented,
 
then arrow may not be as beneficial at small, as you get larger and larger data sizes if you need to access, you know, lots of columns in a very row oriented way. Most analytical processes are going, or analytical workflows are accessing smaller numbers of columns, subsets of columns, and are gonna highly benefit from
 
column-oriented access for vectorized processes. And that's where Arrow really shines, especially for any kind of ETL or data transfer or computations, because the data's in that column-oriented format, even in memory. It means that you can use SIMD and other vectorization very, very efficiently with Arrow data.
 
Matthew Topol (14:27.161)
And so if you're, you know, meta-released their open source computation engine Velox, which we're doing a lot of work with, and the internal memory format for Velox is effectively ARA. DuckDB, the internal memory representation for DuckDB is...
 
effectively arrow. It's not quite identical. There are some things that they do slightly differently because they found that in certain ways that are their optimal performance and things like that. But for the most part, it's mostly identical to arrow and we're actually working with them in a lot of ways to try to find those situations where they were like, this representation is more efficient for this and trying to add those to the arrow spec to get to reduce that kind of fraction factor.
 
to reduce that kind of fracturing and keep everyone and keep arrow relevant in all these spaces. And then you get the fact that it's extremely performant, not just for the internal computational aspects, but for getting the data in and out. DuckDB does provide an interface using arrow C data interface. You can get arrow data.
 
directly out of DuckDB and avoid the copy because it will give you the pointers to the data that it allocated as the result data.
 
Will (16:00.782)
I gotcha. Right on.
 
Jillian (16:03.23)
Yeah, even on the row data though, I don't know, go ahead. I was gonna say even on the row data, I had like a data set that was like three billion rows by I don't know, like 12 or 15 columns and the whole data pipeline ran in like a couple of hours on serverless data infrastructure too. So it was like under the 16 gigs of memory. So my clients are cheap.
 
Matthew Topol (16:03.297)
And so... That's it! Have a good day!
 
Jonathan (16:03.536)
I went.
 
Matthew Topol (16:19.177)
Oh yeah!
 
Matthew Topol (16:25.533)
Yeah, absolutely. Absolutely. I mean, when I say a very row-oriented access pattern, I mean like explicitly like you are looking at the value in every column row by row as the way you're accessing the data, which is actually fairly rare. Most workflows will work exactly the way you want them to and be better.
 
in a column-oriented fashion in most cases. There are cases where that row-oriented accessing is important, but they're actually fewer than you think.
 
Jonathan (17:08.677)
So you've talked about databases here, and that's what I wanted to ask about. Like when Will asked, where would you use Apache Arrow? And you said anywhere where you use data. Base is the word that springs to mind. Database is everybody uses a database. I'm assuming you need specific databases that are designed to work with Apache Arrow. You can't just like install a plugin and suddenly use Apache Arrow to query Postgres or something like that.
 
Is that accurate?
 
Matthew Topol (17:40.401)
It's actually hilarious that you use that as the example. There is work being done to create a Postgres plugin to provide a Arrow Flight SQL interface to Postgres.
 
Jonathan (17:44.046)
Okay.
 
Jonathan (17:54.425)
Oh, that's awesome. Nice.
 
Jillian (17:56.694)
That would be very cool. Then I'd have vector databases and I'd have arrow and I just have all the things that I wanted to postgres.
 
Matthew Topol (18:03.101)
But in general, you're right. In general, you're right. A database needs to have the ability to ingest the arrow data or export the arrow data to actually leverage it because it is itself, it's a format, it's a memory format. The database has to know what you're talking about to give it the data. But above and beyond just databases, services.
 
Jonathan (18:05.764)
Okay.
 
Matthew Topol (18:30.773)
processes, you know, arrow flight is not perceived furthermore. So, if you're passing data around to creating a data service, arrow is an exceptional way to do that as your memory format because of the efficiency. Because of the fact that you're reducing the number of copies as you pass that data around. You don't need to serialize and deserialize the data as it comes in and out.
 
Now, if bandwidth is your problem there, the IPC protocol does have a compression option. And therefore, that's where you kind of have to play with settings and the idea of going, is it, you know, you get the benefits if you pass data around by not having to sterilize or desterilize, but...
 
if you need the data to be smaller to pass it because of bandwidth reasons, you can compress it to get a faster network transfer at the cost of the compress and decompress at the end. Now, one of the benefits of most of the arrow implementations will actually look at the size of the compressed buffer, and if it's not actually smaller, it'll throw it out and just use the
 
Will (19:53.014)
Well, that's clever.
 
Matthew Topol (19:54.637)
because the protocol allows the compression at the buffer level. And you can have compressed buffers and uncompressed buffers alongside one another.
 
Matthew Topol (20:08.229)
And so you can kind of get the best of both worlds as possible. You know, obviously it's pretty dependent on the implementation, the advantage of it. You know, and that's kind of the name of the game in a lot of the arrow spec things that, you know, the more active implementations implement are more complete. And if you look at, if you look at the arrow documentation, the arrow site, there's actually like a full, there's a full table.
 
Jonathan (20:17.333)
Mm-hmm.
 
Matthew Topol (20:37.961)
kind of going of like the main implementations and what features are or are not implemented of the arrow spec in all of them.
 
Matthew Topol (20:48.589)
The most complete ones are the C++, the Java, and the Go, and the Rust. They are the most complete. The Pyarrow is super complete because it's just a thin veneer on the... The Python is just a thin veneer on the C++ library. There's two different types of implementations there. You have the native implementations.
 
Jillian (20:54.478)
Thanks for watching!
 
Matthew Topol (21:17.737)
And then you have the implementations that are just kind of wrappers around the C++.
 
Jillian (21:26.166)
Yeah, which I am here for, because with the Python implementation, it works really well with a library called Cython, which lets you call like kind of lower level C or C++. And with the data set that I was talking about earlier, where it was 3 billion rows, I was getting to this point where it was running for like two or three hours, and then there would just be a memory leak like somewhere, and I couldn't figure out where it was. And so instead of figuring out where it was, I was like, well, screw this, I'm just gonna write everything in Cython, because that's...
 
Matthew Topol (21:36.009)
Yep.
 
Jillian (21:52.666)
how I solve these kinds of problems. And then the memory use of like the entire, you know, of the entire program as a whole like just, you know, just like dropped. And I was very impressed with myself over that one. So that was great. And Arrow too, of course.
 
Matthew Topol (21:58.177)
Yeah.
 
Will (22:04.067)
Yeah
 
Matthew Topol (22:04.805)
Very nice, yeah, I played a little bit with the Cython stuff. The PyArrow actually uses Cython itself to do the communication with the Python and the Python and the Python library. So it's using Cython directly there anyways. I played around with the Cython a little bit. Yep. I played a little bit around with it and it can get confusing sometimes.
 
Jillian (22:20.466)
Yeah, I took a look at the pyre implementation. It is almost all Cython, which is really cool.
 
Matthew Topol (22:30.485)
Scython is not the easiest thing to play with, so all the power to you there.
 
Jillian (22:31.54)
does.
 
It's better than it used to be, but yeah, it is kind of, it takes some getting used to, I guess. But it's great if you're interfacing with other libraries as well. I work with a lot of other file formats, which are maybe like 10, 20 years old, and a lot of them are like C, C++, these kind of, they're not legacy because they still exist, but they're before all kind of the nice libraries that we have now. So being able to integrate things with Cython is nice for me.
 
Matthew Topol (22:53.171)
Mm-hmm.
 
Matthew Topol (23:03.946)
Yeah.
 
Jillian (23:04.924)
I don't know about anybody else, just me.
 
Jillian (23:13.086)
I'm really curious about how you got onto an Apache project. Were you working on it before it was an Apache project? Like, I've always kind of wondered, because it sounds like this kind of very glamorous thing, like, ooh, it's an Apache project. But I have no idea how they get started, how one gets to be working on Apache project. I'd really like to hear more about that process. It is, isn't it? Oh, no. Is it just unpaid labor, or like all my hopes and dreams about to be dashed here?
 
Jonathan (23:29.321)
Oh, it is super glamorous, I can say, as a fellow PMC on an Apache project.
 
Matthew Topol (23:41.685)
Oh, it is absolutely unpaid labor. Absolutely unpaid labor. Actually, I'm lucky and I'm extremely lucky in that my current position at Voltron Data is I work on the Arrow library. That's my day job now. I get paid to work on the Arrow library. That's what I do. So I'm very, very lucky in that respect. As far as how I got started here, I wasn't part of the Arrow inception. I learned about the project separately, entirely.
 
Will (23:41.983)
No.
 
Jillian (23:43.904)
Okay.
 
Matthew Topol (24:12.653)
The general process for something becoming an Apache project is you go through what's called a incubator and you talk with the Apache Foundation and they'll usually give you someone who is very familiar with the way, with the Apache way of doing things and then you eventually can become a top level Apache project in and of itself. Mostly it's about governance, it's about the way you operate the governance of the project.
 
Jillian (24:17.046)
Cough cough
 
Matthew Topol (24:37.221)
is the big part of what makes something an Apache thing. But as far as how I got started in it there, before I worked at Voltron, I joined Voltron last year. Before that, I worked at a financial software company Backset Research Systems. And I worked on a product that was effectively just, basically just a giant vector calculator.
 
That's what I worked on. The project is basically just a giant vector calculator. I started working on the engine itself, eventually moved up to working on the application side of things, and serverization and stuff. And I had a project that I was trying to do to create visualizations. Users can put in parameters, like I want the standard deviation of the twice for the last.
 
200 days of the entire S&P 500 and they get a column of that. You know, and they can have all their columns and their parameters and we want to create the ability for them to create charts from that report. You know, chart this column against that column and so on and so forth. And the problem you run into is, well, at any time a user can add or remove a column. So I do not have a static schema.
 
Jillian (25:34.946)
you
 
Matthew Topol (26:00.669)
And I can have anywhere from 10 to 2,000 columns. I can have anywhere from 100 to 3.5 million rows. And at any point, user can add or change columns that will completely change all of that. And somehow, I need to be able to create interactive speed charts.
 
Matthew Topol (26:24.565)
So I went to a Hadoop conference and was just asking around, this was back in 2018 I think. I was just kind of asking around for people for ideas of what could possibly enable me to do this so that I could keep updating the data and do all these things and whatnot. And everybody told me I had too many requirements.
 
Will (26:47.43)
I would tend to agree with him.
 
Matthew Topol (26:51.111)
And of course I'm like... And my husband... And the problem is like, okay, but those are my requirements!
 
Jillian (26:51.786)
Yeah, you want too many things. Take it down a notch.
 
Will (26:58.563)
Right?
 
Matthew Topol (27:00.485)
And so I learned about Arrow at that Duke conference. And I also came across Apache Drill, which is...
 
Jillian (27:12.01)
Not superset, because you keep saying charts and I keep thinking superset.
 
Matthew Topol (27:18.089)
SuperSit either didn't exist yet or wasn't very popular yet at the time that this happened.
 
Jillian (27:24.83)
It might not have. I think it might have been incubating around then. I don't know.
 
Matthew Topol (27:29.503)
Drill is a distributed compute engine. If you've heard of Dremio, Dremio was built on Drill originally. And actually, the other fun story is that Arrow itself actually came out of Drill's internal vector representations.
 
Apache Drill's value vectors are what became arrow.
 
So anyways, I ended up hacking together this system that basically I made a Python service using PyArrow that accepted an arrow IPC table and wrote the data as a parquet file to an HDFS cluster.
 
Jillian (28:21.066)
Very nice.
 
Matthew Topol (28:22.089)
And then I had Apache Drill running on the same nodes as the data nodes of the HTBS cluster to get the short circuit reads.
 
So that basically what happened is you do your report, you get the compute. It writes the results set as a parquet file. And then when you want a chart, you send the front end just sends a request to that service which creates SQL from it and sends it to the drill cluster which queries that parquet file and returns the results. And if they add columns or change the data at all, I just overwrite it with a new parquet file.
 
The next query goes in, drill goes, oh, the metadata says that this has been rewritten. Refresh the metadata and then run the query. And because the drill nodes are running on the same boxes as the HDFS data clusters, it's all super fast and interactive speed. Sub-second queries, the whole nine yards.
 
And that was kind of my introduction to Arrow in the first place. From there, I was building other services using Go. And I was like, I want to use Arrow for this because it makes sense. And I was like, oh, there's a Go library for Arrow, perfect! And I started going through it and finding, oh, it's missing this feature, it's missing that feature. Oh, I'll contribute this thing, I'll contribute that thing. And I just kept doing that because, well...
 
I wanted to do it in Go and not have to rewrite the entire thing into the plus. So I just kept adding to the Go library instead.
 
Jonathan (30:05.485)
Makes sense.
 
Matthew Topol (30:10.929)
And eventually they made me a committer and I became more active on the mailing lists. And I was giving talks at conferences about how we were using Arrow at FactSec. And then PACT, the book publishing company, reached out to me and was like, would you like to write a book? And I was like, wait, what?
 
Will (30:39.726)
Hahaha
 
Matthew Topol (30:44.592)
And I wrote a book.
 
Jillian (30:48.33)
That is very cool though.
 
Matthew Topol (30:52.656)
I did not expect to write a book.
 
Jonathan (30:55.401)
So tell us about that. How did the book come about? Tell us about that then you said he didn't expect to write a book. What's the story behind the book?
 
Jillian (30:56.01)
Does anybody ever really, like?
 
Matthew Topol (30:58.761)
What was that?
 
Matthew Topol (31:05.341)
I mean, I just, that was the story I was giving top of the topic. I was giving top of the topic, Fizzle.
 
Jonathan (31:07.785)
Okay. Well, I mean, but how did that lead to a book specifically? Did they ask you to write a book? Okay. All right.
 
Will (31:08.346)
Yeah.
 
Matthew Topol (31:13.001)
Packed, literally yes. Packed is the publisher for the book. And I got an email one day from a product manager at Packed. And they were like, we want to have a book on Arrow. Would you be willing to write a book? And at first, I thought that they were asking me to consult on a book. And I was like, yeah, sure. I'd love to consult on a book.
 
Jonathan (31:24.986)
Yeah.
 
Jonathan (31:30.412)
Awesome. Yeah.
 
Jonathan (31:36.473)
Uh-huh.
 
Matthew Topol (31:39.461)
And then I met with them and they were like, no, we want you to write a book. I'm like, oh.
 
Jonathan (31:44.125)
Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha
 
Will (31:45.902)
Thanks for watching!
 
Matthew Topol (31:46.998)
As far as I can tell, they reached out to several people in the community. I just happened to be the guy that responded.
 
Jonathan (31:50.17)
Mm-hmm.
 
Jillian (31:52.422)
Thank you.
 
Will (31:53.102)
Hahaha!
 
Jonathan (31:54.767)
Yeah. Cool. And now you're world famous among Aero users.
 
Matthew Topol (31:55.485)
Hahaha
 
Matthew Topol (31:59.133)
I mean, the book's selling well, it's a tech book, so it's selling well for a tech book. The primary thing I noticed, shockingly, my general way of handling interactions in keeping people engaged is humor. I know, weird, right? And so I found that people kind of really enjoy my presentation style.
 
Jillian (31:59.178)
Yeah, now you have the book!
 
Jonathan (32:04.653)
Yeah. Awesome.
 
Jonathan (32:21.702)
Mm-hmm.
 
Jonathan (32:26.151)
Hmm.
 
Matthew Topol (32:26.497)
which tend to involve lots of memes and jokes. And so when I was writing the book, my primary goal there was, I don't want this to be a dry book. I want this to be engaging, which means I need to write it in my voice and find some way to translate the way I talk, give presentations.
 
Jonathan (32:29.657)
Oh yeah.
 
Will (32:31.29)
Nice.
 
Jonathan (32:42.852)
Mm-hmm.
 
Matthew Topol (32:56.229)
And so there's just puns. There are puns everywhere. One of the ones I'm most proud of is a chapter called, O.G.B.C. Takes an Arrow to the Knee.
 
Jonathan (32:58.685)
Okay.
 
Nice.
 
Will (33:09.173)
Hahaha
 
Jonathan (33:09.485)
Okay.
 
Jillian (33:11.286)
Very nice.
 
Matthew Topol (33:16.41)
There's another spot in there, because I talk about using Mmap and how you can leverage the arrow IPC format and Mmap to kind of keep memory usage down in certain ways and other things, and then I explain how that works. So I basically explain how virtual paging works, but the heading for that section before that is just TLDR, computers are magic.
 
Jonathan (33:35.29)
Mm-hmm.
 
Will (33:43.17)
All right.
 
Jonathan (33:43.629)
Okay.
 
Matthew Topol (33:47.341)
And you can kind of get the tone from there. From the... I tried as best as I could to kind of keep that kind of fun tone with the book. And the other selling point of it is that it has examples and usage in Python, C++, and Go throughout the book. Which is kind of different for tech books, which usually will focus on one language. But kind of...
 
Jonathan (33:50.321)
Yeah, awesome.
 
Will (34:07.252)
Oh wow.
 
Jonathan (34:13.13)
Mm-hmm. Yeah.
 
Matthew Topol (34:16.421)
one of the important aspects of Arrow is that interoperability aspect of things. And so I wanted to have the examples in at least two, if not three, languages to kind of showcase that.
 
Jonathan (34:31.805)
That's really cool. Yeah, I like that.
 
Matthew Topol (34:35.929)
And then I went around to conferences talking about the book and giving out three copies of the book and now I go around conferences as part of Voltron Data talking about the work we're doing with Arrow in general.
 
Jillian (34:55.338)
That is very cool. It's really nice when jobs kind of embrace like open source software. I think that's so much more common now than it used to be. It seemed like jobs were like, so like afraid of this, like, oh no, we can't have like our software just, you know, out on the web. How will we have any intellectual property or make money? And now it just seems to be so much more of a given of like, ah, you know, their software is out there. Let's go use it.
 
Matthew Topol (35:10.613)
Yeah.
 
Jonathan (35:13.085)
Our competitors will use our software.
 
Matthew Topol (35:19.101)
I mean, so Voltron Data itself, I mean, I mentioned that Wes McNeese was one of the co-creators of Arrow. He's also one of the co-founders of Voltron Data. And Voltron, you know, we, like, Arrow itself is a primary aspect of the startup and the work we're doing. And so there's, and so a large amount of the...
 
Jonathan (35:30.212)
Hmm.
 
Matthew Topol (35:45.641)
current maintainers that actually do work at Voltron.
 
as we're moving through things. And so it's really cool because there's that huge support of the Aero community and the fact that the better the Aero community does, that helps out Voltron data and itself also. And so I like being on the open source side of things, where, like I said, my day job is I just get to work on the open source libraries, which makes me really, really happy.
 
Jillian (36:21.398)
That's great. That's why I tell everybody, like whoever asked me like, Oh, how do I get a job? And like, go find some open, go find like an open source software you can contribute to or start writing, start just getting your work out there.
 
Will (36:21.604)
for sure.
 
Matthew Topol (36:30.077)
Oh yeah, I mean, absolutely. I mean, if you can get your, like I said, I just started contributing stuff, new features and things to Arrow, and that's how I ended up down this whole path. You know, and exactly, that's exactly it. Find a project you like, contribute to it. You know, there's always gonna be features or things people want from it. You can find stuff for it, and...
 
Just have fun, be part of the community. And you know, like I've been, I've been working for the last, one of my other side things I've been doing for a long while now is I've been trying to grow the Arrow Go community. Because, you know, Python is your big go-to for data science nowadays. You know, when you think about data science, you don't think Go currently. I've been trying to change that a bit.
 
Jonathan (37:30.257)
Good job. I hope that you have great success there. I don't see why Go isn't used for stuff like that.
 
Matthew Topol (37:37.681)
The problem is really the lack of the widely used libraries. Really what it is. There is no real data frame thing for Go, like you have pandas or polars for Python. And I've been trying to promote Arrow as that for Go. I created a
 
Jonathan (37:41.902)
Yeah, right.
 
Jonathan (37:52.89)
Mm-hmm.
 
Jonathan (37:59.982)
Right.
 
Matthew Topol (38:05.641)
Parquet implementation in Go that interoperates with Arrow. I'm contributing to the Iceberg repo now. I'm making a Go Iceberg implementation.
 
Jillian (38:17.29)
What is iceberg? So like I was looking, all right, we're gonna get real off tangent here for a minute. I was looking at their website and I was just like, what, I don't get it. Like, what is this? What is it? I don't know.
 
Matthew Topol (38:22.11)
Nah, go for it.
 
Matthew Topol (38:29.965)
So Iceberg is a table format. Basically, it is a bunch of metadata on top of other data files.
 
Jillian (38:41.59)
But I already have parquet files or parquet files. Why do I need iceberg?
 
Matthew Topol (38:45.181)
The parquet files are immutable. If you want to update the parquet file, you have to rewrite it. And also, then you lose whatever was there before. Iceberg metadata that gives you time travel, you can see snapshots at previous versions of the table. You can do partitioning.
 
by grouping the multiple parquet files together as one table, and get highly efficient reads and lookups. You also can have parquet files that define, these are data that was deleted, these are rows that were deleted from the table, without having to rewrite the entire table. It's a way of getting much more efficient scans
 
joins, lookups across a table that is made up of multiple and many data files and allowing you to have that time travel snapshotting ability to see what the table looked like at different points in time, which for a lot of workflows is very, very important. It also means that you can get much more easily.
 
As a compliance across the table of data updates Deletes you can get transactional interactions with the table so on so forth You know it You know you got iceberg or you got Delta or you got you know Apache hoodie You know the hoodie H UDI hoodie
 
Jillian (40:29.834)
Patchy what? Hoodie? Hoodie, okay, all right. I just wanna make sure I heard that right. That's great.
 
Matthew Topol (40:36.001)
That's the other, those are the competing table format things. Personally, I think Iceberg's gonna be the one that wins. That's my personal opinion.
 
Jillian (40:48.554)
That is very cool, except for the history. I don't know that I want that much adult supervision on what I'm doing, but I can, I understand why it's useful. I'm just a little bit like, oh no, this is gonna tell on me, you guys.
 
Will (40:55.511)
Hahaha
 
Matthew Topol (40:56.225)
hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahah
 
Jonathan (40:56.438)
Hehehehehehe
 
Will (41:07.059)
It's like looking at your git commit history and you're like, oh, we're going to squash that.
 
Jillian (41:10.314)
So you know, I actually contributed once to Apache Super Set, and they were like, are you sure you don't wanna squash these commits? And I was like, no, it's fine. They never asked me to commit again. Ha ha ha.
 
Matthew Topol (41:20.753)
Oh, ha ha!
 
Will (41:22.454)
You know, I think that's a really good point because you were talking earlier about using open source as a way to land a job. But I think one of the other benefits of contributing to open source is that you learn how to work in the constraints of a larger community. You know, when you get ready to make that first contribution, you'll get feedback, hey, squash your commits or, hey, you need a test to cover this. And I think it actually helps you level up your skills.
 
really, really quickly so that when you do land that job, you're able to start contributing to your new role a lot faster than you would be able to just out of school.
 
Matthew Topol (41:59.197)
Absolutely, absolutely. I mean, there's lots of things that I have learned from contributing to different projects. And also, some projects can get really, really interesting in the way they solve certain problems. And just seeing the way that certain problems are solved by different projects, in and of itself, is really, really cool.
 
Jillian (42:01.654)
Definitely.
 
Will (42:25.254)
Oh for sure, yeah.
 
Jillian (42:27.422)
Yeah, those people in skills are really important. Like, you know, back in the day when I occasionally hired people, I'd be like, oh, you know, have you worked on any projects and then go see how, go see how they deal with all that. And, you know, and that was kind of a big deciding factor on whether or not somebody could play nice with the other children on the playground while working, you know, on a collaborative project. So.
 
Matthew Topol (42:38.187)
Mm-hmm.
 
Matthew Topol (42:49.473)
Hehehehe. At least it's fun. At least it's fun.
 
Matthew Topol (42:58.953)
Project collaboration is always an interesting adventure.
 
Will (43:04.15)
All right.
 
Will (43:08.239)
So what's on the future for Arrow?
 
Matthew Topol (43:13.377)
So there's a couple things. Like I said, we're adding the new data types to keep the relevance and also interact better with things like DuckDB and Velox and whatnot. One of the things I'm directly working on is improving the non-CPU APIs for the Arrow C++ library.
 
For example, I mentioned there's the cData interface for passing data back and forth between runtimes and the process. We recently added an extension to that of a device array, so that you can take data on, let's say, a GPU and pass that between runtimes, like from Python to C++, to share different libraries.
 
without having to copy the data back to the CPU and then back to the device. But you can leave the data on the device the whole time and pass the device pointers through it. And so we're kind of pushing the API up through the library to make it easier to manipulate and to leverage non-CPU memory with Arrow in the existing libraries.
 
Will (44:19.374)
Wow.
 
Matthew Topol (44:33.737)
Like currently there's a couple spots where it kind of blindly tries to validate things, and then if your data's not on the CPU, if you have a GPU array, it just blows up because the pointers are device pointers, because it's like, oh, I'm just going to validate this thing. It's like, oh wait, no, that's cool. So I'm working through finding and fixing those spots in the library to make sure that it looks more cohesive and consistent throughout it. And there's other...
 
Other people are looking at better interoperability with other ML spaces, like PyTorch and...
 
and those other ML libraries and seeing how we can improve the interoperability between of ARA with those libraries. There's a large effort, GeoARA and GeoPARQA, for geolocation stuff and geographical data sets and representing them in good ways. There's a whole effort going on there.
 
And then like I mentioned, ADBC with database interactions, you know, we have the drivers we have, we're trying to improve and add more drivers. You know, I'm also in that effort too. And we're trying to, you know, like I have an issue filed with BigQuery to expose some arrow stuff so that I can create a BigQuery driver for ADBC, you know, and things like that.
 
Matthew Topol (46:05.695)
So as far as what's in the future, we're just kind of branching in lots of different places because it's a large community.
 
Will (46:14.154)
Great. One thing I'm curious about, since we're talking about in-memory data, memory is ephemeral by nature and not guaranteed. So whenever you're accessing the arrow data, do the libraries validate that memory location is still valid? Or is that up to the implementer? Or how do you handle that? Or is that not a concern at all?
 
Matthew Topol (46:40.205)
In most cases, it's generally up to the implementer of the library for the language. For example, C++ library uses a lot of shared pointers, and otherwise to kind of ensure the memory stays alive. There are certain APIs that are explicitly marked as up to the caller to ensure that the pointer stays valid.
 
Will (46:47.999)
Okay.
 
Will (47:08.31)
Gotcha.
 
Matthew Topol (47:09.213)
And so the exact way of managing the memory is going to differ slightly between library to library because different languages have different ways of doing it. But AgriWallet becomes a very, very efficient way to handle larger than memory data sets.
 
Will (47:18.838)
Yeah.
 
Jillian (47:18.882)
Thanks for watching!
 
Matthew Topol (47:32.106)
The C++ and Python libraries have the dataset library for Arrow, which is explicitly for handling larger-than-memory datasets and efficiently handling filtering and querying and doing joins and things like that to kind of manage streaming the data through.
 
Will (47:50.483)
Okay, gotcha.
 
Will (47:55.298)
Yeah, that seems like a whole rabbit hole that could go quite deep. I'm just gonna sit over here in the corner with my select star and be happy that someone else is working on that.
 
Matthew Topol (47:59.747)
It does and it is.
 
Jonathan (48:05.811)
Hehehehehe
 
Jillian (48:06.718)
That's right. That's where I'm at with all this too. Like, oh, look at, look at all these people writing these nice libraries for me. I would also really like it if ChatGPT could do a better job with the C++ arrow implementation. If anybody is out there, you know, like listening to my wishlist, that would be, I would very much appreciate that because I don't write C++ all the time.
 
Matthew Topol (48:09.964)
Hehehehehehe
 
Jonathan (48:23.421)
Hehehehe
 
Matthew Topol (48:27.599)
What did they do? What did they do? What did they do?
 
Jillian (48:30.83)
It just gave me really weird responses. I was kind of, I don't know. It wasn't completely useless, but it was not, let's say, correct either. And I was just thinking, oh, it would be so nice. Because you know C++, you have to remember so many things, especially if you go and code in Perl or Python or something for a while, you don't have to remember types. So then you go back to C++, you're like, oh, what are all these words? What is this?
 
Matthew Topol (48:42.145)
There, there.
 
Matthew Topol (48:48.192)
Yeah.
 
Jillian (48:59.198)
You know, it would just be nice if Chat Cheap E.T. could take care of some of that for me. I don't think I've ever coded in Go, so I don't have any opinion on that. I know, Jonathan.
 
Matthew Topol (49:00.309)
Why do you let go?
 
Will (49:09.258)
Oh, you should check out the Cup of Go podcast. It has some really good pointers and tips on using that. And yeah.
 
Matthew Topol (49:12.875)
Yeah!
 
Jillian (49:13.978)
I'm sure it does.
 
Matthew Topol (49:15.785)
Very very nice, I like it.
 
Jonathan (49:16.957)
There's also a really cool YouTube channel called Buildly Go. You should check out. Shameless self-promotion. Ha, ha.
 
Matthew Topol (49:20.865)
Hehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehehe
 
Will (49:22.762)
Yeah, but I do like boldly go.
 
Jillian (49:23.082)
That's what the show is for. That's why I make sure to mention bioinformatics at least three times.
 
Will (49:29.302)
Bold to go, your YouTube channel is actually really, really well done.
 
Jonathan (49:34.089)
Oh, well thanks. I hired a really great editor I found through a mutual acquaintance. Ha ha ha.
 
Will (49:41.114)
Oh, nice. That's probably good for him because that mutual counterpart's probably leaving that editor hanging recently.
 
Jonathan (49:47.173)
Hahaha!
 
Matthew Topol (49:48.385)
Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha
 
Jillian (49:49.966)
I know, going off and running marathons instead of just making YouTube videos. What's wrong with you?
 
Will (49:53.577)
Yeah.
 
Jillian (50:04.09)
All right, well, I think we've got through all my questions, at least. Did anybody else have any questions or not? Was there anything else you wanted to bring up?
 
Matthew Topol (50:12.541)
No, I mean, I think I've mentioned all the things I can think of. Um, uh, I mean, I got, uh, let's see here. Um, I'll be at a conference in November in Sweden, actually, talking about Arrow and stuff. Um, which would be kind of, which would be a lot of fun.
 
Jonathan (50:28.765)
Sweden. Nice.
 
Jillian (50:30.082)
Cool.
 
Jonathan (50:35.997)
cool. Well, if you happen to fly through Amsterdam, stop by, we can have a coffee together.
 
Will (50:36.522)
Nice.
 
Matthew Topol (50:41.357)
So the flight actually drops me in Copenhagen. And I take a train from there.
 
Jonathan (50:44.218)
Okay, very cool.
 
Jillian (50:48.994)
That'll be a fun trip.
 
Matthew Topol (50:50.793)
Yeah, I'm looking forward to it. I've never been before, so it'll be great.
 
Will (50:55.398)
I can only imagine it, it's like the Tom Hanks movie, Polar Express.
 
Jillian (51:02.358)
Great.
 
What's the movie with like the train that um, I don't know there was like some kind of apocalypse and the only thing that's running is that train and it's the post apocalyptic world.
 
Will (51:14.451)
Oh yeah, that was the...
 
Matthew Topol (51:14.782)
Oh, I know, thanks.
 
Will (51:18.158)
That was a Brad Pitt movie, wasn't it?
 
Jillian (51:21.334)
I don't remember now. I saw it a while ago. But hopefully your trip doesn't end up like that, is what I was going for. Yeah, I think we can all hope that. I didn't mean to, like, put those kind of thoughts out into the universe.
 
Matthew Topol (51:25.857)
I-I-I-I hope so too!
 
Will (51:28.942)
I hate it.
 
Matthew Topol (51:31.253)
John Apocalypse happens while I'm on the train.
 
Matthew Topol (51:37.962)
It's going to be, you know, step off and it's just fire everywhere.
 
Jillian (51:42.79)
Yeah, that... all right, let's not go putting those kind of thoughts into the universe. Where can people get a hold of you if they, you know, want for you to be like Matt? Come talk about all the cool Arrow stuff or write us a book and make millions of dollars, that kind of thing.
 
Matthew Topol (51:59.825)
I mean, I'm easily found on GitHub, I'm Zeroshade, you can find me on Twitter with the same name, Zeroshade, you can find me on Mastodon, LinkedIn, you can search me by name, you'll find I have the author of In-Memory Analytics, the one I wrote in my little name thing, so you can find me easily on LinkedIn, you know, all of the above.
 
And the book is available on Amazon or from Pac's website directly.
 
Jillian (52:29.89)
Pretty cool.
 
Jillian (52:37.906)
Nice! Well, I guess let's do some picks. Who wants to go first?
 
Jonathan (52:44.965)
I can go first. No, sorry. Will, you go. Oh.
 
Will (52:45.262)
I can do it.
 
Jillian (52:47.586)
Well, now you two have to like battle it out. I mean, that's clearly the only solution. And Jonathan's got his green screen here, so there should be some really great special effects happening any moment now. I know.
 
Will (52:49.506)
Bye.
 
Matthew Topol (52:51.657)
Fight, fight, fight.
 
Will (52:52.798)
Right? Yeah.
 
Will (52:58.122)
Yeah, I can't compete with that.
 
Jonathan (53:02.853)
So I'll go then since we're just sitting here. So on topic, somehow recently YouTube, which I have been watching, has started showing me videos from Blumenek. I think that's how you pronounce it, which is kind of interesting channel about arrows. So I thought it'd be perfect to talk about today. He's an archer who makes short little videos about
 
archery and I've never been an archer but it's kind of cool because he talks about like fantasy movies and video games and are these things they do in these settings are they realistic and he kind of demonstrates so he took a recent video I watched was taking a scene from Robin Hood Men in Tights I think where the character shoots like six arrows at once and they and they pin a guy to the wall
 
He's like, is this possible? So he starts with one arrow, then two, then three, and works his way up and talks about, is there any plausibility to this? I was honestly surprised at how not completely bogus some of these movies and video games are. So that's my pick for the week is Blumenik, B-L-U-M-I-N-E-C-K on YouTube.
 
Jillian (54:08.93)
Thanks for watching!
 
Matthew Topol (54:25.982)
Cool.
 
Will (54:29.838)
Cool. I will stay on topic then. My pick for today is gonna be the Diamond Archery Bow from Bow-Tech. I have one, go out in the backyard, fire off some arrows. And it's actually a surprisingly fun hobby because you don't need a lot of space to do it. You do need like a reliable backstop in case your arrow doesn't go where you intended it to go. So that's one tip.
 
worth noting, but it's actually fun because there's a lot of subtleties to archery that are fun to just work on, from working on holding your breath and controlling the...
 
controlling the release and then setting the tension on the bow you can actually turn it into a strength workout by adjusting the tension and You can get started with it relatively inexpensive You can buy find a bow even newer used for any budget, but it's actually a lot of fun so my personal bow is a diamond archery bow from bowtech and Yeah, there you
 
Jillian (55:40.354)
Cool. What about you Matt, do you have any picks?
 
Matthew Topol (55:43.762)
I guess I'll go with the Baldur's Gate 3 I've been playing consistently. I've been playing that constantly.
 
Will (55:48.97)
Nice.
 
Jonathan (55:51.929)
I'm literally downloading that right now on my brand new Steam Deck. It just arrived today, that's why I can't pick it yet. I haven't played anything on it, so. It looks cool, I don't know if it's fun yet. I haven't tried it, but I've played Baldur's Gate III. I played the pre-release about two and a half years ago. It's been that long. I actually looked at it. It was a week before my son was born. It was the last time I played it.
 
Will (55:53.794)
laughs
 
Matthew Topol (55:55.997)
Ooh.
 
Jillian (55:57.194)
You have the steam dick? Why wasn't that the pic? I want to know about it.
 
Matthew Topol (56:04.064)
Hahahaha
 
Jillian (56:05.642)
It could have been like a preemptive pic.
 
Will (56:20.621)
That's fitting.
 
Jillian (56:20.97)
That checks out.
 
Jonathan (56:21.077)
Imagine that maybe I'll have a chance now with the Steam Deck to play it again now that it's finally been released I Do have another baby?
 
Jillian (56:26.074)
Oh no, Jonathan, that means you need to have another baby. That's how that works. I know I'm talking about another baby, like maybe twins.
 
Matthew Topol (56:30.613)
Ha ha!
 
Will (56:33.555)
Hahaha!
 
Matthew Topol (56:35.317)
Clearly he does not need sleep.
 
Jillian (56:37.726)
No! I mean, I will just, it's the youngest one now. You could have another one. It'll be fine.
 
Jonathan (56:42.865)
So stay tuned, I'll probably be picking a Steam Deck in a week or two, once I've had a chance to actually use it.
 
Will (56:48.766)
and the new kid, you gotta keep the velocity up.
 
Jonathan (56:50.585)
Yeah, right.
 
Matthew Topol (56:51.957)
Yep, yep, keep that velocity going.
 
Jillian (56:52.058)
That's bright.
 
Gotta supply me with ever constant baby pictures. That's what I need.
 
Alright, well I'll just keep on, uh, I guess the video game and archery theme and pick Horizon Forbidden West. That is a really fun game and it has, uh, like very smooth mechanics in it, so you like run around and jump and climb and shoot bows and it's a very cool world to run around and explore. The first one, the first one had really good mechanics too, but the second one is like, I don't know, they went like all in on the mechanics and the very pretty scenery,
 
Matthew Topol (57:19.409)
I played the first one.
 
Jillian (57:30.792)
like the best. And it's nice when developers, you know, like when a game studio actually listens, they're like, oh, these are the things that people like. And so let's keep them, unlike Sonic the Hedgehog, you know, like Sega, I'm looking at you over here.
 
Will (57:40.75)
I'm out.
 
Matthew Topol (57:44.203)
Hey, hey, Swan Frontiers was actually pretty good.
 
Jillian (57:47.878)
I haven't played that one yet, I gave up after... I forget, there was one I played when my daughter was little, I don't remember what it's called now, but then I was like, no more! I'm just gonna stick with the Sega Genesis emulator, Sonic, and play Sonic 3 over and over and over again. Alright, but anyways, yeah, exactly, as one should.
 
Matthew Topol (58:04.821)
That's fair.
 
Will (58:08.717)
Hahaha!
 
Jillian (58:09.822)
And then my other pick is a book, it's called Million Dollar Outlines, which was recommended to me by one of my sisters and I've been kind of on this kick lately of understanding the psychology of story, mostly because it's really interesting and I think it really fits whether you want to write fiction or nonfiction. In fact, you know, like Manu here was saying, the best nonfiction is the kind that's not really dry and has like, you know, at least some kind of human
 
I don't know what intervention and it I guess so kind of understanding these same sort of like flows of stories and how you have these emotional hooks and how you like create a point and build up to that point all that kind of thing is really relevant whether you want to write fiction or nonfiction because it's all just you know human psychology and neuroscience which is kind of nice for me because every once in a while I like to get back to my neuroscience roots so that's it million dollar outline go read it it's fun
 
Matthew Topol (59:01.769)
He's really cool.
 
Will (59:05.538)
Have you read Joseph Campbell, I think, Man with a Thousand Faces? I think that's the right title.
 
Jillian (59:13.098)
No, I've read... what is it? Like Power... I've read a bunch of the other ones, like Power of Myth or... Neil Gaiman has a really good book about...
 
about Norse mythology, and he covers a lot of this kind of like, this kind of like, the psychology behind it, you know, like, well, why, like, why do we have these stories in the first place? And it seems like there's some kind of, like, very common human psychological underpinning that, yeah, that goes like, but it goes like, it's fascinating, because it goes like across cultures and these places where people, you know, likely didn't talk to each other, right, like Norse mythology and Greek mythology, and you see all these same kind of similar themes. So I was on a deep dive.
 
Will (59:24.802)
Oh yeah!
 
Matthew Topol (59:37.737)
Oh, absolutely.
 
Matthew Topol (59:51.349)
Yeah, what's really, I think one of the more interesting things I've seen is when people draw these similarities between the different mythical creatures in like the mythologies of different places that are effectively similar creatures in completely unconnected cultures. And then try, and then do the, kind of like reverse engineer, what is it that was in that environment.
 
Jillian (59:52.556)
with that for a while.
 
Matthew Topol (01:00:21.525)
that led to those myths and that thought of that creature that showed up in these two completely unconnected cultures. And it's really interesting when people do that. Almost every culture has a vampire type myth. Almost every culture has a dragon or other large lizard.
 
Jonathan (01:00:44.293)
They must have met my mother-in-law.
 
Matthew Topol (01:00:49.776)
What the hell?
 
Jonathan (01:00:51.449)
Said every culture must have met my mother-in-law then I'm totally kidding. I love my mother-in-law. She's a sweet lady. There's nothing wrong with her I just couldn't think of who else to say I didn't want to say my wife
 
Jillian (01:00:54.766)
Oh no. That's... You might want to ask for that to be edited out.
 
Will (01:00:54.787)
Wow!
 
Will (01:01:07.303)
Sacrifice it all for the punchline.
 
Jillian (01:01:08.118)
know that... Uhhh... Definitely would have been the wrong thing to say.
 
Matthew Topol (01:01:13.365)
Well played, well played.
 
Jillian (01:01:16.13)
You should have just stuck with the cute baby pictures ideas. That never gets you in trouble.
 
Will (01:01:16.398)
Team player.
 
Jonathan (01:01:20.669)
My mother-in-law doesn't even speak English, so there's no way she'll ever hear this, so... ..
 
Will (01:01:24.299)
Hahahaha
 
Jillian (01:01:24.85)
Mm-hmm. Now listen, mother-in-law, like, man, me and my in-laws don't speak the same language either, and yet they always know things, all right? Like, always. They always know things. And your kids are getting older, they'll start telling on you soon, like, it's all downhill from here. Which is why everybody should just have more babies, so that I can look at the baby pictures.
 
Matthew Topol (01:01:25.169)
And that's how you do it safely.
 
Jonathan (01:01:34.471)
Ha ha!
 
Will (01:01:35.899)
Yeah
 
Matthew Topol (01:01:44.457)
Just to see you in the center.
 
Matthew Topol (01:01:50.465)
Can those babies be like fuzzy with their big kittens instead?
 
Jillian (01:01:55.077)
Mm-hmm. Yeah, that's fine. I'll accept that. I'm going to go get some coffee.
 
Jillian (01:02:02.758)
Alright, well I think, uh, you know, the time of the show where we've gotten real off topic has finally arrived. Or, might have arrived from the beginning of the show. I'm not, I'm not sure. So we're gonna sign off for this week, and we will see you all next week. Bye!
 
Matthew Topol (01:02:10.026)
I'm okay with it.
 
Jonathan (01:02:12.049)
Ha ha ha.
 
Will (01:02:17.672)
Right on.
 
Jonathan (01:02:17.981)
Cheers.
 
Matthew Topol (01:02:19.125)
Bye.
 
Album Art
Apache Arrow with Matt Topol - DevOps 175
0:00
1:00:23
Playback Speed: