Building Zeet with Johnny Dallas - DevOps 160

Johnny Dallas is the CEO and Co-founder of Zeet. He joins the show with Jonathan and Will to talk about his company and his journey as a developer. He begins by sharing how he became a developer and his experience of building Zeet. Moreover, he talks about some of the services they provide in Zeet and how their customers can benefit from them.

Show Notes

Johnny Dallas is the CEO and Co-founder of Zeet. He joins the show with Jonathan and Will to talk about his company and his journey as a developer. He begins by sharing how he became a developer and his experience of building Zeet. Moreover, he talks about the services they provide in Zeet and how their customers can benefit from them.

Sponsors


Links


Socials


Picks

Transcript


Will_Button:
What's going on, everybody? Welcome to another exciting, wow, why did I say that as a question? No, it is gonna be exciting. Another exciting episode of Adventures in DevOps. Joining me in the studio today, as usual, my co-host, Jonathan Hall.
 
Jonathan_Hall:
Hey everyone, I am super excited.
 
Will_Button:
Hahaha And then the reason for the excitement today is we have the guy with the coolest name in DevOps. We have Johnny Dallas joining us from Zete. Welcome, Johnny.
 
Johnny_Dallas:
Thanks Will, it's like to be here and I'm
 
Will_Button:
Well,
 
Johnny_Dallas:
definitely
 
Will_Button:
so the first
 
Johnny_Dallas:
excited.
 
Will_Button:
question I have before we give you a chance to introduce yourself is Johnny Dallas your birth name or did you pick that thinking I'm just going to do Rockstar DevOps so I need a Rockstar sounding name?
 
Johnny_Dallas:
It was, no it's my real name, it's actually my dad's
 
Will_Button:
Right
 
Johnny_Dallas:
name as
 
Will_Button:
on.
 
Johnny_Dallas:
well so you can give him the blame. I come from a long line of John Dallas's. I'm the only one who's gone by Johnny for whatever
 
Will_Button:
It
 
Johnny_Dallas:
reason.
 
Will_Button:
is,
 
Johnny_Dallas:
It
 
Jonathan_Hall:
Ha!
 
Johnny_Dallas:
feels
 
Will_Button:
like
 
Johnny_Dallas:
like such a gimme, it's right
 
Will_Button:
it's
 
Johnny_Dallas:
there.
 
Will_Button:
just meant to be echoed out on a stage, right? Cool, so
 
Jonathan_Hall:
awesome.
 
Will_Button:
for anyone who's
 
Johnny_Dallas:
And
 
Will_Button:
not
 
Johnny_Dallas:
uh...
 
Will_Button:
familiar with your work, tell us a little bit about who you are and what you do.
 
Johnny_Dallas:
Yeah, happy to. I'm Johnny Dallas. I'm the CEO and co-founder of ZEET. We help engineering teams build internal developer platforms and do DevOps easier. Been doing DevOps for a long time. I got started in tech when I was in high school and ran DevOps for a previous startup, spent some time at AWS, then left there in 2020 to start this company. So, been in small companies, been in the cloud writers themselves, now building DevTools, make a little bit easier.
 
Jonathan_Hall:
Cool.
 
Will_Button:
Right on. So started doing DevOps when you were in high school. How did that work with your class work?
 
Johnny_Dallas:
Yeah, it's a weird place to start. I think DevOps, you gotta be on call. It's important that you're around. It's important that you're available. In
 
Will_Button:
Yeah
 
Johnny_Dallas:
high school, you don't really get to be available. You have to, you show up in the morning and you sit in class and you raise your hand when you have to use the bathroom and it's very prescribed. Yeah, I was in charge of all of our kind of operations at an early startup called Bebo tool did not work the way typical SRE or DevOps organizations did because of my limited schedule and limited resources of our team. So we had to use software quite a bit. I used to think of myself as the software engineer in the DevOps hat more than an SRE or DevOps person from the ground up. So lots of automation, lots of systems making it so
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
that the rest of our developers could do what they had to do without having to talk to me. So, got paid my fair share of times in the middle of finals or uncomfortable situations like that.
 
Will_Button:
I can just imagine where in high school some kids were like sneaking out of class to go out behind the gym and smoke a cigarette or whatever and you're sneaking out to go log into the system and handle an on-call event.
 
Johnny_Dallas:
The number of times I've had to log into a console and deal with some sync issue there. Yeah, not quite as
 
Will_Button:
Hahaha
 
Johnny_Dallas:
spicy of a reason to sneak out of class, but I was happy with it.
 
Jonathan_Hall:
If we can talk about this without putting too much on the spot. How long ago was this? I don't want to
 
Will_Button:
Yeah
 
Jonathan_Hall:
age. I don't want to put a date on your on you too too much,
 
Johnny_Dallas:
Yes.
 
Jonathan_Hall:
but
 
Johnny_Dallas:
No, not at all. I'm 21 now. So I started when I was
 
Jonathan_Hall:
Okay.
 
Johnny_Dallas:
doing this when I was 15, 16. So
 
Jonathan_Hall:
Yeah.
 
Johnny_Dallas:
five years ago, something like that. I think 2017, 2018 when I got started. I will say, well, there were quite a few times getting paged in not fun situations. The business we were building at the time was a eSports company. We were doing an eSports platform helping
 
Will_Button:
you
 
Johnny_Dallas:
run Fortnite and Call of Duty tournaments in the world at the time and We would host giant massive 20,000 plus Person tournaments where everybody would be live streaming video to our servers from around the world I remember very distinctly we had our biggest tournament of the of the year one time soldier boy and ninja and all these celebrities were competing in this tournament and the tournament starts and crashes
 
Will_Button:
Hahaha
 
Johnny_Dallas:
and it's completely down and everybody's freaking out where what's going on is this gonna come back is this a temporary thing and I get paged of course in the middle of a math final and I have to tell my teacher like hey soldier boys on the phone can I take this I really hit he sounds mad I think I got a deal with this I'll be right back and I've never forgotten the look bewilderment on her
 
Will_Button:
Hahaha!
 
Johnny_Dallas:
face of like, you what? What are you doing? You're stepping out to SSH onto an EC2 instance in EU West 1 because of social... I don't understand any of the words you're saying
 
Will_Button:
Yeah,
 
Johnny_Dallas:
to me right
 
Will_Button:
so not
 
Johnny_Dallas:
now.
 
Will_Button:
only are you ditching class to deal with work, but you're dealing with work that is for an e-gamer, that's something that our education industry doesn't really recognize as a real profession anyway. So it's just adding just adding layers to this. That's great.
 
Johnny_Dallas:
layer on layer. I gave up quickly on trying to win those
 
Will_Button:
Look,
 
Johnny_Dallas:
arguments.
 
Will_Button:
I'm just going to go smoke a joint, okay? Is that
 
Jonathan_Hall:
Ha
 
Will_Button:
easier
 
Jonathan_Hall:
ha ha ha!
 
Will_Button:
for you to handle?
 
Johnny_Dallas:
Yes.
 
Will_Button:
Right?
 
Johnny_Dallas:
I'll be back, don't worry.
 
Jonathan_Hall:
awesome. I was going to say I did
 
Johnny_Dallas:
Yeah, but it was a... Oh,
 
Jonathan_Hall:
similar stuff when I was in high
 
Johnny_Dallas:
go ahead.
 
Jonathan_Hall:
school, which was much longer ago. EC2 instances did not exist yet. But I never stepped out of class to fix stuff. And part of that probably is because it was so early in the sort of technology cycle that nobody expected the internet to be fixed right away. Like if the internet broke, they're like, oh, maybe it'll be ready tomorrow. And that was just fine.
 
Will_Button:
Alright.
 
Johnny_Dallas:
Hehehehehehe
 
Jonathan_Hall:
Basically, I ran a very small dial-up ISP out of my parents' house when I was in high school and served half the town dial-up internet. But if it didn't work, they'll just try tomorrow.
 
Will_Button:
Right? I'll just put it in the mail. That's fine too.
 
Jonathan_Hall:
Yeah,
 
Will_Button:
Ha ha ha.
 
Jonathan_Hall:
yeah, that'll work. No.
 
Johnny_Dallas:
There's no SREs, there's no pager duty, there's no SLA measured in minutes.
 
Will_Button:
So what? That obviously puts some unique constraints on how you view building and deploying systems. Tell me a little bit about how that led to where you're at now with Zeed. Like what was the chain of events or thought process that made you say, you know what we need? We need another company in the digital space. So I'm gonna launch one.
 
Johnny_Dallas:
Yeah, so there were some, as you said it, there were some definite unique constraints on kind of the infrastructure set up that we had to build at that previous startup. One of the most obvious ones was we couldn't really have a human in the loop for most DevOps interactions. We had a team of 10 to 12 application developers who were great at building product, great it happen. But they weren't
 
Will_Button:
you
 
Johnny_Dallas:
really comfortable with infrastructure, and we, you need to deploy somehow. So one of the first things we did was, how can this team of developers who don't want to touch AWS, who don't want to touch infrastructure, who don't want to learn about it, scale. So we engineered systems that they had to put a JSON file in the root directory of a GitHub repo, and We made a nice UI so our CTO could click one button and it would spin up a whole region with all the terraform and Ansible automation that that entailed. We kind of approached it with this idea of it needs to be self-serve
 
Will_Button:
you
 
Johnny_Dallas:
first. I think when I was creating the system, we had a button to spin up a region. I probably clicked that button a hundred times just to spin it up, spin it down, spin it up, spin it down until it was perfectly automated. actually self-serve 100% of the time actually usable. And that, you know, became one of the guiding principles of platform engineering now, really how do you scale that self-serve nature? How do you put a product in the place of the DevOps interface so that application developers can do their job without need to talk to somebody? So it was a real forcing function back then. We didn't have a DevOps team. He only had a software engineer who was around some of the time to go from there. You build a product that way.
 
Will_Button:
It sounds a lot like when I was with Apptiv. Let's see, I think it might've actually been around the same timeframe. We did a very similar thing where the developers could self-serve. We went with a YAML file so they could drop a YAML file in their repo and that would configure everything that they needed to go up and running. It worked out really cool because one of the things that I learned that was you're right developers don't want to learn any more about infrastructure than they absolutely have to to do their job and if in a perfect world they would learn zero about it not like trying to say anything negative about them but they've already got enough things that they're responsible for so we don't want to add to that they don't want to add to that but one of And then they would say, oh, wait, well, can it do this instead? Or can it do whatever idea they had? And initially, we would build that in for them if it was something we thought would be widely used. But then we started saying, well, we do accept pull requests. And then the development team started actually adding all of these features themselves. And so the DevOps team, which consisted of me, one other person, and our QA person, and and worked with us part time, we just kinda set back and watched this thing take a life of its own.
 
Johnny_Dallas:
Yeah, I think that's the beauty of DevTools. I think one of the things I always think about with DevTools is people should use it in ways that surprise you. People should have use cases that you can't design for on day one. Or else you haven't really, you know, unlocked a system that can grow beyond yourself. So like for us, we had a JSON file and it was very simple. It was, you know, what's the command that you want to run to build? What's the command you want to run for runtime and a little bit of health check configuration. But very explicitly in kind of how we set up health checks, it was tell us the port to hit on your container and you decide entirely how you want to handle health checks. We had like a custom format. This was very non-standard, but a custom format for responding with your capacity and your kind of availability in that health check so that teams could kind of control their auto scaling with
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
their application logic. They didn't have to learn about some infrastructure system. running a JSON blob on a certain endpoint. And so it allowed these teams to, when normally they would ask me, hey, how can I set up this type of auto scaling, this type of auto scaling? Well, you already know, make your application return the right logic when you want it to go up, and a different number when we want it to go down. And that just really freed them up to go and build. So, I think that's it for this video. I hope you enjoyed it. If you did, please
 
Will_Button:
right
 
Johnny_Dallas:
like, share,
 
Will_Button:
on.
 
Johnny_Dallas:
and subscribe.
 
Will_Button:
That's cool. I haven't heard that approach before of just letting the health check dictate. You know, because a lot of times in my experience it's been memory utilization, CPU latency, but I kind of like that idea of just let the application itself decide if it needs more help or not.
 
Johnny_Dallas:
It was an interesting approach. I would say that was kind of the issue is it didn't account for things like memory and CPU and other kind of resource utilization. So we ended up adding that as a separate system level check and then combining kind of the application level check and the system level check to determine full health. So there are pros and cons. I think it was nice to give them the flexibility. With my work now, we were kind of taking this idea of what a simple developer platform that sits on top of your cloud that makes your cloud a lot more accessible to developers who aren't super structured, advanced, let's say. And one of the first features that we built into this is a similar thing here. If you can export custom metrics on a certain path and use that to auto scale using Prometheus metrics and KEDA, far more
 
Will_Button:
you
 
Johnny_Dallas:
control over how the application works without them having to dig into infrastructure tooling. I think is really interesting and Healthchecks is one interface that makes that possible.
 
Will_Button:
Right on. So let's talk a little bit about what you're doing over at Zete. So now your Zete is a platform that allows customers to build their own internal development platform. Is that an accurate statement?
 
Johnny_Dallas:
Yeah,
 
Will_Button:
Cool.
 
Johnny_Dallas:
I'd say that's accurate.
 
Will_Button:
Who is jumping on board with that? Do you find that it's existing DevOps teams or are development teams coming to you looking to take this on themselves?
 
Johnny_Dallas:
We see a bit of both. I'd say like, you know, when you look at the market, there's the platform as a service companies like, you know, Heroku, Fly, Render, Railway more recently, that are really centered on the developer experience. And they're really great tools. I mean, I think they're way more accessible versions than, of infrastructure
 
Will_Button:
Yeah.
 
Johnny_Dallas:
than the AWS console or something like Terraform for just somebody coming Street learning development, like you said, there's a ton of jobs to do in development. It's a lot of responsibility. So those have been great platforms. However, they tap out at some point on kind of capabilities. There is certain use cases that just aren't going to be supported by them. And you have to go down to AWS in those cases. You can't really go and deploy Ethereum node on Heroku, for example, or advanced networking stack on Render. And so what we really look to do is how do we give you some of the developer experience from these platform as service companies with the power of your cloud. So we deployed onto your cloud. We give you a simple interface on top of it that infrastructure engineers can customize to determine what the underlying services look like. But application developers have a simple place that they can see all the pieces deployed, they can click or use our API or use a CLI to spin up resources. They don't have to worry about the underlying kind of infrastructure technologies.
 
Will_Button:
Right on. And you're doing that across multiple cloud providers too, right?
 
Johnny_Dallas:
Yeah, we have support for six cloud providers now and Teams deployed across various combinations of them, which has been a really interesting use case. Like we talked about, let people surprise you with use cases. When we started, we were just gonna be a easier backend to AWS and the first time somebody needed two AWS accounts, we were like, oh wait, the cloud provider is actually a backend. What if we support other cloud providers? And now we see multi-cloud use cases that surprise us
 
Will_Button:
for
 
Johnny_Dallas:
every
 
Will_Button:
right
 
Johnny_Dallas:
day.
 
Will_Button:
on.
 
Johnny_Dallas:
It's pretty exciting to see. We actually, how we, multi-cloud's really interesting, having a platform like this that just makes infrastructure accessible. You see people do really interesting things with it. I think we recently saw a team came on and they're doing chat bots. I think with all the AI hype right now, there's a ton of new chat bot companies, but they really needed to scale and they had a ton of credits from a bunch of different cloud providers. us an
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
Azure and a Google Cloud account and spun up 10,000 chatbots on day one across like 3000 per cloud. Which was just insane to see that
 
Will_Button:
Thanks
 
Johnny_Dallas:
this
 
Will_Button:
for watching!
 
Johnny_Dallas:
like team of two did that on day one by virtue of this platform. Like that's awesome.
 
Will_Button:
So
 
Johnny_Dallas:
I
 
Will_Button:
they're
 
Johnny_Dallas:
love
 
Will_Button:
just
 
Johnny_Dallas:
that
 
Will_Button:
riding
 
Johnny_Dallas:
that's possible.
 
Will_Button:
free credits as long as they can.
 
Johnny_Dallas:
Yeah, it's great. I mean, every time they run out one Cloud provider, they make an account somewhere else, get new credits, swap over. Power of the platform.
 
Jonathan_Hall:
Interesting. Do you see that happening a lot? Or is that just a one-off example?
 
Johnny_Dallas:
I think not as much as I would expect. I think most people, when they get cloud credits, they get really excited and they claim them and
 
Jonathan_Hall:
Oh.
 
Johnny_Dallas:
they don't realize that they expire after a year. And then they get to the 11th month mark and realize that
 
Jonathan_Hall:
Thanks for watching!
 
Johnny_Dallas:
they use one of their three clouds and it doesn't matter, they're all gone. So this team specifically is, I think, being strategic about it. I think more people, honestly, I think more people should do that.
 
Jonathan_Hall:
Hehehe
 
Johnny_Dallas:
make the cloud providers work for you a little bit more. Um,
 
Jonathan_Hall:
Yeah, all right.
 
Johnny_Dallas:
but I don't see it that often.
 
Will_Button:
That'll be the new training segment to come out of why Combinator is how to leverage all the cloud providers at once. Ha ha ha.
 
Johnny_Dallas:
Yeah, there
 
Will_Button:
Ready?
 
Johnny_Dallas:
you
 
Will_Button:
Yeah.
 
Johnny_Dallas:
go. We're gonna have to put that video out. Ha ha ha.
 
Will_Button:
So speaking of multi-cloud, like with something like Terraform, you can deploy to any cloud. But you write different code for depending on the cloud provider that you're targeting. How did you approach that problem or solve that problem when using Zete?
 
Johnny_Dallas:
Yeah, so I think there's two pieces to it that we think about. One is you'll kind of have to acknowledge that you're going to give up something. If you're building to the greatest common denominator between two clouds, there's going to be something that it well supports that Google Cloud doesn't and vice versa. So with that in mind, the thing we started with was what's the biggest greatest common denominator we can build off of? What's
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
the most common unit of abstraction across And managed Kubernetes was, I think, an obvious choice for us. So we built a lot of our product off of Kubernetes. We have some other pieces for serverless and various databases as well, but Kubernetes is where the meat is. And then we built two abstractions that could kind of exist across cloud. So our core unit is like an application. We have simple controls for what ports without any nuances around how this security group implementation might work versus that security group implementation. But then I think the real magic was we make these abstractions transparent. So teams can use kind of a Z app to deploy across any of the clouds that we support. But if you want to dig in and make this using something special in AWS, using something special in GCP or something specific in a certain
 
Will_Button:
you
 
Johnny_Dallas:
cloud, you can customize that template. it for your own use case because it's your cloud. You should have full control. We are not
 
Will_Button:
you
 
Johnny_Dallas:
in the business of getting in your way. We're in the business of, you know, accelerating you. So it's been a kind of a two-faceted piece of build to the greatest common denominator, as well as do it in a way that you can peel back the onion and it can be transparent
 
Will_Button:
All right
 
Johnny_Dallas:
so people
 
Will_Button:
on.
 
Johnny_Dallas:
can customize that if they want to in the future.
 
Will_Button:
So for companies that adopt this, how does the role of DevOps change for their existing DevOps team? Because like a lot of the, in my experience, a lot of the work that you do in DevOps is building out that infrastructure platform and maintaining that. So what does that look like in this environment?
 
Johnny_Dallas:
Yeah, I mean, I'd say it, I'll caveat with this
 
Will_Button:
Hahaha
 
Johnny_Dallas:
with, depends on the team. Every team is different, every DevOps team is different,
 
Jonathan_Hall:
What?
 
Johnny_Dallas:
every
 
Jonathan_Hall:
Is that a one size fits
 
Johnny_Dallas:
requirement
 
Jonathan_Hall:
all solution?
 
Johnny_Dallas:
is different, et cetera.
 
Will_Button:
Thanks for watching!
 
Jonathan_Hall:
Turn.
 
Johnny_Dallas:
Sorry to say, Jonathan, there's a, we haven't solved it all yet. But that said, there are some patterns, you know, there are some trends. I'd say generally with existing DevOps teams, of two groups, you either become the platform engineer who is building this whole product, this whole platform, or you become a service team where you're dedicated to a certain type of resource or a certain area of the infrastructure and you integrate that into the platform. Example this might be maybe you go and become part of the compute service team and you're worried about how does this platform provision compute, how do upgrades happen, what versions where on the instances,
 
Will_Button:
you
 
Johnny_Dallas:
whereas the platform team might be thinking about, should this be accessible through a CLI or UI or an API, or what's the right way to communicate from this team to that team? Do we have the right support for the right types of databases
 
Will_Button:
you
 
Johnny_Dallas:
and the right types of compute available to me? It's more about the operations, whereas service teams are more thinking about, how do I expose this piece of infrastructure to the rest of the company? What I like about this is this is much more similar to how operate. There's a platform and there's consumers and producers on it. And it feels like a much more scalable architecture than go talk to that guy and he'll figure it
 
Will_Button:
Yeah,
 
Johnny_Dallas:
out.
 
Will_Button:
for sure.
 
Jonathan_Hall:
Yeah.
 
Will_Button:
And I think that one of the big takeaways from that, from my perspective is, look at your backlog. If you're on a DevOps team and you're considering like, wait, what's this gonna do to my job? I always tell people just look at your backlog because there's stuff, every team I've ever worked in, there's a backlog of stuff that you look at and you just laugh and like, yeah, I'll never get to that. tools like this aren't actually taking your job away. They're actually freeing you up to go take care of those things that you know you're currently not gonna get to until like 2048 or something.
 
Jonathan_Hall:
Yeah.
 
Johnny_Dallas:
Exactly. It's a free you up to get through your backlog and or take that thing that's two years down
 
Will_Button:
Yeah.
 
Johnny_Dallas:
your backlog and just give it to you on day one. Right? Like men of early platform teams I've talked to who are trying to figure out what the right way to interface with their platform is. And everyone starts with a developer portal like backstage or, you know, using tools like that. And everybody dreams of having a CLI that's custom built for their use case
 
Will_Button:
Hahaha
 
Johnny_Dallas:
and they will never build it. it's kind of years down the line in the kind of ideal state that they want.
 
Will_Button:
Yeah.
 
Johnny_Dallas:
It's like, yeah, you're just not gonna get there. Let me help you. Ha ha ha.
 
Will_Button:
So Jonathan, what do you think?
 
Jonathan_Hall:
I've been looking through the website and I have to say I like the green.
 
Will_Button:
Yeah
 
Johnny_Dallas:
I'm glad you like the green. We're a fan of the green. I still use a green and black terminal, you know, the old hacker man style.
 
Jonathan_Hall:
I think
 
Will_Button:
Right. I mean, that's
 
Johnny_Dallas:
So,
 
Will_Button:
instant credibility.
 
Johnny_Dallas:
want to bring some of that in.
 
Jonathan_Hall:
yeah I think it's cool like I don't know if everybody does this it's
 
Johnny_Dallas:
You
 
Jonathan_Hall:
probably
 
Johnny_Dallas:
got
 
Jonathan_Hall:
one
 
Johnny_Dallas:
it.
 
Jonathan_Hall:
of my silly habits but I use the mask cursor to click and drag on text I'm reading to sort of keep my place and as you do that it highlights in bright green so that's kind of cool
 
Will_Button:
Oh
 
Jonathan_Hall:
Sorry,
 
Will_Button:
wow,
 
Jonathan_Hall:
that's
 
Will_Button:
that's
 
Jonathan_Hall:
not really
 
Will_Button:
check that
 
Jonathan_Hall:
relevant
 
Will_Button:
out. Ha
 
Jonathan_Hall:
to the topic,
 
Johnny_Dallas:
Hehehehe
 
Will_Button:
ha
 
Jonathan_Hall:
but
 
Will_Button:
ha.
 
Jonathan_Hall:
it's fun. What's that, Will?
 
Will_Button:
No, I was just, I was just doing that. I was like, holy shit. I mean, oops. Like, holy
 
Jonathan_Hall:
Ha ha ha 
 
Will_Button:
shit. I done.
 
Johnny_Dallas:
Ha ha ha
 
Will_Button:
This used to be a PG podcast.
 
Jonathan_Hall:
Yeah.
 
Johnny_Dallas:
Heheheheh.
 
Will_Button:
But that's how cool the feature is. I'm just going to
 
Jonathan_Hall:
Yeah.
 
Will_Button:
be honest.
 
Jonathan_Hall:
Let's talk about your CSS style now. Forget about DevOps.
 
Will_Button:
Right.
 
Jonathan_Hall:
We're
 
Will_Button:
Hahaha.
 
Jonathan_Hall:
talking about CSS.
 
Johnny_Dallas:
The truest
 
Jonathan_Hall:
Bye.
 
Will_Button:
Right?
 
Johnny_Dallas:
developer
 
Will_Button:
Yeah.
 
Johnny_Dallas:
operations is writing CSS, or is highlighting to retext. What are devs if not people who spend all day reading docs and
 
Jonathan_Hall:
Yeah.
 
Johnny_Dallas:
keeping track of where they're at anyways? Well, I'll pass the compliment along to our designers. I'm sure they'll be happy to
 
Jonathan_Hall:
Thanks for watching!
 
Johnny_Dallas:
see somebody paid attention to the details. I know there was a lot of decision making around the green.
 
Will_Button:
for
 
Jonathan_Hall:
So I'm
 
Will_Button:
sure.
 
Jonathan_Hall:
kind of curious, like if somebody wants to onboard with ZEET, this is one of the things I've been trying to figure out as I'm reading the website. Help me understand, like what's the specific problem that the decision maker is gonna be having that would cause him to click on the Try Now button?
 
Johnny_Dallas:
Yeah, so for early teams, teams who are more like the startup I was describing earlier, lots of application developers, few DevOps engineers, or few kind of people thinking about infrastructure. The try now is if I need to deploy something and I don't know how to, or there's something more complex that I need to manage, how am I going to do that? I'm looking for some easy platform. And I know I'd prefer to be in AWS because maybe I have credits or I expect in a more complex way.
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
And it's just a better place to stay in than some company that I don't necessarily know if it's going to be around for 10 years. That's kind of the one big reason people
 
Will_Button:
you
 
Johnny_Dallas:
come to us is I need to deploy. I want to do it scalably. I want to do it effectively. Give me a nice clean interface. I think the other one, which is more interesting
 
Will_Button:
you
 
Johnny_Dallas:
is larger teams who, you know, have a DevOps team, have an application or product engineering team. really struggling with self-service. Maybe they just hired a bunch of new application developers who are new grads and are entering the industry and learning about all of the tools that they don't know about.
 
Will_Button:
you
 
Johnny_Dallas:
Maybe they just grew quite a bit and they have kind of new infrastructure capabilities that need to add. Maybe they're doing some big digital transformation, whatever the reason, they feel like infrastructure isn't accessible to enough of the engineering team. And what we do there is we work with the infrastructure engineering
 
Will_Button:
you
 
Johnny_Dallas:
to turn any kind of templates or any existing services they have into accessible services on Z. And then their application teams are able to come in and click or
 
Will_Button:
you
 
Johnny_Dallas:
use YAML or use API or CLI to provision the infrastructure in a more intuitive way. So those are the two big use cases we really see today. Early teams need a platform to deploy, later stage teams looking to
 
Jonathan_Hall:
And in that first
 
Johnny_Dallas:
really enable
 
Jonathan_Hall:
category,
 
Johnny_Dallas:
self service.
 
Jonathan_Hall:
who are the, who's approaching you? Is it a founding engineer or CTO type person? Or is it a non-technical founder? Who's,
 
Will_Button:
Thanks for watching!
 
Jonathan_Hall:
who's making these, uh, these calls to you?
 
Johnny_Dallas:
We like to say
 
Jonathan_Hall:
Okay.
 
Johnny_Dallas:
the
 
Will_Button:
Hahaha
 
Johnny_Dallas:
DevOps by default person. I think there's always one. The person who didn't really realize that this was gonna be their job when they started, but
 
Jonathan_Hall:
Uh huh.
 
Johnny_Dallas:
now is on the hook for all the DevOps stuff. Often this is a founding engineer or
 
Will_Button:
you
 
Johnny_Dallas:
technical founder. Sometimes it's a senior engineer who fell into this trap. But DevOps by default is I
 
Jonathan_Hall:
Nice,
 
Johnny_Dallas:
think the
 
Jonathan_Hall:
nice, yeah.
 
Will_Button:
which
 
Johnny_Dallas:
relatable
 
Will_Button:
probably
 
Johnny_Dallas:
phrase
 
Will_Button:
describes
 
Johnny_Dallas:
there.
 
Will_Button:
most of us who have been doing DevOps for several years. Like, damn, how did I end up in this spot?
 
Johnny_Dallas:
Yeah, it's funny once you start doing it, it kind of just doesn't stop. But it's a fun job.
 
Will_Button:
Oh, for sure, for sure.
 
Johnny_Dallas:
Always new challenges, right?
 
Will_Button:
I think one of the other interesting aspects of this is on the side of working with larger teams, one of the problems I've always faced is when you get to those larger teams, you have a bunch of different teams that are building products, and you have a smaller DevOps team. And ideally, you want your DevOps team to be working with that application team early development life cycle so that as they're making architectural decisions, you're there
 
Johnny_Dallas:
you
 
Will_Button:
to either steer them one direction or get heads up of what's coming your way. And as the number of application teams grows, it gets harder to do that. You find yourself in this position where teams are hitting you up saying, hey, we built this thing and marketing has the post to announce it scheduled for three hours from now. Can you deploy it? You're like, wait, what? And I think that's where something like this can help, because a lot of the applications that get built,
 
Johnny_Dallas:
you
 
Will_Button:
the application code is different, but how
 
Johnny_Dallas:
I'm going to go ahead and start the presentation.
 
Will_Button:
it runs and scales is the
 
Johnny_Dallas:
So, I'm going to start with the presentation of the
 
Will_Button:
same, whether it's an API
 
Johnny_Dallas:
first part of the presentation.
 
Will_Button:
or a single page
 
Johnny_Dallas:
So, I'm going to start with the presentation of the
 
Will_Button:
app or whatever. They're kind of
 
Johnny_Dallas:
first part of the presentation.
 
Will_Button:
all built the same way.
 
Johnny_Dallas:
So, I'm going to start with the presentation of the
 
Will_Button:
And so this allows
 
Johnny_Dallas:
first part of the presentation. So, I'm going to start with the presentation of the first part of the presentation. So, I'm going to start with the presentation of the first part of the presentation. So, I'm going to start with the presentation of the first part of the presentation. So, I'm going to start with the presentation of the
 
Will_Button:
you to give those application teams a way and you know that whenever they follow this path, all of the things that you want to happen are gonna happen whether they know that those need to happen or not. You know, things like you gotta run multiple instances across different availability zones or regions and you've gotta be behind a load balancer and use SSL and all of that stuff that they may not know needs to happen or if they do know that it needs to happen, they may not know how to implement it. click this button and all of that stuff is going to be done for you. And so that number one gives your application teams ability to move at their speed. It also takes workload off of your team. And when you take that workload off, it may actually free up enough time where you can work your way back with those teams to meet them earlier in their development lifecycle to handle the edge cases that are going to be coming down the road. regions in about an hour and a half. Is that cool?
 
Johnny_Dallas:
Yeah, I think that's exactly right. It's like if you give them the building blocks that are already blessed or already kind of the golden paths are already guaranteed to be the right architecture, you give them these building blocks and let them play with them. As long as you know that all the blocks are solid and anything they make out of them is going to be solid, you can take your hands off a lot and allow iteration to happen that much faster. I think fundamentally that's the job of DevOps, right? developers to move faster and make sure we don't get in the way. We do need to make sure that it's still done in a compliant way
 
Will_Button:
you
 
Johnny_Dallas:
and still we're still responsible for uptime and we're still responsible for compliance and security. So how do we do that without slowing down iteration? Because it's not be the guy who stands in the way and says, nope, can't release this, sorry, you guys got to push, push, push, push. Or at least it's you want to be there as little as possible. So give them the building blocks that they can use. I think, you know, we've kind of bridged this with Z,
 
Will_Button:
you
 
Johnny_Dallas:
we've kind of bridged this gap between these early teams that might use a platform as a service and these later teams who are actually gonna control their infrastructure themselves. And we've seen some interesting examples of this where we have a default app template which deploys on Kubernetes and has a bunch of our kind of opinions around how we think web apps or web API should be deployed. And then teams, we've seen teams that are a little bit later stage come in, customize that template and say, security. This is how we
 
Will_Button:
you
 
Johnny_Dallas:
think about load balancing. These are our compliance requirements and the app developers don't even know. They are just deploying apps. They have no idea that the underlying infrastructure changed and now suddenly everything's compliant.
 
Will_Button:
Yeah.
 
Johnny_Dallas:
The ops team is ecstatic and app devs just keep pushing code.
 
Jonathan_Hall:
Nice. That's kind of the dream for a developer, isn't it? They can just focus on their code. They don't care if it's being deployed to Kubernetes or to Heroku or to a bare EC2 instance or
 
Will_Button:
Thanks
 
Jonathan_Hall:
bare metal
 
Will_Button:
for watching!
 
Jonathan_Hall:
or a Raspberry Pi or whatever. They just want to write their code and see the results in production and get the logs when something goes wrong.
 
Johnny_Dallas:
Exactly.
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
One of my, one of my favorite questions is ask developers when doing any sort of like architecture review or starting some engagement of here's how we're going to help you with infrastructure is, Hey, just describe to me what the application looks like on a whiteboard and be as you can draw whatever boxes you want. We're going to stick to those. Um, the, the whiteboard level of abstraction. Cause nine times out of 10, people do not go up and
 
Will_Button:
Hahaha
 
Jonathan_Hall:
Right.
 
Johnny_Dallas:
draw a box that says EC2 nor Kubernetes nor ECS or Fargate or subnets or AZs. They draw a box that says API and they draw a box that says website and they draw a line between them. And that's about as complex as it gets, but that's how application developers should think about it. Because if they had to imagine all the complexity, it'd be impossible to do their job. So we as DevOps engineers need to help, or we as DevOps or platform engineers need to help application developers stay at the whiteboard level of production, think like that. Give them those building blocks. Cause that's what's intuitive to them. And that's what makes sense to them. We
 
Jonathan_Hall:
Yeah,
 
Johnny_Dallas:
have
 
Jonathan_Hall:
nice.
 
Johnny_Dallas:
to make them solid.
 
Will_Button:
What's your term of choice? Do you prefer DevOps engineer or platform engineer? I'm going to go to bed. Bye.
 
Johnny_Dallas:
Gosh, that's the hardest question of this industry. Man, I feel like the titles in the DevOps world are so dynamic and so often misconstrued and used in different ways. I like DevOps Engineer because I think it speaks to the point of this all, which is operations for developers. That's why we're here. We're here to support devs. We're here to make devs happen. Platform engineering is, I think, a specific type of DevOps engineer, let's say, platform engineers, who are really building that platform and not really being the service owners. So as we think about this new world we're entering, and there's platform engineers and service owners, I think platform engineers are a more specific version of DevOps engineer. But I also talk to people who use SRE instead of DevOps engineer or infrastructure engineer or automation engineer. It doesn't really matter.
 
Will_Button:
I think over my career I've had the title of, let's see, sysadmin, IT engineer, DevOps engineer, SRE, and probably others. And throughout all of them, I feel like I've kind of always had the same job and same role despite having all of those different titles.
 
Jonathan_Hall:
Was that job the office gesture or more substantial than that?
 
Johnny_Dallas:
the
 
Will_Button:
right? I'm
 
Jonathan_Hall:
Ha ha ha

Will_Button:
Comic Relief. Comic Relief and Fall Guy. Yeah, absolutely.
 
Jonathan_Hall:
That's right,
 
Johnny_Dallas:
Whatever helps the
 
Jonathan_Hall:
devs
 
Johnny_Dallas:
devs.
 
Jonathan_Hall:
need humor too.
 
Johnny_Dallas:
Someone's gotta do it. Yeah, since that had been I haven't heard in a little while, I feel like that's dropped off quite a bit, though definitely worth
 
Will_Button:
Yeah,
 
Johnny_Dallas:
the
 
Will_Button:
for
 
Johnny_Dallas:
mention
 
Will_Button:
sure.
 
Johnny_Dallas:
as well.
 
Will_Button:
I look at the salary surveys that all the different marketing firms put out every year. And it's funny because sysadmin is on there. But the pay scale for that is quite a bit lower than a DevOps engineer. And it's one of the nuggets of knowledge I try to share for people who are sysadmins trying to figure out what to do with their career. I'm like, hey,
 
Jonathan_Hall:
Change
 
Will_Button:
you're
 
Jonathan_Hall:
your job
 
Will_Button:
kind
 
Jonathan_Hall:
title.
 
Will_Button:
of already doing the DevOps thing, polish up your resume and go make some extra cash.
 
Jonathan_Hall:
Yeah.
 
Johnny_Dallas:
Yeah, the different compensation bands for the different titles is pretty insane. I think I saw some platform engineering survey and platform engineers are just DevOps engineers switching to the title platform engineer, get a 30% salary increase. And it's like, oh, okay. Yeah, just do that. Just a question of which technologies you're using, but it's the
 
Will_Button:
Yeah.
 
Johnny_Dallas:
same job at the end of the
 
Will_Button:
Yeah.
 
Johnny_Dallas:
day.
 
Will_Button:
And I think that's important to highlight is for me, like the, the job you strip away all the marketing terms, all the technical terms. Um, we're there. The software development teams are our customer and our job is to help them build their product successfully. Yeah,
 
Johnny_Dallas:
however that looks,
 
Will_Button:
exactly,
 
Jonathan_Hall:
you
 
Will_Button:
exactly. There's
 
Johnny_Dallas:
make
 
Will_Button:
no
 
Johnny_Dallas:
that
 
Will_Button:
rules
 
Johnny_Dallas:
happen.
 
Will_Button:
beyond that, whatever it takes.
 
Johnny_Dallas:
Use the best tools at your disposal and
 
Jonathan_Hall:
Ha ha ha
 
Johnny_Dallas:
use that
 
Jonathan_Hall:
ha.
 
Johnny_Dallas:
magic one size fits all
 
Will_Button:
Hahaha
 
Johnny_Dallas:
solution that we all know about that definitely exists and you're set.
 
Will_Button:
the one solution we're not allowed to talk about.
 
Jonathan_Hall:
Right.
 
Will_Button:
Hahahaha
 
Johnny_Dallas:
Yeah, yeah, you know that one, that tool, brr, that we all... Yeah, I wish
 
Jonathan_Hall:
Yeah,
 
Will_Button:
Right?
 
Johnny_Dallas:
they
 
Jonathan_Hall:
it's
 
Johnny_Dallas:
stopped
 
Jonathan_Hall:
so annoying.
 
Johnny_Dallas:
editing it out on this podcast. It's so weird, but yeah. Yeah.
 
Will_Button:
So what's the future look like for Z? Tell me about something new and cool that you're going to do.
 
Johnny_Dallas:
You know, I think one of the really exciting things about the, or one of the just like fun things about the job we get to do is we get to focus on actually improving DevOps
 
Will_Button:
BOOM
 
Johnny_Dallas:
problems. What I mean by that is, I think most DevOps teams kind of play catch up or end up getting put into a spot of playing catch up or playing support where, oh, hey, the CI pipeline is broken. Come and fix it. Or, oh, hey, we have to get this launch out in the next two hours. you can help me with that, right? Or, oh, hey, there's a massive security vulnerability. You're on top of that, right? And developers engineers don't get a ton of time to actually
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
improve their systems or do system engineering or think about what is a better
 
Will_Button:
you
 
Johnny_Dallas:
version of this world look like and then get there. But we do. We are a team of developers engineers. We are a team of infrastructure engineers who are dedicated to just doing that. And with some of the fun, exciting new things. And we have a launch coming out in the next couple of weeks that's going to be very exciting, which adds a lot of capabilities for DevOps engineers or infrastructure engineers to customize our platform. So we're going to be allowing you to bring in kind of external resources, put them into Z, get a bunch of the platform features that Z provides already to whatever resource you have. So you'll get an audit log, you'll get various interfaces, ability to roll back changes, whether it's a GitHub project
 
Will_Button:
Peace.
 
Johnny_Dallas:
or Kubernetes manifest that you want to reference, or Terraform module, or anything else that you represent in code. That's one thing we're going to be coming out with soon. And then more and more of our own first party modules that have really powerful logic built
 
Will_Button:
Thanks
 
Johnny_Dallas:
into them.
 
Will_Button:
for watching!
 
Johnny_Dallas:
Specifically around multi-cloud, I think there's going to be some really cool things we come out with here. We interviewed a team
 
Will_Button:
Thanks.
 
Johnny_Dallas:
recently that has a custom compute module that decides whether it should deploy on AWS or Google Cloud at schedule time, based off of whatever is cheaper at the moment.
 
Will_Button:
Thanks for watching!
 
Johnny_Dallas:
And they just, all of their compute-based workloads that aren't really data related and insensitive, they throw on this compute module and
 
Will_Button:
Oh wow.
 
Johnny_Dallas:
they had like 75% savings. So I'd love to come out with a module like that, that just gives all of our thousands of customers instant access to this multi-cloud cost savings, because we have everything available for us. It's something we can build. I want to give to everybody, everybody should be leveraging these kind of innovations. So I think it's a really fun spot about building a dev tool and building an infrastructure tool for DevOps people is you get to go do the like 20% time that you always want to do when you're a DevOps engineer, but you never really get time
 
Will_Button:
Yeah.
 
Johnny_Dallas:
for that's, that's our job. That's what we get paid to work on full time. It's great.
 
Will_Button:
Yeah, and then to be able to pass those, to build that and release it and let a company take on the benefits of that specifically in cost savings because that's such a hard problem in cloud providers. Currently, we can scope everything out and plan the right size and then you can do reserved instances in AWS or I forget what the GCP term is for it, Basically, you make a one or a three year
 
Jonathan_Hall:
preemptible
 
Will_Button:
commitment
 
Jonathan_Hall:
I think.
 
Will_Button:
to it.
 
Jonathan_Hall:
Oh, reserve,
 
Will_Button:
Yeah.
 
Jonathan_Hall:
no sorry, I think the opposite. Yeah, yeah.
 
Will_Button:
Yeah, preemptible is their version of a spot instance. But
 
Jonathan_Hall:
Yeah, right.
 
Will_Button:
yeah, the reservations. And yeah, so if you can just bypass all that, because that's kind of the world we live in, right? Instant gratification, no commitments. And asking somebody for a one year commitment in anything these days Like it's it's hard
 
Johnny_Dallas:
Ha ha ha
 
Will_Button:
enough to get somebody to commit to a 20-second TikTok video much less when you're
 
Jonathan_Hall:
Ha ha ha 
 
Will_Button:
coming to an EC2 instance
 
Jonathan_Hall:
Indeed.
 
Johnny_Dallas:
Yeah, totally. I think the cost savings is there's tons of opportunity. The, the, uh, uh, maybe even the cost strategy around switching clouds is something that we should come out with as a module, help people do that more easily. But there's so many opportunities to kind of make these systems more efficient. And so many of them are just kind
 
Will_Button:
All right.
 
Johnny_Dallas:
of gated by work. Like doing, doing customization, reserved instances, anyone can do that. have care and there's so many teams out there that would benefit massively from that and just don't have that person. So how can we scale this with software instead of just you know tribal knowledge and
 
Will_Button:
I'm
 
Jonathan_Hall:
you
 
Johnny_Dallas:
hope that
 
Will_Button:
going
 
Johnny_Dallas:
you have
 
Will_Button:
to
 
Johnny_Dallas:
the guy
 
Will_Button:
go.
 
Johnny_Dallas:
who's been burned before on your team.
 
Will_Button:
Cool. What else should we talk about?
 
Johnny_Dallas:
One segment that we've started doing, or one of my favorite questions to ask other DevOps and future people is, you know, everyone has their favorite horror story of something crazy going down in a developer capacity, whether it's downtime or, you know, crazy on call or something like that. Well, I think I already asked you about yours, but Jonathan, I'd be curious, like, your favorite DevOps and horror story.
 
Jonathan_Hall:
Well,
 
Will_Button:
you
 
Jonathan_Hall:
I'll usually tell the story early, early in my career. I think it was 2006. I had just gotten hired as a network engineer. I'm another one of those titles that does the exact same thing as everyone else. And so the company hired me to basically set up a private cloud, if you will. We didn't call it that back then, but that's what it was for spam filtering service that they had written. So they needed someone to come in, a developer to come in, and a network engineer to come in, and they were gonna try to make this cloud thing. And it was just a few weeks after starting, and the new developer made a change that broke things.
 
Will_Button:
Hehe.
 
Jonathan_Hall:
Pushed it, so we had, we installed this service on premise to our clients at the time. We weren't in the cloud yet. I was hired to put it in the cloud. or 200 something customers and every one of them broke. Stopped processing mail which meant they couldn't email us for support and so of course they started calling us and the funny thing is this is on Black Friday. For some god awful reason we decided to push
 
Johnny_Dallas:
Oh no.
 
Jonathan_Hall:
this untested change on Thanksgiving weekend. And
 
Johnny_Dallas:
Yeah
 
Jonathan_Hall:
yeah so it was crazy it turns out the problem was a single bit had not been set on a script, so it was not
 
Will_Button:
Hahaha!
 
Jonathan_Hall:
executable. So it was a single bit that kind of changed my
 
Johnny_Dallas:
Hehehehe
 
Jonathan_Hall:
career because I was pretty green. I didn't really have a lot of experience with this stuff. I knew how to code a bit. I didn't really consider myself even a coder yet. I kind of cut my teeth on coding at that job. I could hack Perl scripts and bash scripts together, but I wasn't really a coder. So yeah, that's kind of the story that Open my eyes, I guess, in a way to... platform engineering, resilience engineering, kind of even just sort of best practices about coding, like, oh, we should write tests to detect this for us. You can never be smart enough to catch everything. You have to rely on systems and tools to help you out. So that's kind of the lesson I took from that.
 
Johnny_Dallas:
That's a good one. Getting your mail or your customers unable to send support mail because their mailbox
 
Jonathan_Hall:
Yeah.
 
Johnny_Dallas:
is down is
 
Will_Button:
Hahaha
 
Johnny_Dallas:
a nice little chicken and egg there too. Ha ha
 
Jonathan_Hall:
Exactly.
 
Johnny_Dallas:
ha.
 
Will_Button:
What about you, Johnny? Give us a good story.
 
Johnny_Dallas:
I think my best story was probably in, so a couple of months after we'd built kind of the initial version of this platform that handled all of our DevOps and all of our infrastructure back at that previous startup we were talking about earlier. One of the big pieces of infrastructure that we had to manage was Elasticsearch cluster. And this was a very manually managed resource. There was no communities, no home chart, no managed Elasticsearch. EC2 instances in an S to H connection. So I was in charge of managing this, but we had just hired a new engineer who was a fresh grad from Cornell, and the two of us were working on optimizing it together. Mind you, I'm still in high school, so I'm only showing up after three o'clock, and I'm like a 15-year-old kid, and this is a fresh Cornell grad who's really trying to prove himself. And I was in charge of teaching him kind of the ropes He
 
Will_Button:
Yeah
 
Johnny_Dallas:
really knew, thought I knew what I was doing, but we worked together on this elastic search project for a week or two together, and it comes time to go deploy it. And we go and deploy it, all was great, massive efficiency improvements, everyone's happy. All, we all sleep well that night. The next day is a Saturday, and I'm out in the middle of the day, getting lunch with some friends or something, and I get a text from this new intern. And he's like, hey, real quick, go to this URL and he sends me a 502 URL. I'm like, uh-oh, why is this 502-ing my guy? This is a little concerning, what's going on? And he just sends me a screenshot
 
Will_Button:
you
 
Johnny_Dallas:
of the AWS console and he's logged into the Ohio US use two region and it's the EC2
 
Will_Button:
I'm going to
 
Johnny_Dallas:
tab
 
Will_Button:
go ahead and turn it
 
Johnny_Dallas:
and
 
Will_Button:
off.
 
Johnny_Dallas:
there's no instances there. We had a full running in Ohio region. What happened? Where did it go? We had a hundred instances here yesterday and he had accidentally clicked select
 
Will_Button:
Hahaha
 
Johnny_Dallas:
all and terminate
 
Jonathan_Hall:
Whoa,
 
Johnny_Dallas:
in the AWS
 
Jonathan_Hall:
how do
 
Johnny_Dallas:
console.
 
Jonathan_Hall:
you do that?
 
Johnny_Dallas:
and deleted our entire Ohio region.
 
Jonathan_Hall:
Did he think he was on a different instance or something?
 
Johnny_Dallas:
Yeah, I asked the same question. He thought he'd selected, I think, just one of them. He was trying to push out an update to the cluster we just got in live yesterday, and he'd accidentally clicked select all. And we didn't have proper IAM permissions in place
 
Jonathan_Hall:
Yeah.
 
Johnny_Dallas:
to stop that from being a bad action. So I had to run home, miss my lunch, fix this, spin up the new region. Thankfully, because we have this platform, spinning up the new region took only one click, and then a lot of stressful watching. waiting. But it did come up successfully and we were able to recover in the about an hour. But that was
 
Will_Button:
you
 
Johnny_Dallas:
a great learning I think for me of you know trust your teammates
 
Will_Button:
you
 
Johnny_Dallas:
but also make sure there's controls in place to prevent a mistake from from going bad and make sure you have systems in place to recover. Being able to spin up that whole new region was a godsend. I would have had to spend the rest of my week if not week or weekend if not week spinning that up if we didn't have that in place. So that was probably my worst outage and it's completely
 
Jonathan_Hall:
you
 
Johnny_Dallas:
not
 
Will_Button:
Yeah.
 
Johnny_Dallas:
caused by anything real. If it was customers scaling that'd be one thing, but just an internal mistake
 
Jonathan_Hall:
the majority
 
Johnny_Dallas:
has
 
Jonathan_Hall:
of industry mistakes
 
Johnny_Dallas:
made it that
 
Jonathan_Hall:
are
 
Johnny_Dallas:
much
 
Jonathan_Hall:
like
 
Johnny_Dallas:
more
 
Jonathan_Hall:
that.
 
Johnny_Dallas:
painful.
 
Jonathan_Hall:
You know, you're here when Facebook or Fastly or whoever, all
 
Johnny_Dallas:
Very
 
Jonathan_Hall:
the big
 
Johnny_Dallas:
fair.
 
Jonathan_Hall:
ones that make the news, or even Twitter recently, it was somebody pushed to change and didn't know what was happening, or they deleted the wrong thing, or GitHub published an SSH key by accident, you know, it's people. And that's not to blame the people, but it's, we are the weakest link.
 
Johnny_Dallas:
Yeah, we can make automation. That's how we need to do all of this.
 
Jonathan_Hall:
Yep.
 
Johnny_Dallas:
It's the only thing that
 
Will_Button:
Right,
 
Johnny_Dallas:
scales, and it's the only
 
Will_Button:
for
 
Johnny_Dallas:
thing
 
Will_Button:
sure.
 
Johnny_Dallas:
that's reliable.
 
Will_Button:
I had a very similar incident to the one you just told, Johnny. I was working for this company. We had a MongoDB cluster that we were running on EC2 instances. And it was pretty active. It consumed retail product feeds and either updated or added about 18 million products per hour from this feed data guide. I highlighted the EC2 instances and clicked terminate thinking that I was on the dev cluster that I just built. Turns out I wasn't. I was on the prod cluster and I nuked the entire prod cluster.
 
Johnny_Dallas:
I'm sorry.
 
Jonathan_Hall:
I've done that sort of thing so many times. Dropped
 
Will_Button:
Yeah,
 
Jonathan_Hall:
a
 
Will_Button:
so
 
Jonathan_Hall:
database
 
Will_Button:
the takeaway
 
Jonathan_Hall:
table
 
Will_Button:
for
 
Jonathan_Hall:
on
 
Will_Button:
me,
 
Jonathan_Hall:
the wrong server. Yeah.
 
Will_Button:
right? Yeah. So the big takeaway for me was some of the things I learned is use good naming for your EC2 instances, because it popped up the message there, says, are you sure you want to terminate it and gave the name of the instances? And I'm like, yeah, yeah, yeah. I know those are the ones I want to delete. I just clicked the check box, OK? And so I clicked Yes. So good naming so that it's clear what you're working on. termination protection so that you can't terminate your production instances. Like you mentioned, IAM permissions to add an extra layer of protection for you there. Another good one that I like to use now is separate AWS accounts so that your production account is a different AWS account than your development account or if you're over in GCP that it's a different project. Yeah, so all that. The only saving grace for me there was all of the data was on EBS volumes that did not delete when I terminated the instance. So I was able to launch a new MongoDB instance
 
Johnny_Dallas:
Nice.
 
Will_Button:
and just reattach those instances and all the data was there. So it was only down for, I don't know, maybe an hour or so, but it was still a painful hour. Yeah.
 
Johnny_Dallas:
That is a, that is very lucky. That's awesome. Um, that are either lucky or, or well,
 
Will_Button:
Oh
 
Johnny_Dallas:
well
 
Will_Button:
no,
 
Johnny_Dallas:
engineered,
 
Will_Button:
it was pure luck.
 
Johnny_Dallas:
well
 
Will_Button:
It was
 
Johnny_Dallas:
thought
 
Will_Button:
pure
 
Johnny_Dallas:
out.
 
Will_Button:
luck.
 
Johnny_Dallas:
Um, trying
 
Will_Button:
Yeah, yeah.
 
Johnny_Dallas:
to give you a little bit of credit, you know, um, I, I love that. I, I think how do you quickly recover when something goes bad? Cause something will go bad. Um, but just like the first time you're in that sit, position and you look around and you're like, Oh shoot, it's all broken and I have to fix this is a, it's a moment. You know, it's a, it's a, this is a funny moment in your, in your DevOps career of you make it past that and yeah,
 
Will_Button:
Yeah,
 
Johnny_Dallas:
you
 
Will_Button:
it's
 
Johnny_Dallas:
can,
 
Will_Button:
a growth
 
Johnny_Dallas:
you can bring
 
Will_Button:
opportunity.
 
Johnny_Dallas:
broad back. Good job.
 
Jonathan_Hall:
I think that's really the
 
Johnny_Dallas:
Exactly.
 
Jonathan_Hall:
one of the most important lessons anybody can learn in this industry is that things will go wrong and stop trying to prevent that. I mean, I don't mean that literally. I mean, there are definitely things we should do to prevent things from going wrong. But if that's what you're banking on, you will fail. You must assume that things will go wrong. The production database will be deleted. You will write bugs, et cetera, et cetera, and optimize for recovery, not for prevention.
 
Will_Button:
Yeah, absolutely.
 
Johnny_Dallas:
One of the early kind of bug isn't necessarily the right word, but architectural misstep
 
Jonathan_Hall:
Sounds
 
Johnny_Dallas:
that
 
Jonathan_Hall:
very
 
Johnny_Dallas:
we made
 
Jonathan_Hall:
politically correct.
 
Johnny_Dallas:
with kind of
 
Will_Button:
Yes.
 
Johnny_Dallas:
this
 
Will_Button:
That's
 
Johnny_Dallas:
initial
 
Will_Button:
it. Ha
 
Johnny_Dallas:
platform.
 
Will_Button:
ha ha.
 
Johnny_Dallas:
Yeah, thank you. Well, you'll hear why. We had the system that, you know, stored all the definitions of how our applications were supposed to run and build and all this. stored in some central database and some central API. And then every, every region of our, of our service would, would pull from this database and determine if it had all the right services up and running, and if not, spin them up. Well, the service that actually defined what should be available was also deployed through the system. And so there was one time that we had the system go down and it couldn't come back up because the system that was a necessary dependency of it also wasn't up. come up because
 
Will_Button:
Yeah.
 
Johnny_Dallas:
the system was down. And we just got into this chicken and egg where our entire API and our entire service was down because suddenly the database was saying, oh, there's supposed to be no services running. What do you mean? This is empty. And so every region was actively spinning itself down. And that one was a catastrophe, honestly, of just cascading both within every... We were deployed to 15 regions at the time and we had tens of thousands of EC2 instances And I was looking at every single
 
Will_Button:
Hahaha
 
Johnny_Dallas:
region dwindling rapidly going just minus one instance, minus one instance, minus one instance. And then at the same time, every region disappearing off the book. So it was just everything collapsing. And the fix to all of this was I ended up SSHing onto the admin box that had this database that was the central kind of source of truth. And just running a Git clone and running the service myself. And I just had to run it for one minute to get everything back up and running. You know what? You just had to do what you had to do. I don't run Git on prod, don't have state gender prod. These are bad practices, but got the system back up, re-architectured it, fixed that problem. But I wasn't going to, oh no, I have to re-architecture this whole system and design a new system because everything is down right now. No, I have to do what I have to do. Just get it back up and then fix the real problems because the problem is going to happen. If I try to design a perfect system right now, I'm just going
 
Jonathan_Hall:
Yep.
 
Johnny_Dallas:
your point there,
 
Will_Button:
task
 
Johnny_Dallas:
Robin.
 
Will_Button:
failed successfully.
 
Jonathan_Hall:
Thanks for watching!
 
Will_Button:
Yeah
 
Johnny_Dallas:
Выходу.
 
Will_Button:
Right on. So we are approaching the one hour mark on here. Should we do some picks?
 
Jonathan_Hall:
I think it sounds like a good idea.
 
Will_Button:
Alright, have you got one for us Jonathan?
 
Jonathan_Hall:
I do. I'm going to do a little bit of an unusual one this time. I was shaving a couple weeks ago when I dropped my razor on the ground and it snapped in half. I was so annoyed because I have a, what do they call it, a safety razor I guess, old style. I don't have one of those big fancy Gillette ones or whatever like 16,000 blades and vibrating sensors and 3D whatever. It's just a blades in it and it works great. And it fell and broke. So I thought, well, I'm gonna, I'm gonna splurge this time. I'm gonna research and find the best possible razor I can get. And I'm gonna buy that. And so I did some research and turns out it was a model I already had. Apparently
 
Will_Button:
Ha ha ha
 
Jonathan_Hall:
I did the same research 10 years ago when I bought this one.
 
Johnny_Dallas:
Hehehe
 
Jonathan_Hall:
Let me find the model
 
Johnny_Dallas:
I'm sorry.
 
Jonathan_Hall:
here so I can give the proper pick. I have the wrong window open. So my pick for the the week is the Merkur, M-E-R-K-U-R. It's a German company apparently. Merkur Dovo Safety Razor 34C. They have two models. They have one of the short handle, one of the long handle. I got the short handles because it takes up less space in my luggage when I'm traveling. But I use it at home too. So Merkur Dovo 34C. It's about
 
Will_Button:
you
 
Jonathan_Hall:
40, 50 bucks, but then you fill it for about 10 years till you drop it on the floor with 10 cent razors. So it actually saves a ton of money over one of those big fancy 3D app based ones, whatever they have these days.
 
Will_Button:
Yeah, which are really expensive and don't seem to last that long.
 
Jonathan_Hall:
Yeah.
 
Will_Button:
All right,
 
Jonathan_Hall:
That's me.
 
Will_Button:
Johnny, Johnny, if you got a pick for us.
 
Johnny_Dallas:
Yeah, let's see, I like this bit you guys do. I'm probably gonna go with the, just started reading this book yesterday, tomorrow and tomorrow and tomorrow, enjoying it so far. So I'm gonna go, that's my pick for the week. Don't have a full review yet, cause I just
 
Will_Button:
All
 
Johnny_Dallas:
started
 
Will_Button:
right,
 
Johnny_Dallas:
yesterday, but
 
Will_Button:
right on.
 
Johnny_Dallas:
ask me in a week.
 
Will_Button:
So for me.
 
Jonathan_Hall:
Is it fiction or what is it? What's the genre at least? Okay.
 
Johnny_Dallas:
Yeah, it's a novel about kind of a game designer future, or a couple who meets in a game design school in the future and ends up changing kind of how the gaming world works.
 
Jonathan_Hall:
Interesting.
 
Johnny_Dallas:
Very ready
 
Will_Button:
Oh
 
Johnny_Dallas:
player
 
Jonathan_Hall:
Cool.
 
Johnny_Dallas:
one and platform
 
Will_Button:
nice!
 
Johnny_Dallas:
engineering.
 
Will_Button:
Well yeah well played, well
 
Johnny_Dallas:
Tie
 
Will_Button:
played!
 
Johnny_Dallas:
it in, you know, gotta, gotta pull it in.
 
Will_Button:
Cool, my pick for the week is actually going to be Polygon's new ZKEVM blockchain. We launched it yesterday, but by the time this podcast comes out, that will be a couple of weeks ago. But the ZKEVM chain is super cool because it's a blockchain that uses zero knowledge proofs that allows us to scale Ethereum and then write those transactions back to Ethereum and safety provided by Ethereum using zero knowledge proofs and the ZK team at Polygon has put in a massive amount of effort for this and are actually launching this at least a year if not a couple years ahead of when anyone thought that we would have this possible. So pretty exciting stuff if you're interested in blockchain or want to see how this looks KEVM is a cool way to get introduced to that. So that's my pick for the week. And then, Johnny, if people want to learn more about ZEET or engage with you, how can they do that?
 
Johnny_Dallas:
Yeah, best place is probably their Twitter or LinkedIn. ZEET is Z-E-E-T. Our website is zeeet.co. You can find me online at Johnny Dallas. I'm the only Johnny and it's my typical username on most platforms. But yeah,
 
Will_Button:
Right on.
 
Johnny_Dallas:
love
 
Will_Button:
And then you have
 
Johnny_Dallas:
to
 
Will_Button:
a
 
Johnny_Dallas:
talk to
 
Will_Button:
podcast.
 
Johnny_Dallas:
anyone who's listening. If
 
Will_Button:
Yeah. You've got a podcast
 
Johnny_Dallas:
you're interested
 
Will_Button:
also,
 
Johnny_Dallas:
in platform
 
Will_Button:
right?
 
Johnny_Dallas:
engineering. Yeah. And we've run a, we run a podcast, um, talking about platform engineering and all things DevOps called the platform, um, over on the YouTube channel, um, where you can find it on my socials as well. So if you're ever interested in learning more about platform engineering, multi-cloud and kind of where we're, where we're headed as an industry, um, want to hear me, hear me talk about it more.
 
Will_Button:
All right,
 
Johnny_Dallas:
Come,
 
Jonathan_Hall:
Awesome.
 
Will_Button:
well
 
Johnny_Dallas:
uh, come
 
Will_Button:
thank you
 
Johnny_Dallas:
hang
 
Will_Button:
so
 
Johnny_Dallas:
out,
 
Will_Button:
much for being
 
Johnny_Dallas:
come listen.
 
Will_Button:
on the show today.
 
Johnny_Dallas:
Thanks for having me. It was
 
Will_Button:
you too
 
Johnny_Dallas:
great
 
Will_Button:
and
 
Johnny_Dallas:
to chat
 
Will_Button:
thanks
 
Johnny_Dallas:
with you
 
Will_Button:
for
 
Johnny_Dallas:
guys.
 
Will_Button:
listening everyone we'll see y'all in the next episode.
 
Jonathan_Hall:
See you later.
Album Art
Building Zeet with Johnny Dallas - DevOps 160
0:00
1:02:37
Playback Speed: