Serverless in Production With Erez Berkner - DevOps 151
Erez Berkner is the CEO and Co-Founder at Lumigo. He joins the show to talk about, "Lessons Learned From Running Serverless In Production For 5 Years". He begins the program by elaborating on his personal interpretation of Serverless. Moreover, he dives into the five important lessons he learned from running serverless workloads.
Show Notes
Erez Berkner is the CEO and Co-Founder at Lumigo. He joins the show to talk about, "Lessons Learned From Running Serverless In Production For 5 Years". He begins the program by elaborating on his personal interpretation of Serverless. Moreover, he dives into the five important lessons he learned from running serverless workloads.
About this Episode
- Observability strategy
- Use multiple AWS accounts
- Securely load secrets at runtime
- Follow the principle of least privilege
- Optimize cold starts
On YouTube
Sponsors
- Chuck's Resume Template
- Developer Book Club starting
- Become a Top 1% Dev with a Top End Devs Membership
Links
- Lessons Learned From Running Serverless In Production For 5 Years
- LinkedIn: Erez Berkner
- Twitter: @erezberkner
Picks
- Erez - The Godfather (1972) - IMDb
- Jonathan - Good Strategy Bad Strategy
- Jonathan - Cup o' Go
- Will - Therabody | World-Leading Wellness Solutions
Transcript
Will_Button:
What's going on everybody? Welcome to another episode of Adventures in DevOps. I'm today's host, Will Button, joining me in the studio, my cohost, Jonathan Hall.
Jonathan_Hall:
Hey everyone.
Will_Button:
And we have a special guest today, Eris Berkner, welcome Eris.
Erez_Berkner:
Anyway, Elijah, it's great to be here.
Will_Button:
We're excited to have you and today we're gonna be talking about lessons learned from serverless. So before we jump into that, do you wanna introduce yourself and tell us a little bit about your background?
Erez_Berkner:
Yeah, of course. So my name is Aris Brechner. Today I'm CEO and co-founder of Lumigo. I'm a developer or maybe today a developer by heart, but I spent the better years of my career as a developer in different companies, working from, you know, the deep, deep... deep, dark kernel drivers of Linux all the
Will_Button:
Yeah.
Erez_Berkner:
way to higher application, many times on the security space. And yeah, just a big fan of technology.
Will_Button:
right on. I think that goes hand in hand with people who have been doing this for decades. You know, like you kind of have to, this has to be your job and your hobby if you're going to persist at this for decades, I think.
Erez_Berkner:
Yeah, otherwise you might, you know, that's a, I think we, you know, we live in an era where at least at the, you know, at our industry, the high tech industry, many, many of us fortunately have the ability to combine your hobby and your, and your work. And honestly, if that's not the case, I think we are fortunate enough today to, you know, to have the ability to choose and to say, okay, so. I might just go and do something else. And I think that's why you're seeing a lot of people that really love what they're doing, really love coding, really love architecture, building, engineers, because I think this is the people that remains in this path.
Will_Button:
Yeah, and I think that adds a lot to how quickly we're seeing technology change, just because people are excited and passionate and actually enjoying what they're doing. I can't imagine that I would be this excited about work if I were, you know, say a roofer here in the desert and carrying shingles up onto a roof every afternoon.
Erez_Berkner:
No disrespect to roofers, right?
Will_Button:
Right?
Erez_Berkner:
Yeah, but
Will_Button:
For
Erez_Berkner:
yeah, yeah,
Will_Button:
sure. Yeah, for sure.
Erez_Berkner:
for sure. But I must say, I always, because I'm 100% software guy, I always admired people that are actually doing what I call actual work
Will_Button:
Yeah.
Erez_Berkner:
and actually building something physical, doing something physical. I always say like, you know, that's so much more than what I can do.
Will_Button:
Yeah, for sure. Like, if you, those people who can build something and they can point to it and you can walk up and you can touch it or kick it or whatever.
Erez_Berkner:
Absolutely. And then you can tell your grandmother about that. And she understands.
Will_Button:
Yeah! So y'all have been doing serverless quite a bit over at Lumigo, right? And by quite a bit, 100%, is that accurate?
Erez_Berkner:
That is accurate. I think today we degraded to 99%
Will_Button:
Yeah.
Erez_Berkner:
just
Jonathan_Hall:
Oh
Erez_Berkner:
to
Jonathan_Hall:
no, you're not purist
Erez_Berkner:
be
Jonathan_Hall:
anymore.
Erez_Berkner:
honest. No one will know you're pure. But we're very much focused on eating our own dog food, using serverless, and pushing the boundaries. of serverless for already five years now.
Will_Button:
Right on. That's enough time to develop some pretty strong opinions about it. So I think one of the things that stands out to me about serverless is if you're going to build and run a serverless application, you kind of have to start with that as your goal in the beginning. It's pretty challenging to... take an existing application that's built to run as a service or as a container and move that to a serverless type application. Would you agree?
Erez_Berkner:
It's a very interesting point. I think that it depends. I think maybe the basic is that service is not just a service, it's not just a specific technology, it's a mindset. And you know, Werner Vogel, a city of... of Amazon on his latest keynote in re.Invent talked about event-driven architectures, EDA, which many people say, okay, it's a rebranding of serverless or something like that. But the point is that you need to think in events, in decoupling different services in a very, very... architectural manner in order to go serverless. And so I think migrating from a container to a managed service shouldn't be like dramatic. It's more about how you actually build this in the right constructs, the right architecture, the right mindset. And this, I think it's not trivial because it's a new, it's like learning a new language. it's not just doing the same. It's not just like, okay, let's do the same over there. It's like, let's think about it from a different angle and it's not trivial and it requires time, it requires expertise. You need to get some consultation many times. And I think this is why it's perceived as significant change or migration when doing so. And the last thing is that many times it's done in baby steps. So if your architecture, you know, container environment in the right way, you can take a specific microservice and just migrate this and that you're done for this month and then maybe step like this.
Will_Button:
Right on. Jonathan, what do you think about serverless?
Jonathan_Hall:
It's something that's intrigued me, and I've never really played with it very much. So most of my knowledge is theoretical. So it's probably hard for me to ask very intelligent questions.
Will_Button:
Yeah.
Jonathan_Hall:
But I'll tell you, there's one thing in this article that really popped out at me and I think is a relevant discussion, probably for other topics, too. And that is the idea that logs are overrated. Do you want to talk about that a little bit maybe? Why are they overrated? Yeah, what do you replace them with?
Erez_Berkner:
Yeah, it's a bold statement, right?
Will_Button:
It is.
Erez_Berkner:
Yeah. So I think maybe just, just before diving into that, maybe it would help to actually define what serverless is just because I think many people have different definitions.
Jonathan_Hall:
Еще? Королев.
Erez_Berkner:
And there's a big fight over there, you know, ideologically, like even within, you know, the cloud providers, is that defined service or not? So by no means this is a complete definition or this is just my single person definition, but For me, the serverless architecture concept mindset is about consuming more and more managed services with APIs that can scale, opposed to just building, you know, renting a server and building your software on top of that. So instead of just, you know, getting a few servers and building your software over there, you use a lot of the managed, the software service. So be it database of the service like Snowflake or DynamoDB that you can just consume via API, you don't need to set, you don't need to build it. Manage queue like AWS, SNS, SQS, EventBridge, SRE, manage file system. And the point is that through this, Manage services like Lego pieces, you can today build almost every application just by connecting those Lego pieces together in a very fast way to build, not to build, but to, let's call it glue together, those different services. So when I talk about serverless, it's functional service like AWS Lambda's, but it's all the variety of managed services within the cloud provider or external. to the cloud provider and that's as a service concept that I refer to. So maybe that would be just like the first definition before we go into the logging, but
Jonathan_Hall:
So you
Erez_Berkner:
I'm
Jonathan_Hall:
would
Erez_Berkner:
just
Jonathan_Hall:
include
Erez_Berkner:
interested to
Jonathan_Hall:
like database
Erez_Berkner:
hear,
Jonathan_Hall:
as a service
Erez_Berkner:
yeah, it's
Jonathan_Hall:
as
Erez_Berkner:
okay.
Jonathan_Hall:
part of your definition of serverless.
Erez_Berkner:
Sorry, can you repeat
Jonathan_Hall:
So
Erez_Berkner:
that?
Jonathan_Hall:
are you saying that you include things like database as a service as part of the definition of serverless? Okay, very good.
Erez_Berkner:
Absolutely, absolutely. Exactly, you got that just right. I even define containers of the service in the broad serverless spectrum. It's, for me, it's again, it's something that you consume, you don't manage, and it can scale. This is, without getting philosophical, this is what helped me as a developer to just connect, connect, connect, connect, glue this, run this, and I'm done. And now when you think about that, your question about logs, when you have this microservice environment with a lot of managed services that you don't control, you don't own, you cannot code over there, you cannot change the API, it's a closed garden. You cannot do anything on there. You don't... issue logs, right? Because it's not your system. But all of a sudden, sometimes there are dozens or more microservices in those managed services environment. Getting a log all of a sudden feels very limited. Because a log, on the one hand side, is limited to that specific service and that specific time and specific event. So it's not enough. tell me the entire story, what happened across dozens of services in that request. I have to correlate a lot of logs in order to figure this out. It's called distributed tracing. On the other hand, everything is logged today. People talk about logs fatigue because so many things are logged and it's very, very hard to find the right log of what you need in order for you to understand what happened. So that's kind of like why logs are overrated nowadays. Maybe it makes sense like 10, 15 years ago. And the alternatives are what distributed tracing companies, observability companies offer today with tracing, with the ability to focus you on a single request, end-to-end request so a developer can debug without looking at logs, just looking at the actual data that goes from one service to another and getting to a root.
Jonathan_Hall:
Very good.
Will_Button:
Gotcha, so the logs themselves are typically... independent events and then when you're jumping from. from serverless component to serverless component, whether that's a Lambda function or a container or database service, it's really challenging to correlate those string of events together as one sequence of events, and that's where observability steps in to provide you that higher level picture of, here's your data and here's what happened to it at each step of the way. Is that like a good summary?
Erez_Berkner:
That's actually a really good summary. I would say, yeah, and think about system at scale. Even two components. This component, lambda container, let's say has one million requests going in every minute. And the other component also have one million events going as a next step. And now, all of a sudden, you have two million events and logs, no, maybe... 10x of those logs. But the real problem is, like you said, how do I connect just the two logs that I want to dive into? And this is very challenging. So yeah, absolutely, it's about connecting, correlating the entire system with logs.
Will_Button:
just rejects it right.
Jonathan_Hall:
Hehehehe, Grap! Heck yes!
Erez_Berkner:
This is actually what people are doing,
Will_Button:
Yeah.
Erez_Berkner:
right? Because so they have some sort of a unique identifier and they search for that unique identifier across the millions of logs. So that's the other layer of this.
Jonathan_Hall:
And you know how I know that identifier is unique is because I just SSH to the server and put the line in there that says 1, 2, 3, 4, 5.
Will_Button:
Hahaha
Erez_Berkner:
That's the fun thing, you cannot SSH2 as serverless environment, right? Because you have no target to SSH2, to DynamoDB or others. That's another interesting point.
Will_Button:
Yeah, for sure. And when you first start down this path, I think for me anyway, that was, that was very eye opening. because I didn't realize how dependent I was on SSH or having direct access to the machine, not only for like configuration stuff, but just troubleshooting. And so when I started using serverless, I hit all these obstacles because my tool set that I had been using for decades no longer worked.
Erez_Berkner:
Exactly exactly and and and this is a example of a great example of the mind Mindset that I'm talking about in the shift. So I used to as a stage two servers as well when I was developing This is this is you know, this is no longer an option today in a in an environment that is you know Services are spinning up and down all all the time and changing and you don't have a story. So Where would your SSH do? And this is why you need to have different ways to look at those environments, to adapt the tools to those environments.
Will_Button:
So for whenever you're trying to approach the observability aspect of it, what are some of the key questions you should be asking to build out a good observability platform?
Erez_Berkner:
Well, I think, you know, the basics, we need to remember the basics doesn't change, right? So what do we want? We wanna build a solution to a problem that we have, you know, a business problem. And this solution needs to meet some criteria of functionality and the SLAs and whatever the boundaries are. So that's one. And then the second, and this is where observability comes, we need to have the ability to make sure that we're meeting the SLAs. So if something goes wrong, we need to know about this in X minutes after the problem. We need to have backups, of course, like high availability. And we need to be able to get to the root cause. in y minutes or y hours or whatever it is that it is. This is not new, right? This is just the idea of we need to run production, business critical application doesn't change. What does change when we talk about modern cloud observability is how can we get there? And this is where Jonathan talk about the logs. It's not about the logs. The logs can be great, they can help, but. just having a bunch of logs doesn't help in those environments. It's no longer the metrics that we used to have. So when I was working with servers, I always kept an eye on CPU, right? If it's more than 50% CPU, we need another server.
Will_Button:
All right.
Erez_Berkner:
Same goes to memory, I.O. But it's still like... Who cares about the memory of DynamoDB or S3 or Stripe? Or there are different things that we need to be looking at those environments. So things like, no, I'm using Twilio. I want to make sure I'm monitoring that service in terms of latency hiccups. That's becoming much more important. No, I don't care about their CPU. If I'm using Lambda's... I want to make sure that I understand cold starts. And is that a big thing for me? And if I'm using DynamoDB, I want to get alerts on, you know, capacity issues in terms of read, writes, my definition, I can adapt those, but I need to know. That's a bit of the visibility that really changes. Again, the same target, we want to make sure the system is up and running, but we need to watch other things. If we watch the old things. we will just get hit from a different side. So that's one thing about looking at different thing and the other is actually debugging. And this goes to our discussion about tracing, distributed tracing, getting the one, two, three, four, five of Jonathan into the server. But having this in a way that I can, as a developer, understand post-mortem what happened from beginning to end. cross-services.
Will_Button:
Right on. So let's jump onto a different topic here about multiple AWS accounts. This is something that you reference in the article that we've got here as well. And I like this topic because it's something I'm currently dealing with. So just to set the foundation here, the idea between... in AWS of using multiple accounts as you have a separate AWS account for development and for production for each application or each team that you're supporting. And that gives you the ability to give a little. more permissive access to your development team and your development environment, and then no one gets access to production. And then whenever the day comes that you actually do get hacked, you kind of minimize the blast radius because that AWS account has only limited resources in it. What's your experience with how granular to go with that? Is there a? Is there such a thing as having too many AWS accounts?
Jonathan_Hall:
Surely not.
Erez_Berkner:
It's a very good question. Yeah, go ahead, John.
Jonathan_Hall:
I was thinking, surely not. Surely you can never have too many AWS accounts. Who doesn't
Will_Button:
Alright.
Jonathan_Hall:
want more?
Erez_Berkner:
Yeah, so I think on the article, by the way, the article I didn't mention, this article is written by Jan Cooey, he also named Ojan Cray the Burning Monk, and he's a serverless, one of the biggest serverless experts in the world today, happened to be our developer advocate but...
Will_Button:
Yeah, right.
Erez_Berkner:
But he's been this expert much before he started working with us. I just want to say that I want to give him the credit because this is not my work, it's his work and his analysis. And second is that he has a lot of great stuff. If you are going to get into serverless and understand better, I would say number one is to read his blogs. About your question, I think in a serverless environment, having the right separation, having multiple AWS accounts serves multiple things. One is, as you mentioned, security is great, like zoning and security. This is the bread and butter. I spent 14 years at a company called Checkpoint before Lumigo. So creating the right zones. and it's critical for segmentation and separation. And this really grows now also like, you know, to East-West security and zoning within the data center. So this is micro segmentation and all the other things that are really ramping up in the last, you know, five, six years. This is kind of like best practices for security and that to minimize exposure, right? So absolutely, you know, this is one angle of this. but I think the same concept could apply not just for security but also for Different implication of one environment over another so if I'm if I'm let's suppose let's look at Having one AWS account for example for your entire company Let's suppose. I have you know 40 developers one staging environment and one production. If I run everything in the same AWS account beyond the security issues, I run into risk of actually consuming and reaching some of the limits that exists in AWS environment. Because you need to remember this is not your physical server room that you know everything that's going on and you're just limited by CPU and memory. You're limited by a bunch of things. Most of them you don't know about because they're not documented, but you will feel them once you reach that limit. Just to give an example, this is documented, but the number of lambdas in vocation you can have at a single second is limited. If a developer now runs an experiment, a stress test on the environment, all of a sudden your production will suffer. and you will start having issues in throttling your lambdas in production.
Will_Button:
Oh right, yeah, I hadn't thought about that.
Erez_Berkner:
Yeah, those are at the end of the day, although this has separate services, there are shared resources in an account. And you need to be mindful of that. This is one perspective, the resource utilization. And the other one is risk. If a developer deletes his account, that's okay, we can build a new account. If he deletes the production account, that's a bit more of a problem.
Will_Button:
Maybe.
Erez_Berkner:
So this is why at Lumigo, and this is, I think today, almost the best practice if you're really practicing, serverless, at least serverless in AWS is to have a separation of accounts, have an account, best would be account per developer, account per testing, account per staging, account per production, at least one. And then have a very clear separation and role segregation. And the only downside of that that I at least know of is managing all of those accounts. And that's a challenge.
Jonathan_Hall:
But Will likes that part, so there's no problem. ..
Will_Button:
Mm-hmm.
Jonathan_Hall:
..
Will_Button:
Yeah, right.
Erez_Berkner:
Yeah, absolutely.
Jonathan_Hall:
That's what everybody complains about with AWS accounts, isn't it?
Erez_Berkner:
Yeah, for sure. I have so many accounts. This is where you really need to have good people that are managing your AWS environment, know what they're doing, using the right tools because AWS has some tools to manage organizations and other accounts. But you need to be mindful of that because that becomes a challenge as you grow over, let's say, I think 80, 90 accounts in your environment.
Will_Button:
For sure, yeah. And for anyone who's not familiar with doing multiple AWS accounts, there's an AWS tool called Control Tower that lets you, so you have one main AWS account and then you can create. other AWS accounts that roll up into that. So then you can have a single account where you, all your billing rolls up into that, but then also your user management. So you can have users and groups and then assign different permissions to those individual AWS accounts based on the needs of those users. So it's not literally you going to the AWS console and creating, you know, 75 individual AWS accounts. Although I'm sure it probably started that way.
Erez_Berkner:
After that, that was it.
Jonathan_Hall:
I'm curious to talk about a tangential topic, and it's just a name serverless. I mean, I think we all know it's a marketing buzzword. It doesn't actually mean there aren't servers involved. What name would you choose if you were going to rename this today? What would you call this thing?
Erez_Berkner:
That's a very interesting question.
Jonathan_Hall:
Or maybe you like the name and you wouldn't change it. I don't know.
Erez_Berkner:
No, you know, I don't really like the name because you're absolutely right. One, there are servers behind the scenes, always. Number two, it's kind of like, you know, it's very hard to define something by saying this is not X.
Jonathan_Hall:
No sequel
Erez_Berkner:
No.
Jonathan_Hall:
all over again, right?
Will_Button:
Hehehehe
Erez_Berkner:
It's like introducing me, I'm not Will, right? It's not,
Will_Button:
Yeah
Erez_Berkner:
I'm not gonna, I'm Ares, right? So, so... I think,
Jonathan_Hall:
Although you
Erez_Berkner:
you
Jonathan_Hall:
have
Erez_Berkner:
know...
Jonathan_Hall:
to say, you have to admit it worked for wireless. You know, when radios were invented
Will_Button:
Yeah
Jonathan_Hall:
and wireless internet came out, those seemed to work pretty well.
Erez_Berkner:
Right, right, right, right, right,
Jonathan_Hall:
Anyway,
Erez_Berkner:
right,
Jonathan_Hall:
I'm
Erez_Berkner:
which
Jonathan_Hall:
sorry,
Erez_Berkner:
is nice.
Jonathan_Hall:
I'm just, I'm taking this off track.
Erez_Berkner:
No, no, no, that's actually interesting. And no, but I think on naming of servers, I think, and I'm not a marketing genius, but I would actually, this is why I'm calling this sometimes, you know, managed services, because I think it's not the fact that you are not running your servers, it's the fact that you are not managing or maintaining your servers, it's somebody else's work. So I don't know if that's a... I don't know if that's the name I would choose as a marketeer of AWS, but the concept of having a managed service fits more than no server.
Jonathan_Hall:
Yeah. I don't know what name I would choose either. You use a broader definition than I think a lot of people do. Because I usually think of serverless as functions as a service. But if you want to include database as a service, then function as a service is far too narrow. I do like the function as a service better than serverless. But even that isn't quite perfectly right. It's like I'm not paying for functions. I'm paying for execution threads or something. So maybe it's threads as a service or something like that. I don't know.
Erez_Berkner:
Yeah, or execution of the service
Jonathan_Hall:
Yeah.
Erez_Berkner:
or, you know, yeah.
Jonathan_Hall:
I don't think we're going to change the name of this industry anytime soon, so this is mostly just a beer time sort of conversation. It's not really going to be useful.
Erez_Berkner:
So, you know, I think that actually, you know, one is AWS is really trying to change just what you just said, like about the fact that many people identify serverless with functional service.
Jonathan_Hall:
Mm-hmm.
Erez_Berkner:
Serverless is Lambda.
Jonathan_Hall:
Right.
Erez_Berkner:
And AWS is really trying to fight that and spending a lot of, you know, a lot of content and thought on. Trying to educate and saying hey, this is more and you know we can define more things you know a dynamo DB serverless and Aurora is a serverless and others and event bridge and SNS and that's good, so This is one thing that others is trying to do. I think most people still think and Still know that serverless is is lambdas and I think part of that is why they started talking about event driven architectures because Instead of trying to change people thinking about a specific term, let's define a new term EDA. And now EDA is event driven and yeah, DynamoDB streams is event driven and Kinesis streams is event driven and SNS and SQL is event driven. Serverless or not, that's a different discussion. So event driven, let's talk about event driven and now I will do troubleshooting and debugging and digital tracing for event driven architecture. So I think maybe that's an interesting direction that AWS is taking.
Jonathan_Hall:
Cool.
Will_Button:
So you mentioned in here, one of the things that I've seen very, very few people implement, although I'm a huge fan of this approach when it comes to loading secrets. For decades, we've loaded secrets as environment variables, but with things like Parameter Store and AWS or a secrets manager, they all have APIs. And so... This is one of the few approaches I've seen. I've used it myself a couple of times, and I really, really like the approach. So whenever your application cold starts, it actually goes out to the secret vault or store or whatever you have and pulls in the secrets that it's authorized to read and loads those within the application runtime itself. So those are never exposed as environment variables. What shift did you all to using this approach?
Erez_Berkner:
Sorry, what was the question?
Will_Button:
What shifted y'all to using that approach versus the age old approach?
Erez_Berkner:
Yeah, so, you know, I think this comes with the fact that this is also a mindset thing because you need to remember that, you know, an environment variable in AWS is open and, you know, anyone that can log in to the console and the Lambda can, or from the command line as well, can see the environment variable. So once you understand that... It's very clear that you cannot use that for secrets. And then the right thing is to find other mechanism. And this is where, having a parameter store or a shared place for secrets, I think really, or a vault, really makes sense, even regardless of AWS, it is just like a best practice. And then... And I think the only change is in service environment, especially Lambda and environment driver, it's a must. It's no longer just like a good design. It's something you have to do in order to keep your secrets.
Will_Button:
For sure. One other thing that I noticed that you put in there that I hadn't seen done before is when you load those secrets in, you also have an expiry timer on there. So periodically you'll go refresh those credentials from within the app, which makes it easier to rotate those credentials, I'm assuming.
Erez_Berkner:
Exactly, so if you have enough grace period assuming you have then you can without any need to automatically You know invalidate Manually you just you know, you just plug a new Secret and you know, you know that everything you know in in in x minutes will will be updated with a new one and that's That's opposed to just like okay now we need to take the entire system down in order to to rotate those
Will_Button:
Yeah, for sure. Do you have a default lifetime for your credentials, like maximum of 30 days or 30 hours or something like that that you gravitate towards?
Erez_Berkner:
I must say that this is no longer something that I know. So I don't wanna just say, I can tell you what I think reasonable, but I don't wanna just say something that, my CTO would come up to me and say, what was that rubbish you talked with
Will_Button:
Hahaha!
Erez_Berkner:
Willard? It's not five minutes, it's 30 days.
Will_Button:
Fair enough. Fair enough. So when it comes to applying least privilege to different resources, that's something I've always struggled with. I think that's one of the big deficiencies of AWS is, okay, I wanna apply the least amount of privileges to read records from a DynamoDB, for example. I think there's a definite need for more documentation from AWS to say, hey, here are the IAM permissions that you need, and here's what that permission will give you. Because so many of them I've looked at them, and it's like, um. permission X and you look at the definition for it and the definition of it is it grants you the ability to do X. I'm like well I kind of figured that part out on my own I'm kind of wondering what the heck X is.
Jonathan_Hall:
And then,
Erez_Berkner:
Yeah.
Jonathan_Hall:
I mean, if you're working with somebody who's managing these permissions and trying to be conscientious about it, you end up with a problem like I had at my last contract, where every day or week you're asking for more permissions because some weird thing in the UI says, you don't have permissions for this thing. And you may not even know if that's the thing you want to be doing.
Will_Button:
Right? Yeah.
Jonathan_Hall:
Like I'm trying to debug why this service won't restart. And I'm looking through this and oh, here's an error. I need this. So would you give me that? And... Oh yeah, okay, so then 20 minutes later, I have the permission, oh, that just gives me a link to the Docker image. I don't need that. I need whatever. So it's just, as both a user and as an administrator, it can be very frustrating.
Erez_Berkner:
Yeah, yeah, yeah, for sure. That's, you know, you also don't, in those cases don't know really to explain why you need that, right? Because yeah, the computer told me so kind of
Will_Button:
Right?
Erez_Berkner:
thing.
Will_Button:
Here's the permission denied message. Clearly, I need this permission.
Erez_Berkner:
Yeah, so I think that this is always a challenge and it works in principle, right? You know, least privileged principle really makes sense when you work and plan your system on the whiteboard. but I think you probably know better than I do that this really gets messy in real life. And getting all of the permission, and there are such a granular permission today. So if you go too granular, you become like the annoying guy that keep opening tickets
Will_Button:
Yeah.
Erez_Berkner:
for everything. And if you go too wide, then you're not secured because everything is wide open. So I think it's really, it really is a challenge. And, you know, this is where theory meets practice and it's hard. There are some very interesting startups today that trying to address, you know, permission, you know, on demand automated permission without the need to, you know, to open a request and approve it one by one by a manual human being. But, but this is still. I think an unresolved territory, unresolved
Jonathan_Hall:
Yeah,
Erez_Berkner:
problem.
Jonathan_Hall:
I agree.
Will_Button:
Yeah, so meanwhile, we're back to trial and error of adding permissions one at a time, waiting for it to blow up. Go back.
Erez_Berkner:
Yeah.
Jonathan_Hall:
Or you could just hire an AWS
Erez_Berkner:
Yeah.
Jonathan_Hall:
consultant who knows all about this because they've already done the trial and error.
Will_Button:
Yeah.
Erez_Berkner:
If we just knew someone.
Will_Button:
Yeah, right. So one of the big challenges with. specifically with like lambda functions is the concept of cold starts. What advice have you learned over the past few years in making that approachable?
Erez_Berkner:
Yeah, so cold sort, just to share for those who doesn't know what cold sort is, this is what happens when a Lambda wasn't invoked for some time. So when Lambda is not invoked for some time, or even when it is invoked, every once in a while AWS decided to refresh the container underneath and basically kill the container and spawn a new one. And that means that it will take time for the next invocation to load everything to memory and get through all the constructors and all the basic things that initialization. So this will manifest like some single invocation is going to hiccup. If you're usually doing like 100 milliseconds, all of a sudden for one invocation, that's your average, you will have a hiccup, let's say, every hour of one second. and it can be three seconds, it can be half a second, it can be more common or not, and this is especially problematic for systems that have spikes, because if you are just like on a steady rate and then all of a sudden I need on this specific second 1,000 more executions in parallel, AWS need to spawn 1,000 more containers to serve those lambdas that are now happening. and they weren't like hot containers. There were nothing over there. And then if you're spiking like this and then you're going to sleep and after an hour you spike again, this can be a real, real problem because it creates big latencies every hour. So that's kind of like the problem of cold start and AWS is constantly working to improve and improve and improve that, but it's still a thing. And I think what I learned is that the first and foremost is it goes back to visibility is to understand do you have a latency problem? And if you have a latency problem, is that because of cold starts? And there are questions that are not trivial to answer in a multi, like in a distributed environment. But there are ways and we have some articles, Jan wrote some articles about how can you, build metrics to identify latencies and call starts. And we bake those in as an example into Lumigo platform because this is a recurring challenge for the customers. So instead of, you building all of this and analyzing this, it's kind of one of the things that you get out of the box. And this has been very, very useful. The reason I'm showing this is because we're seeing people that are actually taking a lot of action based on identifying call starter, identifying latencies. So my experience would be number one, make sure that you have this in place, either with a tool or in CloudWatch, you can build that metric in CloudWatch through the log. And the second is, once you have it in place, if this is a real problem for you, there are many ways that you can improve the situation of cold starts. Just to give two examples, but again there are a lot of blogs and articles around that. One would be to reduce the amount of initialization that happens when the container spawns up. So save it to a later stage and then... and then do it incrementally, not at once. And second would be keeping the container warm. So if you know that you're gonna have a spike, if you can, you know, one minute before that start raising the lambdas, that would be great because then you won't hit that on the critical time.
Will_Button:
Right on. One thing I wanted to ask that... that I've noticed here. So whenever we're building these things out and using a lot of managed services, like AWS has gone out of their way to make it really easy to use the console to point and click your way to success. But just on the whole, I really try to avoid that and use infrastructure tools, whether that's Terraform or CloudFormation or something like that, just so that there's like formal written documentation of how the infrastructure is built and the changes that happen to it over time. And I've been wondering lately, am I just like fighting an uphill battle on this or should I just give in and embrace the console? What are your thoughts on that?
Erez_Berkner:
Winners don't use drugs, developers don't use console.
Will_Button:
Yeah
Erez_Berkner:
I'm kidding. I'm kidding. But honestly, I love the console. I must. I don't. Maybe that's over exaggerating. Nobody loves the console, but I use the console. But when I see the developer team, I don't think many of those are using the console today. Because you know, you want to get things via code, you want to get things via API, you want to get scripts running. You don't want to do things manual. I'm not developing today, so it's easier for me. I just want to check something so I go in and just view
Will_Button:
Mm-hmm.
Erez_Berkner:
I think for viewing that great, but if you're really developing You'll find Number one the console is much more limited as every as always like the as a Support for an API is there, but it always takes time until it gets to the UI if it gets to the UI and then You really don't want to do changes that you cannot repeat that you cannot run a VScript. And so this is another thing why I think console is more popular when you talk about building significant systems or serious systems, not sorry, sorry, the command line or APIs. And that would also be my recommendation. Like if sometimes it's easier, you just wanna do a quick something like, Playground or hobby or just want to view something that's easy But I would recommend on the day-to-day to to use a you know API cm command line or scripts it it it saves a lot of time and It also saves a lot of mistakes that are just done Since somebody go into the console and clicked on this and it was unsupervised
Will_Button:
Yeah, for sure. Because I'm sure you can get the... I'm sure somewhere in AWS there's a log that would tell you that would happen, but I'm not sure where you would go to find it and how long it would take you to get to that point.
Erez_Berkner:
Yeah, you can go to CloudTrail to see some of the audits. It's not everything, but I think it's decent. But at the same time, this is good for really, I think mostly for auditing and trying to understand. But on your day-to-day, when you go to an environment, you wanna be able to see in your logs and in your system what happened. not through just like, you know, an auditing tool of AWS. I think it's much harder.
Will_Button:
Yeah, yeah versus a pull request.
Erez_Berkner:
for example.
Will_Button:
Cool. What's your tool of choice for doing the infrastructure management?
Erez_Berkner:
So we actually have a, we're having like a homegrown tools over here. So this best fits our needs today. And I think this really develops in the last four or five years. I'm not sure and I'm not really monitoring that space, but might just be that today the right thing is to take like one of the cool startups that monitor, managing the infrastructure. instead of building your own thing but for us it's already working and it's pretty well so we're good with that.
Will_Button:
Yeah, for sure. It's hard to... Like once you have all the bugs ironed out, it's hard to justify switching over to something else.
Erez_Berkner:
right.
Will_Button:
Chant more about you, what's your favorite tool.
Jonathan_Hall:
You know, I don't really have one. I've used a few different ones, but none of them extensively enough to say that I have a favorite.
Will_Button:
Gotcha.
Erez_Berkner:
Whatever you will. You have anything that is your favorite?
Will_Button:
I'm a big fan of Terraform
Jonathan_Hall:
What
Will_Button:
because
Jonathan_Hall:
about
Will_Button:
they,
Jonathan_Hall:
Pulumi? Weren't you talking about that for a while or does
Will_Button:
well,
Jonathan_Hall:
that not
Will_Button:
yeah,
Jonathan_Hall:
help
Will_Button:
I
Jonathan_Hall:
here
Will_Button:
was
Jonathan_Hall:
too?
Will_Button:
going to
Jonathan_Hall:
Okay.
Will_Button:
follow up with Pulumi.
Jonathan_Hall:
Alright.
Will_Button:
Terraform does a great job of keeping up with the pace of AWS and GCP. And so anything that you want to do, Terraform generally has support for it. And then Pulumi is right up there with them. I think the difference between the two is Terraform is very declarative, which I think lends itself to. people who have a sysadmin background, whereas Pulumi is very programmatic. So if you have developers that you're trying to get interested in managing their infrastructure as code with Pulumi, you pull in SDKs and libraries and it's a familiar interface for them because they're actually writing code to build their infrastructure. And so I think you get that buy-in and adoption developer team a lot quicker with Pulumi instead of the declarative style approach of Terraform, but both of those I'm a pretty big fan of.
Erez_Berkner:
Yeah, and in serverless environments, by the way, there is also like a framework called serverless framework, which is doing pretty decent work when it comes to managing assets in a serverless environment.
Will_Button:
Yeah, they keep popping up and it looks like they're making tons of progress as well to make it easy to build, deploy, and manage serverless applications.
Erez_Berkner:
Yeah.
Will_Button:
And their documentation, they've got a lot of great documentation too. So I think that's a really strong approach there is if you build great documentation and tutorials to educate your potential users, then they're just going to be, they're going to be big fans for however long they can be. Right on. Anything else we should talk about?
Erez_Berkner:
No, I think we covered the entire article.
Will_Button:
Jonathan, got anything else?
Jonathan_Hall:
No?
Will_Button:
How's
Jonathan_Hall:
I
Will_Button:
the
Jonathan_Hall:
don't have
Will_Button:
uh...
Jonathan_Hall:
enough wisdom about serverless to share anything, sorry.
Will_Button:
How's the sleep schedule with the new baby? Shhh.
Jonathan_Hall:
Not great. I spent four hours in bed this afternoon. My son and my two-year-old and I have been sick. I don't
Will_Button:
Oh no.
Jonathan_Hall:
know if you've been noticing but my eye is twitching a little bit. Still can't quite breathe very well.
Will_Button:
Hahahaha
Jonathan_Hall:
It's not the drugs, I swear.
Erez_Berkner:
Hehehehe
Will_Button:
Right? It was a podcast so I wasn't going to say anything to draw attention
Jonathan_Hall:
Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha
Will_Button:
to it.
Erez_Berkner:
Hahaha!
Jonathan_Hall:
You didn't say anything about developers can't do drugs, right?
Erez_Berkner:
I
Will_Button:
Right? Yeah. They said winners don't do drugs. Didn't they? Ha
Jonathan_Hall:
Right.
Will_Button:
ha ha ha.
Erez_Berkner:
Only winners, only winners.
Will_Button:
Cool, well let's move on to some pics. Jonathan, got any pics for us this week?
Jonathan_Hall:
I have a couple.
Will_Button:
Alright.
Jonathan_Hall:
The first is a book. I can't remember if I picked it or not, because I finished it a few weeks ago, but it's good enough to pick twice if I did already. It's not really a new book, but I read it for the first time. And it came out, so I wrote at the end of the year, I wrote my top five reads for the year, as I did last year also, and this was on that list. So it's Good Strategy, Bad Strategy by Richard Rumelt. I don't know if I pronounced the last name correctly. Sorry, Richard, if I didn't. Yeah, it's, I mean, I don't know, I felt it was gonna be kind of dry for some reason. I don't know why I thought that. I mean, people recommended it. This is a great book. I thought, yeah, Strategy. It sounds academic and blah, blah, blah. But actually it's a really, really good book, entertaining book. And it really kind of helps you understand why you hear so much corporate BS claiming to be Strategy that isn't Strategy at all. So if you're tired of, if the reason you don't want to read a book about strategy is because you're tired of hearing your boss talk about strategy and you know that he's full of crap, you will love this book.
Will_Button:
Hahaha!
Jonathan_Hall:
It will help you, it will first explain why that's bad strategy and actually why it's not even strategy at all. And then help you learn to identify, so you can actually call BS when that sort of stuff comes up. And then at the end, he helps you. come up with some ways to identify good strategies. So if you are in a position or want to be in your career where you're helping to set strategy, whether it's technical strategy or business strategy or whatever, this book can really help you with that. So anybody who wants to go into management, CTO work, anything like that at all, this is, I would call it a must read for anyone trying to go into that sort of career line. And it's a good read for anybody at all, regardless. So
Will_Button:
What
Jonathan_Hall:
good
Will_Button:
was
Jonathan_Hall:
strategy,
Will_Button:
the title
Jonathan_Hall:
bad
Will_Button:
of
Jonathan_Hall:
strategy.
Will_Button:
that again?
Jonathan_Hall:
Yeah, sorry?
Will_Button:
I was just asking what the title was again.
Jonathan_Hall:
Yeah, Good Strategy slash Bad Strategy, The Difference and Why It Matters by Richard
Will_Button:
Right
Jonathan_Hall:
Ramelt.
Will_Button:
on.
Jonathan_Hall:
So that's my first pick. My second pick is going to be categorized under shameless self-plugs, if I could actually speak today. And that is my brand new podcast that's coming out.
Will_Button:
Ooh.
Jonathan_Hall:
As we record this, it's coming out next week. It was going to come out today, but since I was napping all day, it didn't happen. So
Will_Button:
Yeah
Jonathan_Hall:
the first episode should come out next week. By the time you hear this episode published, it will have already been out for a week or two. But the podcast is called Cup-O-Go, as in like a cup of Joe, but it's Go instead of Joe. So
Will_Button:
Oh, clever.
Jonathan_Hall:
yeah, clever, right?
Erez_Berkner:
Mm-hmm.
Jonathan_Hall:
Yeah.
Erez_Berkner:
All right.
Jonathan_Hall:
And now you have to watch it or listen to it just because it's such a clever name. The premise, my promise to the listener is to help you keep in touch with the Go community in 15 minutes a week. So it's gonna be Go news mostly, in the first 15 minutes of each episode roughly, will be news related topics for the last week in the Go community. What versions were released, what libraries had security patches, et cetera. That sort of stuff, you know, things that are interesting. What decisions are they gonna make this week that you might care about if you're interested in that. So if you don't wanna spend all the time it takes to read all the Go announcements and blah, blah, blah, blah, and follow all the Go Slack channels. You just want the highlights in 15 minutes a week. This is the podcast for you. And then after that, we'll do some longer form, more casual discussion in chat. So if you want the quick sound bites, listen to the first 15 minutes and then you can go off and do your own thing. If you want to hear the whole thing, probably be a 30, 45 minute episode. So
Will_Button:
right
Jonathan_Hall:
check
Will_Button:
on.
Jonathan_Hall:
it out. Cupago.dev if you want to go subscribe. It's already on Apple podcasts and everywhere else. And by the time you're listening to this, you should have two or three episodes there you could go listen to.
Will_Button:
Nice.
Erez_Berkner:
Good luck with that.
Jonathan_Hall:
Thank you.
Will_Button:
All right, what do you got for picks?
Erez_Berkner:
Ehhmmm... Yeah, so I would recommend... I think that the... that comes to mind and it can be in any category, right?
Will_Button:
any category at all. Anything goes.
Erez_Berkner:
Yeah. Yeah. So I happened to watch a very old movie, relatively old movie, recently, that I consider one of the best movies ever created. You guys want to guess?
Jonathan_Hall:
Is it the Christmas movie with William Shatner?
Erez_Berkner:
How did you know?
Will_Button:
Ha ha ha!
Erez_Berkner:
Will, what is your guess?
Will_Button:
Um. Oh, I'm gonna go with the other top Christmas movie, Die Hard.
Erez_Berkner:
Oh, nice, nice.
Will_Button:
Hahaha
Erez_Berkner:
It's not bad, but I like it. It's not as, it's older. So I actually saw a Godfather.
Will_Button:
Oh yeah.
Erez_Berkner:
which I really like, you know, it's so, everything is so slower than what people do today, but much more profound and such. Oh yeah, amazing. Yeah, Jonathan just showed us like the book, The
Jonathan_Hall:
on my shelf.
Erez_Berkner:
Girlfellas on the Shelf, yeah. So yeah, so if anyone didn't watch it yet, I highly recommend it, you know, Marlon Brandon, you know, doing an amazing acting over there and. I really like the subtleties, like you really feel that they spend the time on every minute in that long movie and they really think about the nuances to build the right story, the right feeling. So I really, you know, I really love that movie.
Will_Button:
Excellent.
Jonathan_Hall:
it is a good
Will_Button:
Yeah, good call. All right, for my pick this week, I am picking for Christmas, I got a TheraGun massager. that's just been super cool. And this one's actually pretty small. I'm holding it up to the camera here, but since this is a podcast, you don't know that. But it's just a little bit larger than my hand, but the battery life's on it really good. And it's great for just like digging into those sore muscles. And especially if you've been sitting at the keyboard all day and your back's all tight and sore, it just works everything out. And... Yeah, I've seen them around for years. My wife bought one for Christmas and now I'm like, wow, so that's why everyone likes those.
Jonathan_Hall:
Cool.
Will_Button:
So that's my pick. And Eris, if people want to interact with you, hang out with you on social media, anything like that, where's the best place for them to head?
Erez_Berkner:
Yeah, I think Twitter, direct message on Twitter or LinkedIn or even through, you know, if they're using Lumigo is like free and easy to sign up. So we also have like a chat through the platform that I many times answer on my own. So we'll try to. So actually in every channel that we have is from Twitter to LinkedIn to intercom platform.
Will_Button:
All right, cool. Well, thank you so much for joining us today. This has been a great conversation and look forward to seeing you on here again.
Erez_Berkner:
Thank you very much for having me. It was a great fun.
Will_Button:
Right on. Cool. See you, everyone.
Jonathan_Hall:
Cheers.
Erez_Berkner:
Thank you. Bye bye.
Serverless in Production With Erez Berkner - DevOps 151
0:00
Playback Speed: