Where Do Bugs Come From? - DevOps 136

In this episode, Jonathan and Will don their exterminator suits and talk about zombies, podcast automation, and where bugs come from.

Hosted by:

Jonathan Hall •

Will Button

RSS Spotify Apple Podcasts YouTube Amazon Music

Show Notes

In this episode, Jonathan and Will don their exterminator suits and talk about zombies, podcast automation, and where bugs come from.
- How design and architecture choices can lead to bugs
- Siloed teams solving similar problems with different implementations
- Context of code in peer-review
- Types of testing that can reduce and catch bugs
- Deadlines that force shipping code despite readiness
- Leaving breadcrumbs for future-you

Picks

Jonathan- Transistor.fm
Will- DevOps For Developers
Will- Account Factory for Terraform

Transcript

Will_Button:

Welcome everyone to another exciting episode of Adventures in DevOps. And I don't know why I clapped when I said that. Maybe I'm just that excited today. But hey, joining me in the studio, my co-host, Jonathan Hall.

Jonathan_Hall:

Hey everyone, how's it going?

Will_Button:

It's going well for me.

Jonathan_Hall:

Awesome.

Will_Button:

I think I feel like I'm in a brain fog. By the way,

Jonathan_Hall:

Oh.

Will_Button:

I'm Will Button. I'm the other co-host for the day. I had to Google my name to see what I was supposed to say for that part of the episode. That's how my day is going.

Jonathan_Hall:

Did you find yourself or did you find an imposter?

Will_Button:

You know, it's really hard for me to find, like when you put my name into Google, because my first name is a verb, and my second, my last name is a noun. I don't really turn up that often.

Jonathan_Hall:

Oh, okay.

Will_Button:

Which I don't think is a bad thing. If you're Googling me, yeah, I don't want to know why. If you're trying to get in touch with me and we don't already have a relationship, I probably don't want to talk to you. Oh! Ha ha ha!

Jonathan_Hall:

Well, if I Google myself in incognito mode, I see a whole bunch of people I don't recognize. And then the first text hit is me, my LinkedIn profile. The second one is some guy called the British Chef. There's someone on IMDB. There's a whole bunch of Jonathan Hauls out there. So anyway,

Will_Button:

Wow.

Jonathan_Hall:

at least I'm in the top three. Ha ha ha.

Will_Button:

Yeah, and it sounds like they all have pretty cool careers too. I mean if they're in IMDB, if they're world

Jonathan_Hall:

Will_Button:

famous chefs.

Jonathan_Hall:

yeah, there's me, there's this chef, then there's me on my personal website, and then there's someone in IMDB who's known for the walking dead, so that's cool.

Will_Button:

Oh wow.

Jonathan_Hall:

Then the next one is a full professor of pharmaceutical chemistry in Zurich. So yeah, sounds like smart people and cool people. I guess I have a good namesake.

Will_Button:

Yeah, you guys should all hook up and then trade jobs for a day.

Jonathan_Hall:

Hmm. I don't think I want to do the professor of pharmaceutical chemistry, but I might be able to act in the walking dead. I think I could act like a zombie easily enough.

Will_Button:

Yeah, right. Just find out what his role is in there.

Jonathan_Hall:

Will_Button:

So,

Jonathan_Hall:

I want to tell you a story.

Will_Button:

okay, tell me a story.

Jonathan_Hall:

I'm working on some podcast automation.

Will_Button:

Jonathan_Hall:

So I use transistor.fm for my podcast. And I want to automate some of the episode publishing and stuff like that. And they have a cool API that lets you do most of that stuff. So first thing I did a couple days ago when I decided to start this project was look for an SDK for this API. And there's a couple for Go, but they're really poorly written. or an unsupported, you know, they're unofficial, of course. So I thought, well, it's not a big API. There's about a dozen or maybe two dozen at most endpoints. So it's not a big thing to do. So I started writing my own sort of SDK around their API. And today I discovered when you query for your episode statistics, they report to you a list of daily statistics by default for two weeks, I think, and how many downloads you had per day. The funny thing is the date format is, I have to remember here, month, month, day, day, year, year, year, right?

Will_Button:

Okay.

Jonathan_Hall:

Not a great date formatting. I much prefer

Will_Button:

Yeah.

Jonathan_Hall:

some sort of standard, but whatever. As long as it's consistent, I don't care.

Will_Button:

Right.

Jonathan_Hall:

But it's not consistent because when you get to the end of the request and it has the start date and end date, then it's day, day, month, month, year, year, year. So.

Will_Button:

Hahaha!

Jonathan_Hall:

In the middle of the body, the day comes first, or the month comes first, and the end of the body, and then the end of the request, the day comes first. Now that's kind of annoying, but whatever, I hacked around it. And then I moved on, and I got that endpoint working. Then I moved on to the next endpoint, which is to query not just your overall show statistics, but a breakdown for every episode of your show. So for each episode, you get an array of dates and download stats, right?

Will_Button:

Yeah.

Jonathan_Hall:

Well, on this one, Everything comes in day, day, month, month, year, year, year format. So I would call this a bug.

Will_Button:

That seems like a reasonable term to apply to it, yeah.

Jonathan_Hall:

So this makes me wonder, where does a bug like this come from? How do people write code? I mean, I understand not choosing a great date format for whatever reason, maybe somebody just didn't think about it, but how do you get two different date formats in the same response? And then how do you get two responses that return ostensibly the same format, but at a different level of detail, and then different formats of dates there? This is weird. So... Let's make this a broader question. I don't think we'll ever figure out where this bug came from. But where do bugs come from? How do things like this happen? Where do bugs come from? What do you think?

Will_Button:

Well, generally for me, it gets assigned to me as a ticket. You know, things have been going well. DevOps team's not getting a lot of love or respect. So let's go plug some bugs into the system so that we can look like heroes, because otherwise, the code that I write is generally bug free.

Jonathan_Hall:

So bugs come from Jira in your case.

Will_Button:

Right? There's a little delay on my joke there.

Jonathan_Hall:

Got a live studio audience here, but they're kind of sleeping, so...

Will_Button:

Right? Yeah.

Jonathan_Hall:

Will_Button:

No, so that's a good question, just using the scenario you brought there. Because really, that seems like it was built in a very task-oriented fashion. And whenever someone, assuming that someone peer reviewed that, they looked at that without context of how things are being handled in the other parts of the application. mentioned in the past specific to logging, one of your pet peeves is implementing logging at every level instead of having a main logging function for your application and just letting it handle the logging for you automatically.

Jonathan_Hall:

Yeah, yeah, for sure.

Will_Button:

But even with this, assuming that these dates have got to be coming from a database, and databases are kind of picky about their date format. So it almost seems like someone had to look at that and say, no, no, no, no, no. That's ridiculous. We're not using this ISO 1986 format or whatever it is.

Jonathan_Hall:

Right? Yeah, so clearly somewhere along the line, someone made a choice to use an arbitrary date format. And this API ostensibly uses JSON API, but it doesn't follow the spec very closely. I don't know if there's a date format specified in JSON API. But these are dates. Of course, you can do timestamps. Those are defined in JSON API. And they're standard to JSON, I guess, too. But this is just the date. So there's no time value, right? So that

Will_Button:

Yeah.

Jonathan_Hall:

maybe makes sense to use their own. time. But somebody made the choice twice, or at least twice, probably more than twice, to use a particular format and they didn't make the same choice every time. I say they, I don't mean a specific person, but the people or the team involved in doing this.

Will_Button:

Yeah, so what's your theory on that? Like, how do we get to that path?

Jonathan_Hall:

In this case, I don't know. I mean, I think it's a safe guess that the parts of this response were built by different people. Someone built a container that included the start and end date, or maybe they added pagination at a later time and added those fields at that point or something like that. Why the exact same data format? So the per date format is the same across both requests, but the date format is different. You get an array of objects that has a date and a value. It says date and then September 12 or whatever. And then you have downloads and a number, an integer. So that particular object is duplicated across both requests. But it's clearly not the same implementation on the back end. Even though the API documentation says it's the same, the back end is completely different. So that tells me that at least three times this decision was made.

Will_Button:

Yeah.

Jonathan_Hall:

At least once for each implementation of this object. And then at least another time for the start and end date fields on one or more of the requests or responses. So yeah, whoever's doing this, there's a communication breakdown for sure.

Will_Button:

Yeah, I think that's one of the things I've noticed in my own experience doing reviews for pull requests. You know, whenever you look at it, if you're using GitHub, when you look at it in GitHub, it shows you the snippet that's changed. But there's so many times where there's context above or below that code that wasn't changed that's relevant to answering the question, is this a good chunk of code to approve or So I think that's probably one answer to your question, where do bugs come from, is lack of context in peer reviewing changes.

Jonathan_Hall:

And that's of course assuming that peer review happens at all. I don't

Will_Button:

Yeah.

Jonathan_Hall:

know what kind of development workflow they use at transistor.fm. Yeah? Oh, there we

Will_Button:

I think I know the guy who built that. I might

Jonathan_Hall:

go.

Will_Button:

be able to put you in touch with him. Yeah.

Jonathan_Hall:

I did report this bug to their support. So hopefully it'll be fixed.

Will_Button:

Yeah.

Jonathan_Hall:

But if not, then yeah, I might hit you up on that.

Will_Button:

Hahaha!

Jonathan_Hall:

So maybe we can zoom out a little bit. What other kinds of bugs do you see frequently? And maybe to the extent that you know or can guess, where do they come from? Because I think this is a, I don't mean to turn it into like a checklist, if you know, here's where these bugs came from. But just generally speaking, where do bugs come from? And if we understand this, or maybe the better we understand it, the better we can address some of these causes of bugs.

Will_Button:

Yeah, I think it might be helpful to. to understand or to like break it down into buckets because I don't think there will be a definitive list that applies to everything but I can think of there's regression bugs like something that used to work and gets broken and then there are bugs of implementing a new feature that has some bug in the feature. I think one part of that answer is good testing, which good testing is such an overloaded term. OK, I have unit tests, and then I have user acceptance tests, and I have synthetic tests. And at what point do you say, OK, enough testing. We just need to roll with it. Like personally, whenever I write code, I find myself writing. at least three tests for everything, every thing that I'm going to test. There's the test case to make sure that it does what it's supposed to do when supplied with the right data. There's a test to make sure that it throws the right error if it's supplied with incorrect data. And then there's a test to make sure that it does something, whether that's throw an error or log or whatever, when it has incomplete or missing data or an exception.

Jonathan_Hall:

Mm-hmm. Mm-hmm.

Will_Button:

But I still don't think that would catch this specific bug that you're talking about because we could write tests for all of those.

Jonathan_Hall:

Yeah.

Will_Button:

and it would still work.

Jonathan_Hall:

Someone kind of has to notice this one, right? It's a mismatch. It's not that it's functionally broken per se. It's that it doesn't make sense, right? I think it kind of requires some human understanding to catch this one.

Will_Button:

Yeah, I think there needs to be, maybe that's another part of the answer is you need like an agreed upon. set of rules of how we do business. You see this a lot in, I'm drawing a complete blank on the name, but there's like for JavaScript, there's the style guides. That's the word I'm looking for. You know, so you can have style guides that might catch part of this, but only if the data is correctly typed so that it knew what type of data it was looking at.

Jonathan_Hall:

Right.

Will_Button:

But then there's like a higher level style guide of how we architect things, like this is our date format. These are how we build our, these are the naming conventions we use for API endpoints. These are how we name our services, different things like that. So it's almost like a, I don't know that style guide would be the right term for that, but like an architecture guide maybe.

Jonathan_Hall:

So I think, yeah, I think architecture is along the right path. And I think that actually probably gets to kind of the core of the problem in this case, which is that the architecture was not defined well enough that it was easy for this to be done correctly. You know, this is the kind of problem, like the fact for, you know, the simple example would be that the fact that this statistics object that contains the date and the downloads number looks the same on both endpoints but has two different implementations would be an architectural smell, I think.

Will_Button:

Yeah.

Jonathan_Hall:

So there's some code duplication happening there and that introduced a bug. That's one of the main reasons to dry your code, right? So that didn't happen here.

Will_Button:

Yeah, and I think that's another, when you talk about microservices, that's another pitfall there that doesn't get discussed very often. This could very well be two separate microservices handling each of these endpoints. And so

Jonathan_Hall:

Could be.

Will_Button:

not only is the code duplicated, but it has different functionality.

Jonathan_Hall:

Yeah. Yeah, for all we know, they're written in different languages, too, right?

Will_Button:

Right? Yeah.

Jonathan_Hall:

One could be in Ruby and the other in Node.js, or who knows what. So yeah. So where does that put us? We have, we've talked about regression tests or regression bugs, regressions, which tests guard against architectural smells. There's just so many kinds of bugs.

Will_Button:

right?

Jonathan_Hall:

Trying to create a taxonomy for all bugs probably isn't a very useful use of time, but...

Will_Button:

It's going to look like the animal kingdom taxonomy.

Jonathan_Hall:

the

Will_Button:

Here we have the early prehistoric bugs.

Jonathan_Hall:

Well bugs do tend to evolve, right? Some of them.

Will_Button:

Yeah, for sure.

Jonathan_Hall:

So I did make a list of some things that can cause bugs. And maybe we could start with this. Or maybe we've already started. But maybe we can continue with this and add to it or delete some things off the list if I'm wrong. So I think an obvious place where bugs come from is an inexperienced developer. And that happens at so many different levels from just not realizing that they're using the wrong syntax or the wrong thing or an off by one error or whatever. There's so many mistakes that anybody can make and you make more of them when you're inexperienced. And then the higher level inexperienced developers maybe don't know how to write dry code or good architecture and so they end up with other layers of bugs as well. One I mentioned that I think is appropriate for this channel, for DevOps channel, is the idea of developers not writing their own tests. I think the Accelerate book makes a pretty strong case that developers need to be writing their own tests. If they aren't writing their own tests, business outcomes suffer. If I recall, I don't remember the quote exactly, but I think the claim was that in companies where developers are not primarily writing their own tests, There's no improvement in business outcomes versus no tests at all.

Will_Button:

Yeah, I can see that being the case. I've not encountered that in a long, long time, though. Have you? Where developers aren't writing their tests?

Jonathan_Hall:

You lucky guy.

Will_Button:

Really? Just sitting over here in my naive little world, huh?

Jonathan_Hall:

Yeah, I see a lot where QA type people, testers, are writing tests and developers are not. And I see two reasons that this is a problem. and on why that doesn't improve business outcomes. The first is that when a tester writes a test, they're writing at a different level than a developer usually. They're not writing unit tests, usually. They might be, but generally they're writing more integration or end-to-end tests, which are slower and more brittle and less precise, so they're less likely to catch a lot of bugs. But even if you could have a test engineer writing exactly the perfect testing... that would be functionally equivalent to what a developer would write. I think this is the big reason why separate people writing tests doesn't improve business outcomes. You don't teach the developers to write better code. You just teach them to keep throwing their crap at you.

Will_Button:

Right?

Jonathan_Hall:

And I use the analogy of making dinner in my kitchen. When I make dinner and I leave a mess and my wife complains about it, She has two choices. She can either clean up the mess or she can wait until I clean up. She can yell at me until I clean up the mess. And if she makes me clean up the mess, the next time I'm less likely to make a mess. But if she cleans it up, the next time, I meant it for one thing, it teaches me to be lazy in the sense of, yeah, I know my wife will clean it up. But even if I try not to be lazy and I make the mental effort to clean up my mess next time. I may not notice the mess I made because she cleaned up something that I didn't realize I had done.

Will_Button:

Right?

Jonathan_Hall:

So I think this sort of psychological phenomenon is really the underlying reason that developers need to be writing their own tests and when they don't, from a systemic level, it causes bugs.

Will_Button:

Yeah, I would add to that and say that writing your own test forces you to think better about how you write code as well. It changes the way that you architect it and you write it in a way that's designed to be tested and I think you produce better, more sustainable code as a result of that and that lesson like you mentioned, that lesson is lost if you're not the one writing your tests.

Jonathan_Hall:

Exactly, yeah.

Will_Button:

But I'd be curious if someone listening advocates the other way and you're a big fan of having someone else write the test, hit us up on Twitter and

Jonathan_Hall:

Yeah,

Will_Button:

let us

Jonathan_Hall:

please.

Will_Button:

know. Because I'd be really curious to hear what the logic and wins and outcomes of the alternative is.

Jonathan_Hall:

Yep. The next one on my list was developers not being responsible for their own software deployment, which is another DevOps-related thing, right?

Will_Button:

Mm-hmm.

Jonathan_Hall:

If you write your code and hand it off to the ops team and they're the ones who deploy it and maybe handle on-call situations, you know, when the server crashes at 3 a.m. or you start serving 500 errors all the time, someone else fixes it, same exact problem, you know, the developer isn't incentivized to learn the proper ways to do things.

Will_Button:

Yeah,

Jonathan_Hall:

So.

Will_Button:

agreed 100% on that. Just had that conversation last night about developers being on call and the lessons you learn from it. And the common argument is, well, it pulls them away from doing their tasks. Like, well, they're not doing their tasks very well if they're spending that much time on call.

Jonathan_Hall:

Yeah, that's, I mean like, I think my first question would be, what do you think their task is? Ha ha ha.

Will_Button:

Right? Do your office space reference here. What exactly would you say your job is here?

Jonathan_Hall:

What would you say you do here? Yeah, I mean, if a developer's job isn't to write code that solves customers' problems, or really to the point, solve customers' problems, and code is the means.

Will_Button:

Yeah.

Jonathan_Hall:

If they're delivering code that doesn't work, they're not doing that. So that is their task, for goodness sake. If you have some other definition of a programmer... Go become a gardener, please. Leave the industry. You don't know what you're doing.

Will_Button:

Alright.

Jonathan_Hall:

And then a very closely related one to that is the development team not being responsible to fix their own bugs. And this comes up when you see like a maintenance team, you have the feature team and a maintenance team. This doesn't teach the maintenance team to write good code, it teaches the maintenance team or the feature team to just write as many features as they can, who cares if they work, the maintenance guys, the junior guys probably will be fixing our bugs. So that's another place where bugs come from in my opinion.

Will_Button:

Yeah, and then that's just another, you know, we talk in DevOps a lot about throwing the code over the wall between

Jonathan_Hall:

Yep.

Will_Button:

software engineering and IT, and that would be another wall to throw your code over.

Jonathan_Hall:

No, throw it down to the junior guys who haven't earned their chops yet or whatever.

Will_Button:

Right?

Jonathan_Hall:

So I have two more on my list and then we can open it up and talk about anything else that comes to mind. The last, most of these are really DevOps things, which is I think why I wanted to talk about it today. A long delay between bug creation and fixing, I think, causes bugs. And this all goes back to that training your developers to learn how to write. bug-free code, right? If I write a bug and I don't hear about it for six months, there's two problems. One is I've forgotten what I did in the first place.

Will_Button:

Great.

Jonathan_Hall:

And the second is I don't learn that there's an incentive to write bug-free code, because delayed reaction or delayed incentives, delayed punishments, we all know from psychology, is less impactful. So yeah, putting this in a more DevOps-y way. Long cycle times cause bugs.

Will_Button:

Yeah, for sure. For sure, because the longer that delay is, I know personally, I have a tendency to discount its importance. It's like, oh, I wrote that six months ago and I'm just now hearing about it. It's probably not that big of a deal.

Jonathan_Hall:

Definitely have that conversation sometimes, yes. Well, if it's been there that long, it's probably not priority. Let's put it on the bottom of the backlog.

Will_Button:

Right?

Jonathan_Hall:

And then it's another six months rotting. The last one on my list, and this one might be contentious, so maybe you can argue with me about this one. I hope you will.

Will_Button:

bring it on.

Jonathan_Hall:

Deadlines.

Will_Button:

So are you advocating that tight deadlines create more bugs?

Jonathan_Hall:

I think I'm advocating that deadlines in general create bugs.

Will_Button:

Oh man,

Jonathan_Hall:

I'm gonna let

Will_Button:

alright.

Jonathan_Hall:

that linger for a while before I explain what I'm thinking.

Will_Button:

Yeah, I'm trying to think how I'm going to play the antagonist to this. So OK, I want to argue against it, saying that deadlines aren't the source of bugs, because deadlines are a way for you to define what work you're going to deliver and. when it's going to be done. And if that's causing you a problem, it's not a problem with the deadline. It's a problem with your ability to scale, or to forecast and plan your work.

Jonathan_Hall:

So let me be clear that I'm not saying that deadlines are evil. I think they're often useful. They are sometimes necessary. But I think that when a person, a developer, a creative person, let's broaden that, when I think when you're trying to implement something creative and you know there's a deadline, you cut corners. whether the bed line is there for good reasons or not. So that's kind of my theory here. Now, I definitely, there are some bad deadlines. There's a lot of arbitrary deadlines out there like, oh, have it done on Thursday just because, and especially unrealistic deadlines are very harmful.

Will_Button:

Right?

Jonathan_Hall:

But I think I wanna make a broader point. I mean, maybe we can narrow it down to the unrealistic deadlines in a minute, but I think I wanna make a broader point that I think deadlines in general cause people to cut corners. Maybe there's somebody out there who that's not true for, but I think on average, that's probably true. I don't have data to back this up. It's just kind of a thought I had.

Will_Button:

I think... I think I could agree with you in the case of unrealistic deadlines, but I'm not sure I'd buy that in the case of generally agreed upon saying deadlines. know, because I think if you have a deadline, if you have a deadline and it's realistic, then it's merely a tracking point at that point. And it's just something to work towards so that you have a fixed goal in mind. Having said that, everything that we forecast in our world and put a deadline on it, we blow past that deadline like Jeff Gordon at the Daytona 500

Jonathan_Hall:

Yeah.

Will_Button:

and set another deadline. And I'm totally OK with that because everything that we do in our world, you're like, oh, OK, I'll do this. Like, oh, man, that's got problem X, and problem X leads to problem Y, and then all of a sudden you're rethinking the whole thing, and you have to adjust your deadline at that point. So if you don't have that flexibility to adjust your deadline as you learn more information about the problem you're trying to solve, I can see your point that you're going to have to cut corners to hit the deadline.

Jonathan_Hall:

So let me tie this back to my opening story with this transistor SDK I'm writing. I don't have a deadline for this, but I could easily have created one for myself. I could have said, you know, I want to have a working version in two days or something. I've been working on it for about a week, a little less than a week, right? But I've been adding features, like actually where I found this bug is a feature I don't care about.

Will_Button:

Yeah

Jonathan_Hall:

I don't need to use it to query statistics, but my thought was... If I'm going to write this SDK, I'm going to open source it. And I might as well, you know, there's not that much sort of a surface area here. I might as well finish the thing. And that way somebody's more likely to want to use it. And I'm more likely to get a contribution or at least get bug reports if something's wrong.

Will_Button:

Mm-hmm.

Jonathan_Hall:

Whereas if I just did the bare minimum for what I need, I'm not, nobody else is going to want to use it. Because I only need three endpoints. Nobody's going to want an SDK that only uses those three endpoints. Right. So

Will_Button:

Right.

Jonathan_Hall:

I thought, you know, I could spend. I could spend 50% of the effort to build something only I will ever use, or I could double the effort and have something that potentially others will use and I might get some valuable feedback. But there's no deadline. If I had a deadline, I would cut off that 50% scope that I'm not gonna use. And I probably wouldn't get feedback, and I probably wouldn't do the same sort of testing I'm doing right now. In other words, I would cut corners. I would cut corners. Now, is that appropriate? If there was a business case behind this, if there was a business waiting on delivery, then it would totally be appropriate to cut those corners to

Will_Button:

Great.

Jonathan_Hall:

get that software into their hands faster. Since this is a hobby open source project, I don't know. But the point is, at least on this example, it feels to me like the fact that there's no deadline means I'm making better code. And sometimes that's the wrong trade-off, to be clear. Again, I'm not saying deadlines are evil. Sometimes deadlines are necessary and useful. And if this was a business case, it would be very useful to focus on the three endpoints that matter, even if they're quote, lower quality, than to do a full SDK. So anyway, that's my

Will_Button:

Yeah.

Jonathan_Hall:

point. I think it's quite reasonable to disagree with it because it's not even like a firmly held opinion. It's just kind of a random thought that I had.

Will_Button:

So you say there's no deadline. So if two years from now, you're still plunking away at this and haven't put it to use, you OK with that?

Jonathan_Hall:

I wouldn't say so. I would get bored by that point and just move on to something else if it took that long.

Will_Button:

Yeah, so there is a deadline. It's just not defined.

Jonathan_Hall:

I wouldn't say.

Will_Button:

It's somewhere between this week and two years from now.

Jonathan_Hall:

Is that a deadline?

Will_Button:

Yeah, because there's a point there's a point in there where you'll you'll mentally say, okay, I'm shipping it or I'm abandoning it.

Jonathan_Hall:

Okay, if we want to broaden the definition, or use broad enough a definition of deadline to include that concept, then I suppose there is a deadline. There's an expiry date on this project.

Will_Button:

Yeah, maybe expiry date is a better term than deadline. But I don't know. I mean,

Jonathan_Hall:

I mean,

Will_Button:

I think there is a deadline.

Jonathan_Hall:

I could also put an arbitrary deadline for six months from now. And in that case, honestly, I don't think this is more than another week's, I mean, maximum one week of work before I'm ready to, like next week, my pick is my new SDK for Transistor or whatever, right?

Will_Button:

Right?

Jonathan_Hall:

So, yeah, it's not a big project. It's a pretty simple project, but I could put a deadline so far out there that it's ridiculous, like six months, or two years even, or whatever. And at that point, it wouldn't feel like a deadline because there would be no pressure. So in that sense, no, there's no deadline. And if my next product manager says the deadline is the year 2030 to fix this bug, then no, that's not gonna cause pressure. So

Will_Button:

Not

Jonathan_Hall:

I guess

Will_Button:

until

Jonathan_Hall:

there has to be.

Will_Button:

December

Jonathan_Hall:

So I

Will_Button:

of 2029,

Jonathan_Hall:

Right. So

Will_Button:

when I actually look

Jonathan_Hall:

Will_Button:

Jonathan_Hall:

suppose

Will_Button:

it.

Jonathan_Hall:

in the strictest sense, the deadline has to inflict some amount of pressure. Just having a deadline that's before the sun explodes isn't meaningful.

Will_Button:

Yeah, and I would advocate putting a deadline on it just to get closure on it. Because until you get closure on it, it's going to be mentally tying up cycles in your head. It's an unresolved thing. And if you have too many open file handles, you're going to reach a point where you've exhausted the number of file handles you can have open.

Jonathan_Hall:

Oh, that's easy to fix. Just recompile the kernel. Ha ha

Will_Button:

Ha ha ha

Jonathan_Hall:

ha ha. Ugh. So it sounds like we agree though that tight deadlines, especially unrealistically tight deadlines cause bugs. And would that surprise anybody? I don't think so.

Will_Button:

I know some people in upper management that would be absolutely shocked

Jonathan_Hall:

Ha ha ha

Will_Button:

to hear that.

Jonathan_Hall:

They

Will_Button:

But

Jonathan_Hall:

watch too much Star Trek.

Will_Button:

yeah, I do agree with you there. Unrealistic tight deadlines will definitely be a source of bugs. And I'm OK with setting tight deadlines with the caveat that once we learn more, that deadline may become null and void. And we're going to set a new deadline based on more accurate information.

Jonathan_Hall:

Yeah. And sometimes, honestly, the thing we need is a hack to get through the day. And

Will_Button:

Right.

Jonathan_Hall:

we know that it's not a perfect solution. We just need something to make. And if that's the trade-off, fine. So I think it's OK. Sometimes it's OK to accept buggy code if the trade-offs are appropriate, right?

Will_Button:

Yeah, and I think the right way to handle that is, okay, we're implementing this hack to fix this. So here, that's done, we close that bug ticket, and we open up a new ticket saying, hey, we created this hack, we need to go back and fix this properly. And reference those tickets so there's a clear historical path along that. But yeah, writing a hack, closing the ticket, and pretending that the hack doesn't exist. just leads to more problems.

Jonathan_Hall:

So in my list, I didn't talk about breakdowns of communication over complex architecture. But I think those are the things we talked about earlier. Those are definitely things that I think can contribute to bugs. Are there

Will_Button:

Jonathan_Hall:

other?

Will_Button:

how do you deal with that?

Jonathan_Hall:

Yeah.

Will_Button:

Communication, especially in a remote world. I've been talking with several people about this recently with a lot of us working remote these days. I think a lot of people got excited and said, thank god, I don't have to talk with people anymore.

Jonathan_Hall:

Hehehehe.

Will_Button:

But I've been remote for like six or seven years at this point, and I found it's the exact opposite. I have to talk more with people than what I would if I were working side by side with them in the office, because you have to do proactive outreach to communicate. And when you do communicate, whether it's Well, it's most often written, whether that be in a direct message, an email, or a comment on a ticket. I think you have to spend a lot more time thinking about and choosing your words carefully to convey the meaning that you're trying to convey. So from that perspective, I think being remote actually requires more social skills than being in the office where you can just blurt out whatever pops in your head. And then base your second statement on the body language and the facial expressions of whoever you're talking to.

Jonathan_Hall:

Yeah, you touched on a topic. We could probably talk a whole episode about this. Or I could, at least. I don't know if you have enough interest in the topic. But I've been a strong advocate for remote work for a long time. I started doing it in 2011, I think. And I've done it most of the time since then. Off and on, I've had office jobs since then. But I think the key to successful remote work is asynchronous work. And there's a tension here because a lot of the community, especially the XP community, is strongly emphasizing synchronous work and pair programming and mob programming and stuff like that and getting rid of pull requests and all this sort of, getting rid of asynchronous code review and stuff like that. So I'm a little bit torn between these two concepts, but I'll just play the advocate for asynchronous work for a moment. I think, I don't know if it requires better social skills, but it certainly requires different social skills than in person. It requires good written and reading communication skills that aren't required when you're remote or when you're in person. So, what I think is important to do is document everything. Document your thoughts. Don't just create a pull request with a title, add field to struct. or to object or whatever, explain why you're adding that and the thought process that went into it, which customer you're doing it for and so on and so forth. Leave a huge breadcrumb trail to how you got to this point. So that somebody, and that somebody likely will be you in six months who's forgotten the thing

Will_Button:

Right?

Jonathan_Hall:

you're doing can figure out what's going on. And so I've said for a long time, it's only recently I started to question whether this is universally true, but I said for a long time that remote working, successfully remote, working remotely successfully improves your co-located working skills as well. Because if you write everything down in Jira or in GitHub or on a Wiki or an email, whatever, I don't care, as long as it's documented somewhere, if you learn how to communicate effectively written in ways that... are discoverable, that's also an important aspect. It's not enough just to write a 15 page document on Slack that's gonna be lost 10 days later.

Will_Button:

Great.

Jonathan_Hall:

It has to be someplace discoverable. Those skills are hugely valuable. They're essential if you're remote, but they're still incredibly valuable even if you're co-located. So I've been a long time advocate to these sorts of skills, learning to do this better. And every time I hear somebody tell me, oh remote is just not as effective, What they're saying is that people don't have these skills.

Will_Button:

Yeah,

Jonathan_Hall:

If you have

Will_Button:

absolutely.

Jonathan_Hall:

these skills, working remote is as effective, if not more effective than working in person, because when you're in person, you have interruptions and you have that person who doesn't know how to write something coming over, hey Will, where was that document you sent me last week? Can you send it again? You know, crap requests like that.

Will_Button:

Absolutely, which goes back to the bus factor because if your knowledge management system involves asking someone personally, what do you do when that person is on vacation or no longer with the company or deceased? So I agree with you 100% there. One of the tools I like to do, because to facilitate that or to guide people to that comes in the form of. GitHub templates, a new issue template or a pull request template and a read me template for a new repo, where you can lay out those bullet points and kind of say, here are the things that you should think about adding to this. And they're free to delete whatever section may not be relevant, but by putting it in front of them, it sparks the creative juices of like, oh, yeah, that piece of information might be relevant.

Jonathan_Hall:

Yeah, so anyway, we could talk about that longer. I'm really interested in diving into that topic, the dichotomy between the power of synchronously two brains working on the same piece of code versus the efficiency of working asynchronously, if you can get your team to merge pull, or to review code effectively and stuff like that. But it's probably for a more programming-related

Will_Button:

Yeah,

Jonathan_Hall:

podcast.

Will_Button:

whoever does that with me is going to have to be really, really comfortable with curse words.

Jonathan_Hall:

Ha ha ha.

Will_Button:

That's like 90% of my coding.

Jonathan_Hall:

Ha ha ha ha. Anything else to talk about on this topic?

Will_Button:

I think we pretty much narrowed down where bugs come from, so

Jonathan_Hall:

Will_Button:

Jonathan_Hall:

solved

Will_Button:

can

Jonathan_Hall:

it.

Will_Button:

look forward to those just disappearing in the next few

Jonathan_Hall:

Yeah,

Will_Button:

weeks after this episode releases.

Jonathan_Hall:

great. Looking forward to the bug free world that we are about to usher in.

Will_Button:

Yeah, all right. Let's do some bug free picks.

Jonathan_Hall:

awesome.

Will_Button:

You got anything ready?

Jonathan_Hall:

So I'm going to pick, after I piss on them a little bit for having a buggy API, I'm going to pick Transistor.fm.

Will_Button:

Hahaha!

Jonathan_Hall:

I'm actually quite happy with their service. And their customer service was quite responsive when I reported this bug and when I asked about a developer account and so on. And I've been using them for a year and a half for my own podcast. So if you're thinking about starting a podcast, and you totally should, Transistor.fm's a great place to go. It's about, I think it's 19 US dollars a month for their starter plan. and I've never had a need to go to something more expensive. So there are other options, of course, check them all out, but yeah, transistor.fm is a good one. I give them

Will_Button:

Right

Jonathan_Hall:

Will_Button:

on.

Jonathan_Hall:

pick.

Will_Button:

Excellent. So I've got two picks. The first one, I'm gonna pick my own YouTube channel, hashtag shameless self-promotion, YouTube DevOps for developers, because I'm just over 7,000 subscribers on the channel, and I'm hoping to hit the 10,000 subscriber mark by the end of the year.

Jonathan_Hall:

Awesome.

Will_Button:

Yeah, I think I can make that happen. And there's no real reason for that other than I just I think it would be cool. So

Jonathan_Hall:

want to. Yeah.

Will_Button:

if you haven't checked out my YouTube channel, DevOps for Developers, and go to subscribe. And my second pick is more of a what the pick, because I've been on this quest lately to. to do better infrastructure as code. And it dawned on me, when you look at any system, what's the biggest weakness of every system? And it's always the humans. We're the ones that screw everything up. Ultimately, even going back to the bug situation, there's always a human at the root of every bug. And in infrastructure as code, I've looked at a lot of the work we've done to document how our services are built, how they're run, how they scale. But I see very few people addressing the humans that are involved in your infrastructure, talking specifically about who has access and what level of access do they have to your AWS account. And so I've taken on the task of documenting that, putting that in a GitHub repo so that no one gets access to AWS. without opening a pull request and then documenting the requested actions that they would like in there. That pull request gets merged automatically deployed. And if it ever changes, we have an audit trail in the good history to show why it was changed and who changed it. But taking that one step further, we use Control Tower and have multiple AWS accounts under an umbrella AWS account. And there's this tool from AWS called the Account Factory for Terraform that lets you deploy AWS accounts in your account factory using Terraform. But I'm still trying to wrap my head around it and implement it because I'm like, what on earth is this? It's Terraform, but it launches DynamoDBs and step functions and Lambda functions and S3 buckets, all to seemingly create just another AWS account. So I'll be back next week with more info on how that process worked out, because I'm just trying to get it scoped out and working to see if it's a go-no-go decision on is this how we really want to create accounts. But I bring it up because if you've used the account factory for Terraform, I really want to hear what your experience was with it because they throw out in their documentation get-offs and infrastructure as code. But my initial opinion on this tool is they totally missed the mark on that. So I'm curious to get someone else's opinion.

Jonathan_Hall:

All right, awesome.

Will_Button:

Cool, alright, well, sounds good man, I'll see you next week.

Jonathan_Hall:

See you next week.

Will_Button:

Alright, on, so long everyone.

Jonathan_Hall:

Cheers.

Where Do Bugs Come From? - DevOps 136

0:00

41:41

Playback Speed:

Show Notes

Sponsors

Picks

Transcript