Profound

S3 E3 - Donna Knapp - Dr. Deming as a Sustainability Specialist?

February 01, 2023 John Willis Season 3 Episode 3
Profound
S3 E3 - Donna Knapp - Dr. Deming as a Sustainability Specialist?
Show Notes Transcript

I interview a longtime Dr. Deming fan and early DevOps pioneer in this episode. Donna has a similar background to mine, where we both worked before the age of clouds and modern infrastructure. You are going to enjoy this podcast. Here are some links for follow-up knowledge.

Donna Knapp Linkedin

ReCommoning: Transitioning Organisations with Jabe Bloom

The Visible Ops Handbook

TRADE-OFFS UNDER PRESSURE: HEURISTICS AND OBSERVATIONS OF TEAMS RESOLVING INTERNET SERVICE OUTAGES

What Would Deming Do

John Willis: [00:00:00] Hey, this is John Willis again with another profound podcast. I've got a great old friend. We don't see each other, but like, we go back way back with DevOps days. And in fact, I'm pretty sure I met you at the 1st DevOps days. Maybe the 2nd. 

Donna Knapp: I can't remember. Yeah, 

John Willis: yeah, that's right. Austin. And yeah, so Donna, why don't you go ahead and introduce yourself?

Donna Knapp: So, hi, I'm Donna Knapp. I am the curriculum development manager for Academy. Which means that I develop and maintain our course materials. We focus on IT service management, Agilene, DevOps. The coolest thing about my job is, my job is to learn a bunch of stuff about a lot of different things and then figure out how best to share that with other people.

So for somebody like me, who's very much a lifelong learner, it's a great gig and I've been with the academy for I was just told 16 years. [00:01:00] My, my the people at the academy always have to remind me how long it's been. And and and so clearly you know, I love what I do and I love who I work with.

That's awesome. 

John Willis: Yeah, that's great. Yeah, I, I I remember you and Lisa and you did all sort of showed up at the DevOps days and you had this strong service management background and like, wait a minute this and I actually had in another life. You know, very early I tell and it was more about back in the days when you had to sort of certify a product and I told less about the early early on.

And again, we don't want to go there, but there was this sort of, like, why a service management people showing up at a DevOps conference and like, you know, we had no clue. Where the origins came from, and I guess that leads to my sort of 1st question with you. I always like to ask guests, you know, have this sort of relate learning relationship.

I think you said earlier with depth. So, what is your, how did you become a Deming sort of life learner 

Donna Knapp: or so?[00:02:00] Really early in my career, I actually pulled out another resume just to figure out what year it was. And it was 1982, which is 10 years in my mind, but we all know is more than that. I worked, I was the I.

T. liaison to my company's I said 9000 initiative and years later I kind of understood how really well run this program was because what they did was they brought in people from all different parts of the company and we all went through a little training program. We got a little bit of lean education.

We learned about total quality management. Yeah. We learned about statistical quality control and process design and improvement and that's where I was introduced to Deming because, and I'm a little bit of a pack rat, so I'm always shocked when I don't have things, but we were actually given a little workbook as part of this program and the workbook had a whole bunch of Deming quotes laced throughout this workbook.[00:03:00] 

And that, that started it, like, you know, I was, who was this guy, Deming, what's, what's he talking about? Let me, and Deming, you know, the more time you spend with Deming, the more, along the way, you're going to be introduced to things that are counterintuitive, you know what I mean? Yeah, you're going to be like, what's he talking about?

Yeah, yeah. But I always feel grateful that I'm very much a lifelong learner. And and so I go on these little journeys where I, I, I go off to try to figure out what was he talking about, you know, when he talked about whatever principle or and, but it was really early on and I'm always super grateful for that opportunity because really today I continue to kind of always want to learn about, you know, Things he viewed as important, you know, leadership, you know, employee experience, sustainability, like he and, you know, correct me if I'm wrong, but he never [00:04:00] used the word sustainability, like, in its pure form, right?

But he talks about sustainability all the time. He never used the phrase employee experience, which we use all the time today, but he talked about employee experience all the time. So, I guess I continue to learn from him and I, I, I am completely convinced that he was ahead of his time. We're only now just starting to wrap our head around what he was.

Trying to help us understand and we, you know, in a lot of ways, we still aren't doing, you know, the things he suggests we do. But but that's where it began, right? Really early on in my career. And I've continued to. Just kind of along the way, try to learn from him. 

John Willis: I think that's what's fascinating about Deming.

You know, the most people's experience are similar. It's, you know, for me, it was a gentleman named Ben Rockwood, who, you know, it was an early DevOps days. We were doing something on and the people have heard this podcast started a million times, but. It was an [00:05:00] open space on theory constraints, and I had basically really delve into goal rat.

I mean, because, you know, I met gene gene said, hey, before you read an early draft of my book, you should read this book, which was the goal and I, I just started consuming everything. Go read. Wrote at that point come into this open space like all right and and and and and benrock would sort of not really because ben would never do this but sort of pats me on the head and said oh john john it's it all goes back to me i'm like no no and but you know and his sort of proof case was go read the fourteen points just google fourteen points.

And like you with the quotes, right? It was, wow, this, you know, to me, it was everything we were trying to say in DevOps. Like, he was saying this, like, I don't know, 20 years, 40 years. But the other thing, the other thing I think is fascinating, it's like, if you're an insatiable learner, and you love to learn to learn, right?

Deming is the greatest gift, because there are things [00:06:00] that, like, like, like, 10 years ago that I thought I understood, then 5 years later, I'm like, oh, I didn't really understand it. It always made sense. And then even recently, and we can get into this a little later, but I, I, I, I had this, oh, my goodness, like the difference, what he called analytical statistics versus enumerated statistics.

Right? And it just. Fascinates me that every time I think I understand something, I'll read something else by him. We'll just have a couple of words like, Whoa, reason why he used those words. And then I'm like, you know, maybe I don't understand this as deep as I and goodness. It's it's the ultimate puzzle.

You know, anyway. Yeah. Yeah. 

Donna Knapp: Well, and I think that there lies. A true lifelong learner is that you're always willing to challenge. You're always willing to recognize it. Like, hey, I really thought I understood this. And then all of a sudden. You'll read something or somebody will say something and you'll realize, like, all [00:07:00] right, like, I've kind of learned that if I have a, like, visceral reaction to something, somebody says, like, a push back on it to me, that tells me, like, you need to challenge your beliefs on this because some.

You, you know, intuitively or in the back of your mind, there's some kernel of truth there and you don't yet understand why it's a truth. You don't understand what there's some limiting belief that's standing in the way of you really, truly understanding. And now you need to go on a journey to try to better understand.

John Willis: Oh, that's brilliant. No, because I think like, I love that because, like, it is, if you can't immediately defend. Like, you're right, you have this visual effect, like, no, I, I don't think, I don't believe that or like, like, it sort of hurts my, you know, like, the way everything I know. No, you're wrong. Yeah. And, and, and, and, but if you can't, like, say, oh, no, here's why you're not right.

Right? Then that's you're right. That should be an immediate clue to, like, and that is this sort of the theory of knowledge. Right? Right. So, [00:08:00] yeah, 

Donna Knapp: well, and he, you know, kind of gaining the insights and I think. Yeah. We just did last year, Michael Cardinal, who is I think his current title is, like, director of business process, process management at 3rd Era.

But he and I did a presentation at a conference called What Would Demi Do? And 1 of the things we talked about is that idea that it's not just about. Kind of knowing this stuff and reading what his points are, but really understanding, like, why was he making that point and what. What are the insights to be gained and.

I think sometimes I find in the IT community in particular, we are sometimes, we're very enamored with the new and shiny, right? And we always just think we're different and what we're going through is different and, you know, we, like, these old ways of thinking aren't relevant to us. But just to your point about IT service management, like, why are these IT [00:09:00] service management people showing up in their DevOps days?

Well, here's the reality. If you're doing DevOps, you're doing IT service management. You're handling incidents, you're making changes, you're making releases, you're managing portfolios, you're doing service management. You're just not calling it that. Right. Or you're just not doing it in a way that, you know, you mentioned the IDLE framework.

Historically, you're not doing it in a way that the IDLE framework says you should be doing. But you're doing it, right? So let's just accept that and let's understand, right? Let's, let's try to gain the insights. That we need to understand why this is important and to the context of Deming. Why what he's continues to try to teach us because I don't think we've learned all of his lessons yet. 

But why do we need to try to challenge our beliefs and kind of wrap our heads around what he was trying to say? Because if we don't, I just think we're in big. Trouble as a global 

John Willis: community. [00:10:00] Yeah, no, I agree. And I think you're right. Part of the problem is the you know, like, we saw it. We both of us have doing this long enough that even sort of even my initial reaction to some of the people even before we coined it DevOps was, you know, like meeting Luke can ease with puppet, right?

And him talking about, you Like, like him using the phrase change management and I'm like, Oh, what is this young, young, young man know about like, cause we come from an age where you were in like, you know, some of the largest telcos and, and and, and, and switching fabrics in large banks. Right. And now you want to know what like, change manager looks like there.

And it's, it's, it's very mature, probably rigid, definitely rigid in modern times. But, but, but then like. You know, I think that was 1 of the good things about, you know, like, you, Lisa, myself, and there's a fair amount of people that were sort of had been doing all this stuff prior. You know, for me, it was, I've been doing Tivoli stuff, right?

IBM Tivoli. [00:11:00] And so I was able to sort of look at both worlds and sort of, I mean, I guess that's the positive thing about learners, right? Is that, you know, you either come into it and say, we've already done that before. It is no sense doing that. Or you come in like you guys did. And I did where, like, this is fascinating.

Let me help you try to, you know, let's, let's drive this car together. But let me try to help you understand what, what sort of IP is already out there that enforce because I was going to ask you about, like, that's the, I'd love to hear your version of it. Like, you know, I think for me, I didn't really come in, I knew idle, but I've been doing, like, large you know, pre DevOps, automation, infrastructure, typically type stuff.

And it wasn't until, like, a little bit deeper into my DevOps journey where I realized, oh, my goodness, it was sort of coordinated with the Deming experience, but like, like, oh, my goodness, like, everything we've been saying has really been talked about in lean, you know, and then where does lean come from?

And, you know, and then, like, the overlay of service [00:12:00] management. So, like, what was your sort of view as you sort of saw all these young, not me, but, like, talking about, like, these new shiny objects when you were coming in and saying, obviously, at that point, you were Like, we've done a lot of this before, but how, how did you, I don't know, this is a crazy question, but how'd you connect the dots?

Donna Knapp: Well, it kind of took me a while because so I'll give you kind of a couple of specific examples. So I was a change manager at 1 point in my career. And, you know, this was back in the mainframe days, and so change management was, in fact, a discipline, you know, you had to understand that, like, and I worked in a manufacturing company.

So, and I'm always super grateful because I so tangential story, I was a service desk manager at the same job at 1 point and the system went down at 1 point and the shop floor manager called me on the phone and said, get down here. And I was like, well, we have that we're having a system problem right now.

I'll come down when I can. He's like, [00:13:00] get down here and he slams down. And I remember walking into the shop floor. And thinking to myself, man, it's really quiet in here today. I had not connected the dots, right? Shop floor down equal bad, right? If the computer system is down, the shop floor is down and the shop floor is down in the manufacturing company.

We're going out of business tomorrow. Like, I was always grateful for him because he called me down. He proceeded to chew me out for 15 minutes about the fact that he had 30 union workers sitting on the dock smoking cigarettes. Right. So I kind of got that connection first of all. And I also was a change manager.

So You know, you have to understand sometimes how you get to places. So what did that teach me? Oh, okay. I need to be a little more risk averse. I need to be I was a change manager. Now I need to be a little more risk averse. I need to be careful because. If we have a change that goes bad, the shop floor goes down and shuffle down and go back.

Right? So I became 1 [00:14:00] of those change managers who is maybe overly risk averse. Right? And and and so now I come into the DevOps community with this history, right? This view of change management. And it took me a really long time to kind of wrap my head. And at the time that community was like, we don't need no sync and change management.

Like they, they thought they didn't need change management at all. They thought they should be able to do anything they want to do anytime they want to do it. And I, like years later, I did a presentation, I did an ignite at a DevOps days. And, you know, I kind of said to the audience, can we all agree that if nothing else, we need to know what changed, who changed it and why it got changed.

And everybody went like, yeah, we can agree. It's like, okay. So, you just all agreed we need change management. So, now let's talk about how we do change management in the modern world. So, it took me a while to wrap my head around all the things that the DevOps community was doing to mitigate risk, right?

And that's the That's what I had to get to in [00:15:00] my head. They were, let's talk a little lean. They were making smaller changes. They were making frequent changes. They were, you know, they were applying automation, you know, where they could for, for, for testing purposes. Fast forward even, because when I first came into the DevOps community, we weren't necessarily, or At least I wasn't being exposed to things like blue green and, you know, dark launches and some of the stuff that are like, really commonplace today, which again, help you to minimize the risk.

So it kind of took me a while to wrap my head around all the things that were being done in that community to mitigate the risk. that then led me to understanding like, oh, okay, this is just standard change, right? What we would call in idle terms, a standard change. Like you're making a change through a trusted and secure DevOps pipeline.

We can call that a standard change. You don't need to go in front of the cab. All we need is a record of the fact that you made that change. But it took me probably a couple of years. Quite [00:16:00] honestly, it took me a couple of years to get to that place. So I think it's, I think it's where You know, we went through a lot of years of framework wars, right?

We're not doing idle because we're doing DevOps, or we're not doing agile because we're doing DevOps, and we're not doing, you know, whatever, because we're doing whatever. And really, if you look at the hottest performing organizations, They they're doing it all right. And they're not doing any of it in its purest form.

That's right. They're pulling bits and pieces right based on their circumstances needs and goals of all these different ways of working. They're pulling these bits of pieces together and coming up with what is what works for that. And, and, and, and what Deming has taught us, if nothing else, is what you need to understand is that what's working you.

For you today isn't going to work tomorrow. So you better be starting to think about the future, right? You better be looking ahead a little bit as well, because it's not [00:17:00] entropy is going to set in your systems are going to deteriorate. And you're going to be right back where you are. It always kills me.

Like, I've had situations where. We've done training, like, I've gone in and done idle related training or even DevOps related training. You know, we trained up, you know, hundreds of people within an organization and we come back years later and they're a mess. And it's like, yeah, what happened there? You know, like, somewhere along the line.

They let entropy sit in, they let their systems, they didn't practice continual improvement. And it all starts to fall apart. Yeah. 

John Willis: Yeah. It's funny. You know, there's, you just had a couple of things I wanted to drill into based on what you said. But there's this famous, it's not even really a quote.

It's a, it's a, a video of sort of funny clips of Deming talking and, and there's a there's a, 1 of his students says, Dr Deming, you know, I was here in your seminar a [00:18:00] couple of years ago. And you said X, X, X, Y, Z. Now you're saying ABC and he's like, you know, like a terrible impersonation book. I will not apologize for learning, you know, it's changing.

It's awesome. I 

Donna Knapp: never, 

John Willis: it's a great one. But what I also want to point out to people who are younger you know, I think somebody who was under my question, like, how did you not know that computer down is. Well, back in our day, that wasn't obvious to everybody, even in the IT infrastructure, the way it was so early that we didn't really understand the true, you know, business, that's what we learned in the 80s.

Right? But, but the, the other thing that I really want to drill in on is. I was literally, it's kind of funny. You're talking about sort of change and trying to explain to the DevOps community that, like, let's agree. Let's change. I I, you know, it's not that I'm calling this, but I sort of trying to create like this.

There is. [00:19:00] A DevOps heuristic, and, you know, and if you go back to visible ops, and were you involved in visible ops? Yeah. Yeah. Let's let's hold that just for a 2nd, then, because I thought you were and we'll explain as well. But visible ops is sort of the gene and a group. You did a bunch of research and we'll drill out, but there's a heuristic in there.

It's not called heuristic. It is basically that, you know, I think it says 80 percent of all outages are related to changes. And then you know, there was you know, the SRE doubled down on it. It was a tenant of SRE that Google wrote. It's like, they said 70%, but who's counting? And then, you know, I don't know if you ever had a chance to read John Osborne's Master's Capstone.

It's a brilliant paper. I mean, it's very academic. I haven't. No. Yeah, I'll send you a link. I'll put a link in the show notes. What he did is he basically tried to come up with sort of heuristics for incident management while he was at Etsy. So he literally did it and he got his master's degree in it, where he took, like, some [00:20:00] major outages and looked at how people behave and how they solve things right out in the thesis said there are 3 heuristics.

And it's managed in modern, you know, large scale cloud computing, you know, infrastructure in the first heuristic was the probability of an incident is most likely related to a change, right? Like, that is as pure as gold, you know, as anything, if there is a phrase, a DevOps heuristic, that's it. Right 

Donna Knapp: and and so it's always fascinating to me that this community so very much pushes back on change management and I, like, I can understand it because again.

I was guilty of establishing overly bureaucratic processes and overly controlling process. Like, I was guilty of it. And but, but, but I learned better. Right? And so now I very much encourage. ITS and professionals to think very differently about change management and kind of, you know, encourage them always to [00:21:00] understand, you know, that if you're living in a waterfall world, that's 1 thing.

And let's be clear. Waterfall still exists in some organizations. And if you're living in an agile DevOps world, it's very different and you have to adapt accordingly. But you know, until the day comes. so much for joining us. Then we that we've that we've gotten so good at system design and development that we don't produce incidents.

We need. You know, processes like change management and processes like incident management and processes like problem management. And if we, if we. At time, we'll have a little rant about root cause analysis, because that's another area where I'm constantly challenging myself to understand why the DevOps community thinks the way it does.

But we'll get there if you want to. 

John Willis: Yeah, we can go. It is I, it that took, that was a struggle. It's a struggle for anybody who is sort of, comes out of the background that we came to. Yeah. I think it, you know, the, the thing that I, I really liked about some of the stuff that I've seen, your [00:22:00] presentations at Doss and, and Jane and, and Lisa's, you know, and, and one of the things you guys have done really well is these translations.

Like, like even what you said earlier, like. Like, what, you know, like, okay, I'm not going to tell you that you, you people who have built this incredible, fascinating automation and infrastructure and delivering value fast and resilient, right? That that you need to stop doing that, but I am going to force you to say, there is a change management process.

Right? Right. And I think you made that point really clear. Yeah. And then, you know, and when I talk about the heuristic, and I think we agree is. Yeah. The way we meet in middle is like you said earlier, like, you know, like, when you finally came in realization, oh, hey, y'all, that's standard change. You don't have to go to the cap for that.

Right? Right. You know, and I think so. A lot of that is. Like, you know, like, and even when I say that that's heuristic, I'm not saying, like, for [00:23:00] anybody, I think anybody who's gotten this far in my podcast understands me well enough to know that I say change it, you know, that sort of change thing, it isn't meaning that you can't do change or have a cab or go more restrictive.

It means that you need to basically have a way to understand and record the changes. And, you know, and the more automated you can do, beautiful, but like, because, you know, anyway, by, by sort of law, like governance and risk, you, you know, you, you know, you will get fined or shut out, you know, lose a banking license if you don't have evidence of a change, but then go back to the incident, heuristic, this is an incident heuristic, it is, you know, by some of the giants, Gene, yourself, your team, the people who invented SRE at Google will tell you this, that like, Like, change is most likely the 1st place to go look for instant resolution and therefore you better have some form of a change management.[00:24:00] 

Process, you know, and that doesn't get away with automation. I think I made my point, but yeah, 

Donna Knapp: right? Well, and I think, you know, I have, I kind of preach this philosophy all the time, make it easy for people to do the right thing. And you know, I'm seeing some organizations doing some really inventive things with incident management.

1 of our clients has something they call the bullet train publics, which is a supermarket chain down here in Florida. I live in Florida. Also kind of years ago, kind of instituted a similar approach where if you've proven that you produce high quality changes, right? If you've proven that you're, you're, you're, you're embracing good design and development practices, you're, you're, you're doing the appropriate level of testing, right?

You kind of. Can prove up that you're doing all the things that you need to be doing to produce a high quality change and mitigate the risks [00:25:00] and impact potential impacts associated with that change. You get to go really, really fast. If you, on the other hand, are wreaking havoc on the production environment, every time you make a change.

You're going to have to endure a little bureaucracy, right? And I, like, when I was a change manager, I can remember going to project managers. And, and, and sitting the reports on their desk and saying to them, look at what you're doing. You're wreaking havoc on the production environment. Every time you make a change, the phones ring off the hook on Monday morning.

Like, you need to understand what you're doing, and you don't feel the pain of that. And this is one of the things the DevOps community has done that I think is really good, right? Is the, you hold, you build it, you run it philosophy of life. Like, they should be feeling the pain. Right. They should, in fact, be the ones that get woken up in the middle of the night.

But I think one of the things that's challenging right now, John, is companies are all over the place, aren't they? Like, you have [00:26:00] some companies that have these, you know, product centric teams where they build it, they run it, they own it, they're taking care of everything. You have organizations that still have dev and ops, and the 2 don't talk to each other.

You have companies that have DevOps teams that are standalone. It's kind of all over the place. And so what's challenging for IT service management professionals today is they have to kind of understand where their company is. What's going on in their organization? How all these different disciplines, Agile, Lean, DevOps have been introduced and what state of maturity they're in, and then adapt IT service management accordingly.

Because how, what change management as an example looks like in all of those different environments is in fact very different. What incident management looks like in those different environments is very different. So it's, it's, it's really both interesting and challenging time from that perspective. The companies 

John Willis: are all over the place, but, you know, it made me think of it.

I love I'm loving this conversation, by the way, but [00:27:00] the you know, when you talked about, like, the bullet train or what public's trying to do is go. I mean, that that is the. Methodical Google S. R. E. structure. I mean, they've just they figured that out. And so that's how to do it. Right? Exactly. And then they sort of codified it as a process.

Now, it still runs through these, like, weird translations. And that could be another podcast of how service management teams sort of say, oh, we're DevOps. Why? Because we're now. Called SRE, right? Like that that, right. That's terrible. But but, but the thing I always, when I really understood it, was Damon Edwards, who really originally deep dived on SRE and like, I got to just sort of sit in and on his, on his wave, if you will to ride his wave.

And, and the thing that made really clear to me was, and, and we, we wrote about a little bit this in The Doubt handbook too. We, I don't know if we call it SRE, but we got it from Tom La Micelli, who was SRE which was, there was a, it, it was a transaction. You know, a developer [00:28:00] would basically have some service or something.

They go to an SRE team and Google and SRE team would say, well, you know, you got to have this. You need to instrument that. Like, they need to have not only proof that they could, they didn't wreak havoc. They need instrumentation in the code that they were going to manage. So there would be this back and forth and they'd finally come and like, okay, we can manage this, but here's what's going to happen now.

We're going to give you, you know, sort of an, you know, a a an SL, you know, SSLO. You know, of like some 99. 9 and, and, but, and you're going to pay, you know, in funny money or Google money this amount. And that's the transaction. And, and, and at that point, what that's saying is. You've given us enough information that we've got an expert group on managing reliability that will manage it for you.

But, you know, what Tom Limitelli said in, in some of his books the cloud I'll put the link in our perfect memory anymore, but [00:29:00] it never did. But you know, he, you know, he said that, you know, you could get kicked out of Google's SRE. Like, if you didn't own up to your contract, like, you sort of, you let you know, you let sort of non functional sort of drift and all of a sudden, you know, they would literally say, okay, you're not on the SRE manager, like, you're out, right?

So that, you know, that, that whole thing about Google's was just their way of learning what you have learned in the industry is that there's a way to do change if you don't recalibrate the way you don't recalibrate in a more prescriptive way. Is you do proper instrumentation, such as the people who manage reliability, the operations or or whatever you want to call it can manage it to the way that they can isolate whether it's your problem or it's a system problem.

So it's really that was to me. I hate how the industry is. Has sort of morphed into a self serving thing, but in general, the purest idea of it is like, what what you've been saying, and what is sort of [00:30:00] the core tenants of of how we should and how successful high performing organizations do 

Donna Knapp: well. Well, and I think 1 of the coolest ideas that comes out of.

Is the concept of the error budget because. There's this constant struggle against speed and stability, right? We call it the wall of confusion in the DevOps community and, you know, that constant battle between speed and stability. But here's reality. The business wants it all, so let's figure out how we can achieve it all.

Right? Because that's what we have to do. And so I think error budgets, because, you know, whatever budgets kind of says, like, look, there's this expectation that your system is going to be X percent available, which means you've got a little room for error, so to speak. And that's what allows you to experiment.

That is what allows you to kind of figure out, like. Okay, how fast can we go before we break? The line right before we, we [00:31:00] start to impact availability and before we, you know, start to, you know, impact our customers and, and, and I think where, you know, to your point, like, where I get frustrated sometimes is.

Organizations adopt the we're doing. It's like. Okay, so. It's an error budget is exhausted. Do you stop development? Well, no, we would never stop development. You're not doing yesterday. Well, there has to be a consequence. If you exhaust that error budget. 

John Willis: Damon had a really good, I think that's another sort of misnomer that people think about error budgeting and they go, I can't use it.

And I have actually tried to express what I call pragmatic SRE. Like, you don't, you're not Google, you're an investment bank or you're, but so like, there's a whole story there, but, but 1 of the things I loved about what Damon really exposed was, and like, he went and talked to the original, you know, some of the people that were sort of showing up in pre SRE con [00:32:00] presentations and.

In general, it was just it's a point for a convergence for improvement. So 1 of them might be, you can't deploy no more, but that's not realistic in enterprises at all. But the other is maybe it's a technical debt sprint. Maybe it's a resource, right? It's like we've hit this thing. So now let's figure out what we're at an improvement.

We've broken up a level of where we need to improve, like, we proved it to ourselves. And so, you know, it could be slow down deployments, but it could also be, why don't we do the next 2 weeks as just an improvement sprint? Why don't we do, why don't we bring in a consultant to help us understand improvement?

Right? Or why don't we read a book? No, but, but, but, yeah, no, so I think that a lot of people think that error budgeting is just stop deploying code. And then all the enterprise is like, yeah, those young kids don't understand how a bank works, but, like, the pragmatic. Yeah. [00:33:00] 

Donna Knapp: And that's fair. Yeah. And it's, and it's, and it's based on data and it's, yeah, it's, it's an ability to have a reasoned conversation, right?

It's a point at which you have a reasoned conversation. That's right. About, I think it was Jez that used that phrase 1 time, reasoned conversation. And I always love that. I love that phrase because it's just like, let's talk about it and let's figure out how to approach this. But I think And so I think, you know, you know, if we really live the spirit of DevOps and the spirit of SRE, it always comes down to communication and collaboration.

It's so much of what Deming talked about, right? Is like, people just got to figure out how to get along and work together and, you know, think about the system and don't just think about your own individual little thing. And so I think some of those concepts, like error budgets that. That promote that conversation are super interesting.

Those are really good tools. Collaboration. 

John Willis: [00:34:00] I, I I sort of played around with, I don't know if it'd be blog article. It's certainly not going to be another book at this point. Not that I won't write another book. Actually, I'm writing another book, but, that, you know, that, like, all the things that Deming would be, like, totally frustrated.

There was a, I was telling a story, there was an Isaac Asimov short story I read years and years ago about Shakespeare coming into the future and taking a community college course on Shakespeare, right? And it was like, oh, my God. And then you got to like, a B minus or something, you know, but, but, but I think, like, some type of where if Demi came back and he looked at him, like, no, no, no, that was good.

That was good. But why are you still doing this? Why? Why? Why is the largest, you know, you know you know, 30 percent of the top 20, you know, if you strip out the cloud based ones companies. Almost everybody's copies these days, but, like, if you looked at him, there's probably 60 percent is still what he would call in the prevailing system of management.

Right? [00:35:00] Still today. And, and, you know, I did this sort of crazy. So, yeah, he'd be furious. He'd be like, like, just like he was furious with, like, America after World War 2, right? Even the red bead. I'm like, and I got a sort of tongue in cheek blog that said that if he saw that we didn't take the red bead.

Experiment any further than it was when he died. I think he would be disgusted with everybody, you know, so, yeah, it's well, he, I think sometimes he, like, I was just having this conversation with somebody about this whole great resignation thing. So Demi would be like, I told you so, he would be like, I predicted that in 1930.

Donna Knapp: Like, why are you people surprised that this is happening? You know, I do think there are some things where he just would shake his head. And I, I, I would like to think there are some things that he would, would be encouraging to him. Like, I just did [00:36:00] Every January, I do a state of little presentation state of it service management and I talk about agile and lean and DevOps and how they work.

They all work together. But 1 of the things I talked about is the fact that. Every year for the last few years, we've seen sustainability go up on the list of priorities for CIOs. Currently, the kind of most current stat was like, 57 percent of CIOs consider sustainability, you know, 1 of their top 3 priorities and.

You know, a ridiculously high percentage, like 74 percent of CEOs. Think their company is going to be out of business in 10 years if they don't change the way they're doing things, which is, you know, that's a big number. So and he talks so much about sustainability. So I think he would be very encouraged to see us finally talking about.

You know, the need to think, think about future generations to think to start thinking about our community, what's going on in [00:37:00] our communities to start thinking about the impact we're having on the environment. You know, it's all 

John Willis: right. He would be so deep into that. Yeah, I mean, especially, you know, especially if you live to be 130, but, but, you know, the other thing, I mean, it isn't.

Like it is frustrated, you know, and I was gonna, you know, I'll put it as a show note, but Andrew Clay Schaefer, who I worked with and Jay Bloom and Jay Bloom has this recommenting idea. And it, and I always struggled to explain it because they're so much smarter than I am and better at explaining it.

But it's the idea, like we keep going, like, why, why do we have these framework wars or now we're having like platform engineering versus SRE versus DevOps. Well, you just like, stop the nonsense. And, and. And I think Andrew, even though it was James sort of theorem, if you will the, you know, like, like, we have to sort of get, like, work on the selfish interest.

Let's do we have selfless interest and it's recommending. And again, I'm doing a terrible job and I'll let anybody who listens. Go look at james writing [00:38:00] about this because it's fascinating because i keep thinking like every time i have these conversations people like you and like like we just we just keep sort of losing at scale right now you know i'm like we get better in some ways and there are definitely and even in my book.

I do an interesting comparison between with Jeff Wilkie at at Amazon, who was the number 2 guy when he retired. Jeff Bezos said he was the most important guy. He built, he built distribution centers. He was a lean Six Sigma, right? He came in with that background where we know where lean comes from, right?

And, and then Tim Cook at the same time, he Put in by Steve jobs to try to solve the distribution problem for a new way to ship, you know, hardware and computers, and they both came in around the same time. They both came from the lean 6 sigma background or all those tenants of and we can, we can trace that all back to Deming.

So, so you think, okay, indirectly Deming has the positive influence. Is apple [00:39:00] and Amazon, and then, like, you'd argue, maybe those aren't positive, but, like, from a, like, oh, my goodness, the world has changed and technology and human. So, so there are a good list of things that. But there still are the frustrating things I think that you and I constantly sort of battle.

Donna Knapp: Well, and I think what those organizations have hopefully taught us is. You know, it's we often hear, you know, innovation or operational excellence, and you kind of have to understand that innovation is an aspect of operational excellence. So you have to build this foundation of operational excellence.

Like, some stuff has to be standard. Right? And that's kind of a lot what Deming talked about, right? Some stuff has to be standard ways of working because otherwise you're just. You can't go fast. You just can't like if everybody's constantly doing their own thing, and you're constantly reinventing the wheel, you need to understand that you're going slow.[00:40:00] 

And so, if you want to go fast, you have to build this foundation of operational excellence. But what we have to recognize is that part of that is constantly continually improving and constantly innovating. So we can't ever think we're done. We just can't. And that it's, it's, it's a. Fundamental principle of quality management, the processes are never done.

 And, and, and we just, we kind of have to recognize that. And I, and I think also we. You know, today I find that even people think that processes are an outdated notion. And. Like everything you do is a process, like everything you do is a process, you can change the names and you can change the faces all you want, but everything you do is a process.

And all this automation we're talking about today. How do you automate something? You define a process, you define a repeatable way of doing something and then you apply technology to it. So I think and and and understand that. Automation gets you [00:41:00] stuck in a moment in time, right? It's kind of what automation does.

It keeps your process and when on the good, on the good side, it keeps your process stable, but in the bad side, it keeps your process stuck in a moment in time. So, for all the organizations that have spent a fortune implementing, you know. Service management systems. And having updated their workflows through the years, they need to understand that they're stuck in whatever moment in time it was that they automated that workflow.

So we even have to get good at going back in and looking at the automation. And I always I'm constantly. You know, John, 5 years from retirement, so. Like, on 1 hand, I'm like, thankful because I'd hate to be 30 today. I'm wondering how I'm going to maintain a career. You know, what, what am I going 

John Willis: to have a blast?

Donna Knapp: I know. Right. But in the same respect, I, I always look at. This constant cycle of jobs getting obsoleted and when jobs get obsoleted, they create new jobs. So what are the new jobs going to [00:42:00] be and and I think some of the people that are going to be successful in the future are going to be the people that understand that it starts with the process.

That technology is used to improve the efficiency and the effectiveness of a process, but you need you do, in fact, need to start with the fact that we're trying to get something done, or we're trying to solve some problem and and and then we can apply. Technology to that, I think people think, I think sometimes people in the industry think technology leads the way.

But, in fact, the need to get something done is what leads the way. And then you apply the current technology helping you do that. That's my way 

John Willis: of viewing it. No, and you're, you're so right. It's like what you said earlier about, like, trying to tell, hey, can we all agree that there is change management? And it's, I think you're right.

So, the next battle, there's a, you said earlier about, like, we still don't [00:43:00] really implement Deming's ways or understand how to implement. So, Deming had a quote, like, in the 80s. And I think it was, I'll find the right 1, but it's it said that it would be you know, 50 years before people understood.

Sure. It's work. Right and and and we still there. Right and and and here's the thing, right? It might have been in 90s because I think it was going to be 2030 if we did the math properly, but your point about process is like, if you look at what a large piece of Deming's work. We all know, if you look at its work, it was about process.

And so when we, like, just like the sort of people in DevOps with myself included, it had to be sort of like, can we at least commit that there is change management? And, you know, this, you know, I think what we're missing to your, I think your point that I love that you expressed is that we are missing process improvement and all this fancy wangle [00:44:00] dangle technology.

 Because, you know, it goes back to something I said earlier, like, you know, when I, you know, my latest learning about what Deming would call analytical statistics is all around understanding a process, you know, like using the data to let the process tell you without really any specifics about it's the data, you know, instead of using your human instinct or bias, You get an unbiased control common cause and like, again, we're we're at the 54 minute right now.

But but but the but, yeah, I mean, that I think that is sort of 1 of the most deepest missing pieces. Of Deming's message to the modern world is what he learned, you know, almost exclusively through sure, which is how to understand a process and that is almost [00:45:00] I mean, I can't think of any infrastructure DevOps shop.

And if there is 1 out there, please ping me that actually runs this cross control. In a modern, you know, sort of cloud base, cloud native infrastructure. I would love to hear of 1 and the question I would ask why 

Donna Knapp: not. Yeah, well, in understanding variation and understanding, you know, kind of. We don't have time for the root cause analysis ramp.

We'll have 

John Willis: to do that. We're going to have to do a part 2. I got a 

Donna Knapp: call coming in. Let me just say, I, I, I, I like. I have tried and people have helped me get closer, but I have tried to understand why the, the DevOps community pushes back so much. , 

John Willis: I, I know the answer to this. I do have the answers, but, oh, okay.

What's the answer? I only have five minutes, so if we wanna just hop on a calendar and then, and do a part two. Okay. I, I think I, I, I, I, I've studied you know, Sydney Decker, DACA Woods, all those really the cause the other one that we could like graph on to graph onto [00:46:00] is situational awareness. The, the, the, the resilience people have an aversion to that.

And I really had to, I had to figure that out. 'cause John Osberg asked us to take any references, added a DevOps handbook, and we did. And then Jean snuck one in not, but, and I, and in the final edit, I had to go back and tell Jean, I think we should take this out , but I had to do a ton of research. To be able to give the reason why, and that just opened up, you know, and then I've gotten to know Sydney Decker a little bit.

And I really I think long story short, let's have another call and I think I can, and what you're going to find is it's stuff that you already agree with. It's just worded. It's a different way about just like change management, just like process. 

Donna Knapp: Yeah, and and I'm like, I've had some insights, but I listen to me, but, you know, getting back to process, it's [00:47:00] understanding kind of what are, you know, at what point are you out of control and what's causing you to be out of control and and how.

You know, right? How do we how do we get the variation down to a level that is acceptable? Because it is that it's that ongoing balance between enabling innovation and enabling us all to go as fast as we possibly can. Within control, right while still having control while still having kind of that foundation of operational excellence that lets us not all constantly fight with each other and conflict with each other.

We get in each other's way. That sort of thing. A 

John Willis: couple of passes. Topal is an amazing human in every way, technically, you know, personally, I mean, I love what he says just because you can go fast doesn't mean you necessarily should. So I do need to wrap up tell people where they can get ahold of you if they, you know, they wanted to sort of reach out to you and, you know, sort of have discussions 

Donna Knapp: or whatever.

[00:48:00] So I work for ITSM Academy, ITSMacademy. com. Come at academy dot com. I'm on LinkedIn as well. So go ahead and hop on up to LinkedIn and I'd love to hear from people. 

John Willis: We're going to have a part 2 because we'll do the whole breakdown of situational awareness and root cause analysis. 

Donna Knapp: And we'll have. Yeah, I would love to talk.

About root cause analysis, let's 

John Willis: do it. It'll be fun. It'll be fun because I've, I spent a lot of because I was like, you, I, I had to, I had to battle some really smart people and I had to fully on this. I had to do the homework. To be able to have those I've, I've had 2 hour conversations with John Osborne and things like this and Sydney.

Okay. Good. Good. Okay. So I'm not 

Donna Knapp: alone. 

John Willis: Yeah, no, no, I didn't I didn't come in and like, oh, you're right. I'm wrong. It was it but I had so with the minute I got left is I did have that visceral. I think everybody who is old school wish. Has that reaction to it and I did, you know, unknowingly did exactly what [00:49:00] you said that you do, which is I had to go find out from the people who are like, I think John Osborne, I think the world of John in every way.

And I'm like, why is this person who's so smart, like, saying something that I so disagree with that portal? I, I found Sydney Decker and and I just had a sort of worldview where I was able to sort of get to a place where, like. We're saying the same thing. It's just that we were painting different colors, but we'll do that.

Donna Knapp: I've had 3 insights that have gotten me a little closer to understanding. So hopefully you're going to give me that 4th insight. 

John Willis: I think it'll be a fascinating either way. It'll be a fascinating podcast. So I do got to go and this was just delightful. 

Donna Knapp: Yeah, I agree. I had a good time. so much. Bye bye. Have a good one.