Profound

S4 E27 - Dr. Bill Bellows - Bridging Deming, DevOps, and the Power of Systems Thinking Part 1

John Willis Season 4 Episode 27

In this episode, I engage with Dr. Bill Bellows in a deep dive into the application of W. Edwards Deming’s systems thinking in software development and DevOps. Dr. Bellows, a veteran in quality management and an expert in Deming’s principles, shares insights on variation, Taguchi loss functions, and the synthesis of parts in a system to highlight gaps in current industry metrics like DORA.

Key Topics:

  1. Misconceptions About Managing Parts vs. Systems:
    • Dr. Bellows references Russell Ackoff’s assertion that managing individual parts optimally doesn’t guarantee an effective system. He relates this to the tendency in software and manufacturing to assess components in isolation rather than as part of a larger system.
  2. The Role of Variation in Quality:
    • Building on Shewhart’s work, Dr. Bellows explains how statistical process control examines stability and variation within components. Taguchi’s insights are introduced to show how variation in individual parts impacts the whole system's functionality.
  3. Applying Taguchi to Modern Metrics:
    • The conversation examines how DORA metrics, such as deployment frequency and mean time to recovery, serve as output measures but fail to address the underlying inputs driving these metrics. Dr. Bellows highlights the importance of understanding "failure" through operational definitions and its nuanced variations.
  4. Systems Thinking in Feedback Loops:
    • Emphasizing tighter feedback loops, Dr. Bellows ties traditional Deming concepts to the promise of continuous improvement in DevOps. He advocates for a systemic view, where the interplay of individual variances contributes to collective outcomes.

Key Insights:

  • Systems must be analyzed holistically to manage complexity and leverage opportunities effectively.
  • Outputs like DORA metrics should inform adjustments to input characteristics rather than serve as the sole focus.
  • Precision in defining failure and understanding its economic implications is critical to refining processes and delivering value.

John Willis: Hey, it's John Willis. We got another great podcast with Dr.

Bill Bellows, and you've heard him a couple of times, and it's always pretty great, and we're gonna sort of catch up on where we left off last time, but I'm gonna ask him some questions related to DevOps y things, so this will sort of be an intro. Hey, Bill, how's it going? 

Dr. Bill Bellows: Very good, John. Great to be back with you.

John Willis: No, it is fun. It's fun you know, having some of the conversations we've had. So the, you know, the, we've had a bunch of conversations, right? Now, you know, it's been a little while, but we, you know, we talked about you know, sort of variation, misunderstanding variation. We led into sort of Taguchi, what's the difference between sort of what, sort of a statistical process control and Taguchi's sort of loss function?

We can sort of catch everybody up quickly on that. And, and I wanted to start transitioning our conversation because one of the things that, like, I, I always feel like is now the [00:01:00] time to ask him to solve our problems. Well, not yet. Not yet. You know, in other words, like, and then, like, I know that's a question.

A lot of people who listen, like, Oh, this guy is like, he's, he's spot on. Okay. Like, okay. Can I just hand them my data and say, you know, Bill, how do I fix this? And, you know, I don't think we're quite ready for that yet, but, you know, I, you know, I was telling you about how, like at a high level, we, you know, you know, there's a lot of measurement going.

In fact, developer productivity is one of the more interesting conversations. And, you know, we can have a longer conversation in another podcast about what's going on there, but in this sort of the, the software world or DevOps or whatever you want to call it these days. There's a lot of conversation about what kind of metrics make sense for developer productivity.

How does develop productivity translate into organizational productivity? Like a lot of things there. And the, the, the gold standard has been really four metrics. And again, I'm not going to force you to sort of deep dive into explain how you did Taguchi him. And, but like to give you sort of a general sense of [00:02:00] like, we've been you know, for about maybe 10 years now, maybe a little less.

We we've created this gold standard. I think it's a little outdated, but we we measure sort of the effectiveness in terms of how often do we deploy software if we're deploying more often than maybe we're doing a better job because we get better feedback. How long does it take from the time that we sort of hit the button on software?

To its it is authoritatively running on the system. There's a sort of delivery and speed aspects. And then there's two other that we measure. One is you know, what is the sort of change success? Like we make a change. How often does it succeed or fail? And then the sort of last one, which is a lot of controversy around is the sort of what they call MTTR or mean time to repair a mean time to restore.

And it is a lot of problems with averages, but it's some sort of measurement to say, how good are we? You know, if we can measure how fast we're going, can we correlate that [00:03:00] to, are we getting better at our change success rate are the feedback loops happening quick enough so maybe we're showing that we can, when we make changes, we make less errors and we're getting better at reducing the amount of time that it takes to fix a problem.

And so we've, we've sort of anchored a lot of what we talk about in DevOps. We call them the Dora metrics, right? DevOps Operations Research Association, I think it is. But, but I guess so as a starting point, You know then again, I can put you in a spot, oh, you're the consultant, we just hired you, fix this, or, but like, so, like, you were talking about, like, how would you, you, how should we start thinking about variation and loss function in relation to, it's really any metric, but in, if any of that sort of resonates, does that all make sense to, I guess, well, 

Dr. Bill Bellows: let me let me start with Oh, a quote from Russ Ackoff, which I think, which I have found characterizes [00:04:00] the standard way of managing anything, managing a nonprofit, a for profit, a software company, a library, university aerospace company.

So Russ said the Again, Russell Aikoff, who was taught for many years at the University of Pennsylvania, and he died in 2009, written dozens of books, A C K O F F, Russell Ackoff. And one quote I'll use that I think fits as a great starting point is, he said, the characteristic way of management we have taught in the Western world is to take, is to take an organization and divide it into parts.

Wait, wait, wait, let me back up. The characteristic way of management we have taught in the Western world Is to divide an organization into parts and manage each part as well as possible. And if that's done, the belief is the system will perform well. What [00:05:00] I've done with that quote is shown it to audiences and seminars or university courses.

And it's, again, it starts with the characteristic way of management is to take a complex system, divide it into parts, and manage each part as well as possible. And if that's done, the system will dot, dot, dot. Or actually, I end it, and if that is done, then I say to the audience how, how would you complete this sentence?

Okay. And it's kind of a 50 50 split. Okay.

Essentially, you'll get a mess or will behave well. So what's interesting is there are people that You know, in the opening hour of a presentation have no reason to not need to be curious about that. So it's very easy to look at that and say, and we'll behave. Well, I then follow by showing Russ saying the system will behave.

Well, the people are like smile and then he says, and that's [00:06:00] completely false. All right. So then both. So the first group gets like, oh, yeah, we got it. Right. Then the other group says not so fast. So I've, I have found and presenting to you. project management audiences, which is awfully filled with software people.

And then, and they will say, Bill, that's what we do. We manage the parts really well. So, so that's a, that's a universal thing. Then from a quality perspective, and this is part of the focus of an ongoing series I'm doing for the Deming Institute. And the series is called misunderstanding quality. And this applies.

Very, very well to the previous quote from Russ, you, you break it up into parts, each part gets a set of requirements, and I'm sure that applies to software, that a given module, a given piece has a set of [00:07:00] requirements that that team is working on meeting, and, you know, the challenge becomes not how good the parts are, because what somebody is getting at the end of the day is an integrated piece of software.

software integrated pieces of an airplane. You know, I used to joke when I would do seminars for you know, at Boeing commercial airplane sites, I'd say 747, you know, you can say 787, but then the day I went up there, I'd say 747 is not a bunch of parts that fly in close formation. But from what Russell was saying is that's really what's going on.

And so back to quality, each of those. Elements has a set of requirements. And if the requirements for each element are met, then we'd say that that is complete from a quality perspective. We say that is [00:08:00] good. It is done. It is error free. And, and then we just pass our module on to the next people to put those out.

Because at the end of the day, no matter what you're working on, you're working on the, the, the lyrics for a song, and someone's going to overlay the the music. They have to come together at the end of the day. And so no matter what you're working on, it gets integrated into something. The The variation piece, now I go back to Shewhart, is when we look at each element, and we look at, you know, how it performs, and we, you know, we, we, we turn it on, and we measure its performance.

And that's kind of what you're talking about with this data and you start to get some data that tells you how it's working for an automobile. It could [00:09:00] be, you know, miles per gallon, but on the component level, on the very intricate component level, it could be looking at the, you know, pieces of wood that are going into a house and a 2x4 piece of wood has a, we cut it to a given length.

It's a certain width. It's a certain height. And then that goes into some wall. And so we could monitor. If we're cutting those pieces of wood, we can monitor them, that they're being cut to the same length, the same width, the same height. And Walter Shewhart found, Walter Shewhart, back in the 20s, is that, no, you don't get the same length each time, you don't get the same width each time.

And so the role of statistical process control is to monitor those characteristics of the components. It could be a length, a thickness, a hardness. So in software, we're going to have [00:10:00] characteristics of the components. Again, I'm my limits to software or a freshman class writing basic back in the mid seventies.

But I'm, but I'm sure you, you, these components have performance characteristics, probably, you know, a, a, whatever, whatever they are. And you're going to. Watch how those things operate on an individual basis and, and so the role of statistical process control is to look at that functionality of the components, whatever those characteristics that are deemed as, you know, important performance measures.

We'll just say that and we can look at those and ask, are they stable? Which means are the ups and downs? of that measure consistent within some, within some band? Are [00:11:00] they randomly above and below some average within some, you know, some limits referred to as control limits? That's, that's process control.

That's looking at how components and how any characteristic, whatever characteristic you want to plot. Now, what Dr. Taguchi's work includes with his loss function is brilliantly tied to what Achoff is talking about. Because if you look at any characteristic, any measure, and you look at it on its own, And it, you know, the characteristics may perform within a set of requirements, and we say, Oh, it's performing within the requirements, and what Shewhart saying is, Well, but it, it might be drifting up, it might be drifting down, that might be an indication that something's not right.

What Taguchi's talking about, relative to variation, is [00:12:00] how does the variation in one characteristic, Impact how that component is merged with another component. So, you know, in a, in a mechanical situation, you could have the outer diameter of a tube, and it has a quality characteristic called diameter. We monitor that.

And then someone else could be manufacturing a plate which has a hole in it, and we could monitor the variation of those holes, that they are not drifting higher or lower, that they're within some realm. What Taguchi's talking about is not those parts taken separately, but what he's talking about is, when you take the, the tube, And put it into the hole.

That's a joining operation. That's an integration operation. And it may be critical that the clearance between the two is [00:13:00] essential. Well, the person who owns the tube is focusing on the outer diameter, that aspect of the software piece, the person who owns the. Plate that has the hole in it. They're focused on the hole But it may well be for you the user of this system the clearance between the two May be essential for how this software works and and and that's where dr Taguchi comes in which is very consistent with Ackoff because Ackoff is talking about the tube by itself The plate by itself And even though dr Russell Ackoff doesn't talk Explicitly about variation.

He is implicitly talking about how those two come together. And so for our audience, the managing, you know, when Ackoff says the characteristic way is to take the parts, manage them separately, and if that's good, what [00:14:00] Taguchi's talking about, very much the same thing, but his lens is, how does the variation in one come together with a variation in another?

To affect the person who's got to use that software, use that thing. That's the system aspect that I think is essential for our listeners. 

John Willis: Now I have a ton of questions, right? So so like, In some ways, so like, you know, sort of holding off Taguchi for a minute, right, like when, when we lose like the analytical statistics or statistical process control, right you know, sort of measuring variance, and a lot of times the way I see it or the way I would sort of envision it being used in our software world is at a process level and not a component level.

In other words, we're looking at You know, again, going back to those metrics, like, that's sort of one example, but, but it is more of I will say it's a systems approach because I guess that's where I want to go with my question. Like, in other words, What are the things we might measure is how effective how are we in our testing of our software?

Like, [00:15:00] you know, we'll, you know, a lot of people will write like a test for a different component of software of part of a service. The service has lots of different programs in it. Every time you write a program. You know, in some, one of the philosophies called TDD, test driven development, you put, you know, you put your test first, you write your code to validate the test, right?

It's sort of a methodology, you know, some people love it, some people hate it, right? But, you know, I've used examples and some of the people who are trying to understand how to use Deming's work in, in sort of knowledge work, we've measured sort of TDD in a, in a, you know, statistical process control, right?

So to see the voice of the process is like, this is what it is, we're looking for trends. Like, is it sort of increasing? Did something affect that? Right, this is class statistical process control, right? Like, you know, why, why did all of a sudden it was variant and now all of a sudden it started having a dotted line going straight up?

Let's go find out what happened. In, in a sense, that is one variant of, excuse the sort of overloading of the term, of systems, a systems [00:16:00] way to look at things, right? In other words, Right. Like, in other words, like, I'm not just looking at you know, the sort of the, the, the library itself, its component measurements.

And then how did it fit with this library? I'm looking about how they look together in a process. Like, how did they, and so, but then, so that, so, and if I go back to those variables, right, in some ways, those variables are, if I threw those, like, how often do you deploy? Is a somewhat layer of a system view of how fast you're delivering all the stuff that you're developing for this particular service, like an ATM processing machine.

Dr. Bill Bellows: Well, first I would say is what's, you know, towards going back to what Ackoff talk about is, is, you know, Russ has talked about the, you know, he says the characteristic way we have the West of Western management is take a, take a system divided into parts. Okay. So that could be an automobile [00:17:00] software, a rocket engine, an airplane, but in the parts.

Now. I think another them bait. Relevant here is to look inward at the airplane, at the software, and then, and then look inward at the pieces within, Russ would call analysis. And maybe we talk about knowledge is the ability to look at how those pieces fit together, how the parts of an automobile fit together.

Synthesis is looking outward and asking, Why does the driver sit on the left side versus the right side and at least, you know, in the United States, you know, why is the, why is the washing machine designed for a family of four because there's a lot of families of four. So, I mean, why is it designed? And so synthesis, the reason I want to bring this up is Russ talks about knowledge as, as analysis synonymous with analysis.

Analysis, analysis is looking inward to [00:18:00] analyze is to look inward to synthesize is to look. outward at the containing system. To look inward, Russ also associates that with knowledge. You know, this is how these things work together. To look outward, Russ calls that understanding. So understanding is not about looking inward.

about how the parts of the car work together. Understanding is looking outward to say if you want to understand how the White House operates, you look outside the White House. You don't look inside the White House. That's the understanding piece. Well, the reason I bring up those, those terms is, is designing the, to design the thing is to look Inward to understand the usage is to look outward.

So if you want to understand the features, this system should have, that it should be designed for people who are left handed or right handed, then you have to look outward. But then when we talk about the system, there's no end to how big the [00:19:00] system is because system includes time. So we're not just looking at a software product used today, but are we thinking about the use of this product five years from now, 10 years from now, 20 years from now, that somebody may say, okay.

Yeah, we have. We've designed it the last five years long and then we expect it's going to be, but I just, the reason I want to throw those terms out is there's no limit to how far out you could look because the system includes the future and you can go to infinity there. 

John Willis: Yeah, that's, you know, I mean, it's the temporal nature of design, right, which is a whole other complex conversation by itself.

Dr. Bill Bellows: Well, but just, and the other thing I throw on that is Russ would say, the bigger, not Russ, Deming would say, Dr. Deming would say, the bigger the system, the more complicated to manage. But the more opportunities, and, and so just, just, When people talk about system, I think it's important [00:20:00] that they get agreement as to what the size of the system, but that's, that's just an agreement that we're going to look at this, how the software is going to be used for the 1st year or the 2nd year.

And, and, and by all account, the system could be bigger than that, but they have to, they have to agree to that. How big a system do we want to think about? 

John Willis: And in a sense, I, again, I'm like sort of his right, but like a process is a system, right? Like, in other words, if I'm looking. Well, again, I'll go back to those four metrics, right?

I think those are based on what you described from an ACAR perspective, the synthesis right there. One is, how often are you deploying? One is how fast are you deploying? Right? That's the sort of, that's you know looking from outside, you know, I guess, you know you know, understanding, like, in a certain sense, it's understanding, right?

And I guess then the question says, okay, then. Yeah. Like, if, if those are sort of components at this level of the system, then like, how does, like, [00:21:00] I guess the question I was going to early, like, then where does, how do I do that to Coogee fitting? Like, how would it, if, if, if like a synthesis or sort of an understanding of a system, let's say the Dora metrics give us a lens for trying to understand the system, right.

To tell me how often I'm doing it how fast I'm doing it how quick I can fix it. And when it breaks, which is maybe tells me a little bit about maybe I'm going too fast. Maybe I'm not going fast enough. And then how long does it you know, how effective am I in the aggregate, like, if I'm doing 100 changes an hour, am I failing 5 percent of time, am I failing 10 30 percent of time, right?

I guess you know, at that level, then, like, how, if, if you could then apply Taguchi of like, oh, like, let's say those are things in the Taguchi world. Now, I'd wanted to figure out what the sort of the Taguchi's view of how would I like, you know, what's the analogous of that of, like, the plate and the sort of rod [00:22:00] in a plate or, or is that so?

Dr. Bill Bellows: think the

John Willis: And I, you know, this is hard. I mean, I don't think in software, 

Dr. Bill Bellows: I would say what we have to think about is is, is the difference between managing the parts and managing the system. I and so it could be on the one hand, the elements of the software module and how well. those play together. And then, and then the, you know, module one, module two, one is the tube, one is the hole.

The question comes down to is what is the potential for,

I guess the question becomes, how do we define how well those two modules come together? So, and, and mechanically the tube and the hole, it could be that we're joining them, we're welding them, and we look at the weld quality as a function of. The [00:23:00] clearance between the two. So the wealth quality is a function of the two and how they come together and the come together as the gap.

So I would say relative to the modules is what is the performance metric of the two combined that we can look at is one is one thing. And then the next thing is, is. Is what is the characteristic? How do we how do we what is the characteristic of each of them that we're looking at that each have variation?

And what we're trying to do is look at how do they fit? And the idea being what a cost talking about with is. Quote is a characteristic of math we have taught is breaking the parts, matches part as well as possible. The traditional way of looking at things to just say, this module is good. This part is good.

This component is good. And what is good means all the requirements are met. What's missing in that. is that there's variation [00:24:00] in good. So one, you have to ask, what are these characteristics? And, and then, but it's a, it's, it's the big thing is it's moving from the model that fit is absolute. If this is good, which is an absolute, if this is good, that's absolute, then they fit, which is absolute.

What Taguchi is talking about is there's degrees of good in each. And because there's degrees of. Good, then there is, instead of saying they integrate, my question is, how well do they integrate? And what does that measure? 

John Willis: Right, so I think I've got, I'm sort of narrowing in maybe to hopefully be making sense here of my questions, right?

The, so I think everything you just talked about from a software delivery sounds like an analysis and knowledge base. Like, I could look inward at the delivery of a software component. Yeah, yeah. Oh, the dependence in that there's a lot of work that goes on there. But I think to me, what's more interesting is the industry level and maybe Taguchi doesn't apply.

But [00:25:00] but is there sort of like 1 of the promises of these 4 metrics is that that the promises that. The, the quicker you get a feedback loop, right? We, this, this is not, we didn't invent this. This came from you guys. It came from Deming. The quicker we can get feedback, the better we're going to be, the more quicker we can deal with, you know, as opposed to sort of waiting long windows to find out, you know, it's just sort of the Shewhart stuff, right?

If I'm doing the inspection at the end, if I can sort of move quicker to find out exactly what's going on, yeah. So this increased feedback loop, so the, so the promise of these four metrics is that, or hypothesis around them is that if I can deliver faster and I can decrease my lead time to deliver faster, if I can see these two metrics going in a certain direction, decreasing.

You know, like, depending, you know, like, like I can go faster, right? I'm going faster. I've got more, it's increasing on the deployment. It's, it's decreasing on the lead time, how that time it takes, right? So measurement of [00:26:00] speed, how quick I am. The promises, the faster I go or some point, maybe there's a point where if I go any faster, it doesn't matter that I should actually decrease the the amount of failures I have.

Right, the change failures, because I'm getting faster feedback, right, like, and that I should be able to resolve issues quicker, because the tighter loop between the feedback loop of knowing and delivering and not waiting, for example, the classic software thing is, I spend a year developing a product, I put it in, they call it waterfall, I put it in, and you're tracking bugs of code that was written a year ago, and it's like, it's going to take forever to fix These thousands of bucks, but if I'm breaking it up into a lot of stone chunks, I know I'm going deep into software, but where I want to get to is there is a hypothesis built into these metrics that if I can get tighter feedback loops between delivering more often and breaking it up into smaller [00:27:00] pieces that I will reduce the amount of outage time that I have in terms of how many failures I get.

Promises tighter feedback loops. I know what I'm doing better. I'm putting more better code in place, and it will take me less time to resolve the problem because it's a smaller piece and speed and effectiveness to feedback loop. So, the question I guess I'm having, if at that level, if I was to use those sort of 4 values in a hypothesis driven, like, these are the things that should happen, is there a way to sort of use it Taguchi to, you know, like, at that level, let's call those the components.

And they have, 

Dr. Bill Bellows: they aren't, 

John Willis: is it not really? 

Dr. Bill Bellows: Those are not components. Those are measures. Okay. Those are as you call them for key metric. Those are metrics. Those are outputs. Those are measures of the performance of a. [00:28:00] Of a system and and we can apply those for deployment frequency lead time for changes.

Meantime recovery, we could apply that to. Rocket engines, airplane parts, but 

John Willis: doesn't the variation of those give us some clue? Because I mean. Because, I mean, even like, if I'm just measuring, I don't know if I sort of measure, like, the libraries to spec. You know, like, like, and I'm like, these were the requirements for this code.

I met the requirements. These are requirements for these code. I met those requirements. In sort of the Ackoff version of it, that could fail miserably because the two had requirements that didn't really have an overarching requirement or where they were components, but but but it still is a measure of variation of some measurement of those components.

And I guess I'm trying to wonder, and I guess maybe I'm just failing me here, but 

Dr. Bill Bellows: what I'm, what I'm looking at is those are metrics and they do have variation, but those are [00:29:00] measures Those are outputs of what, what has gotten me to think about is those are characteristics of how well something is performing meantime, between failure deployment changes, but what, what I don't see in those is

if I don't like those rates, what am I going to do? And so if I, if I don't like the weld quality. Then I go in and change the weld temperature, the weld time, something about the weld schedule I'll go change. And then try again, or I look at the clearance between the parts and I'll go change that. So what I'm wondering is those are measures, those I could look at as measures of weld quality and how often I changed the process, but what I'm not getting out of those measures are, what are the characteristics of the inputs?

These are outputs. [00:30:00] So I would say, what would be the corresponding inputs? And so If we want to change the failure rate and have fewer failures, if it's welding, then I go in and change the gases, or I change flow rates, or I change pressures, or times, or current levels. And so what I'm wondering here is, in the world of software, What are the inputs that I'm going to go change if I don't like these outputs?

And even also when I look at failure rate, failure rate is synonymous with a requirement not being met, but it doesn't tell me if I'm not meeting the requirement, how, how am I not meeting the requirement? So a bathroom tile may be a defect. Right. So we're [00:31:00] making bathroom tile even welding. There could be a well defect.

Is the well defect a crack? Is it porosity? Is it what people call drop through? So, so what I'm wondering is, what do we, 

John Willis: what 

Dr. Bill Bellows: is the characteristic that leads to? 

John Willis: Well, I mean, so I mean, that's one of the problems I have with these metrics. I'm just trying to figure out. Yeah, yeah, yeah. 

Dr. Bill Bellows: But the conversation is, what is what is failure?

I mean, 

John Willis: no, I agree. Like, like, again, that's why I said, they'll say they're very much debated MTTR, right? What is what average? What is it? I mean, it goes back to, like, another whole conversation. If you don't have operational definitions, but let's assume you have them. Yeah. Proper operational definitions.

But again, and so help me here on this then, are we saying in general from if we look at it from how Taguchi integrates with sort of Ackoffs analysis and synthesis that you really have to, you kind of like, Only apply, and I guess only is a strong word, but the application of sort of Taguchi methods really will only work [00:32:00] with analysis and not synthesis, because, because to me, in my mind, the, those DORA 4 metrics are definitely synthesis, and you're right, they are.

Well, 

Dr. Bill Bellows: well, good, good point. I would say. You just gave a, a thought. So, so looking process control, control charts, that's analysis. So I'm seeing, I'm, I'm, I am looking inward and I'm seeing this diameter variation, that's analysis, right? Synthesis. And so I'm looking at, I'm using analysis to look inward, look at control charts, to look at the components by themselves, right?

And now synthesis is when I put them together. And start wondering what, why, why, because I'm looking at one part in isolation, but when I look at look upward and see how that feeds something else, that is where Taguchi's work is coming in. The loss function aspect is not how good the [00:33:00] parts are, but how well do they integrate?

John Willis: Well, then couldn't you then again, couldn't you sort of like look at those sort of four metrics Of how well they integrate? 

Dr. Bill Bellows: Yeah. Yes, I would say 

John Willis: I guess that's what i'm trying to like This 

Dr. Bill Bellows: is great john failure rate A lousy weld is a failure rate issue. 

John Willis: Yeah. And, and let's assume that we have the operational definitions clarity to the extent that you did, but 

Dr. Bill Bellows: what I'm suggesting is.

What I'm suggesting is that failure rate is not a monolithic thing. We have to get in and say, what do you mean by failure rate? So it's, it's the software is going out looking for a number coming back. And that time is it's taking too long, but I would say in all likelihood, the failure rate could be. It could be 10 different things that constitute failure, and then we look at each of [00:34:00] them and and in each of them has a characteristic, which is something is too long.

It's too short. I mean, hard to put together. 

John Willis: We, I mean, we could, and again, we definitely don't do this, we're not even close, and this is why I want to have these conversations, but is, like, we could say that failure rate is a variance, you know, and again, like, where we get that, and then, like, okay, now, now we've sort of done a, you know, sort of at least an apple to apple, and then we get the operational definition pretty clear, we say a failure rate is, in our mind, let's just say a failure rate is a latency issue, and the variance is between You know, three milliseconds and five milliseconds of, that's now, 

Dr. Bill Bellows: now, but, but then, but then we're, that becomes operational when we say, oh.

So the issue is it's, it's a timing issue. It's taking too long that now we can act on it Now. At least that becomes a measure [00:35:00] we can turn. We can then say, is the variation in that timing rate, what is the ideal timing 

John Willis: rate? Okay, and this is where I was trying to get to. So if, you know, again, we don't get close to this, but if we could get it down to clear operational definitions and variance within what we're looking in, within those Dora metrics.

And it would probably be slices of lots of, lots of them. But, but then we could, I think I'm leading you to that. We, I could then try to apply a Taguchi loss function. 

Dr. Bill Bellows: Yes. Once you get to just talk failure and not discriminate what we mean by that is, is, is, is, first of all, that, that, that's the big step.

Taguchi is kind of a neat story, but I can't 

John Willis: even remember if it was off again, but clearly you have that problem in not just software. I can't imagine that. Yes. All the [00:36:00] industrialization of creating rockets and planes. People don't have clear operating, and this is why Deming was screaming as loud as he could about operational definitions, that there are probably a lot of people who are trying to apply these type of methodologies and don't have clear operational definitions in what they're measuring.

Dr. Bill Bellows: Yeah, and but again, I think it becomes important to say. Again, what do we mean by failure? And then you, then you, ideally you find out, well, there's, there's timing failure, there's this failure. Oh, tell me about that. And then, then each of them has its own loss function, meaning that not all of them have equal impact on how the customer is using this.

I mean, so take the two by four that I'm trying to put in. If, if it's a little, if that two by fours is, you know, when I think about the two inch width, if it's a little wide, a little narrow, [00:37:00] that's not a big deal in construction. I mean, the, the width of it, the four inch piece becomes an issue, right?

Because I'm, putting sheetrock on both sides, right? So that becomes an issue. The height becomes an issue, but that's a situation where a little wide, a little narrow on the two inch, no big deal, right? So what you end up with is each of those dimensions has its own potential sensitivity to how it's being used.

And so each of those is a characteristic. But that's, I think, when you dive in here is you start asking, what do we mean by failure? 

John Willis: Right, right. 

Dr. Bill Bellows: But I think that becomes the big breakthrough is we realizing that there's different ways to fail. And then the question becomes

You know, ideally I can improve each, but also the appreciation that approving one may make another one worse. 

John Willis: Well, you got that too, right? But I guess that goes back to like my favorite sort of quote that you pointed out to me about, [00:38:00] you know, that the Taguchi's and I always forget the long version of it, but in the end, it's the economic value to this.

And so when we're talking about what the failure is in the Taguchi sort of definition, it's that failure that causes an economic loss. Well, 

Dr. Bill Bellows: and the idea there being, the software may not, this thing we're delivering may not constitute a failure, but what Taguchi is talking about is the fact that those characteristics have variation could create an impact on On the user somehow, and then you find out that somewhere downstream, there are timing differences that they're trying.

There's some differences that aren't we don't call failure, but yet how there's it's a question of is there a [00:39:00] synchronicity issue that that is costing us. So you take. Three Space Shuttle main engines on the Space Shuttle, and they can all perform to within requirements, but if one engine is performing with a higher level of thrust than the other, then that could cause the vehicle to steer one way or the other way.

So the engine to the engine, it doesn't matter, but when you take three engines at random, how they mesh together could create a headache for the vehicle and how they integrate into the vehicle in terms of if one is pushing much harder than the other, then it steers the vehicle in one direction. And then the question is, What does that, does that make the astronauts life miserable?

John Willis: Well, you know, I can go right to the sort of like, you know, like algorithmic trading, right? Like, or just in software. You know, a one millisecond defect in a web page getting to a consumer that's looking up [00:40:00] the best sneaker to buy is completely different than a one millisecond latency in a black box Al Gore trading, which could mean billions of dollars, right?

Yeah. 

Dr. Bill Bellows: And that's, and that becomes the economics is, does anybody notice that little float? And could we say we don't, it doesn't constitute a failure. But then if you look at the characteristic and you look at that variation, does that variation impact the next person, the person after that, the person into that?

And that's the idea that it may not impact you, but it may impact the person after you. And this is where Dr. Deming says, the bigger the system, the more complicated. But the more opportunities, when I began to realize that this, to you it looks good, but to the person after you, it's a nightmare, and that, you know, that becomes a challenge when we look at the system.

So is the system the module? Is the system how you use [00:41:00] it? Is the system how the person after you uses it?