S5 E3 - Joseph Enochs – DeepSeek, Emergent Behavior, and the Future of Intelligence Artwork

Profound

Ramblings about W. Edwards Deming in the digital transformation era. The general idea of the podcast is derived from Dr. Demming's seminal work described in his New Economics book - System of Profound Knowledge ( SoPK ). We'll try and get a mix of interviews from IT, Healthcare, and Manufacturing with the goal of aligning these ideas with Digital Transformation possibilities. Everything related to Dr. Deming's ideas is on the table (e.g., Goldratt, C.I. Lewis, Ohno, Shingo, Lean, Agile, and DevOps).

All Episodes

Profound

S5 E3 - Joseph Enochs – DeepSeek, Emergent Behavior, and the Future of Intelligence

February 03, 2025 • John Willis • Season 5 • Episode 3

In this episode, I talk with returning guest Joseph Enochs about the artificial intelligence (AI) world and its implications for businesses and innovation. A major highlight of the conversation is an analysis of DeepSeek, an open-source AI model developed by a Chinese company. Joseph explains how DeepSeek and similar models demonstrate that AI development is becoming increasingly accessible globally. With only a fraction of the computing resources used by giants like OpenAI and Meta, DeepSeek has replicated the performance of cutting-edge models like GPT-4. This, Joseph notes, is a clear example of how creativity and resourcefulness can overcome technological constraints, further accelerating the democratization of AI.

The conversation also dives into emergent behaviors, where AI models demonstrate the ability to reason about new and unseen data, similar to human problem-solving. Joseph discusses critical benchmarks like GPQA (Google-Proof Question Answering) and the ARC Prize, which measure these capabilities. He highlights how modern models use reinforcement learning to develop reasoning skills, making them capable of tackling complex tasks at an unprecedented level of sophistication.

We also touch on practical business considerations, such as how organizations can evaluate AI models for cost-efficiency and task-specific performance. Joseph advises leaders to use AI-driven frameworks to determine when to invest in high-cost, high-performance models like GPT-4 Omni versus smaller, fine-tuned models for less complex problems. He underscores that open-source innovations will continue to push costs down and improve accessibility for businesses of all sizes.

The discussion wraps up with a reflection on the importance of knowledge sharing, applied research, and collaborative learning to accelerate the adoption of AI in solving real-world problems.

John Willis: [00:00:00] Hey, this is John Willis another really fun podcast with a regular now, I will say my good friend Joseph. Hey Joseph, what's going on?

Joseph Enochs: Hey, John, thank you for having me really excited to be here again. And I'm excited to hear that I'm a regular now. So I'm very, very thankful for that. And looking forward to a great you know, 2025.

Let's go.

John Willis: Yeah, gotten some really good feedback on some of your other podcasts. I mean, you know, like, it's always sort of interesting. I love them all, but every once in a while, I'll do 1 and I'll think, okay, that might have been too tactical. And then I'll get a whole bunch of really positive feedback. So I guess I'm really good positive feedback on.

Perfect. On some of our prior podcasts that we've done, and to those who listen, Joseph has been an incredible mentor to me. I, you know, like, to the extent that I know anything about this stuff, like, I will attribute a lot of it to sort of me, Joseph, allowing me to ask him dumb questions. So, I really appreciate that, though.

Joseph Enochs: And as you know, for over a decade John, you've been a big mentor going [00:01:00] way back to old school DevOps day. So it's exciting to me to even hear that. So I get goosebumps just thinking about it. Really appreciate the kind words, John.

John Willis: All right, so now we'll stop this love fest. So but no. So one of the things I've been thinking a lot about, and we've talked a little bit about this, right, is I'm like, within the last, you know, Couple of months.

I mean, like, all right, this whole thing has been insane, right? We, we hear about like GPT 3, some of us are paying attention. Then we get to 3. 5 and it's a little more interesting. ChatGPT makes things crazy interesting for some good reasons and some bad reasons. And then you get the GPT 4, then you get GPT 4, Omni, and like, okay, this is all good, but then it seems like there's been another, like, Avalanche of stuff starting like November, you know, cause then you got open AI with oh, oh, 1, you got, oh, 3 there's been an avalanche of just like, seems like open [00:02:00] source models that are catching up.

So what I'd like to have a dialogue with you is a couple of things. The ultimate goal would be how do people figure out, because some of these models now have economic impact, right? Like, before you got cheap, right? Even GP for Omni, like, you will still whatever you are paying for a chat GPT just got it free.

Now, now the game has changed, right? Like, and I, you know, like I'm from my own experience, the things I'd like to experiment, like, okay, do I really want to go pay 2000 or 500 to do this experimentation? And I got to believe, and I know talking to some executives, helping understand the landscape of the differences of the things that are here now.

And we're going to keep seeing more sort of horizontalization, sorry y'all, of this. So so if you could sort of walk us through, like, what are the differences, you know, going from 4. 0 to 3, some of that's kind of [00:03:00] public, but like, sort of give us a little more behind the curtain. And then let's dive into, like, how are people going to make decisions?

And then, and then throw in some of these new, you know, open projects, including the white, what's the, the 19 hours, 450 story out of Berkeley. Right. So that's a lot to throw on you, but like

Joseph Enochs: you're up. Well, let's jump into it. Let's jump into it. So I think. To your point, maybe I'll try to throw some numbers at it a little bit.

And maybe we can start back with one of the most difficult benchmarks out there. There's a lot of different benchmarks that you can look up ELO ratings. They have the gym where they have, you know, the arena where these models can be tested side by side in the leaderboards. You've, you've seen a lot of those.

One of the most challenging models or benchmarks, if you will. For these models is what they call GPQA. I'm not sure if you've [00:04:00] seen or your audience has seen GPQA, but it's like a, what they call a Google proof question and answers. So what does that mean? Well, that means that even with Google and you try to search a question out, right?

You a, a regular person without domain experience could not answer these questions. So it's one of the most. challenging benchmarks sort of at the, you know that there is. And if you you would need to be, let's say, a Ph. D. level person in a specific field. Not just a Ph. D. level person in general.

A Ph. D. level person in general gets about a 35 percent score on these tests. A Ph. D. specialist in a special area in their area Gets about in the 80th percentile on these questions, right? They're not even 100%.

John Willis: So let me start for this. So are these questions or is it just like if [00:05:00] I don't know anything about a subject and I'm not a subject, is it sort of a benchmark of how well it could answer a question in a field that I know nothing about?

Or is it purely how well does it do on a benchmark set of questions and test answers?

Joseph Enochs: It's, it's, it's two, it's two pieces. The, the one part is what are we searching for? Well, we're searching for what we call out of distribution answers and responses from an LLM. And we talked about this before. This is about emergent behaviors.

So when we look at a model. We look at the amount of data that went into the model. That's the distribution of data when a model can answer questions about what's inside of its training data that's within distribution when a model does can answer questions on data that it hasn't necessarily been trained on.

That's what we call out of distribution [00:06:00] responses. And it's, it's an emergent behavior, right? Which, which happens in, in, in nature. And I think we've spoken about this a lot, but if you look at, like, in biology, one of the My exciting things that I like to research is like the biology of like an, an octopus for instance, right?

And we've spoken about this before. Well, it's got in its tentacles, it's got a lot of neurons inside of those tentacles and each individual tentacle there's connections to each it's like a mesh. And then at the, you know, the top the, you know, brain, the main brain, if you will, is connected to all of those other legs.

At some point, these neurons have emergent behaviors, meaning it hasn't necessarily seen this, but it can react to it, right, without its data. Well, Measurements of that inside of these models. That's what we're looking for. We're trying to find emergent behaviors inside of these models. So by giving it a simple benchmark on something that it's been trained on, like an ELO [00:07:00] rating or things of that nature, we're testing for in distribution.

But with these new models, Right? Like you mentioned, like the 01 and the 03, we're starting to see out of distribution responses, potentially meaning that we have emergent behaviors. And this is kind of the beginning of artificial general intelligence. One of those benchmarks is GPQA, and another one of those benchmarks is the ARC Prize.

And both of those benchmarks are meant to try to look at human equivalence. in these tests and figure out how well or where AI currently is on these benchmarks that are relatively, let's say easy ArcPrize is like relatively easy for people, but it's very difficult for learning systems in AI. Whereas GPTQA is near impossible for regular people unless you're an expert in a [00:08:00] specific field.

Even those people have to really work at these, these difficult problems. So I would say two benchmarks. One, trying to test for these emergent behaviors. And the second one, trying to see human equivalence on, on various facets like GPTQA is again, these PhD level questions that That you and I would have difficulty even finding the answer for, right?

Even if we had Google right in front of us. And then the other one, like we said, is about figuring out, you know, simple tasks as far as distribution and out of distribution. Okay. And then May. So maybe just to resonate with you a little bit more on that, if we look at 4.0 on G-P-T-Q-A, right, just the GPT-4, right?

It had like a 30% you know, 35% on GPT qa. So it was about as good as a specialist general, like [00:09:00] a a, a PhD person. that was not in their field. Like they could answer 30 to 40 percent. Okay. Then when we went up to GPT 4 0, right, it went about 50 percent. So it improved on these GPT QA sort of problems.

Now with O1 and then O3, you're getting 78 to 80 percent on O1 and 88 over 90 percent on O3. So this thing, this, this test. is even better than the experts now, right? GPT the O3 model is even better than the experts on these GPQA tests. And then looking at It

John Willis: used to be, just because the, the sort of the meat and potatoes, and that is basically benchmarking.

So in my original question wasn't framed properly. That means that it actually the emergent behavior of the answers is [00:10:00] the model is figuring out like an intelligence specialist would do within his, his or her field.

Joseph Enochs: Yeah, the exciting thing about that is it's reasoning about the question and saying, Hey, how can I look at this question?

How can I break it out into smaller pieces? And then as it goes through those pieces, it can actually stop itself and go. Wait a minute. I think I need to kind of think about this step again and reevaluate itself and that reevaluation piece is kind of like what we we do. Humans do right? We'll step back and wait.

Wait a minute. I need to take a step back and think about this. We can see that in the reasoning steps. The the R1 model that was just recently released from DeepSeq, they open sourced this. For the 01 and 03 models, right, they haven't released all of the, the testing components. But in DeepSeq, they, there was a, you know, an aha moment where they saw in the [00:11:00] reasoning about a complex math problem.

It looked at, Hey, maybe I should square this and then maybe I should square it again. Well, wait a minute, let me take a step back. Right. And when they saw that, they're like, wait a minute, this is how it's how we reason. Right. And this is where we're starting to see the entrant of these emergent behaviors and potentially out of distribution responses again, meaning that

John Willis: So DeepSeek is the one, like, like, DeepSeek is a mirror of what maybe O1 or O3 are doing, because we're getting to see the transparency of it.

But you can tell everybody what DeepSeek is exactly.

Joseph Enochs: Yeah, so, so DeepSeek is a, it's a, it's a model that was made by a, a Chinese company, right? And they have, they've basically open sourced this model. Along with other models as well. So we've got again, oh one and oh three, who are these reasoning models that are released by open AI?

And then we [00:12:00] have other organizations let's right now, primarily outside of the United States that like Tencent, which is a Chinese organization who is doing the same sort of testing and validation and Getting the same results as an open AI 01 or 03 model, but in the open source community. So it's a deep seek is a model released by by a Chinese company.

Okay,

John Willis: cool. All right. It's like, there's so many ways I can go here because I want to keep on this trade, but does that sort of like, you know, like, there are no moats. There

Joseph Enochs: absolutely is no moat. And I, I'm glad that you keyed on that, right? Because, you know, the, the sanctions in the, the, these different countries, right?

They're only giving them so many access to so many GPUs. And so, [00:13:00] you know, there's historical precedent for this, right? Once you know, countries or organizations are constrained, they have to start being more creative. And that's what we saw with deep seek. They didn't have as many GPUs as the meta and open a eyes of the world.

They only had a few 1000 of them, but they were able to, you know, fast follow and replicate the findings from the oh, 1 and oh, 3. So Like to your point, the intellectual property laws you know, are kind of proving that you know, those things can be overcome by, you know, multiple companies can do this.

The replication of these models, right? You know, the, you know, the, the knowledge is out there, the secret sauce, so to speak, of, of AI. Is becoming more widely understood. So to your point like, unlike, like a traditional industry, intelligence is becoming widely available [00:14:00] which is exciting because it's not going to be exclusive.

What we're, what we're seeing is that it's not going to be exclusive to a few large entities. It will actually be something that can be, you know, democratized over time.

John Willis: So yeah, open source actually will win, maybe, right? The so going back to, like, okay, so then, like, the 01, 03, DeepSeek, and we can talk about some of the others out there, are reasoning engines, you know, we've learned, and we've had these discussions with you about sort of single, you know, single shot, few shot, chain of thought, like, how does this all, what, what are these reasoning engines?

Why are they different? And why on certain things, the results are so phenomenally different. And I know you talked about like, they're, they're making, but like, what's sort of inherently built in, like, maybe another layer, one more layer, not two layers deeper on, you know, they're doing something that the intention mechanism layer or, you [00:15:00] know,

Joseph Enochs: well, primarily what they're doing is what we're finding from the papers that just recently released in the last, you know, Weak is, is there leveraging reinforcement learning?

And there, right, right. So we were postulating before the R1 model came out and the Kimmy models came out, we were kind of theorizing on how they were getting reasoning and question answers, and there were some papers that released about strawberry the strawberry models and and that scenario, what.

They were talking about is that there's various reasoning steps that a regular, like a person would go through. And in that paper, there were 95 reasoning steps, like, use an analogy, right? Use it in, in this sort of, you know, this sort of word problem or, you know, all these 95 different traditional ways that people reason.

And they would ask [00:16:00] the model, the regular model Once it was benchmarked, right on what it could do zero shot, they would then ask the model, the same exact model, utilizing these reasoning steps, 95 of them, and some algorithms, they didn't want it to brute force it, use these reasoning steps to determine the, the answer, the correct answer, and so they would take, like what you mentioned, chain of thought, which is like, hey, think about this problem, Think about the steps.

Break those steps down. If a steps too complicated, break that step down further, right? They give it some prompting, and then they give it this reasoning steps, and the model will then try to look through these reasoning steps to see if it could figure out the answer. Well, we, as researchers are keeping track of those questions and answers and responses and chain of thought and and over time, the models do solve the problems.

And the beauty of that is they figure out the type of problem that they were given. Right. What their original capability was zero shot. [00:17:00] And then the steps that they took the reasoning path that they took to solve that next level problem. And we keep track of that. And we train it into the model so that it knows on these types of problems, these are the types of reasoning steps that are more appropriate for that task.

And then we benchmark the difference between what it was zero shot and what it was after we do the test time compute and then ultimately test time training piece of these reasoning models. So we were thinking. For a while before these papers released that this was the way to initialize the model and you know, start up the model for reasoning.

However, what we're actually finding out is it's it's kind of like attention is all you need. Those steps that we were talking about. A reinforcement learning on top of just the traditional models is now becoming like, hey, we don't need to initialize this thing. Just start hitting it with reinforcement learning and it will reason about these [00:18:00] problems until it can figure it out.

And both the deep seat paper and Kimmy 1. 5 paper that came out give, insights into how this was done specifically the Kimmy 1. 5 paper breaks down the reinforcement learning steps that they used to sort of replicate the capabilities of these reasoning models.

John Willis: And just to sort of bring that all the way back is that the, the, the assumption is, and probably reasonably accurate that through their own experimentation, these open projects, they're pretty much honing in.

That's. Has to be what, you know, OpenAI is doing for 01 and 03, and then therefore also literally sort of doubles down on why there's no moat, right?

Joseph Enochs: They released the benchmarks, right? And they released the process that they used, and their benchmarks are very, very close, if not in parity, to the 01 and 03.

So it almost

John Willis: has to be based on the [00:19:00] results, right?

Joseph Enochs: Well, it's so simple and I say simple, right? It's, it's, it's, it's, it's relative term, but it's so elegant, kind of like attention is all you need, right? That they're saying, utilizing this method, we don't require a bunch of gymnastics in order to do this.

So this is sort of the path of least resistance to get the results. So we think this is going to stick around for a while, right? Anytime A solution is elegant and the steps are easy and things are easy to replicate. Those sort of things have staying power.

John Willis: And I guess at that point, it doesn't matter if like to your point, right?

If it's the same results, it's elegant. It's basically easy in that sense of AI. Then does it matter if they're doing it a different way? Right.

Joseph Enochs: Yeah, exactly. Anytime you know, these papers come out and they achieve a result and it's very complex and challenging to replicate, those things are usually replaced by [00:20:00] the more simple and elegant things.

Just exactly like the, you know, attention mechanisms.

John Willis: Got it. And so not, and my fear is I'll go deeper than I want to, but like, so you had mentioned, like, that what they're finding is like the sort of the, the basic attention mechanisms are working, don't really have to modify that algorithmically because the reinforcement learning sort of augments that and sort of creates this what, you know, you talked about emergent behavior on its own.

Does that does that make sense?

Joseph Enochs: It does there. I think there are still opportunities for advancement with attention mechanisms. You probably saw the Titan paper that was recently released. The Titan paper kind of make some modifications to your traditional attention mechanisms and transformer architectures.

I. I have not studied it 100%, right? It's just another paper out in the last [00:21:00] few weeks, but but that's from Google research. So they're saying that and many others, the Rock cow, our, our KWV Mamba models, they, they've, they've challenged the attention mechanism, so there's still a investigation for attention and transformers on how to improve upon them, but to your point in this scenario with R1 and Kimmy, they're leveraging sort of traditional mechanisms with reinforcement learning to, to get the similar results as the 01 and 03 models.

John Willis: Okay. And I guess that, you know, as we go, these are these conversations are always reasonably dynamic because I still want to sort of, I think there's more to unravel in the original question, but I, you know, that's another thing I'd sort of been like. Need to ask Joseph this, because there is a lot of conversations about.

Like, and, you know, like transformers are dead or the transfer model is, you know, like, and, and I guess, you know, like, part [00:22:00] of that is sort of true, but, you know, right. Or

Joseph Enochs: I think dad is, you know, I mean, as a relative, that's the way somebody who's trying to say, I didn't say

John Willis: it that way, but, but it's being some, there are some bloggers and stuff that will sort of propose that.

Or some variant of that,

Joseph Enochs: you know, I would say that these are all tools in your toolbox, right? And depending on the task and the job you, you want to use a specific tool. Attention mechanisms are very good for adapters. Even like a dense neural networks are very good for various things. If you're looking at gating networks between, you know, Expert models.

You know, if you're looking at again, testing the layers between these models, all of these things are traditional machine learning concepts that have been around for a long time. And so I would say dead is relative, right? Because the underpinnings of these models [00:23:00] is all They're all based on very similar constructs from applied machine learning.

They're just doing them in elegant ways. So, you know, rope attention versus mast attention versus this, these other attention mechanisms, ring attention. They all have pros and cons and we're all learning together. So I think it really depends again, what tools are in your toolbox for your task and, and think back to your original question, which one's most efficient.

Right. Great. A GPT four level capable model that's been distilled down to eight billion parameters or three billion parameters that can do the work of GPT four. Oh, that may be just fine because I may be able to run that on an inexpensive GPU in my data center, and it can do a single task just fine.

Right? And maybe it costs me. Fractions of a penny to do that. If I take that same problem and I hit it with an 01 or an 03 level model, it may be exponentially [00:24:00] more cost to run that model. As an example, when the models were building out for 03 and ArcPrize, the results were great, right? 91 percent results Yeah.

On the test, but it costs something like 375, 000, you know, 350 to 375, 000 worth of cost for it to get a test. That was let's say you know, a five year old could do most of these, these, these ArcPrize tests, right? Humans are really good at these sort of ArcPrize visual tests, but the AI, 350, 000 to do it.

So do I want to do

John Willis: it? You mean train though, right? The 375 training, not not. No,

Joseph Enochs: it was the inferencing for it to to do the arc prize test, right? So the arc prize is a, is a series of challenges, visual challenges and they pass them in, they [00:25:00] tokenize them and pass them in. And it was, it was, it was trained on the questions and answers and not the hold back you know, the holdout set.

So when they, After that when they went to actually run it and inference have it inferencing, the, they told it, they, they gave it like a, a few settings. They say, Hey, here's like the maximum number of iterations you can do, like low. Right. Or they called that high efficiency and then they had another setting that was low efficiency, which is just like, go until you can solve this problem.

Right. Okay. And so in the high efficiency, they put a, you know, a certain number of tasks and a certain number of chances that it had to basically figure out the problem. And then in that scenario, it, you know, it, it may be a cost them 20 to do it on, on the high efficiency, the low efficiency, this thing ran for.

You know, 10 to 12 hours crunching on, on, on the tests on a huge [00:26:00] cluster. And they estimated that the token cost for the, the, the 400 tasks on low efficiency, they they ended up having a cost like 3, 000 to run that arc price test.

John Willis: So that, so then, so, so it is inference, but it is but like today, if I want to run like the oath, I'm assuming you're talking about the old three in that example, right?

But today, if I wanted to run that ArcPrize 400, whatever questions you're saying, it would, it would literally cost me 370, like, I mean, let's say I, I want to run that thing, it would cost me that much money, or has that now been baked into the model itself?

Joseph Enochs: Not yet, right? If you wanted to replicate that test, the ArcPrize test using 03, you know, if you had access to it, it would cost you a lot of money, right?

It would have said, okay, if you wanted to get a 91, I believe that the result was like a 91. 5. [00:27:00] And on the test, even on the high efficiency like using oh three on high efficiency, which got an 82 around an 82%. That model costs around six, six to 7, 000 to inference just to run that inference on the high efficiency one.

John Willis: Now, here's the question then, right? Like, so you know, Reuven Cohen wrote something that was really interesting about, like, 03, it might cost you 1, 500 to solve this particular academic question, but depending on, like, if you really needed that answer, it was going to be far cheaper than the non AI version of it, right?

So the question then becomes, and like, I, like, I'm fascinated about, like, the question I had asked in the beginning is, How are leaders gonna figure out when to use like, it's great for all these, like the 5 year old questions or the, you know, the PhD level, you know, like astrophysics [00:28:00] questions, right? But the real meat on the bone is.

I'm an executive at a candy bar company, or I am an executive at, you know, at an airline. Well, airlines, you know, they, they do some really fascinating stuff, which probably will get better. But but a lot of the sort of traditional things that come up in, you know, you mentioned our history with DevOps, like the things that, like, we know we're trying to do that make businesses profitable.

Like, you know, how are we going to be able to figure out. What are those questions that are worth spending a hundred grand on versus the ones that aren't, you know, like and where are we going to get guidance on that? Because if all our guidance today is these either five year olds and I'm, I'm, I'm being like really sort of generic here, but if like, if the benchmarks are five year olds or PhD level astrophysics questions, how do we get into around what some of the people, the customers you and I have [00:29:00] visited to help them understand when they should be using?

Like that task, reasonably certain I could do a lot with GPT 4 OPIC. That one probably is worth investing, you know, investing 100, 000 because if it makes me, you know, if it makes me, you know, 5 million in revenue in one year, it was worth, I don't know, like, again, these are The

Joseph Enochs: ROI and TCO of let's say running an API from you know, open AI and paying 2, 000 a month to get access to this model versus you Just using the GPT 4.

0 to accomplish it versus fine tuning a model and having a, you know, an 8 billion parameter model that's fine tuned on this, this one problem. I think it's just a cycle in time. Where are we at in time to your point? If if it's a very challenging [00:30:00] problem, then we want to use the best tool for it. Right?

But how are we doing that now? Well, in our agent frameworks that we're doing now, in many of the agent frameworks, we give agency to the planning model to determine which tools to use. Right. And part of that tool selection can be cost, right? And so, when you're prompting your planning agent, say, hey, This is a cost, you know, I want to be very efficient with with this particular question.

So give me access to a model with the highest efficiency to solve this problem. So that may be baked into a default prompt. And then based on that. If the model has access to multiple tools, a smaller you know, 40 capable model or a 01 type of model or an 03 type of model. I think by default, we can allow them to choose that.

Now, what we could [00:31:00] also do in training is create a dense layer in between that main agent and those models that's trained on to understand. Math problems of this level of complexity go to this tool, math problems of this level of complexity go to this tool, physics problems of this level of complexity go to this tool, and we can fine tune that gating network so it will choose the appropriate expert based on the level of complexity and the level of efficiency that we have.

So I think we'll rely on AI to assist us with those things, and then we'll train models in order to improve that selection criteria. But it's not going to be a trivial task. But the exciting part about it is, is that the knowledge that is being distilled into these models.

It's the costs are going down rapidly and I think there may be some strategy here right now with the Open ai's of the world where they're like, you know what? I don't really have enough [00:32:00] compute to run. Oh three full bore. So what am I going to do? Well, maybe I raise the price to two thousand dollars a month and by by just by the fact that i've raised this to Two thousand dollars a month There's only going to be a small amount of people that are going to pay that to have access to it.

So I think it's again, you get the new shiny new object, but I do anticipate that levelers like these open source papers that are gonna that are coming out will assist in reducing those costs over time. If you're planning on using that model, those models now, you definitely do have to do some R. O.

I. Calculations and you should bake that into your prompts. If you're having an a planning agent that's that's that's determining which model to choose. You should bake that into your prompting prompting strategy on. Hey, this is a very efficient model for this. This model can answer complex questions.

It's not as efficient, right? And give the the agent frameworks, the ability to choose the appropriate model based

John Willis: on and I [00:33:00] guess that's the thing. Like, you're right. I mean, like, at the end of the day, an organization that sort of. Got a good handle on this, you know, but not the sort of technical debt tsunami version of the paper we wrote.

But the, it would say, you know, oh, yeah, for this R& D group, who's sort of spearheading 2, 000 a month is nothing to experiment, you know, like you're a multi, multi billion dollar, you know, market cap company. The, the fear would be, The in between where it's sort of you know, it's the sort of technical that tsunami go out and every business unit, figure this out, use the tools that you want.

And, like, it's a, it's a perfect storm, worst case scenario, or, you know, the worst, worst, perfect storm is terrible makes matters, but the, of, like, you've just, like, told everybody to go forth and I, and now you're like, like, the cloud version, the, the, like, horrific cloud version of, oh, my God, this is.

Our cloud costs are insane. Like, cause everybody's feeling their, their group is so [00:34:00] important. I'm going to use the 2000 and now you've got a large, you know, 70, 80, 000 person organization where, you know, 30, 000 people are using the O3. I mean, again, I'm trying to. I think you've answered all of this.

Joseph Enochs: I'm giving the happy path.

In my opinion. Yeah, yeah, I hear you. But believe me. I think more folks right now are going to be I don't know if this is the right term but stubbing their toe on figuring out how to gain efficiencies. In which model to select for which task and to give put guardrails on the on how many iterations these models can do.

I was just doing a meetup in Bentonville last week and 1 of the participants in the meetup in northwest Arkansas was we're trying to do migration from 1 version of Java to the next version of Java. just based on you know, vulnerabilities and things of that nature. And these [00:35:00] agents keep getting stuck in loops.

Like, how do we protect this? And this is an expensive model. And during that conversation, tried to give them some intuition on, hey, if it's a, if a task that's going to have to iterate on multiple times, this is one where you try potentially a smaller model for that specific task. But your planning agent, that may be your intelligent agent that's more costly.

And that and then it can build the plan. But again, you're in that scenario. You're being the brain model. The smart model is building the plan, and then it's issuing out individual tasks to the smaller models that are more inexpensive. So in that scenario, even if it does loop on itself multiple times to solve a problem, it's a small model.

It can run there again. If you have local GPUs, it can run for however long you want it to run for, and it's not going to break the bank. Now, again, if you're in the API scenario, we're going to have ballooning costs, it's coming, [00:36:00] it's going to come

John Willis: when you argue, like, just, you know, based on the sort of literature, you know, one would be that, like, if you had, you know, if you weren't prepared to set up your own infrastructure, like, oh, one is designed as the sort of lower, more efficient version of maybe that you can use again,

Joseph Enochs: I, I, I doubt.

If you're going to, yeah, they have a one the mini, right. They turned that into a one, like low, medium and high like in the testing, but, but yes, there's ways to re reduce its you know, ability to run, run a muck and take all your money if so to speak. Right. So yes.

John Willis: So it sounds like though, even if you go with the 2000 a month, right.

It's, they're going to have some built in, you know, iterators, iterators already, right? Like, unless you sort of, like, is it one of those ones, like, with Amazon Cloud, where, like, you know, you wanted to start up, like, 200 cloud instances, and you found out the first time you tried to do that, you had [00:37:00] to actually go talk to Amazon, and then you'd have to sort of make a request.

Like, I mean, if, like, let's say you're like a financial institution, well, I mean, here's one, I think, which is very relevant due to sort of a lot of the recent air travel over the last, over those sort of late holidays, like, you know, IROP and businesses, you know interrupt resource operations, right. Or irregular, irregular resource operations in airlines, right.

Like, they've been pretty good at this. But, like, let's say that, like, they, you know, they've been doing this for years, that, you know, they, they have to schedule. Yeah, I think the complexity of an airline, like you know, flight disruptions, you know, I got stuck in you know, I went to the the Penn State Notre Dame game down with a good friend Alan Schimmel down there, and like, it took me two days to get home because of the storms and the flight operations.

But I mean, I was trying to explain to my son, like, that, like, you gotta, you know, like, it starts with food, catering. It starts with flight attendants. It flights with, with, [00:38:00] pilots, it talks with fuel. It talks, you know, like people think, well, why did they cancel my flight? It could have flown into Atlanta.

Well, it has little to do with, like, whether it could make it from the weather from Fort Lauderdale to Atlanta. It has a lot to do with, like, fuel. Where's the best economics? All right. Long story short, let's say that they were saying, well, you know, this whole tree stuff might be way better than what we've developed over the last 10 years.

And so where, say, Delta And we want to go ahead and experiment that like would they sort of Let's say they were like, hey, we don't mind spending, you know, a hundred grand on experimenting now They probably go to the in house version and do the open source, but let's just say they were going to follow the open AI, like, like a lot of these big companies did with cloud.

They didn't, they didn't build their own cloud, right? Like, would they be limited to like the 2000 or would they basically get to, and I know 03 is still sort of emerging and all, but would, would it be similar [00:39:00] to So what you had to do a cloud, would you sort of call open and say, Hey, we're going to really invest heavily in some research in this and therefore turn the throttles off or

Joseph Enochs: I don't know.

I think depending on the organization, they will probably work with you on something that but when we're building out a use case, we are always talking about. Like model selection is one of the very first things that we talk about. And in many cases, all cases, we try to choose the most efficient tool for that, that AI task.

If this AI task, to your point, based on our experimentation, is one that we know that GPT 4 cannot solve, right? We know that GPT 4. 0 cannot solve. We know that, So if we have validated and tested that the smaller models are not sufficient for this particular task, and then to your point, we're using the best tool based on the return on our investment [00:40:00] that we want to spend, we've got to go through that.

To see if it can solve the problem, then it's then we start moving on to what's going to be the carrying cost. Is it just going to solve this thing once? And then we're going to use its output to do this task? Or are we going to have to come back right and update this thing weekly? And what's then our total cost of ownership over the span of this particular use case?

So, right? And this is how I

John Willis: circle back to the original question, right? Like, you know Joseph works for UBT, he'll be in the show notes. If you listen, you know Joseph, if you listen to me, you know he's, as you can see, very good at this. You probably, if you're having these kind of problems, you should reach out to me to reach out to him or reach out to him directly, because he's solving real problems with big companies.

He's been doing this, you know, From the first day I met him, he was telling me incredible use cases a couple years ago. I grabbed his leg and didn't let go, you know, because he had real world experience. But the, the question, it goes back to the core of my original question, which is, there's only [00:41:00] one Joseph.

I know you're more than that in your organization, but like, how do we sort of scale that kind of knowledge so that leaders You know can't clone you a million times or and again i'm not saying you're the only one too But you know, but but like that to me that's sort of like how I wanted to get to is Is there a sort of a path where like some of us can do a better job?

How do people learn what you learn like again and and like obviously doing is always the the proper path But you said that was my point is like what you just described There's a lot of human and for people that don't know heavenly. So, you know joseph before this whole chat gpt revolution happened He had some insight to go back And get a degree in ML and AI before this gold rush.

And so he's literally a technician, a DevOps expertise, a sysadmin who literally goes back and gets a [00:42:00] degree. So when he talks about all those things earlier about like all the patterns, he's been formally trained in all that stuff, right? Is there no shortcut or like how to, like, in other

Joseph Enochs: words, Wow, there's a lot to unpack there.

So is there no shortcut? I think, I think the shortcut is Yes, there are a lot of shortcuts. And what do I mean by that? Well, we're all taking shortcuts by being fast followers and what research is being done like from, like we talked about deep seek and these other models by, by taking shortcuts.

By following in the footsteps of these research we're it's not a cheat code. You still have to put in the work, but at least you're not the one who has to go through every experiment. You're being able to, to follow in the footsteps of these researchers. So I think the, the core sort of developers create the foundational technologies.[00:43:00]

We can research what they have done and we can build upon. The the research that these you know, additional foundation models builders have provided to us. Then I think this gets into what is our responsibility, right? And our responsibility as for AI and our various roles. And that's like, you know, education and impact.

What we're talking about and what you're doing, John, is you're giving outreach to people. Who who haven't necessarily had the time like we've had to distill this down over the last two years. So if you learn something like you talked about Rue and Ruben Cohen share that with folks and and and and be able to spend and invest the time to to to learn these things.

So it's an educational sort of impact that we're teaching, but people have to receive that. So I think sort of the The cheat code is is teach people and learn right and get excited about it. [00:44:00] So, so there's there's things about those thing as well.

John Willis: That leads to the sort of the part B to that question, which I hadn't thought about until you answered it is, are we doing enough?

You know, like, in other words, like, a lot of the times, like, I'll struggle to read a lot of those papers because they're either, you know, again, to oversimplify what it really is, they're talking about 5 year old benchmarks of human capabilities or PhD level. What I want to see more of is. Discussions about what are these papers or papers directly telling us?

How do I solve the problems in supply chain of a candy bar company? How do I, you know, how do I solve problems for logistics in, you know, in in a retail organization? You know, the things that we face every day. In our work, you know, and and I, I guess I, I think maybe there's just not a lot of that coming out.

Joseph Enochs: Well, not yet, but but I think if we [00:45:00] look at like a historical precedent recently, there was a, there's a YouTuber that I, I follow and he did a, he did a talk about, like, you know, like, looking at AI, comparing it to like the printing press and, and in that scenario, the, you know, what did, what happened is, you know, they, they printed all these books and then knowledge just distributed throughout the world.

Right. And there were some countries that you know, outlawed the printing press, you know, the Ottoman Empire, some of these empires outlawed it. Well, those empires don't exist anymore. Right. So, so to your point the technology is getting out there with those papers. It's just a matter of time before it's synthesized into something that we can best practices that we can use in our lives.

Right. And for us, you know, the, the Royal we yours, yourself and mine, we try to accelerate what people's learnings by These conversations by doing the experiments by running [00:46:00] them and then through sort of the individual actions, like the network effect sharing these things, I think they will they will naturally come out.

But what we then try to do is distill them into training programs into like the hackathons that you and I have done together, right? These ideation hackathons on. Getting people to think about how they're going to use this technology to solve the business problems, the big problems that are costing a lot of money and how to then apply those tools.

And we've talked about it many times, right? Building these sandbox environments for people to test this technology and and your idea. And I always still refer back to this of the hourglass, right? Take those sandboxes and ideation and then transform those into some best practices. These cost consciousness that you're talking about has got to be part of those best practices, right?

Along with the security and ethics and all of those components, the the financial elements also have to be distilled down into our best practices. [00:47:00] We're all learning, right? All of us are learning together. I think the exciting thing about it is there's like you said about the moat. There's a big AI is sort of a, there's this democratization that's happening because it's being thrust out to the world.

It's almost like everybody in the world got the printing press instantaneously. So now all this knowledge is coming about and we're seeing The replication happened all over the world. It's not just again in open AI, right? We're seeing these small labs produce the same results. So I think long term is a great

John Willis: example, right?

There's a place where, like, we're almost blocked, you know, the geopolitical. Stance today between anything going on with China and the U. S. like, you know, like, I don't know anybody who is got any type of label on their name running off to go to China, you know, like, no, but but because of this democratization, [00:48:00] like, you're seeing things like the deep sea, you know, stuff show up on our desktop.

Joseph Enochs: Yeah, they used 2, 500 of the I don't know if it's the right term, nerfed, right? Downgraded GPUs to, to train DeepSeek. 2, 500 versus That's 32, 000, right? When meta did llama 2, right? And, and they're talking about X is doing 100, 000 H one hundreds. So they've, they've had to you know, they've had to be scrappy with their capabilities.

So it's very interesting. But I would say this, you know, people. I'm not, I'm, I'm an optimist and I, I try not to I try to state like research and facts as much as I can without opinions. And if we look at, like, the data centers in the United States versus the data centers across the globe and you [00:49:00] mentioned China, I think we have 10, between 10 and 15 times more data center compute than, than China has right now.

So, so we've got. The, and I say we, I say state side here in the United States. When it comes to data center space, we, we, we are light years ahead of that. So I don't, I don't think there's any, you know, there shouldn't be any fear or anything like that because you know, it's going to take a while to build the infrastructure.

It's going to take a while to build the data centers. And I think it's actually exciting to see that other folks can replicate these things because it's going to, again, make it so that. That this is a technology that we can all can consume can consume over time.

John Willis: Yeah, you know, like, there's a billion, like, we're almost at an hour, but there's only so many things I want to ask you about your thoughts about, like, where is this all going?

But I think we'll just. I guess one last question I'd like to [00:50:00] ask you, and we'll definitely do it, because I think there's a lot here that I'd like to talk to you about, about the future. You know, the thing that comes up a lot now, people ask me, like, you know, like, and I don't even want to start this conversation, but like, what is the landscape of talent going to look like, right?

What is, you know, what are the, you know, what are, what is a programmer really going to be? Is there going to, like, I think we'll save that. But I guess the last question I want to ask you is, how do you keep up? I mean, you talk about all these papers. I mean, like, it just seems like. It would be overwhelming.

You know, I'm always fascinating and I know you have a background. You've gotten your degree in AI and and all that. So it helps you a little bit, but it just what I'm always fascinating. It is you seem to know what to read when to read and figure out and literally are able to sort of. Understand it at the time when it needs to be understood.

It seems like you've really got that nailed down. And I guess how do people like.

Joseph Enochs: I would say my calm demeanor [00:51:00] probably shields a lot of anxiety because I, I go through the same you know, information overload as, as everyone else. I think I have sort of mantras of you know, Personally, like, you know, you've got to keep going, and this is all going to make sense. Just persevere through each individual step and component.

And, and when you understand something, try to distill it down into some simple terms so that you can share it with folks, right? Learn that from you, right? Take something that's complex and try to explain it. You're, you have the perfect analogy when you write your books, right? You say, I have my mother in law, Because I want her to be able to understand it because she reads a lot and she's engaged in reading and I want a subject matter expert.

So for me, that's part of, I've adapted your strategy to my learning strategy. I want to learn something and identify what the latest and greatest papers are, what's trending, but then I want to learn that and distill it down to something simple so I can [00:52:00] try to share it in with just regular folks. And, and what does that do?

That helps me learn, right? That I can't explain it in simple terms then. That proves I don't really know it. Right? It's it's not a perfect strategy, but it's it's my strategy. And I think we'll continue to learn together. Right? And we'll continue to work together. So research, learn, share. I learned a lot from other people just like these conversations.

I've learned a lot here. Just us talking about like, what people are interested in. I would just say, get out there and don't be shy. Right? We're all in this together and try to learn and share things. I would say. We all have a little bit of responsibility to share best practices, to share things that we've learned, even if it's simple as a prompt that you like, right.

And how I prompt, and this is how I do it, share these things. Cause what that's going to do is that's going to, you know, expand out the horizons for everybody.

John Willis: No, I just one final note. I'm hoping to [00:53:00] have my AI book. It's called Rebels of Reason, the first early, early edition, you know, which first week of February.

And just, if you don't know this, I, you know, just like in my Deming book, I had two target readers. My brother in law is always target reader one. And in with Deming, it was Ben Rockwood for the, as you know, sort of the DevOps story in in the rebels reason it's Joseph. So you are the second reader, right?

So like at the end of the day, when you get your first copy and my mother gets the first copy, the litmus test for me for a writer is. Did you both enjoy it? Did you both get value out of it? You know, would you recommend this book to somebody else, you know, and in the Deming book that it Mission accomplished.

So we'll see what happens. And by the way, sometimes it takes the Deming book took about eight Revisions to get there eight like eight months and eight revisions, but hopefully it won't take that long in this book. So

Joseph Enochs: I'm so excited to get a to be able to read it, and I, and I'm, there's no doubt [00:54:00] that I'll enjoy it.

Just receiving the book is going to be enjoyable, but reading through it, I'm, I'm very thankful and excited to see the book come out, John.

John Willis: Sounds good, my friend. All right well, thanks, Joseph, as always. I mean, always you know, I can tell during a podcast when it's going really well, and, and, you know, I could definitely feel that.

Joseph Enochs: Thank you, buddy. Thank you very much. Looking forward to doing more of these together out here and making it a great 2025. Sounds good to me.