The Root Cause Analysis and More...
By Matt Richter
So time to delve into the Root Cause Process. But first, let’s review your goals when conducting a root cause analysis. You better understand what the problem is. In other words, what happened? What is the issue that leads you to consider it a problem? Now, figure out why this is a problem. What frustrates you, or others about this issue? What are the consequences to you, others, the business, your clients if you don’t resolve it? In other words, why does this problem bug you? Finally, you always want to resolve the issue. You want to determine how you can ensure it won’t happen again, or if it does, how you can ensure it won’t be as critical or at least gets mitigated.
All problems have three components to them within a system. There are usually physical issues. Like something literally broke and needs to be fixed. Maybe your car won’t stop, and the reason is your brakes are broken. There is usually some kind of human contribution to the problem. For example, in the brake scenario, your significant other has been riding the brakes and didn’t replace them during the last service check at the dealer. And finally, there is usually some organizational, or procedural issue at play, as well. In the brake scenario, your significant other was doing you a favor bringing the car in for service. You usually do it, and she didn’t know the brakes were due for a changing.
One problem… three potential causal areas.
As always, define with the problem. What are the indicators you have a problem? In other words, what are the symptoms? And what do those symptoms tell you?
Collect data– gather evidence for your hypothesis of what the problem is.
Now, identify possible cause factors. Dig into the actual causes for the problem. We’ll give you some tools shortly for doing that.
Ok… you got me… Step Four is really still step three, but making you go deeper still! Identify root causes for each factor. Pull that string.
Finally, for each cause, define a solution.
Now, let’s take each one of these steps in more detail.
1. DEFINE THE PROBLEM
First and foremost, you have to properly define the issue. What’s wrong? What’s bothering you, or one of your key stakeholders in the system? Something isn’t right. And your job is to figure it out.
In all honesty, if I am the system owner, I am a pessimist!! I expect something to be off. I expect a problem. I expect things to go wrong! Why? Because there always is one! Or, at least there is always the potential for one and my job as the system owner is to catch it before it becomes catastrophic or to identify it as a risk before it even becomes a real problem.
So, I am constantly scanning…
There are three questions I look to answer…
What’s broken?
What can be improved?
What big, bad risk could hurt me later?
Something broken could be mechanical, it could be a break in a workflow process, it could be a disconnect between what we offer and what a client buys. Anything not working is something broken. Alternatively, something may not be broken today, but there might be a potential problem that could hurt us in the future. This is the start of risk management, a process we will discuss in later posts. Risks are just that… potential problems that are not real… yet! So, a problem in our system could be a risk. Risks might include shifting market landscapes, changes in regulatory standards, clients leaving us, etc. Or, perhaps everything is fine. All processes work well… today. No big, bad thing will likely hurt us immediately tomorrow. But… we do have an opportunity to grow, to get better at something. Perhaps we can improve a product we offer. Or, perhaps we can develop a new product. Or, perhaps we have an opportunity to acquire a competitor. WHAT CAN BE IMPROVED is really about opportunities.
Remember, the break, the risk, or the opportunity can stem from something physical, something human, or something organizational– or any combination of the three. Keep that in mind.
In a simple system, the problem is likely just one of these things. In more complex systems, the issue might be a combination of these. In super complex systems, the problem might be so wicked as to encompass several issues under each of these umbrellas.
But, in the end, we can go no further down the root cause analysis path without defining the issue on hand.
But be wary of the symptom as the problem!!!
One of the greatest dangers beyond your assumptions, biases, and attribution errors, is to make the mistake that the symptoms you experience are the actual problem. Remember the beach… or the car… from last month’s post? Remember the flu and the cold weather (same post). For example. If I sneeze a lot. Is sneezing my problem? Or, do I have allergies? Or, a cold? Or, the flu? Or, something else. If you stop at the symptom, you risk curing the sneeze, but the cold you actually have will kill you later when it becomes pneumonia. My father is a doctor. And he has always held fast that medication used to fix symptoms should only be done in the rare cases when the symptom is truly itself dangerous. But by masking the symptom, it could often become more difficult to properly identify the underlying CAUSE. Which in turn makes it difficult to actually cure the problem.
Systems as problems are the same. We fix a broken process by adding a new team member to handle those steps. But the real issue in our process maybe is that it is not designed efficiently and the break in the workflow is actually indicating that. We put a band aid on a gaping wound. Might help for a bit, but likely will exasperate the situation later.
So… be wary of the symptoms. Use them to help diagnose the actual issue.
Now that you’ve explore what is the problem… categorize it.
It is wicked or tame? If it was critical, I would hope that you skipped all this and just starting commanding. Or, someone else did. Then you can solve for why it happened. But, if it isn’t critical, now you need to explore whether it is wicked or tame.
2. COLLECT DATA
So, you’ve defined the problem. Simple systems might mean one problem. Complex systems might have more than one. Regardless, you then need to validate you are right. How do you do that? You’ve also determined whether it is wicked or tame.
Tame problems, you validate that you are indeed focusing on the right situation, but then you know what to do. It’s simple. You begin by gathering evidence for the problem. To ensure you are indeed right.
For the wicked, you are going to have to gather evidence on multiple levels and this gets complex.
There are two types of evidence to gather.
Anecdotal evidence is the personal experience(s) of those involved. Get input from those affected by the situation. Get evidence from those involved. Get experts from both inside and outside to provide their insights.
Data is more objective and evaluative. Get the numbers! If it is a procedural breakdown, like dropped calls for an inbound customer service call center. You should have tons of data from your phone systems, customer relationship management software, etc. Run queries. Run analyses. Gather as much data as possible.
Now get multiple experts to review the data from your systems and the input from your folks and get their interpretations of what is happening.
Ask yourself what you think.
Every time someone gives you input, see if two others provide similar input. I always look for at least three independent confirmations of the same conclusions. May well still be wrong, but at least, we were all wrong together. But seriously, it is unlikely– albeit possible– but unlikely for three independent sources to draw the same wrong conclusions.
In other words, if all three agree… we are more likely to be right and if all three have differing opinions. We can still be wrong, but we are more likely right.
If all three provide completely different views, all might be wrong, all might be right but with differing perspectives influencing how they view the situation, or all might be wrong, or any where in between. But, this is an indication, we are not ready to move on, yet. We still don’t know enough, or understand enough.
3. IDENTIFY POSSIBLE CAUSE FACTORS
Once you have collected the evidence and analyzed it, now jump into figuring out why. In other words, what caused the problem in the first place.
Remember, the more wicked, the more paths you will have to explore. The more strings you will have to pull. The more root causes there are likely to be there. The more tame, the more the answer is likely explanatory, but simple.
So, you have data. So you have analyses for what the data mean. Good.
Now ask yourself and your team the following three questions using the data and analyses as fodder for the conversation. Ask:
What sequence led to the problem?
What conditions allowed the problem to happen?
What other problems surround this problem– either as an antecedent, or as a consequence?
Now… revert to your childhood…
Two-Year Olds have it right. For every conclusion… for every answer, take your group of experts and begin with “WHY?”
Why was “that”?
Get the answer to that question.
Ask “WHY” that?
And again… and again… and again.
And please don’t trust your eyes– well ears in this case. Revert back to Phase 2… think of Phases 2 and 3 as an interweaving cycle of “WHYs” and get more evidence for the answer to those “WHYs.”
Don’t trust the conclusion of causality without evidence, either.
For more complex problems, you very well might need to engage statisticians, data analysts, financial experts. That’s ok. But now, you run the risk of concluding the cold weather causes the flu. So, really question whether that is the case. Always… always, engage experts and not pseudo-experts, or self-annointed experts.
4. IDENTIFY ROOT CAUSES FOR EACH FACTOR
Now simply begin labeling the actual root cause based on your conclusions from Phases 2 and 3.
In other words, simply ask yourself to end this sentence. “There, our problem x was caused by dot dot dot.”
It’s that simple.
Of course, this assumes you are right…
So, go back to Phase 2 and ask for evidence.
Or, better yet. Think of this like a math proof. You have completed the math problem and gotten an answer. Now reverse your work and see if it plays out properly.
In other words, validate! Validate!! VALIDATE!!!
5. DEFINE SOLUTIONS
Now, and only now, do you begin to explore how to solve the problem on your system.
We will dive into that more over the coming monthly posts.
But, in the meantime, answer these questions…
What can you do to prevent the problem from happening again?
How will the solution be implemented?
Who will be responsible for it?
What are the risks of implementing the solution?