By: Robert J. Latino, CEO, Reliability Center, Inc.
Abstract: Root Cause Analysis or commonly referred to as RCA, has become a widely used term these days, especially in the healthcare and industry sectors. However, while widely used, it is even more widely misunderstood. This is for a multitude of reasons which we will discuss in this paper. The result of this misunderstanding is everyone thinks they are doing “RCA”. The problem with this is everyone thinks they are doing RCA and the results are disappointing. Therefore the conclusion is RCA does not work. The reality is that what people are calling RCA, is not really RCA! In this paper we will demonstrate the breadth and depth of real RCA and provide technical reasoning for our conclusions.
What is THE Definition of Root Cause Analysis?
Go ahead, Google® it! See what you get.
Wikipedia defines RCA (at least as of today) in general as: “Root cause analysis (RCA) is a class of problem solving methods aimed at identifying the root causes of problems or events”.
You can go to each RCA provider and get a different definition of RCA from each of them. This is because there is no universally accepted definition of RCA. It is not difficult to understand why so many people call their own problem solving approach, Root Cause Analysis. They are not incorrect in doing so from a literal sense because there is no standard they are deviating from.
As an RCA insider and provider (The PROACT® RCA Approach), we are equally at fault for contributing to this confusion. As businesspeople we must provide uniqueness to our RCA approach, therefore we use certain words and terms to differentiate our firm’s RCA approach from others. Other providers must do the same. As a result we will have just as many definitions of RCA as we do providers. If there was a standardized definition of RCA, then some provider’s approaches may not “fit” whatever that definition may be. Therefore they would fight the acceptance of such a definition as it does not support their business interest. This is the cold, hard reality of the business side of RCA.
Because there is no universally accepted definition of RCA, this is a disservice to those who must apply such approaches in the field to improve worker safety, patient safety, operations, maintenance, quality, reliability and ultimately financial stability. Again, we say this as an industry insider and are not proud of this fact. However, we cannot go it alone in the industry as this must be a collaborative effort. We have been included in an attempt to develop such an RCA standard and it quickly got mired down in “paralysis-by-analysis” for the above business reasons.
Essentially, after this long-winded explanation, we have established that RCA is noun and the various brands on the market are the adjectives that describe the noun. RCA is not the only acronym victim here, others such as Reliability Centered Maintenance (RCM), Performance Improvement (PI), Total Productive Maintenance (TPM) and a host of others suffer the same fate of being ill-defined in the marketplace.
So what! Is this a reason to lose hope or give up? Absolutely not! We must learn to focus on processes and not on labels.
How Could the Different Definitions of RCA Cause Poor Results?
Think about this question. This is like looking at any sport from a generic standpoint and asking why is someone better at that sport than others playing the same sport. Golf is a sport. Not everyone plays like Tiger Woods. Why not? His level of skill involves every aspect of the game from the physical to psychological. His breadth and depth of the knowledge of the game differentiates him from his competition. His drive and will to succeed is apparent in how he carries himself on and off the golf course.
Bridge this over to RCA. Without a standardized definition of what RCA is, then unlike golf, the “rules of the game” are inconsistent. Everyone is playing by their own rules as they individually define RCA for their purposes. Therefore processes like trial-and-error, troubleshooting, brainstorming and problem solving all can fall under the umbrella of RCA and be treated equally in terms of their ability to provide consistent results across the board. Unfortunately, this is an unfair and inaccurate comparison because the RCA processes vary widely in breadth-and-depth and therefore yield equally wide variations in results.
Anyone who has ever conducted an “RCA” using a tool like the 5-Whys knows the results vary greatly from that of an RCA using a tool like a Logic Tree or Cause-and-Effect Diagram. Comparing the two tools results as equal would not be a fair comparison.
So what! Different RCA processes require different RCA rules; therefore comparing these results as being equivalent would not be an accurate assessment and could produce counterproductive conclusions.
Focus on Processes and Not Labels
In order to recognize what is Root Cause Analysis and what is NOT Root Cause Analysis (or what is often referred to as Shallow Cause Analysis), we would have to define what criteria must be met in order for a process and its tools to be called Root Cause Analysis. Think about any investigative occupation such as a police detective, an aircraft accident investigator or a safety incident investigator and what their commonalities would be in terms of the process of investigating and incident or accident. The following are the essential elements of a true investigative process:
- Identification of the Real Problem to be Analyzed in the First Place
- Disciplined Data Collection and Preservation of Evidence to Support Cause-And-Effect Relationships
- Identification of the Cause-And-Effect Relationships that Combined to Cause the Undesirable Outcome
- Identification of All Physical, Human and Latent Root Causes Contributing to the Undesirable Outcome
- Development of Corrective Actions/Countermeasures to Prevent Same and Similar Outcomes in the Future
- Effective Communication to Others in the Organization of Lessons Learned from Analysis Conclusions to Avoid Similar Outcomes Materializing in the Future (No Matter which Location)
Like the steps in the essential elements of an investigation listed above, the same could be true for identifying the essential elements of an RCA.
Troubleshooting is usually a “band-aid” type of approach to fixing a situation quickly and restoring the status quo. Typically troubleshooting is usually done by individuals as opposed to teams and typically requires no proof or evidence to back up assumptions. This off-the-cuff process is often referred to as RCA in many circles, but clearly falls short of the criteria to qualify as RCA.
Brainstorming is traditionally where a collection of experts throw out disconnected ideas as to the causes of a particular event. Usually such sessions are not structured in a manner that explores cause-and-effect relationships. Rather people just express their opinions and come to a consensus on solutions. When comparing this approach to the essential elements listed above, brainstorming falls short of the criteria to be called RCA and therefore falls into the Shallow Cause Analysis category.
Problem Solving comes the closest to meeting the RCA criteria. Essentially Problem Solving is the combination of Brainstorming plus the use of a structured tool. Problem Solving usually is team-based and uses structured tools such as comparative timelines, cause-and-effect diagrams, logic trees or fishbone diagrams. Some of these tools may be cause-and-effect based, some may not be. Problem solving oftentimes falls short of the RCA criteria because it does not “require” evidence to back up what the team members hypothesize. When assumption is permitted to fly as fact in any investigative process, it is not true RCA.
So what! Focusing on the common steps involved in an investigation will let us know whether or not the RCA process we are using is really RCA or Shallow Cause Analysis.
Technical Comparison of the 5-Why’s, Fishbone Diagram and a Logic Tree
The goal of this comparison is not to teach the reader how to use these tools properly, but to demonstrate how they can lack breadth and depth of approach when compared to each other. Analytical processes and tools are only as good as their users. Used properly, any of these tools can be used comprehensively to produce desired results. However, experience shows that oftentimes the very attractiveness of these tools could be their actual drawbacks as well. Some of these tools are attractive only because they are quick to produce a result, require few resources and are inexpensive. These are the very same reasons they often lack breadth and depth.
Let’s start with the 5-Whys. While there are varying forms of this simplistic approach, the most common understanding is the analyst is to ask the question “WHY?” five times sequentially and they will uncover the root cause.
The form this approach may look like as follows:
There is a reason we do not hear about the NTSB investigator’s using the 5-Why approach to address a press conference after an aircraft incident. The main detractors with this concept are that failure does not always occur in a linear pattern. More often than not, multiple factors combine in parallel to allow the undesirable outcomes to occur. Also there is almost never a “single” root cause and this is a misleading aspect of this approach. People tend to use this tool by themselves and not in a team and rarely back up their assertions with evidence.
The fishbone diagram is another popular analytical Quality tool on the market. This approach gets its name from its form, which is in the shape of a fish. The spine of the fish represents the sequence of events leading to the undesirable outcome. The fish’s bones themselves represent selected categories that are evaluated as to having been a contributor to the sequence of events.
Figure 2: The Fishbone Diagram Sample – These category sets change from user to user. Some of the more popular category sets tend to be:
The 4 M’s: Methods, Machines, Materials, Manpower
The 4 P’s: Place, Procedure, People, Policies
The 4 S’s: Surroundings, Suppliers, Systems, Skills
The fishbone is often a tool used for brainstorming. Team members decide on the category sets and continue to ask what sub-factors within the category caused the event to occur. When the team feels enough detail has been reached there is a shift in focus towards solutions.
As a brainstorming technique this tool is less likely to depend on evidence to support hypotheses and more likely to let hearsay fly as fact. This process is also not cause-and-effect based, but categorically based. The users must pick the category set they wish to use and throw out ideas within that category set. If the correct category sets for the event at hand were not selected, key root causes could be missed.
The PROACT® Logic Tree is representative of a tool specifically designed for use within RCA. The logic tree is an expression of cause-and-effect relationships that queued up in a particular sequence to cause an undesirable outcome to occur. These cause-and-effect relationships are validated with hard evidence as opposed to hearsay. The data leads the analysis, not the loudest expert in the room. The strength of the tool is such that it can, and is, used in court to express “solid” cases.
A logic tree starts off with a description of the facts associated with an event. These facts will comprise what is called the Top Box (the Event and the Modes). Contrary to popular belief, the Event should be defined as the Negative Consequence of the incident. The Event is usually the reason an RCA was commissioned. It is usually a business level issue where someone was hurt, there was unacceptable risk revealed, there was excessive damage/costs or some type of regulatory violation.
Modes are the manifestations of the failure (often the incident itself) and the Event is the final consequence (effect) that triggered the need for an RCA. While we may know what the Modes are, we do not always know how they were permitted to occur. So we proceed with the questioning of how could the Mode have occurred?
Many have been conditioned to ask the question why during such analyses. However, using the PROACT® Logic Tree methodology the question used initially is “how could”? When looking at the differences between these two questions we find when simply asking why, we are connoting a desire for a singular answer and likely an opinion. When asking how could we are seeking all the possibilities (not only the most likely) along with evidence to back up what did or did not occur.
This questioning process is reiterative as we follow the cause-and-effect chain backwards. Simply ask the questions, answer them with hypotheses and use evidence to back them up. This holds true until we uncover the Human Roots or the points in which a human made a decision error. Human Roots represent decision errors, or errors of omission or commission made by the human being. Either we did something we should not have done or we did not do something we should have done. At this point we are exploring the reasoning of why someone made the decision they did. We are between the ears of an individual at this point seeking to identify a rationale for a specific decision at the time it was made.
This is an important point in the analysis because we are seeking to understand why someone thought the decision they made was the correct one at the time. At this point in the analysis we do switch the questioning to why because we are exploring a set of answers particular to an individual or group. Our answers are what we will call Latent Root Causes or the organizational systems in place to help us make better decisions. The Latent Roots represent the rationale for the decision that triggered the consequences to occur. These are called latent because they are always there lying dormant. They require a human action to be triggered and when triggered, they start a sequence of Physical Root Causes to occur. This error-chain continues if unbroken to the point it results in an adverse outcome that requires an immediate response.
As can be told from this description, the logic tree approach is certainly cause-and-effect related, “requires” evidence to back up what people say and “requires” the depth of understanding the flaws in the systems that contributed to poor decisions.
The failure of a process to achieve its designed objective has to do with the design of the linkages between steps in the process: how the steps relate to one another – the hand-offs. It is the interrelationships that are themselves prone to failure and that propagate the effects of a failure to other parts of the process, often in ways that are unexpected (side effects) or not immediately evident (long-term effects). The logic tree’s strict adherence to graphically representing these tightly coupled relationships makes it more accurate than other tools described for that reason.
Figure 3: The PROACT Logic Tree
In addition to these most commonly used approaches described above, many simply use formbased Root Cause Analysis. This is basically a one size fits all mentality. It is root cause by the numbers similar to painting-by-the-numbers. The same questions are asked no matter the incident and opinions are input as acceptable evidence. Pick lists are often provided which give people the false sense that the correct answer must be within the listed items. No “pick-list” RCA process can ever be comprehensive enough to consider all the possibilities that could exist in each working environment. However the innate human tendency to follow the path of least resistance makes using pick lists very attractive. As noted author Eli Goldratt (The Goal) says, “An expert is not someone that gives you the answer, it is someone that asks you the right question”. That is exactly what RCA is all about.
Many people choose to use form-based RCA systems because the regulatory authority seeking compliance provides them free of charge and suggests they be used. The paradigm is that if “we are using their forms, we will have a better chance being viewed as compliant”. This may indeed be true, but does not mean the analysis was comprehensive enough to ensure the undesirable outcome will not recur. Hence, once again, compliance does not necessarily ensure worker or safety!
So what? Choosing the right tool for the situation at hand is imperative. Using a shallow cause approach and tool when a root cause approach and tool is warranted, could result in the recurrence of an undesirable outcome. This unnecessarily puts lives, property and viability at risk.
Adapting the 5-Whys to the Logic Tree
Now that we have described the basic premise of the 5-Whys and the Fishbone Diagram, along with their pro’s and con’s, let’s try and meld them together into a Logic Tree format so that the benefits from all approaches are expressed as one.
Figure 4: Basic 5-Why Format Using Logic Tree Labels
Figure 4 shows the basic 5-Why structure of five blocks in linear succession. Typically the labels would simply require the analyst to ask WHY? as the analyst answers the questions and continues to drill down until they reach the 5th level and then they would stop.
If we migrated the concept of this 5-Why tool into the Logic Tree, the Top Box would be the Event, the box subordinate to that would be a Mode and the boxes subordinate to the Mode would be hypotheses. The enhancement to the breadth and depth of this approach would be that instead of asking “Why” we would be asking “How Could”. This would expand our population of possibilities and push us out-of-the-box to consider other possibilities than what is most likely or blatantly obvious.
Figure 5: Migrating 5-Whys into a Logic Tree
In Figure 5 above notice the difference when asking the question “How Can” as opposed to “Why”. We now see that there are more possibilities to explore using the evidence collected from the failure scene. By applying an “X” over the hypotheses that were proven not to be true, we are also showing our reviewers that all possibilities were explored and only what was found to be true was followed and drilled down further.
In Figure 6, as we continue to drill down past level 5 (where 5-Whys would typically stop), we can find fruit in uncovering the human decision errors and the reasons they were made in the form of Human Roots (HR) and Latent Roots (LR). By addressing the Latent Roots, the organization systems that provide information for decision making, we will indirectly be affecting the behavior of decision makers. By correcting flawed organizational systems, better decisions will be made and therefore we will avoid the undesirable outcomes we have seen in the past. By simply addressing a Physical Root (PR) we may end up replacing physical parts but not address the systems that allowed the parts to get into the facility (i.e. – inspections, purchasing practices, training on installing the parts, proper maintenance of the parts, proper storage of the parts, etc.). Therefore by not drilling down to latency, we run the risk of the undesirable outcome recurring.
Note that when we identified a Human Root (HR), our questioning switched from “How Could” to “Why” for the reasons mentioned previously.
Figure 6: Drilling Down to Latency
Adapting the Fishbone Diagram to the Logic Tree
The Fishbone Diagram has a different structure than the 5-Whys or the Logic Tree. However, if we stick to process and not labels, it is adaptable to be migrated into a Logic Tree while still capturing the benefits
For the sake of example, let’s use the 4-M’s category set for the Fishbone Diagram. These would be Methods, Machines, Manpower and Materials.
Figure 7: Drilling Down to Latency
Methods  Machines 
Manpower  Materials 
So in the interest of keeping things consistent, I will use the basic pump failure example that we used above to show the migrating of the 5-Whys into the Logic Tree. So how would take this fish structure and adapt it to the Logic Tree
Figure 8: The Fishbone Migration into a Logic Tree
In Figure 8, notice the first row of hypotheses represents the 4-M’s in the Fishbone Diagram. At this point when using the Fishbone Diagram by itself, the team would throw out disconnected ideas about what they felt could have happened in the category. Surely some of those ideas would also crop up using the Logic Tree approach as well.
However, if we were to now start and apply the rules of the Logic Tree with the structure migrated over from the Fishbone, we would starting asking “How Could” the 4-M’s have contributed to Pump CP-235 failing. While the answers to these questions do not line up sequentially in a cause-and-effect fashion, they should nonetheless be captured in a more disciplined fashion than using strictly the Fishbone Diagram itself.
Many people believe they cannot drill down adequately using the Fishbone Diagram because they run out of space to list possibilities. Yes, they stop because of a space restriction on the piece of paper! Using the Logic Tree with the PROACT® Approach rules, we would not have that problem as the drilling goes straight down.
The following figures so the next line of questioning for the 4-M’s. From that point on, the reader will understand the more we ask “How Can” the deeper we will go towards uncovering the Latent Roots or organizational system flaws that affect decision making.
Figure 9a: 4-M Category of METHODS
Figure 9b: 4-M Category of MACHINES
Figure 9c: 4-M Category of MANPOWER
Figure 9d: 4-M Category of MATERIALS
Of course this was for example’s sake to show the migration of data over from one tool to another, and then further expansion of the possibilities using the PROACT® Logic Tree. The Logic Tree is adaptable to most any RCA tool on the market and can enhance the analysis by adding breadth and depth while adding exceptional discipline.
The drawback to this expression of the Fishbone Diagram in the Logic Tree format is that it is not expressed in a totally cause-and-effect manner. The 4-M’s are basically cause categories and an effort to use cause-and-effect is subordinate from this point on. The categories themselves are lateral to each other no allowing a correlation to be made to each other in terms of linkages (cause-and-effect) over a timeline.
Let’s reflect back on the root system in the PROACT Approach we described earlier; Physical, Human and Latent Root causes. Think of the linkages these roots have to each other in time. A bad system exists (latent) which produces flawed information. A person makes a decision based on this bad information (human). As a result of the decision, a physical or observable string of consequences occurs (physical).
Now let’s look at the 4-M’s and their linkages. If they are all lateral to each other on the Logic Tree, neither is expressed as subordinate to another over time. Therefore this gives the perception that they all carry equal weight. Let’s correlate the 4-M’s now to the PROACT Approach Root system and see how they pan out.
Figure 10: Correlating the 4-M’s to the PROACT Root System
So what? This expression in Figure 10 shows most of the time our methods (organizational systems) impact the decision making of our manpower (humans). The direct impact is on decision making. Therefore the actions or inactions of the humans will cause physical (observable) consequences in the forms of issues with our materials and/or machines. Issues with materials and machines may be related to the wrong type, contamination, defective, broken due to improper operation or installation, etc. So in the end, just like in the PROACT® Approach, systems impact humans and humans trigger consequences. By applying the Fishbone Diagram to the Logic Tree Format we are able to use the categories of the Fishbone while getting the benefit of the drill down and time stamp associated with the expression of the Logic Tree.
The purpose and intent of this paper was to demonstrate the variability of how people define and interpret RCA and its impact on bottom-line results. This ambiguity in the marketplace causes people to believe that whatever they are doing to resolve their problems is considered to be RCA. We tried to demonstrate the technical pro’s and con’s of the popular Quality tools of the 5-Whys and the Fishbone Diagram as compared to the Logic Tree applying the PROACT RCA Methodology rules. By removing labels and focusing on process, we can meld all of these tools into a unified expression of cause-and-effect over a spectrum of time while yielding the benefit of all the tools.
PROACT is a registered trademark of Reliability Center, Inc. (www.reliability.com)
Croteau, Richard et al. Error Reduction in Health Care: A Systems Approach to Improving Patient Safety (San Francisco: Jossey Bass Publishers, 2000), p. 181.
C. Perrow. Normal Accidents: Living With High Risk Technologies (New York: Basic Books, 1984), pp. 89-100.