Is There a Direct Correlation Between Reliability and Safety?
Why Explore this Potential Correlation?
By Robert J. Latino, CEO Reliability Center Inc. [18 pages .pdf]
Presented at: SMRP and submitted to Solutions Magazine
I recently presented at a conference called the Human Performance, Root Cause & Trending (HPRCT) conference. I listened with great interest to a presentation on Human Performance Improvement (HPI) by Dr. Todd Conklin and Dr. Sidney Dekker, advocating a ‘Learning Team’ approach. I had come to the conclusion at this conference that these new learning teams were being viewed as the basis for Human Performance Investigations . These learning teams were certainly being positioned by the speakers as a replacement for traditional RCA as known in the Maintenance and Reliability fields. So I wanted to know more, what is HPI?
Human Performance Investigation: HPI strives to understand and explain what happened without judgment, in order to understand the story and to provide a just and honest conclusion in each case. This gives the organization information that is incredibly comprehensive, makes it easier to identify what to correct than with ‘old school’ methods (Conklin, 2014, p. 45). HPI constructs the event context, and looks not at the individual pieces but at the relationships between those pieces (Conklin, 2014, p. 68).
So essentially to me, this was contrasting the basis between a Safety investigation and what we would call in Reliability as a ‘Root Cause Analysis’ or RCA. Being in the RCA business, this naturally piqued my curiosity.
I was not very familiar with this HPRCT conference but I quickly learned that it was predominately attended by progressive Safety professionals in high hazard industries (especially power generation/nuclear).
This was the first time I had heard Root Cause Analysis (RCA) referred to as ‘old school’ and ‘obsolete’, not to mention this was expressed by leading researchers and academics. This got me to thinking, given I have been in the RCA business for decades, is what I do for a living…obsolete?
To be honest, up until this point I had always assumed there was a direct correlation between Safety and Reliability, but I now realized that not everyone outside of the Reliability field, feels the same. So I sought out to understand why the differences in perspective exist; and is there a valid correlation between them?
An Ironic LinkedIn (LI) Post Caught My Attention
Shortly after this conference, I came across this graphic (See Figure #1) used in a LinkedIn post . It was quite a hot topic based on the responses it received.
Now this graphic drew the following conclusions in the cited posted paper:
The probability of an injury is significantly increased with non-routine maintenance activity resulting from equipment failures.
Connecting the importance of human safety to the importance of equipment reliability is critical in driving an injury-free culture.
While this appears to make logical sense on the surface, is it true? Does a direct correlation exist between Reliability and Safety as these conclusions suggest? I wanted to understand the reasoning as to why experts in the Safety world would not agree with this expression of such a correlation.
It is a very prevalent position in Safety that the Heinrich Pyramid has been debunked for decades, so that is one reason they would likely not totally agree with the overlay of this Safety curve.
In an article entitled, ‘Examining the Foundation: Were Heinrich’s Theories Valid and Do They Still Matter?’ , James Howe (Safety Solutions in Medford, OR) is quoted as stating the following:
“The pyramid theory has really done a disservice to the safety profession because it has misled people running safety programs into thinking that if they work on minor incidents, major incidents will go away. And many, many companies are aware that that is not the case. In fact certain companies with award-winning low injury rates have suffered some of the worst catastrophic incidents during the past 10 years.”
So as you can tell, there is no love lost for Heinrich’s research to many in the Safety community. However, I am looking in generalities to see if there is a valid correlation between injury rates and organizational Reliability, and not seeking a debate on the validity of Heinrich’s pyramid.
Keep in mind as you read this paper that comparisons are being made between the perspectives of Safety researchers/academics and that of career Reliability practitioners in the field. I think those dynamics play a role in the world view of both perspectives.
The Safety Research Perspective
As part of my exploration, I read Dr. Nancy Leveson’s ‘Engineering for a Safer World: Systems Thinking Applied to Safety’ . Dr. Leveson is a highly respected researcher and her text is a very well-respected one that is considered the ‘Safety Bible’ by many. I will add that I thoroughly enjoyed the read and learned a great deal. I pulled the following relevant excerpts from this text:
“Assumption 1: Safety is increased by increasing system or component reliability. If components or systems do not fail, then accidents will not occur.
This assumption is one of the most pervasive in engineering and other fields. The problem is that it is not true. Safety is a system property, not a component property, and must be controlled at the system level, not the component level. (Leveson, 2011, p. 7)
Her proposed ‘New Assumption’ was stated as:
New Assumption 1: High reliability is neither necessary nor sufficient for safety. (Leveson, 2011, p.13)”
This contradicts the common belief that there is a direct correlation between Safety and Reliability. I personally, being in the Reliability field for 30+ years, have always believed there is a correlation between Reliability and Safety, but I would assert it is not a direct correlation. This is because we can have a reliable operation and it still be unsafe, and we can also have a safe operation that is unreliable. As a word of caution, please note that a correlation is not necessarily causation.
But I firmly believe (and have experienced) that a reliable operation is inherently a safer operation, as opposed to an unreliable one. In a reliable operation, there are fewer stops and starts and unexpected situations that deviate from control systems in place (requiring a reactive response). It stands to reason then, under reliable conditions, there are fewer needs to quickly correct a deviation from a standard or norm.
However, Reliability is viewed by many in Safety as strictly a component property and as not having system properties (as Safety does). Many in Reliability would take issue with that assumption. But we have to concede that while we experience safety incidents due to poor Reliability, we also experience Safety incidents that have nothing to do with operational (component) Reliability. Injuries occur all the time in areas unrelated to the operation of an industrial facility.