Failure has become a part of every industrial culture around the world; it permeates everything we do in an industrial facility. It is so much a part of our existence that we create elaborate work management and data systems to manage the sheer volume.
It is time to change our paradigm to a culture where failure is the exception and certainly not the rule. This is easy to say but a bit more of a challenge to accomplish.
The first step is to no longer accept the inevitability of failure. We have become conditioned to accepting the fact that failures must exist. We develop sophisticated measurements to track them, but very little is done to truly wage war on them. We ask our plant personnel to take copious notes on failures so we can track every detail of their existence, but after all that effort, what have we accomplished?
I propose a simple, yet extremely effective, eight-point strategy to wage war on the events we commonly know as failures.
Identify and Document Failure
The first step in the process of battling failures is to know where they are hiding. If we were to ask a group of people from the same plant what their most significant failures were, we undoubtedly would get a variety of answers. We rarely know where our most significant issues are. If we do not actively collect failure data as the events occur, we will be stuck in the cycle of a phenomenon I refer to as the “failure of the day.” This type of failure is usually politically motivated and refers to failures that are fresh in peoples’ minds.
Experience, however, has proven that the “failure of the day” is rarely the most significant issue confronting the plant. In order to truly know what failures are the most significant, we have to do a thorough job of collecting data on the failures as they occur. This way we can measure a failure’s impact over time in comparison to all other failures that exist. As a famous management consultant once said, “You can’t improve what you can’t measure.”
Imagine a person going on a diet but not being able to measure whether he’s losing any weight; he could quickly lose momentum and revert to old eating habits. This is precisely what happens in an industrial plant. We collect data inconsistently and, even worse, we do not use the data to affect positive change.
Determine Failure’s Real Significance
Maintenance cost is not a true reflection of a failure’s impact, although this is not apparent from the incredible amount of effort that goes into estimating its impact. Is a plant in business to reduce maintenance cost or to attain production level goals at the lowest possible unit cost? Most managers probably would be in favor of the latter, but maintenance cost is a very small factor in determining a failure’s true business impact. We need to factor in not only the cost of repair but also the cost of lost opportunity. If a failure does not affect the ability to produce product, it is not nearly as significant as a failure that disables the ability to meet production schedules and customer orders.
Imagine a failure that has a maintenance repair cost of $100 but causes a day of production loss. If we look only at maintenance cost, we potentially would be missing a huge opportunity for improvement. Many initiatives to reduce maintenance cost have a negative effect on production. We tend to eliminate proactive maintenance activities that help produce at higher levels. Once again, what are the business goals: reduced maintenance cost or increased production and lower unit cost?
Focus on the Top 10 Failures
Once the most significant failures have been determined, it is critical to create a plant initiative to eliminate them. Plant management needs to embrace the philosophy that unplanned failures will not be tolerated.
They also must provide time and resources so teams of analysts can study these most significant failures on an ongoing basis. These teams should be comprised of source experts in each failure and led by an impartial facilitator. Each team should be assigned a single significant failure to analyze and should provide management with a team charter of what it plans to accomplish. Once the charter is approved, the team will be accountable for determining the underlying causes of the selected failure and developing a detailed plan for their elimination.
Use Root Cause Analysis
Left to their own devices, the teams will surely stray, so they should be armed with a disciplined approach to root cause analysis (RCA) that has a track record of success. The RCA method should be accompanied by a software solution that will guide teams through the process. Using a software tool will help teams be consistent in their results and in their reports of their findings.
Pundits that downplay the effectiveness of RCA software solutions argue that a software tool cannot have all the possibilities for every type of failure event. I would agree that software should be used only as a way to adhere to the discipline of the methodology and as a way to effectively communicate and archive the findings and recommendations from an analysis. Many plants do a relatively good job of performing RCA but then store their analyses in a folder on a shared network drive or in a filing cabinet that virtually no one can access. I believe there is much to be learned not only in a single analysis but also across many analyses.
Consider a scenario where misalignment is uncovered as a root cause on five separate RCA’s. Typically, this would indicate a more pervasive issue, but easily could be overlooked without the ability to query RCA results across analyses.
Develop Effective Strategies
Most RCA efforts fail not on the execution of the analysis process but on the execution of the corrective action phase. Plant management must set the expectation that an analysis is not complete until an effective strategy is in place to eliminate the effects of the underlying root cause(s). Performing an RCA without effective follow up on recommendations is not only a waste of time but could even have a counterproductive effect on team morale.
Teams must present recommendations to plant management for their evaluation, accompanied by a detailed plan of costs on execution steps. Once the plan has been approved, resources have to be commissioned to implement it. The analysis team should not necessarily be assigned to implementing the corrective action plan, because a good analysis team does not always make a good implementation team.
Track Key Performance Indicators
The same system that was used to track failure events also can be used to provide the metrics that determine if analysis work is being effective. Key Performance Indicators (KPI) needs to be generated and evaluated on a monthly basis. The metrics that are selected must be specific to the failure being studied.
For instance, if maintenance cost was the business driver behind initiating the analysis, then it should be tracked to measure bottom line return. In a continuous process plant, asset utilization is a key factor that typically drives the initiation of an RCA. Whatever the case may be, it is critical to the process to select the metrics that will demonstrate success and continue the momentum to solve other problems.
As with any accomplishment, we need to celebrate when we achieve a level of success. This is a vital step to the RCA process that is easily overlooked. The majority of analysis teams are humble and modest groups that do not like to take public credit for their hard work. It is up to management to recognize these major accomplishments and sponsor frequent celebrations to encourage future success.
However, monetary or other large rewards can possibly have a reverse impact on the process. One team may feel that it did not get as large a reward for its accomplishments as another team, and bad feelings can result. Group celebrations and symbolic rewards, such as a team jacket or a baseball cap with the team’s name on it, are well received and are less likely to cause conflict among teams. These rewards may seem insignificant, but these are the same types of rewards that have been used for over 30 years to promote improved safety in plants. Employees wear these rewards with the pride of their collective accomplishment.
Another lesson that we can learn from our plant safety achievements is the need to advertise and communicate our success. Most plants have a progress board at the front gate to communicate their safety record for all to see. Many meetings begin by talking about a safety issue. These techniques are very effective in communicating the need for improved safety.
We can learn from these techniques in communicating our RCA successes. We can publish an article in the company newsletter or industry trade magazine. We can post RCA team activities and successes on the company intranet. By effectively and continually communicating RCA activities, they are less likely to be perceived as the “program of the month” and become part of the plant culture.
Repeat the process. In the early 1990s we heard a lot about continuous improvement. That terminology stemmed from the quality movement in the U.S. and we always heard about the need to continuously improve. We do not hear this term quite as much today, but the concept still prevails. Once we solve a critical problem, there is always another concern to address. RCA on significant issues must be done on a continuous basis. We must change our culture to the mindset where failure is no longer accepted or tolerated.
Every person should view problem solving and plant reliability as a responsibility and part of his job. Just as we are responsible for our own safety, we also must take the same responsibility to make our plants reliable.