Maintenance Resources On-Line Magazine, May 2003
Ronald L. Hughes, Senior Consultant, RCI
To be a good failure analyst one must also be a good manager. After all, failure analysis or problem solving is more than just brainstorming a solution to an identified problem. Successful analysis can only be achieved when a structured technique that uncovers the facts of the incident being investigated is used and adhered to at every step of the analysis process. As the manager or Principal Analyst for the failure your management skills will not only be put to the test but will be an integral part of the investigation.
Managing the Failure Definition
The first step in the analysis effort would be to clearly define what constitutes a failure. This may sound simple but I can assure you that it is not. Ask anyone and they will all tell you that they know what their failures are. Now explore a little deeper and you will find that they all know what’s breaking down but they care for a different reason. The fact is we all tend to care for a different reason and there are many factors that will directly affect the reason why we care thereby changing our failure definition. For example, consider a plant whose production levels are low and maintenance, downtime, and parts cost high. In this example the Operations Manager considers the low production levels to be the failure, while the Maintenance Manager considers the Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR) to be the failure. The Plant Manager considers the low bottom line to be the failure while the maintenance staff cares about the number of times that they must repair the equipment. What we have here is clearly a failure but a different failure definition at every level of the organization. Now add to the thought process by considering another factor that affects how we feel about the failure; i.e., the business environment. Low production levels in a non-sold out condition are not as big a problem as high maintenance cost. Conversely, in a sold out condition maintenance cost are not nearly as important as production levels and downtime. The job of the Principal Analyst is to recognize these factors and apply the necessary focusing tools (Impact – Effort Matrix, Decision by Pairs, Force Field Analysis, Failure Modes and Effects Analysis, etc.) to uncover those failures that represent the greatest amount of potential return or unrealized opportunity based on the right definition of failure for the facility.
Managing the Scope of the Analysis
Don’t bite off more than you can chew! The size and scope of the analysis you intend to tackle should not exceed the available resources for the analysis effort. Therefore, the scope of the analysis should be directly proportional to the resources available to conduct the analysis. Always remember that the bigger the scope the bigger the analysis. Process or system related-analyses tend to be the largest in size because of the many variables associated with the modes of failure. Whereas, single components tend to be the smallest due to the relatively few variables associated with a single item. The key is to determine what is really important and what you can reasonably manage. This is easily done if you have already determined the amount of opportunity by performing a Failure Modes and Effects Analysis (FMEA) and know the available resources on hand. Here the scope and the opportunity have already been identified. The goal is to eliminate failure and recover opportunity as quickly as possible by going after the biggest “bang for the buck”. In essence, limit the scope of the analysis at an early stage and get a payback as soon as possible. By doing so it becomes easier to dedicate resources for those analyses that are larger in scope and therefore more time consuming to resolve. Although the analysis with the largest scope may have the greatest potential return it is not always the best analysis to go after first. Managing the scope of the analysis is important when you realize that an incomplete effort is worse than a smaller completed problem resolution. In effect, don’t go after world hunger on your first attempt, although an attractive opportunity, it may be a bit more than you can chew with the available resources at hand.
Managing the Failure Data
One of the most challenging aspects of any failure analysis effort is the management of the data necessary to solve the failure. Failure data provides the key that unlocks the mystery when problem solving. What the data tells you are the facts of the failure. Therefore, the management of failure data is vital to the successful outcome of the analysis.
It is not enough to merely set down and identify the data necessary to find the root cause(s) of failure, but to develop and implement a data collection strategy that ensures that the integrity of the failure data is maintained. Not just identifying the person responsible for data collection, but how they are going to obtain the data and what they are going to do with it once it has been collected. Think of it like a police investigation. The forensic strategy is handled in such a manner as to ensure that all the evidence is collected and stored until needed. Pictures are taken, evidence is bagged and tagged for use in the investigation and in court, all the witnesses are interviewed and their statements recorded, locations and times are noted to determine all the positional information, etc. The collection of failure data should receive exactly the same type of stringent detail as the evidence collected at any crime scene.
Managing the Analysis Team
Managing the analysis team consist of more than just managing the people. This includes making sure you have the right team, not only in size but also in makeup. A common mistake made by most organizations is to form an ad hoc committee comprised entirely of subject matter experts (lead by the most senior or experienced of the experts) to solve the egregious effects of the incident being investigated. The results tend to be pre-tailored solutions for the specific problem based on the expertise of the team. Make no mistake about it; although subject matter experts are absolutely necessary to solve the failure, to make sure all the possibilities are covered individuals that have little or no knowledge of the failure being investigated should compliment them. Non subject matter experts bring the element of questioning to the table. When they ask a question such as “can this happen or occur?” the subject matter experts then must think about the possibility and answer yes or no to the question. The problem with a team comprised solely of subject matter experts is that they often overlook possibilities due to their interment knowledge of the failure. They believe that they already know why the failure is occurring and want to follow that path to uncover root cause(s). Non subject matter experts want to explore all the possibilities because they have no pre-conceived notions.
It is not necessary for the Principal Analyst to be a subject matter expert in the failure. Quite to the contrary as this is often a detriment to the analysis effort because he also will have developed pre-conceived notions as to why the failure is occurring. What the Principal Analyst needs to be an expert in is the science of Problem Solving or Failure Analysis.
The perfect analysis team is usually made up of 5 to 7 cross-functional people who have a common goal and commitment to solving the failure under investigation. Proper management of the team involves not only the selection of the right people, but also the correct assignment of individuals involved. Each must have clearly defined rolls and duties based on their unique strengths and weaknesses. For example, every team needs a critic to keep the team honest. Fortunately every organization seems to have an abundance of people with this characteristic. The job of the Principal Analyst is to make sure this individual is critical but not to the point of disruption.
Managing the Analysis Effort
The first step in managing the actual analysis effort is to determine what you expect from the final outcome. This can be easily accomplished by developing a charter that clearly delineates the terminal objective of the analysis. This is further enhanced through the development of critical success factors that will tell you whether or not the terminal objective has been obtained. For example, if you are solving a problem involving an administrative issue such as slow invoice processing your charter could be something like the following:
“Uncover the root causes of the recurring invoice processing problems. This includes identifying deficiencies in or lack of management systems. Appropriate recommendations for root causes will be communicated to management for rapid resolution.”
Examples of possible critical success factors could include the following:
- Reduce invoice processing turnaround time from two weeks to one week.
- No lost invoices.
- No incorrect invoices.
- Maintain an invoice tracking system that is 100% accurate.
By developing a good charter and critical success factors for the analysis the team has a common goal and focusing mechanism to keep them on track and stop them from straying off on tangents. When failure analysis begins the goal of the Principal Analyst is to make sure that the logic is sound and that all hypotheses have been proven or disproved. Here it is good to understand that the Principal Analyst manages the analysis and is responsible for its successful outcome. He owns the process the team owns the failure. Keeping this in mind, if the team can prove it to the Principal Analyst, them he can subsequently prove it to management.
Often during the logic tree development portion of the analysis team members will disagree and some conflict will result. This conflict is not necessarily a bad thing. With conflict comes valuable discussion. As long as the conversation it pertinent to the analysis and provides benefit it should be allowed to continue. The trick is to keep this conflict from becoming confrontational and therefore detrimental to the analysis. One management technique used to maintain control during the analysis is for the Principal Analyst to ask questions that will help to clarify points. Questioning not only minimizes the amount of conflict between the team members it keeps the team focused. This is especially important for those team members who are not subject matter experts in the failure under investigation.
Managing the Final Report
The final report is the alpha and omega of the failure. It represents the culmination of the analysis effort and the beginning of failure elimination. Remember that the goal of any failure analysis should be the elimination of identified causes. The final report is the tool used to obtain the resources necessary to implement solutions to the uncovered root cause(s) of the failure thereby achieving that goal. In essence, the final report can be thought of as a sales tool and should be developed with that in mind. At a minimum the final report should not only provide solutions with expected returns on investments but also identify how the failure occurred in the first place. To accomplish this an event summary, a description of the failure mechanism and list of recommendations should be included in the report.
The event summary is nothing more than a brief description how the failure was first noticed, how long it has been going on and the method(s) used to isolate or mitigate the consequences of the failure.
The failure mechanism can be thought of as a summary of the root cause(s) that led to failure occurrence. It chronologically characterizes the things that must occur in order for the failure to manifest itself.
The list of recommendations should not only explain what, when and who is going to be responsible for implementation, it should also include a detailed cost benefit-ratio associated with each recommendation.
The success or failure of your problem solving efforts often depends on the management strategies used to conduct the analysis. A sound management strategy must be devised and put into place for every step in the Root Cause Analysis process in order for the analysis to be both effective and efficient.
Obviously collecting and maintaining the paperwork associated with the failure investigation can be a daunting task. For this reason the use of software that is designed specifically for this purpose is extremely beneficial and is highly recommended. Although there are several packages on the market RCI’s PROACT® is by far the best and most complete of the software packages designed for this purpose.
RCI’s PROACT® software not only makes this difficult job seem almost effort free, but also provides a mechanism that allows easy and ready access to all the pertinent data associated with the analysis, including the structured logic tree. Failure data is maintained in a database unique to the failure and can be sorted by type, person responsible for its collection, date required, etc.
Of equal importance to the analysis is keeping track the verification techniques use for the hypotheses pertaining to how the failure occurred. PROACT® automatically requires the completion of a verification log once a hypothesis is identified. This log can than be retrieved at any time to determine how to proceed with the analysis. In addition, PROACT® has many features that help the analyst do his job. It will help you to determine what your critical success factors are for the analysis, write a report on the analysis, communicate your findings to management, and tract the results of your analysis efforts, just to name a few.
As a failure analyst I find that PROACT® is an invaluable tool for doing my job. My analysis efforts are not only easily managed, but are much quicker than ever before.
Mr. Hughes, a mechanical engineer, is a member of the American Society of Mechanical Engineers (ASME) & the American Society of Training and Development (ASTD). He is currently a Senior Training and Reliability Consultant with Reliability Center, Inc. (an engineering and consulting firm). His expertise encompasses all areas of Human and Plant Reliability including the training/mentoring and facilitation of Root Cause and Opportunity Analysis efforts worldwide for client companies. firstname.lastname@example.org