What is Failure Modes and Effects Analysis?

At the most fundamental level, Failure Modes and Effects Analysis (FMEA) provides a structured and systematic approach to aid the identification of potential failures of any engineered system, process or product. But what does this entail, when should it be used and what are the benefits and limitations?

WHAT IS FMEA?

FMEA is a bottom-up, or inductive, risk analysis technique; which is to say that failures of components of a system or process design are considered one by one to determine their effect upon the wider subsystems and overall system in terms of either safety or operational availability. In this sense, FMEA complements other top-down, or deductive, analytical methods such as master logic diagrams or fault tree analysis, which start with the overall undesirable consequence, and then consider the manner in which they may be caused by sub-system and component level failures.

The FMEA method is particularly suited to the assessment of material and equipment failures. It can be carried out to assess safety, reliability, availability or maintenance; and is typically used to identify potential weaknesses in a design, where the failure of a single component may result in wholesale failure.

THE HISTORY OF FMEA

The military origins of the FMEA technique, developed in the 1950’s for the U.S. Armed Forces, have been obscured by its subsequent widespread application across almost every industry imaginable. From the 1960s onwards, the U.S. National Aeronautics and Space Administration (NASA) applied FMEA in some form to all its programmes, not least the safety-critical missions involving manned spaceflight, such as Apollo and Skylab. Car manufacturers soon embraced the technique, once production volumes had expanded to the extent that the widespread promulgation of hidden, but potentially fatal, design flaws had begun to impact significantly upon corporate reputation. Today, FMEA is employed by reliability engineers working in industries as diverse as food production and semiconductor processing.

BENEFITS

In the developed world, our 21st century society simply does not accept the marketing and distribution of products that are in any way dangerous or life-threatening.

Whether the issue is a design or manufacturing defect (such as the sticking gas pedal affecting an estimated 9 million Toyota cars made between 2004 and 2010), or an unforeseen failure mode (such as the spurious behaviour of the Boeing 737 Max anti-stall system), any resulting requirement for a product recall and redesign will inevitably be expensive; and loss of revenue and reputation could well be crippling.

FMEA is a proven method for the identification of such vulnerabilities, and is, therefore, ideally carried out early in the design process when there is scope to make changes cheaply. As the design evolves, and detail increases, so should the FMEA, continually informing, shaping and optimising the final product. It is important to consider FMEA as an iterative technique that should be carried out at all stages of the design process.

LIMITATIONS

A particular limitation of FMEA is that it focuses on a single component at a time, and does not address the effects of common mode or common cause failures, which arise between components that are similar or identical in design or can otherwise be affected by a shared cause resulting in multiple simultaneous failures.

THE FMEA PROCESS

The practice of FMEA requires the system or process under assessment be broken down into its fundamental constituent parts, normally grouped by their specific functions, whether they be subsystems, replaceable units or individual components. The choice here depends upon the objectives of the FMEA, as well as the maturity of the design and the availability of design information.

The FMEA may be carried out in a multi-disciplinary workshop or as a desktop exercise by an experienced reliability engineer, and covers the following basic steps:

  • Identify individual system components and their function within the system
  • Deduce credible failure modes for each component
  • Determine failure causes for each component failure mode
  • Establish failure effects for each failure mode at a local and system level
  • Identify available failure detection means and safeguards (both preventive and mitigative) for each failure mode

It is important that practitioners understand and distinguish between local failure effects, which describe the loss of function of the component under consideration, and system level effects – the latter capture the impact of failure on the overall system function and wider objectives, such as personnel safety or asset availability. The FMEA family of techniques (see also FMECA and FMEDA, discussed below) are described in a dedicated, but generic standard, IEC 60812. Whilst this does not offer any specific guidance concerning the application of FMEA to safety, it nonetheless addresses reliability assurance and recognises the role played by FMEA studies in support of safety assessments.

OUTPUT

In general, the details of the FMEA will be compiled into a table. In some applications, such as availability or reliability modelling, FMEA also includes an estimation of the probability of occurrence of each failure mode and of the severity of the effect of the failure. This enhances the analysis by providing a quantitative measure of the failure mode importance, but requires caution where there is a high-level of redundancy (when more appropriate modelling techniques should be used, such as fault tree analysis or reliability block diagrams).

Knowledge is power: armed with a rigorous and robust FMEA, the design team then has the opportunity to intervene, typically through applying risk scoring to each entry in the table, to identify and prioritise those areas where a redesign is most likely to be beneficial.

VARIATIONS

FMECA
The FMEA technique lends itself readily to adaptation and extension, which explains its widespread application across diverse industries and design disciplines. For example, Failure Modes, Effects and Criticality Analysis (FMECA) incorporates an additional ranking of the severity of failure modes to indicate which are most likely to adversely influence safe or reliable operation. FMECA allows the calculation of criticality by combining the severity metric with the frequency of occurrence of the failure, thereby facilitating the prioritisation of associated countermeasures.

FMEDA
A further extension to FMEA is FMEDA (Failure Modes, Effects and Diagnostics Analysis) which adds in the assessment of diagnostic coverage of a safety instrumented system design. The FMEDA technique is useful where achieving high reliability requires a comprehensive online diagnostic capability.

CONCLUSION

The FMEA method is particularly suited to the assessment of single failures of systems, processes or products as they relate to safety, reliability, availability or maintenance; and is a powerful tool for identifying potential design weaknesses. Its widespread use across diverse industries bears testimony to its versatility and effectiveness.