An introduction to safety-critical software

 

WHY SOFTWARE IS DIFFERENT

Software is often used to implement the functionality of safety systems because it can be designed to handle complex functionality, is accurate and repeatable, and can be cheaper than hardware solutions. However, there are many examples of safety systems which have failed due to software related faults, a small sample of which are presented in Box 1.

HOW SOFTWARE FAILS

The failure of a safety system based entirely on “hardwired technology” tends to be dominated by so called random failures, which are typically age or wear related, as opposed to software based systems, which fail predominantly due to systematic errors. This distinction arises because systematic errors can often be identified and removed from a hardwired design, whereas this can be much more difficult with software due to a greater level of design complexity and its abstract nature. Moreover, software does not wear out and so does not fail randomly in the same sense as hardware (although the platform upon which the software runs will be subject to random failure mechanisms).

The factors that can lead to a software error, which if triggered can cause a system level failure, are peculiar to systematic errors, both in terms of their introduction and detection [see Box 2].

Box 2

SAFETY ASSURANCE PROCESSES

The uniqueness and complexity of software based safety systems means that there can be a huge array of factors influencing the success or failure of such developments. Fortunately, there are some steps which are generally effective at reducing the risks associated with developing software safety systems.

These steps revolve around safety assurance, i.e. the planning, development, verification and configuration management processes that ensure the software meets its safety objectives. Key steps include:

  • Explicitly identify all safety functional and integrity requirements before commencing the software design phase, as mistakes or omissions will be more difficult and expensive to rectify the later they are discovered and significant software modifications can be a major cause of systematic error.
  • Identify at the outset the means of generating the evidence to show that each safety requirement has been met, to inform the design process, ensuring that the necessary evidence is produced as the software is developed (since retrospective generation is usually very expensive).
  • Confirm the availability of safety assurance evidence when considering integrating previously developed software components, in order to reduce cost and project risk.
  • Minimise the number of personnel developing the software system and ensure all interfaces are well defined. Increasing the number of personnel in order to shorten the development timescale will increase the number of interfaces, potentially leading to a greater number of errors.
  • Consider the adequacy of generic safety assurance evidence for commercial off-the-shelf components (e.g. electrical protection relays, PLC shutdown systems) in the context of the safety system within which it will be deployed, since for example a system failure causing a valve to close could be safe in one system but may result in a disastrous over-pressurisation event in another.

CONCLUSION

Identifying software errors in safety systems is not easy, but the application of targeted safety assurance processes should help manage the associated risks to an acceptable level.

This article first appeared in RISKworld Issue 15.