-
Columns
   
 

Root Cause Analysis

27 February 2015 • Ype Wijnia
risk management, policy development, planning, program management, manage changes

In a study of the history of asset management just someone[1] pointed me on the importance of the accident on the Piper Alpha for the development of asset management. For those who do not remember or have not seen it: the Piper Alpha was an oil platform in the North Sea, that was completely lost after an explosion and fire on 8 July 1988. A number of 167 people died. To determine the cause, the British government established a Commission of Inquiry. In 1990 this commission came with the famous Cullen report. Although the report itself is difficult to obtain[2], there does a video exist on Youtube[3] in which a member of the inquiry committee explains in a presentation of about 45 minutes what exactly went wrong. Through a very sober but also very penetrating analysis the story is told. The incident is peeled from the direct cause for the reason that cause existed, why the cause could be dangerous, why it could develop into an acute emergency and why the emergency ended in a terrible disaster. In other words, a truly excellent example of a Root Cause Analysis. That simultaneously also gives a mixed feeling. On one side are you looking for a piece of excellent craftsmanship (the analysis), but at the same time it is the analysis of an accident involving many deaths. I had the impression that the presenter occasionally found it hard as well.

To summarize the story briefly: the accident started with a gas leak from a pump whose safety valve was replaced during routine maintenance by only a hand tightened valve. The released gas exploded almost immediately. After the explosion, the emergency stop was activated, which theoretically should lead to an insulation of the fault. However the explosion damaged an oil pipeline, which resulted in a new fire. That fire would still be burnt out by itself, if the oil would not be continued pumping to the fire by two nearby platforms. The heat of the fire eventually resulted in two destroyed large gas pipelines. The enormous amount of gas that was released then caused a conflagration that enveloped the entire platform[4]. Then all was lost.

The big question, which is also asked in the presentation, why a pump is put into operation that was maintained. The reason is of course that people did not know it was maintained. When another pump failed, the temptation was very big to rush the maintained pump quickly back into service, otherwise the production must be stopped. That the pump should absolutely not be put into operation was recorded on the work order form, but that was not explicitly notified to the shift because this shift was busy. The form was only left behind for a signature. There was no reference to the maintenance of the safety valve in the maintenance order for the pump itself, just as there was no reference to the maintenance of the pump in work order for the safety valve. On the maintenance order for the pump it could be seen that maintenance had not yet begun, and commissioning would be possible without too many problems. Because the lack of the safety valve was not clearly visible (at some distance from the pump, separated by other objects), this became also not clear in the work to prepare the pump for service.

This can be seen as an unfortunate coincidence. Just a malfunctioning pump, while 2 maintenance jobs should be performed simultaneously on the other pump. Of that work one job did not start yet (if it had been the case then they had not tried to bring the pump back into operation) while the other was not completed within the normal shift. Moreover, the temporary provision that was applied in this work was not clearly visible from the pump. A confluence of four coincidences so, with an astronomically small likelihood .... or not?

What the analysis makes crystal clear is that it was not a coincidence, but an accident that was waiting to happen. The system for managing maintenance orders was just meant to avoid this kind of circumstances together, but it was not used as it was intended. That was because people were not trained on proper use. They learned on the job, making usage errors were passed instead of corrected. The culture had crept into that the forms had to be signed (a compliance culture), not that it was necessary to inform the next shift on the risk status of the platform.

This disregard for risk was much deeper than just the system for maintenance orders. There was a fire wall between oil and gas, but that was not explosion-proof. For an oil platform (which it was originally) understandable, but on a gas platform the risk for explosion is at least as high as the risk of fire. In linking multiple platforms there was not thought about (or at least not trained to deal with) events on a different platform. For gas pipes it was known that a crack would lead to an uncontrollable fire, but there were no additional measures taken to reduce the risk. The mitigation measure that existed, an automatic sprinkler system, however was just put on manual control to secure the safety of divers who could be sucked in by the pumps. A correct point of care, but in the balance, a measure that caused more risk than it solved without additional actions. Also the staff was not trained on what to do in case of emergencies. When the disaster occurred, most staff was waiting for helicopters to pick them up while it was already clear that a helicopter could not land on the platform. But also during the disaster, the instruction to evacuate through the water is not given. The summarized conclusion was that there was a total lack of systematic attention to the major risks that were connected to the North Sea oil extraction.

The report on the accident was published in 1990. It contained many recommendations, which were all implemented in the following years. Possibly the most important was the Safety Case, with which has to be demonstrated the safety of the system was reasonably guaranteed. For this good risk awareness is needed. The Piper Alpha can thus be seen as the beginning of risk thinking within asset management. Before 1990 you will find hardly anything about risk in the asset management literature, while it is now widely recognized as one of the pillars, as is evidenced by the focus on risk management in the ISO55000 standard.

That risk thinking really is the distinctive part of asset management as we see in practice every day. Many organizations that want to introduce asset management cannot help to call asset management has been already practiced for a long time and therefore is nothing new. For cost / benefit analysis that can be the case, though you may necessary question the value assigned to certain benefits. But in the risk analysis these deniers mercilessly own up. Not complying with certain rules is presented as a cause of risk: the car is unsafe because it fails the MOT. Sometimes it takes some persuasion, but eventually the penny drops, cause and effect are the other way around: the car fails the MOT because the car is unsafe. But only when this shift of thinking from to compliance to risk is achieved, you can really start with asset management.

 

 

[1] This person was John Woodhouse, location 9th World Conference On Engineering Asset Management

[2] No digital version is available

[3] Brain Appleton on https://www.youtube.com/watch?v=S9h8MKG88_U

[4] For the transition from the oil fire to the gas fire, see https://www.youtube.com/watch?v=pHriwdaEbms at  6:00 minutes

 

 

Ype Wijnia is partner at  AssetResolutions BV, a company he co-founded with John de Croon. In turn, they give their vision on an aspect of asset management in a biweekly column. The columns are published on the website of AssetResolutions, http://www.assetresolutions.nl/en/column 

<< back to overview

Nederlands English Duits

P.O. Box 30113
8003 CC Zwolle
The Netherlands
info@assetresolutions.nl
+31 6 - 30 18 68 94
VAT NL8231.48.919.B01

colophon
disclaimer
privacy

-