High reliability organization

Een high reliability organization (HRO) is een organisatie die ernaar streeft om een zo hoog mogelijke betrouwbaarheid te behalen. Het concept van HRO werd ontwikkeld aan de Universiteit van Californië - Berkeley waar overeenkomsten werden onderzocht tussen het functioneren van vliegdekschepen, luchtverkeersleiding en kerncentrales. Het betreft hier complexe systemen met een hoog risico. HRO richt zich net als de theorie van normale ongevallen (normal accident theory, NAT) van Charles Perrow op systeemveiligheid. Waar Perrow stelde dat ongevallen onvermijdelijk en dus normaal zijn, legt HRO de nadruk op manieren van werken in omgevingen die weinig tot geen fouten toelaten.

Hierbinnen zijn twee benaderingen: preventie door anticipatie en beheersing door veerkracht.

Preventie bewerken

Om zoveel mogelijk te voorkomen dat incidenten zich voordoen, wordt binnen HRO's veel aandacht besteed aan identificatie van mogelijke oorzaken en omstandigheden van ongevallen, waarna alternatieven worden gezocht of procedures opgesteld. Deze procedures worden regelmatig aangepast om ze te laten aansluiten bij de praktijk. Deze procedures moeten variatie en onzekerheid bij de uitvoering verminderen. Deze benadering kent echter beperkingen, doordat niet elke situatie en variatie voorzien kan worden. Zo stelt onder meer Nancy Leveson dat procedures vrijwel nooit volledig gevolgd worden om rationele redenen en dat het dan ook niet verwonderlijk is dat bij ongevallenonderzoek wordt geconstateerd dat menselijke 'fouten' in 70 tot 80% de oorzaak waren van ongevallen.^[1]

Dit is gelijk een andere beperking van betrouwbaarheid: dit is niet hetzelfde als veiligheid. Een systeem kan heel betrouwbaar onveilig zijn en andersom.^[2] Dit komt onder meer naar voren bij operators die soms ongevallen voorkomen door zich juist niet aan de procedures te houden. Zij zijn dan niet betrouwbaar, maar wel veilig.^[3] Een van de redenen hiervoor is dat het onmogelijk is om alle situaties te voorzien in procedures. Daarnaast werkt een toename van procedures complexiteit in de hand en maakt het minder flexibel voor nieuwe methodes en technieken.

Veerkracht bewerken

Om de beperkingen van preventie op te vangen, dienen HRO's te beschikken over voldoende veerkracht om te voorkomen dat fouten escaleren. Dat betekent:

absorberen van fouten (inveren)
herstellen van fouten (terugveren)

Om het eerste te bereiken is er vaak sprake van redundantie en enige mate van speling of losse koppeling. Om het tweede te bereiken wordt geprobeerd ook kleine fouten snel te detecteren en op te lossen.

Dit wordt onder meer nagestreefd in resilience engineering.

Literatuur bewerken

Leveson, N. G. (2012): Engineering a Safer World. Systems Thinking Applied to Safety, The MIT Press

Noten bewerken

↑ As many human factors experts have found, instructions and written procedures are almost never followed exactly as operators try to become more efficient and productive and to deal with time pressures. In studies of operators, even in such highly constrained and high-risk environments as nuclear power plants, modification of instructions is repeatedly found. When examined, these violations of rules appear to be quite rational, given the workload and timing constraints under which the operators must do their job. The explanation lies in the basic conflict between error viewed as a deviation from normative procedure and error viewed as a deviation from the rational and normally used effective procedure.
One implication is that following an accident, it will be easy to find someone involved in the dynamic flow of events that has violated a formal rule by following established practice rather than specified practice. Given the frequent deviation of established practice from normative work instructions and rules, it is not surprising that operator “error” is found to be the cause of 70 percent to 80 percent of accidents. As noted in the discussion of assumption 2, a root cause is often selected because that event involves a deviation from a standard. Leveson (2012)
↑ Safety and reliability are different properties. One does not imply nor require the other: A system can be reliable but unsafe. It can also be safe but unreliable. In some cases, these two properties even conflict, that is, making the system safer may decrease reliability and enhancing reliability may decrease safety. The confusion on this point is exemplified by the primary focus on failure events in most accident and incident analysis. Some researchers in organizational aspects of safety also make this mistake by suggesting that high reliability organizations will be safe. [...] In complex systems, accidents often result from interaction among perfectly functioning components. Leveson (2012)
↑ If a human operator does not follow the specified procedures, then they are not operating reliably. In some cases that can lead to an accident. In other cases, it may prevent an accident when the specified procedures turn out to be unsafe under the particular circumstances. Examples abound of operators ignoring prescribed procedures in order to prevent an accident. At the same time, accidents have resulted precisely because the operators did follow the predetermined instructions provided to them in their training, such as at Three Mile Island. When the results of deviating from procedures are positive, operators are lauded but when the results are negative, they are punished for being unreliable. Leveson (2012)

[1] As many human factors experts have found, instructions and written procedures are almost never followed exactly as operators try to become more efficient and productive and to deal with time pressures. In studies of operators, even in such highly constrained and high-risk environments as nuclear power plants, modification of instructions is repeatedly found. When examined, these violations of rules appear to be quite rational, given the workload and timing constraints under which the operators must do their job. The explanation lies in the basic conflict between error viewed as a deviation from normative procedure and error viewed as a deviation from the rational and normally used effective procedure.
One implication is that following an accident, it will be easy to find someone involved in the dynamic flow of events that has violated a formal rule by following established practice rather than specified practice. Given the frequent deviation of established practice from normative work instructions and rules, it is not surprising that operator “error” is found to be the cause of 70 percent to 80 percent of accidents. As noted in the discussion of assumption 2, a root cause is often selected because that event involves a deviation from a standard. Leveson (2012)

[2] Safety and reliability are different properties. One does not imply nor require the other: A system can be reliable but unsafe. It can also be safe but unreliable. In some cases, these two properties even conflict, that is, making the system safer may decrease reliability and enhancing reliability may decrease safety. The confusion on this point is exemplified by the primary focus on failure events in most accident and incident analysis. Some researchers in organizational aspects of safety also make this mistake by suggesting that high reliability organizations will be safe. [...] In complex systems, accidents often result from interaction among perfectly functioning components. Leveson (2012)

[3] If a human operator does not follow the specified procedures, then they are not operating reliably. In some cases that can lead to an accident. In other cases, it may prevent an accident when the specified procedures turn out to be unsafe under the particular circumstances. Examples abound of operators ignoring prescribed procedures in order to prevent an accident. At the same time, accidents have resulted precisely because the operators did follow the predetermined instructions provided to them in their training, such as at Three Mile Island. When the results of deviating from procedures are positive, operators are lauded but when the results are negative, they are punished for being unreliable. Leveson (2012)

[1]

[2]

[3]