System Failure Case Study – Ice Falling from Port Mann Bridge Cables

Background:  The Port Mann Bridge is a ten-lane cable-stayed bridge that spans the Fraser River east of Vancouver, B.C.  At the time of its opening in 2012, it was the widest bridge in the world at 65 m (213 ft).  It is a key artery for Metro Vancouver, with about 100,000 crossings per day.

What Happened:  Shortly after opening, during a winter storm in December 2012, “ice bombs” – clumps of ice and snow, dropped from the overhead cables, hitting and damaging over 300 vehicles.  Windshields were broken, and dents were left in roofs and along the sides of cars.  A few people suffered injuries during the incident.

Proximate Cause:  Insufficient measures were in place to mitigate the hazard of ice and snow falling from the overhead cables, which, in the case of the Port Mann bridge, cross over the roadway.  The contract between Transportation Investment Corporation (TI Corp.), the Crown corporation responsible for the Port Mann, and Kiewit/Flatiron General Partnership, included the requirement that “Cables and structure shall be designed to avoid ice build-up from falling into traffic.”  The bridge’s designer maintained that measures including spacing the cable anchors away from traffic, using central pylons which avoid large cross frames over traffic, and using HDPE stay pipes were sufficient to minimize the risks associated with snow and ice accumulation.

Underlying Issues: There were many parties involved in the issue – the owner/operator, the design-build contractor, the maintenance contractor, and consultants.  An analysis of the documentation and internal emails on the issue (from a freedom of information request done by an independent journalist) shows that none of the parties really “owned” the issue, but rather they assumed that either the risk was negligible or at least sufficiently managed, or that someone else would take care of the risk mitigation.

The issue of falling ice and snow from cable-supported bridges is well known.  Examples can be found with bridges in Denmark, Sweden, Japan, the U.K., and in the U.S., including the Tacoma Narrows Bridge in Washington state.  Methods to manage the risk are also well documented, although in many cases there are few practical solutions beyond traffic closure.

Ice and snow accretion removal systems can be mechanical, thermal, or passive in nature.  In addition, localized monitoring of weather and ice/snow build-up can be used to pre-emptively close individual traffic lanes or the entire bridge when necessary.  Finally, the basic choice of having vertical cable planes instead of inclined cable planes (“fanned” cables suspended over traffic) can reduce the risk.

 Aftermath: B.C.’s transportation minister blamed the design-build contractor, Kiewit/Flatiron, stating that the bridge did not meet the requirements.  The provincial automobile insurance agency, ICBC, paid $400,000 in claims, and lawsuits were filed by people who were injured during the incident.  In response to one of the civil claims, TI Corp. and Kiewit/Flatiron stated “The buildup and subsequent release of ice and snow from the bridge structure was the result of a confluence of extreme environmental conditions, both unforeseen and unforeseeable to the defendants or any of them and was the inevitable result of an Act of God”.

Despite that statement, cable collars were installed in the fall of 2013 to help mitigate the hazard.  These are manually released from the top of the towers and slide down the cable stays to dislodge accumulated snow and ice.  The intent is to do this frequently enough to ensure the dislodged pieces are small enough to avoid damage to vehicles below.

In addition, a weather station and cameras were installed to allow for monitoring of conditions that could lead to ice and snow accumulation on the cables.  Operating procedures were put in place to shut down the bridge if dangerous conditions were detected.

The Port Mann collar system is said to be one of the most successful systems in the world.  However, it is not fool-proof.  Snowfalls during December 2016 have so far resulted in about 50 insurance claims from falling ice and snow.  This may have been a result of the operators not deploying the collars early enough during rapidly-changing weather conditions.

Interestingly, over the same period, 95 claims resulted from the same situation occurring on another Vancouver area bridge, the Alex Fraser, which has cables that do not hang over the bridge deck.  High winds together with rising temperatures can blow ice and snow off the cables into traffic.  The bridge was shut down for several hours on two days, due to “ice bombs”.  Falling ice also occurred in 2005, 2008, and 2012, but overall the Alex Fraser is less prone to these incidents than the Port Mann.  As a temporary measure, the Ministry of Transportation indicated that for the Alex Fraser, it will use a heavy-lift helicopter to blow snow accumulations off the cables.

The experiences on the Port Mann as well the Alex Fraser bridges will be applied to the upcoming bridge to replace the George Massey Tunnel.  In particular, the cable stays will not be allowed to cross over traffic and cable collars or an improved alternative will be required.

 Lessons Learned:

  • The requirements were written in a way that left the criticality and expectations somewhat open to interpretation. For some, the word “avoid” means you must completely ensure no risk of an incident.  For others, avoid means “try” or “use typical means and methods”.  One other section in the listed standards specified “practical” solutions were acceptable.  Yet, unless specifically defined, the word “practical” allows for many different measures of acceptable implementation.  The expectations around acceptable public safety were not met in this case, and best practices around requirements for public safety are typically better defined than what existed for the Port Mann Bridge.     
  • Designers should account for the complete system and its operating environment. The micro-climates of Vancouver are well known, and hazardous when heavy wet snow is mixed with freezing and thawing conditions.
  • Repeated questions raised regarding the risks of falling ice and snow could have resulted in a risk analysis leading to a more effective ice hazard mitigation strategy, rather than simply assuming the original design would be adequate.
  • Public safety issues need to be considered carefully and critically, and receive considerable attention from management.
  • Ultimately, the Owner is most likely to be held responsible for the performance of the implemented design and its impact on third parties (in this example, motorists being hit by “ice bombs”). While many projects have multiple contractors and parties involved in the design, construction, operation and maintenance, the Owner team needs to ensure that there is sufficient capability, staffing, mandate, and expertise held by these parties to be able to ensure quality in requirements definition and in the design, build, verification, validation, and operation and maintenance stages to mitigate issues.

Michael Eiche, P.Eng. Principal, SysEne Consulting

Why Conventional FMEAs fail too often, and why the Absolute Assessment Method FMEA is much better.

(Failure Modes and Effects Analysis)


On Oct 1, 2016, a commuter train crashed in New Jersey killing one and injuring 108 with high speed being a factor. The root cause of the crash is under investigation.

A similar crash happened in Amagasaki, Japan in April 2005 where 106 were killed and 562 injured, and high speed around a curve was a factor.  The conventional explanation of the root cause of the Amagasaki crash was corporate pressure on the driver to be on time.  Drivers would face harsh penalties for lateness, including harsh and humiliating “training” programs which included weeding and grass cutting duties.  In this case, the driver was speeding.  The resulting countermeasure in Amagasaki has been to put in an expensive $1-billion-dollar train speed control system on the small line to help mitigate a potential accident.

There have been many other high speed passenger train derailments, such as the Santiago de Compostela derailment in Spain in 2013 (79 dead, 139 injured out of 218 passengers), and the Fiesch derailment in Switzerland in 2010 (1 dead, 42 injured).  The root cause explanation of these accidents tends to focus on the drivers driving faster than they should, and countermeasures tend to focus on semi-automated systems to control train speed.

Do we really know the root cause of these accidents, and are the countermeasures both effective and economic?

One of the best root cause analyses I’ve seen on the Amagasaki crash comes from Unuma Takashiro, and his conclusion is unconventional.  Unuma-san is a Failure Modes and Effects Analysis (FMEA) consultant from Japan. FMEA is one of the best methods to analyze a design to help prevent failures.  FMEA was developed in the aviation and space industries in the 1960’s, adopted by the automotive industry in the 1990’s, and is now prevalent in many industries including health care.

Unuma-san argues that in the case of the Amagasaki crash, the speed control system is expensive and not fail-safe.  One economic and effective countermeasure would be to add a $250,000 guard rail, which at the very least would likely prevent a recurrence, and definitely be useful as an additional layer of countermeasure.  The advantage of low-cost and effective countermeasures is that they can be widely-deployed.


He argues the real root cause of this failure is that the overall engineering and management approach to mitigating failures was not adequate – both initially to prevent the accident in the first place, and subsequently after the crash by putting in the speed control system but not (also) the guard rail.

Unuma-san has a very interesting and useful website on FMEA practices, and uses the Amagasaki crash as one of many examples.  He promotes a FMEA method that uses an absolute evaluation method of countermeasures, as compared to the conventional FMEA which uses a relative evaluation method of countermeasures.  The problem with the relative evaluation method is that it can easily miss important failure modes that do not make an arbitrary priority cutoff.  Missing important failure modes often leads to unexpected incidents.

He also analyzes the conventional FMEA approach and teachings, and points out many problems seen in industry:

  • ineffective because of missing failure modes,
  • done too late in the design process, making it more difficult and less likely to implement countermeasures,
  • led by team members from other departments that are not responsible for the design, which both lowers the effectiveness of the analysis and can allow the designer to not be held fully accountable for the FMEA results,
  • doesn’t promote economical countermeasures, and
  • many of the common FMEA teachings contain flaws that promote the above problems.

Unuma-san shows that many FMEAs confuse failure mechanisms (the physical, chemical, thermal, electrical, biological, or other stresses leading to the failure mode) and the actual failure modes (ways a product or process can fail), leading to missing failure modes.  If a failure mode is missed, then there may be no countermeasure identified, and subsequently incorporated into the design.

He points out that the relative evaluation FMEA method promotes doing the FMEA on the entire design when enough of the design is done, then once the FMEA is done to a certain level, all of the issues are prioritized, and then acted upon.  The problem with this approach is that FMEAs take a lot of time, and by the time the results are done, the recommended changes to the design can be too late to be easily implemented.  He promotes instead that the designers do the FMEA as they are doing the design in a very concurrent and “local” manner, while evaluating the countermeasures in an absolute manner against the individual failure mode.  This more easily allows for countermeasures to get into the design of the product or process in the early stages.

When non-designers take too much of the FMEA responsibility and scope, the effectiveness of the FMEA is reduced and the results are available late in the design process.  The effectiveness is reduced because non-designers are unable to know all the key information in the heads of the designers, and the designers may feel less accountable for the FMEA quality.  Results are delayed because instead of countermeasures being considered at the time of the design decision, they are made available after the design decision has been made and it is then more difficult and less likely to have any countermeasure implemented.

Unuma-san’s method is simpler than many FMEAs, by using a four-point scale to the third power (64 ratings), vs. many conventional approaches using of a 10-point scale to the third power (1000 ratings).  He promotes determining countermeasures per failure mode, evaluating the likely success of those countermeasures, and whether there is opportunity for optimization and lower costs from reducing overdesign.

Unuma-san goes on to analyze the common teachings of FMEA by referring to many of the most common reference material available in books, training material, websites, etc. and he shows many flaws, inconsistencies, interpretation issues etc. that tend to exacerbate the above issues.  Much of the trouble with conventional FMEAs can be traced to poor teachings.

Unuma-san has consulted for a very long and impressive list of Japanese companies on FMEA in the transportation, health care, manufacturing, and consumer goods industries.

I’ve been both a lead designer of multiple complex systems, and I’ve been helping clients improve their product development processes, including FMEA.  The teachings of Unuma-san resonate strongly with me.  Too often I have seen poorly done FMEAs that miss critical failure modes, late FMEAs whose recommendations are too late to be useful, and FMEA study teams that don’t have enough participation by the design team.  The absolute evaluation method FMEA is a substantial improvement over the relative evaluation method, mostly because it evaluates the likely success of countermeasures.  I highly recommend his webpage on FMEAs, and it is linked here.  It is a little hard to read as the website translation to English isn’t the best, but worthwhile.

I think one of the reasons why FMEA teachings have many issues is that few FMEA teachers have been skilled design engineers, but are instead people that gravitate to process design.  The idea behind the FMEA is good and includes teaching early and effective analysis, unfortunately much of the applied practice falls short.  A skilled design engineer naturally considers failure modes and tries to design them out, while simultaneously considering many other design tradeoffs, such as performance, function, economics, aesthetics, ergonomics, etc.  I think that since the conventional FMEA trainers developed the applied practice of the FMEA, they have continued to build upon the original process of the relative assessment method, and have struggled to develop effective practices that overcome the conventional process shortcomings.  In my experience many design engineers have found FMEA to be a good idea but too slow, too time-consuming, and not effective enough to really embrace it.

What I like about Unuma-san’s method it is practical, effective, time-efficient, and evaluates the likely success of countermeasures.  It can be very useful to have FMEA experts, trained in this method, who can help designers with training, facilitation, documentation, review, etc.

There are a few other improved FMEA methods available that are trying to address some of the effectiveness and lateness problems with conventional FMEAs, such as “FMMEA”, (Failure Modes, Mechanisms, and Effects Analysis), and there are good teachings in these methods as well.  I have found Unuma-san’s method to be among the best and really resonates with me.

FMEA is one of the best methods to help avoid failures.  By making the method more effective, products, processes, projects, and infrastructure can have less problems and be more economical.  I highly recommend further study on this topic for engineers and managers delivering any system.

Craig Louie, P.Eng., Co-Founder, SysEne Consulting

There isn’t a dilemma in autonomous vehicles having to choose between harming their passengers or others.


A recent study in Science “The Social Dilemma of Autonomous Vehicles” has highlighted how self-driving cars need to have algorithms to decide on actions in extreme situations – and even having to choose between protecting the passengers vs. pedestrians.  The study results indicate that participants favor minimizing the overall number of public deaths even if it puts the vehicle in harm’s way.  But when asked about which cars they would actually buy, participants would choose a car that would protect them first.  This study highlights an apparent conflict between morality and autonomy.

I like the study, as it raises good questions, and it describes part of one of the many issues in autonomous vehicles.  I also like the many news articles that have been written on this topic based on the study, as it helps raise awareness of the complexity of the issue – that it is both social and technical.  At the same time, for the sake of being newsworthy, and controversial, most narratives I read on the topic frame the study and topic as a social dilemma.  Yet when examined through a technical perspective, we will have dramatically safer situations for both passengers and pedestrians with autonomous vehicles, and there isn’t any dilemma.

Traffic related death rates are over 1.25 million deaths worldwide per year, and with aging drivers, distracted driving, higher speeds, prevalence of substance abuse all contributing to stubbornly keep the rate high.  For every person killed in a motor-vehicle accident, 8 are hospitalized and 100 are treated and released from emergency rooms.  Autonomous driving, when implemented well, will easily reduce this by 90%, and perhaps by 99% when fully implemented.   The response time, sensing, spatial awareness, decision-making, and reliability of an autonomous vehicle will be better than most of us, except perhaps for highly trained and talented drivers, and definitely infinitely better than too many of our driving population that cause most accidents (distracted, drunk, inexperienced, tired, reduced reflexes, etc.).  The autonomous capability allows us to have a safer response for both the passenger and pedestrian.

Consider that the autonomous vehicle can respond faster than most humans.  I have the lane departure warning system on my car, and it is much faster than me.  An autonomous vehicle will be able to brake faster, more optimally, and steer a better adaptive path that is more likely to minimize injury to both passenger and pedestrian.  Most drivers can’t brake as fast, or optimize the braking pressure, or optimize the steering adjustments during the emergency maneuver as well as a well-implemented autonomous vehicle.  The following picture shows a better braking and adaptive steering path with the best overall outcome for both passenger or pedestrian.  In the event of a collision, the overall speed, impact angle, etc. will be reduced.


With autonomous vehicles, there will still be accidents, and there will be cases where it will be determined that the autonomous vehicle did not make the best decision.  But the overall absolute level of safety will go up so dramatically, that the question will not be “isn’t this the wrong car to buy because it may decide wrongly in an extreme case?” but “isn’t this the right car to buy because it is overall so much safer for me and everyone else?”.  The moral path is to embrace autonomous vehicles, and work towards a proper system design and implementation in industry, government, and with consumers.

Why do so many industrial projects underperform these days?


About seven out of ten industrial projects underperform in production, operability, and/or have significant  cost or schedule overruns. Everyone working on the project, including the sponsors, want a successful, on budget, and on schedule project.  There are thousands of reference projects that have been done in the past decade, yet why is it so hard to learn from experience?

There are many reasons, and I’d like to comment on a few of the key ones that are of heightened risk because of today’s environment.  There is much material available on industrial project underperformance, and we have talked to many in industry, and unfortunately we hear painful stories too often.  As a general background:

  • Project complexity increases daily, with more difficult to reach resources resulting in a continual need to deploy new combinations of technologies, more difficult environmental regulations, more difficult community relations, etc.
  • We have been through 15 years of an economic boom in the Global Industry, and even the slowdown blip in 2008 was just a short term 10 month bust cycle with one of the fastest rebounds of industrial activity in 2009. During this boom, the underlying cost structure for engineering and construction services has increased much faster than inflation.  On a typical project in the North Sea, companies are having to pay $300/hr for mediocre quality engineering – mediocre since after 15 years of boom, engineering companies have been often taking on less and less capable staff in recent years.
  • In boom times, many unhealthy projects still make money.
  • For many years now, many owner companies have been shedding internal experts in the technical functions, and they try to offload work and risks to EPC(m) or other contracting firms. But much of the work and risk cannot be transferred from owner to contractor because they are structurally different. Owners make money from the capital asset and they can still survive a budget overrun.  Contractors cannot afford to take any financial liability of an underperforming project.  Many owners often try to offload their project management or technical work to the contracting firm, but this is can be problematic mostly owners and contractors have very different perspectives.  Owner’s teams need to be able to provide enough business and technical direction, and also provide contractor oversight.  When they struggle to do so because they don’t have the resources to do so, the whole project suffers.  Owner companies also struggle with internal coherence between all their internal departments and managers when they don’t have enough project resources.
  • Engineering and EPC(m) firms are always in search of the next project and don’t provide or develop enough long term continuity, R&D, productivity, or innovative support to the project over its entire life-cycle, or to the next project. These contractors cannot hold specialty resources or afford to invest in innovation. Engineering and EPC(m) firms are more service firms than total solutions firms – in part because this is what owner’s ask of them through the procurement process.
  • Much of the supply base, where much of the innovation does happen, struggle to afford or acquire all the necessary expertise needed to develop reliable and cost-effective solutions.

And now the Global economic macro-environment has weakened, especially in Canada’s Energy and Resource sectors.

With today’s drop in energy and commodity prices, and a general shortage of industrial capital financing, industrial companies are slashing their technical and project teams and departments to reduce their operating expenses.  Until mid-2014 or so, production was King.  Now we see significant consolidations, downsizing, and a focus on industrial company survival.    An overly-lean team without enough access to critical skills is going to make current and future industrial projects even more difficult to meet expectations, budget and schedule.

With weakened balance sheets, industrial companies are going to need successful projects more than ever.

Keys for Improvement

We need to do better and we can do better with an improved application of management, strategy, approaches, and more respect for the complexity of today’s industrial projects.  While all key stakeholders have to improve, the greatest leverage is with the project sponsors.  They control the highest level need, budget, scope, risk profile, etc., and so they have the largest leverage on the outcome.

  • There needs to be a common understanding by both business and technical professionals on why there are so many issues with these projects, and going forward, how these projects should be developed, governed, and executed.
  • The project team needs to have the right skills, adequate staffing levels, and then a robust training program on how to best manage and implement the industrial project
  • The up-front design and planning work needs to be adequately funded and given enough time. A weak design and/or poor plan causes too many problems downstream when the activity and capital spend ramps up.
  • The right contracting strategy should be chosen, and the overall team constructed in a complete way and consistent with the strategy. The owner’s team must have the right skills and do all the scoping, concept work, requirements development, and overall management that is typical of successful contracts.  The contracting must be done so that the professional service firms deliver quality and get paid well enough for doing so.
  • Experienced and systematic approaches to the:
    • technical solution,
    • process of doing the project,
    • build and organization of the team

While the above roadmap seems obvious, the root cause of the problematic projects are issues in the above five points, in either the understanding, approach, strategy, or implementation.  Furthermore, they have to be done well enough to the sophisticated level required by the complexity in today’s projects.

When owner’s companies become more open to a longer term value and improved partnering with the contracting firms and the supply base, it can enhance productivity and innovation from their products and services to the owner’s projects over the life of the asset.  For example, engineering firms could provide more long term asset support.  They have significant data on all the projects from the design phase, and can get operational data from the currently operating assets.  Currently after the project build is finished, the engineering contractor moves their resources onto other projects (or if it is slow lets them go).  The owner’s operating department of the asset struggle without the contractor engineering support, design models, people continuity, etc., and often the result is the asset does not operate to its potential. There can be a great business case to further optimization and operational improvements to the operating asset that could be turned into a long term support contract.  Everyone wins.

We must change the way we do things for a better outcome, and the ways do exist.



Movie Review: The Challenger Disaster



This excellent 90 minute movie brings to life the great story of Richard Feynman’s investigation into the Space Shuttle Challenger disaster.  I found the movie had good pacing, rang very true to what actually happened, and had very good acting by William Hurt as Feynman, Bruce Greenwood, and Brian Dennehy.

The movie is based on Feynman’s book “What Do You Care What Other People Think”, which is also a terrific book.  The story follows Feynman’s instrumental role in uncovering the truth about the root cause of the disaster – both technically and politically.  Feynman’s personal heroism against strong headwinds and personal illness makes for a compelling story.

The movie does great justice to key scenes – the dramatic O-ring experiment, the personal difficulties of Feynman, to the political conspiracy surrounding and both supporting and opposing his investigation.


William Hurt’s performance was able to draw me in emotionally into the story.  I’ve not really been a big fan of William’s performance in other movies, as I didn’t like him as Duke Leto in Frank Herbert’s Dune (too stiff), and he was ok in Dark City.  Yet in this movie he was able to capture Feynman’s unique character very well.

The movie inspired me to re-read “What Do You Care What Other People Think?”, which I had read over 20 years ago.  The overall story of Feynman and the Challenger continues to be sharply relevant today with widespread complex system development, that have significant safety consequences, large multi-stakeholder interests, often conflicting, and sometimes these interests are inclined to bury the truth.

One of the most interesting short stories in “What Do You Care What Other People Think” is the story of Richard, and his first wife, Arline.  It is a great love story, despite its tragic nature.  The book’s title came from her.

This movie (and book) is highly recommended!

For a successful technologyreality must take precedence over public relations, for nature cannot be fooled. – Richard Feynman

Is your Complex System Project on track for Ultraquality Implementation?


We expect complex systems like an airplane, a nuclear powerplant, or a LNG plant to practically never fail.  Yet systems are becoming increasingly complex, and the more components there are in a system, the more reliable each component must be, to the point where, at the element level, defects become impractical to measure within the time and resources available.

Additionally, in future, our expectations will increase for complex systems durability, reliability, total cost of ownership, and return on investment, as energy and raw materials increase in cost.

Ultraquality is defined as a level of quality so demanding that it is impractical to measure defects, much less certify the system prior to use.  It is a limiting case of quality driven to an extreme, a state beyond acceptable quality limits (AQLs) and statistical quality control.

One example of ultraquality is commercial aircraft failure rates.  Complexity is increasing: the Boeing 767 has 190k software lines of code, whereas the Boeing 777 has 4 million lines of code, and the Boeing 787 about 14 million lines of code.  The allowable failure rate of the flight control system continues to be one in 10 billion hours, which is not testable, yet the number of failures to date is consistent with this order of magnitude.


Another example of ultraquality is a modern microprocessor, which has the same per chip defect rates despite the number and complexity of operations have increased by factors of thousands.  The corresponding failure rate per individual operation is now so low to be almost unmeasurable.


What are the best practices to achieve ultraquality in complex systems?

Meier and Rechtin make a strong case that while analytical techniques like Six Sigma and Robust Engineering Design will get you close, the addition of heuristic methods will get you over the top.  This includes using a zero defects approach not only in manufacturing, but also design, engineering, assembly, test, operation, maintenance, adaptation, and retirement – the complete lifecycle.

There are many examples how analytical techniques alone underestimate failure; for example the nuclear industry analysis of core damage frequency is off by an order of magnitude in reality.


A sample of applicable heuristics include:

  • Everyone in the production line is a customer and a supplier [also extended to each person in the development team – engineering, supply, etc.]
  • The Five Why’s
  • Some of the worst failures are system failures
  • Fault avoidance is preferable to fault tolerance in system designs
  • The number of defects remaining in a system after a given level of test or review  (design review, unit test, system test, etc.) is proportional to the number found during that test or review.
  • Testing can indicate the absence of defects in a system only when: (1) The test intensity is known from other systems to find a high percentage of defects, and (2) Few or no defects are discovered in the system under test.


[pie chart courtesy Boeing.  FBW = Fly By Wire]

There is a lot more material on “how-to” in the works of Meier and Rechtin, Juran, and Phadke.

Ultraquality requires ultraquality throughout all the development processes, and by extension throughout the delivering organization.  That is, certify a lack of defects in the final product by insisting on a lack of defects anywhere in the development process.  Developing both the processes and organization to achieve this state is possible, is being done in some organizations, and allows for superior business performance.

There are many examples how organizations lack ultraquality in their processes or organization.  General Motors is under heavy criticism these days following the Valukas report, which exposes the poor organization and development practices.  This is anecdotally impacting the GM dealers and turning them into ghost towns.

So back to the tagline: is your complex development project on track for ultraquality implementation?

Why Are Risk Assessments So Underestimated?

In light of the terrible train derailment tragedy in La Megantic this week, one question is “why are risk assessments so underestimated?”

Figure 1: Train Derailment Consequences La Megantic July 2013

Figure 1: Train Derailment Consequences La Megantic July 2013


Engineers, scientists, and managers do risk assessments all the time as a normal course of business.  Yet system failures occur much more frequently than the risk assessments report.

Typical Nuclear power industry/regulator estimates of core damage frequency are between 1 in 20,000 or 1 in 50,000 reactor years, which mean a core damage incidence every 40-100 years; or in our history, there should have been less than 1 incidence so far.  Yet so far we have had more than 10 such incidents.  The risk assessment and management methodology in this case is underestimating the risk by over an order of magnitude.

While it is too early still to understand the root cause and systemic failures in the La Megantic train derailment, clearly the risks were underestimated.  The appropriate safeguards – design or human – failed.

There are many risk assessment techniques used by industry and regulators: Failure Modes and Effects Analysis, Probabilistic Risk Assessment, and Hazard and Operability Study (to name a few).   Where they tend to underestimate risk has been studied by many independent sources* and has been found to be especially weak in human factors:

  • complacency in design
  • failure to anticipate vulnerabilities from external sources to the system
  • unjustified trust in safety margins
  • poor training
  • cutting corners to cut costs
  • cosy relationship between regulators and the regulated
  • cultural factors
  • handovers between individuals or groups from different organizations

Hollywood likes to produce action/disaster movies that illustrate the consequences of accidents and incidents.  Sometimes they are overdramatic (the fuel cell explosion in Terminator 3 was like a huge nuclear bomb!  If only fuel cells could be so powerful…).


Figure 2: Fuel cell explosion (!) in Terminator 3

Figure 2: Fuel cell explosion (!) in Terminator 3


Other times Hollywood seems to be pretty prescient, as in the movie Unstoppable, though that had a happy ending.

Considering the catastrophic consequences of the La Megantic derailment, we need to reconsider oil transport – and not necessarily in favour of pipelines, as pipelines have their unique risks and consequences as well.  The La Megantic derailment is bad for oil overall.

One advantage of many clean energy sources is that the inherent accident risk and consequences are much lower than conventional forms.  When assessed through that lens, the overall project and financial returns can be superior.


* Contact me for links