Why Conventional FMEAs fail too often, and why the Absolute Assessment Method FMEA is much better.

(Failure Modes and Effects Analysis)


On Oct 1, 2016, a commuter train crashed in New Jersey killing one and injuring 108 with high speed being a factor. The root cause of the crash is under investigation.

A similar crash happened in Amagasaki, Japan in April 2005 where 106 were killed and 562 injured, and high speed around a curve was a factor.  The conventional explanation of the root cause of the Amagasaki crash was corporate pressure on the driver to be on time.  Drivers would face harsh penalties for lateness, including harsh and humiliating “training” programs which included weeding and grass cutting duties.  In this case, the driver was speeding.  The resulting countermeasure in Amagasaki has been to put in an expensive $1-billion-dollar train speed control system on the small line to help mitigate a potential accident.

There have been many other high speed passenger train derailments, such as the Santiago de Compostela derailment in Spain in 2013 (79 dead, 139 injured out of 218 passengers), and the Fiesch derailment in Switzerland in 2010 (1 dead, 42 injured).  The root cause explanation of these accidents tends to focus on the drivers driving faster than they should, and countermeasures tend to focus on semi-automated systems to control train speed.

Do we really know the root cause of these accidents, and are the countermeasures both effective and economic?

One of the best root cause analyses I’ve seen on the Amagasaki crash comes from Unuma Takashiro, and his conclusion is unconventional.  Unuma-san is a Failure Modes and Effects Analysis (FMEA) consultant from Japan. FMEA is one of the best methods to analyze a design to help prevent failures.  FMEA was developed in the aviation and space industries in the 1960’s, adopted by the automotive industry in the 1990’s, and is now prevalent in many industries including health care.

Unuma-san argues that in the case of the Amagasaki crash, the speed control system is expensive and not fail-safe.  One economic and effective countermeasure would be to add a $250,000 guard rail, which at the very least would likely prevent a recurrence, and definitely be useful as an additional layer of countermeasure.  The advantage of low-cost and effective countermeasures is that they can be widely-deployed.


He argues the real root cause of this failure is that the overall engineering and management approach to mitigating failures was not adequate – both initially to prevent the accident in the first place, and subsequently after the crash by putting in the speed control system but not (also) the guard rail.

Unuma-san has a very interesting and useful website on FMEA practices, and uses the Amagasaki crash as one of many examples.  He promotes a FMEA method that uses an absolute evaluation method of countermeasures, as compared to the conventional FMEA which uses a relative evaluation method of countermeasures.  The problem with the relative evaluation method is that it can easily miss important failure modes that do not make an arbitrary priority cutoff.  Missing important failure modes often leads to unexpected incidents.

He also analyzes the conventional FMEA approach and teachings, and points out many problems seen in industry:

  • ineffective because of missing failure modes,
  • done too late in the design process, making it more difficult and less likely to implement countermeasures,
  • led by team members from other departments that are not responsible for the design, which both lowers the effectiveness of the analysis and can allow the designer to not be held fully accountable for the FMEA results,
  • doesn’t promote economical countermeasures, and
  • many of the common FMEA teachings contain flaws that promote the above problems.

Unuma-san shows that many FMEAs confuse failure mechanisms (the physical, chemical, thermal, electrical, biological, or other stresses leading to the failure mode) and the actual failure modes (ways a product or process can fail), leading to missing failure modes.  If a failure mode is missed, then there may be no countermeasure identified, and subsequently incorporated into the design.

He points out that the relative evaluation FMEA method promotes doing the FMEA on the entire design when enough of the design is done, then once the FMEA is done to a certain level, all of the issues are prioritized, and then acted upon.  The problem with this approach is that FMEAs take a lot of time, and by the time the results are done, the recommended changes to the design can be too late to be easily implemented.  He promotes instead that the designers do the FMEA as they are doing the design in a very concurrent and “local” manner, while evaluating the countermeasures in an absolute manner against the individual failure mode.  This more easily allows for countermeasures to get into the design of the product or process in the early stages.

When non-designers take too much of the FMEA responsibility and scope, the effectiveness of the FMEA is reduced and the results are available late in the design process.  The effectiveness is reduced because non-designers are unable to know all the key information in the heads of the designers, and the designers may feel less accountable for the FMEA quality.  Results are delayed because instead of countermeasures being considered at the time of the design decision, they are made available after the design decision has been made and it is then more difficult and less likely to have any countermeasure implemented.

Unuma-san’s method is simpler than many FMEAs, by using a four-point scale to the third power (64 ratings), vs. many conventional approaches using of a 10-point scale to the third power (1000 ratings).  He promotes determining countermeasures per failure mode, evaluating the likely success of those countermeasures, and whether there is opportunity for optimization and lower costs from reducing overdesign.

Unuma-san goes on to analyze the common teachings of FMEA by referring to many of the most common reference material available in books, training material, websites, etc. and he shows many flaws, inconsistencies, interpretation issues etc. that tend to exacerbate the above issues.  Much of the trouble with conventional FMEAs can be traced to poor teachings.

Unuma-san has consulted for a very long and impressive list of Japanese companies on FMEA in the transportation, health care, manufacturing, and consumer goods industries.

I’ve been both a lead designer of multiple complex systems, and I’ve been helping clients improve their product development processes, including FMEA.  The teachings of Unuma-san resonate strongly with me.  Too often I have seen poorly done FMEAs that miss critical failure modes, late FMEAs whose recommendations are too late to be useful, and FMEA study teams that don’t have enough participation by the design team.  The absolute evaluation method FMEA is a substantial improvement over the relative evaluation method, mostly because it evaluates the likely success of countermeasures.  I highly recommend his webpage on FMEAs, and it is linked here.  It is a little hard to read as the website translation to English isn’t the best, but worthwhile.

I think one of the reasons why FMEA teachings have many issues is that few FMEA teachers have been skilled design engineers, but are instead people that gravitate to process design.  The idea behind the FMEA is good and includes teaching early and effective analysis, unfortunately much of the applied practice falls short.  A skilled design engineer naturally considers failure modes and tries to design them out, while simultaneously considering many other design tradeoffs, such as performance, function, economics, aesthetics, ergonomics, etc.  I think that since the conventional FMEA trainers developed the applied practice of the FMEA, they have continued to build upon the original process of the relative assessment method, and have struggled to develop effective practices that overcome the conventional process shortcomings.  In my experience many design engineers have found FMEA to be a good idea but too slow, too time-consuming, and not effective enough to really embrace it.

What I like about Unuma-san’s method it is practical, effective, time-efficient, and evaluates the likely success of countermeasures.  It can be very useful to have FMEA experts, trained in this method, who can help designers with training, facilitation, documentation, review, etc.

There are a few other improved FMEA methods available that are trying to address some of the effectiveness and lateness problems with conventional FMEAs, such as “FMMEA”, (Failure Modes, Mechanisms, and Effects Analysis), and there are good teachings in these methods as well.  I have found Unuma-san’s method to be among the best and really resonates with me.

FMEA is one of the best methods to help avoid failures.  By making the method more effective, products, processes, projects, and infrastructure can have less problems and be more economical.  I highly recommend further study on this topic for engineers and managers delivering any system.


There isn’t a dilemma in autonomous vehicles having to choose between harming their passengers or others.


A recent study in Science “The Social Dilemma of Autonomous Vehicles” has highlighted how self-driving cars need to have algorithms to decide on actions in extreme situations – and even having to choose between protecting the passengers vs. pedestrians.  The study results indicate that participants favor minimizing the overall number of public deaths even if it puts the vehicle in harm’s way.  But when asked about which cars they would actually buy, participants would choose a car that would protect them first.  This study highlights an apparent conflict between morality and autonomy.

I like the study, as it raises good questions, and it describes part of one of the many issues in autonomous vehicles.  I also like the many news articles that have been written on this topic based on the study, as it helps raise awareness of the complexity of the issue – that it is both social and technical.  At the same time, for the sake of being newsworthy, and controversial, most narratives I read on the topic frame the study and topic as a social dilemma.  Yet when examined through a technical perspective, we will have dramatically safer situations for both passengers and pedestrians with autonomous vehicles, and there isn’t any dilemma.

Traffic related death rates are over 1.25 million deaths worldwide per year, and with aging drivers, distracted driving, higher speeds, prevalence of substance abuse all contributing to stubbornly keep the rate high.  For every person killed in a motor-vehicle accident, 8 are hospitalized and 100 are treated and released from emergency rooms.  Autonomous driving, when implemented well, will easily reduce this by 90%, and perhaps by 99% when fully implemented.   The response time, sensing, spatial awareness, decision-making, and reliability of an autonomous vehicle will be better than most of us, except perhaps for highly trained and talented drivers, and definitely infinitely better than too many of our driving population that cause most accidents (distracted, drunk, inexperienced, tired, reduced reflexes, etc.).  The autonomous capability allows us to have a safer response for both the passenger and pedestrian.

Consider that the autonomous vehicle can respond faster than most humans.  I have the lane departure warning system on my car, and it is much faster than me.  An autonomous vehicle will be able to brake faster, more optimally, and steer a better adaptive path that is more likely to minimize injury to both passenger and pedestrian.  Most drivers can’t brake as fast, or optimize the braking pressure, or optimize the steering adjustments during the emergency maneuver as well as a well-implemented autonomous vehicle.  The following picture shows a better braking and adaptive steering path with the best overall outcome for both passenger or pedestrian.  In the event of a collision, the overall speed, impact angle, etc. will be reduced.


With autonomous vehicles, there will still be accidents, and there will be cases where it will be determined that the autonomous vehicle did not make the best decision.  But the overall absolute level of safety will go up so dramatically, that the question will not be “isn’t this the wrong car to buy because it may decide wrongly in an extreme case?” but “isn’t this the right car to buy because it is overall so much safer for me and everyone else?”.  The moral path is to embrace autonomous vehicles, and work towards a proper system design and implementation in industry, government, and with consumers.

Staying ahead of upcoming restrictive drone regulations


How can drone developers avoid being shut down by an accident?

With the gradual increase in the use of commercial and consumer drones, we constantly hear about near-collisions and other incidents. Some of these incidents might be over-dramatized by the media or by whoever reported them, yet the overall risk is rising. In the UK,multiple close encounters of the drone kind have been reported recently, with some very close calls between small drones and large passenger jets. In a recent incident in LA, a helicopter was reportedly struck by what was probably a small drone, fracturing the windshield. Not only aircraft and their passengers are at risk – people on the ground can also get injured. In another incident, an innocent hobbyist’s drone clipped a tree and dropped towards the ground, causing serious eye injury to his friend’s son, a young toddler. In the Netherlands a small drone lost contact with its operator and flew away, gradually running out of battery, then eventually descended onto a busy highway. Although the incident did not result in any damage to people or equipment, it did damage the local drone industry, which was shortly thereafter subjected to extensive flight restrictions. In Vancouver, there have been a number of cases of drones reported near the final approach to YVR and at least a couple of cases when commercial drones crashed and caused minor damage to parked vehicles. YVR airport has recently launched a drone awareness program.

Small drones can help reduce emissions (by replacing larger aircraft for similar operation), save lives through search & rescue operations, replace manned aircraft in dangerous operations, aid in “precision agriculture” that helps produce better yields, help inspect smoke stacks and wind-turbines without the need for downtime, and many other possible applications. When designed and used appropriately, the utility of drones is enormous, and this utility is the essence of why the commercial drone industry has been growing so quickly. Drones can do a lot to advance society.

One of the major barriers to the full public acceptance of drones is that they pose safety concerns to the public. Typical root causes of incidents to date include:

  • Irresponsible operation. Not all drone operators are as experienced, cautious, or responsible as the best commercial operators. The fact that consumer drones are relatively inexpensive makes it possible to start a small business using drones at relatively low (apparent) risk, or for a user to purchase a unit for hobby purposes, with little experience or knowledge. Remote control airplane hobbyists are generally responsible, knowledgeable, and have the required skill to safely fly their models; however, modern drones and particularly the multirotor type are easy to operate anywhere, even for the uninformed or inexperienced operator. Owning and operating a drone safely, requires knowledge, skill, and responsibility. One can begin by taking not-too-costly drone courses, either online or in class.
  • Technical problems associated with performance, reliability, or other shortfalls, for example:
    • Drone flyaway, where a drone suddenly flies away due to a broken communication link with the remote control unit (because of remote control failure or radio interference), a software glitch, operator error, design issues, etc.
    • Engine failure. A variety of solutions have been developed that allow a multirotor drone to recover from engine failure, while many of the products currently on the market do not have such recovery capability.   With fixed wing drones, loss of an engine is generally easier to recover from.
    • Loss of control, for example because of interference confusing the unit’s compass, GPS, or inertial sensors, or due to incorrect orientation or calibration of the unit’s compass (“toilet bowl effect”).
    • Power failure due to battery failure or wiring issues.

Drone technology improvements are enabling more capable and lower cost drones, increasing the numbers of drones and the overall safety risk. Improvements include enabling technologies such as lithium polymer batteries, flight control and ground control station software, small-size (low-weight) cameras and other sensors, and small flight controllers based on solid state electronic components.   There are also numerous technologies and techniques that enhance the reliable operation of drones, such as monitoring battery hours/cycles/performance, setting parameters properly, performing calibration, etc. Flight control has become more affordable with software-enabled augmentation of lower-accuracy inertial sensors.

There is significant opportunity for improving the safety and reliability of small drones to the necessary level, by developing more robust system architectures, improving operational procedures and operator qualifications, technology innovation, improved regulation, and by following more rigorous techniques in the design and manufacturing of these products. Most drones do not fully employ the proven and robust approaches used in the manned aircraft industry in the areas of design, testing, regulation, maintenance, inspection, and other best practices. While some of these approaches are more rigorous and expensive than necessary, and can be relaxed somewhat to be suitable for the drone industry, there is high value in many of these approaches that can lead to both adequate safety and risk profiles, and low enough cost and weight.

Regulations are developing worldwide, and all in the direction of more restrictive or higher required capability and proof. In some countries, there is a move towards restricting the operation of drones near built-up areas and air-fields to “compliant systems” only, which is a challenge for all drone developers, and will likely leave some of the lower-cost manufacturers behind. Manufacturers who are proactive in economically developing reliable, safer, compliant products will be the most successful in the marketplace, as they will be able to operate where others cannot and will avoid reliability issues in the marketplace. Even consumer drones are complex products, and their safe operation an even more complex challenge. One major incident caused by technical failure, or by a design that does not prevent user error, could result in a damaging effect on the national or global drone market.

A systems approach to this complex issue that combines proactive strategy, careful risk analysis, economically innovative solutions, and best practices tailored to the drone industry will enable leading drone developers to get ahead on this issue. Such effort may seem costly, but is by far outweighed by the potential repercussions of failure to prevent an incident, even if it has been caused by “operator error”. It you think safety is too costly, try an accident!

A more useful Requirements Process Maturity Model

A useful diagnostic tool to help determine problem areas and areas for improvements are maturity models.  They can be used by both the client and consultant to determine the current level of performance.  The target level of performance doesn’t need level 5 (highest capability) for everything, as that is likely too expensive or difficult to achieve, or not necessarily needed.

One of the best ways of improving technology and product development is for your organization be good at developing and managing requirements.  About 70% of problems in technology and product development come from requirements and system interaction errors, and fixing these problems at the final acceptance test or in the customer’s hands costs about 100 times more than fixing them in the requirements development and management phases of the project.  Basically build the right thing, build things right, and find problems early.

For requirements development and management, there are a few maturity models published, but I have found them too specific to an industry (like for business analysts in the software industry), cover only certain aspects, or don’t cover integration, training, or culture well enough.  So I’ve developed the above model based on similar models from consulting houses, CMMI, Six Sigma, Model Based Systems Engineering, PLM, and my own background.  I think this can apply to all kinds of systems, from hardware-oriented (manufacturing, construction), software-oriented, or combinations of both.

Requirements Process Maturity Model

(click for full size)

Using this tool can then help structure the problem, ask the right questions and prioritize opportunities. Where does your organization stack up?

If you have comments or questions on the model, or have ideas for improvements, please contact me.





System Level Website Failures – Technical, Process, and Organization

In BC, we recently had a windstorm that knocked out power in the province for over 700,000 people, some for 4 days.  One of the most difficult parts of the outage was that the BC Hydro Website that provides outage updates also went out at the same time. This made it very difficult for people without power decide on what to do, where to go, what to do with the food in the refrigerator, etc. and made for many unhappy customers.

Many critical websites are complex systems, and fail more often than desired.  A good example was the failure of the ObamaCare’s HealthCare.gov website launch where there were serious technical problems at the rollout, which has subsequently taken about 6 months to fix the major issues. On launch day, as soon as the website hit about 2,000 simultaneous users, the website performance became unusable, which was an issue since on the first day, 250,000 simultaneous users tried to get access to the website. There are many other problems with the Healthcare.gov, as that project had large budget overruns, with $1.7 Billion dollars spent, which is about 10 times more than budget and what it should have cost. There are also lasting data and security problems with the website and internal database.


The majority of the root causes of the Healthcare.gov failure were systems-level failures in all three major dimensions of any complex system delivery: technical, process, and organization.

  1. Technical: The system design used an outdated 1990’s database server model that doesn’t scale well with many concurrent users, as opposed to using a more typical e-commerce server model that can scale with users.
  2. Process: The system development process used a waterfall approach to build most of the website and then test it, vs. an agile approach where you test the important parts all throughout the development process.  Additionally there was very little testing during the development.  They were even off by a factor of five on the concurrent user requirement.
  3. Organization: The organizational system of the Government and the Contractor were poor with too many delays, last minute changes, poor subcontracting, poor reporting, and poor coordination.

BC Hydro is conducting a root case investigation of their website failure.  Perhaps the root cause was a simple and isolated issue, but I am interested to hear when the investigation is done on whether the failure had similar systems-level causes like the HealthCare.gov launch failure. For any complex interrelated technical, process and organizational complex problem, the Systems Approach is the best way to develop a solution that satisfies the overall needs and meets the expected behaviours of the system.

Why do so many industrial projects underperform these days?


About seven out of ten industrial projects underperform in production, operability, and/or have significant  cost or schedule overruns. Everyone working on the project, including the sponsors, want a successful, on budget, and on schedule project.  There are thousands of reference projects that have been done in the past decade, yet why is it so hard to learn from experience?

There are many reasons, and I’d like to comment on a few of the key ones that are of heightened risk because of today’s environment.  There is much material available on industrial project underperformance, and we have talked to many in industry, and unfortunately we hear painful stories too often.  As a general background:

  • Project complexity increases daily, with more difficult to reach resources resulting in a continual need to deploy new combinations of technologies, more difficult environmental regulations, more difficult community relations, etc.
  • We have been through 15 years of an economic boom in the Global Industry, and even the slowdown blip in 2008 was just a short term 10 month bust cycle with one of the fastest rebounds of industrial activity in 2009. During this boom, the underlying cost structure for engineering and construction services has increased much faster than inflation.  On a typical project in the North Sea, companies are having to pay $300/hr for mediocre quality engineering – mediocre since after 15 years of boom, engineering companies have been often taking on less and less capable staff in recent years.
  • In boom times, many unhealthy projects still make money.
  • For many years now, many owner companies have been shedding internal experts in the technical functions, and they try to offload work and risks to EPC(m) or other contracting firms. But much of the work and risk cannot be transferred from owner to contractor because they are structurally different. Owners make money from the capital asset and they can still survive a budget overrun.  Contractors cannot afford to take any financial liability of an underperforming project.  Many owners often try to offload their project management or technical work to the contracting firm, but this is can be problematic mostly owners and contractors have very different perspectives.  Owner’s teams need to be able to provide enough business and technical direction, and also provide contractor oversight.  When they struggle to do so because they don’t have the resources to do so, the whole project suffers.  Owner companies also struggle with internal coherence between all their internal departments and managers when they don’t have enough project resources.
  • Engineering and EPC(m) firms are always in search of the next project and don’t provide or develop enough long term continuity, R&D, productivity, or innovative support to the project over its entire life-cycle, or to the next project. These contractors cannot hold specialty resources or afford to invest in innovation. Engineering and EPC(m) firms are more service firms than total solutions firms – in part because this is what owner’s ask of them through the procurement process.
  • Much of the supply base, where much of the innovation does happen, struggle to afford or acquire all the necessary expertise needed to develop reliable and cost-effective solutions.

And now the Global economic macro-environment has weakened, especially in Canada’s Energy and Resource sectors.

With today’s drop in energy and commodity prices, and a general shortage of industrial capital financing, industrial companies are slashing their technical and project teams and departments to reduce their operating expenses.  Until mid-2014 or so, production was King.  Now we see significant consolidations, downsizing, and a focus on industrial company survival.    An overly-lean team without enough access to critical skills is going to make current and future industrial projects even more difficult to meet expectations, budget and schedule.

With weakened balance sheets, industrial companies are going to need successful projects more than ever.

Keys for Improvement

We need to do better and we can do better with an improved application of management, strategy, approaches, and more respect for the complexity of today’s industrial projects.  While all key stakeholders have to improve, the greatest leverage is with the project sponsors.  They control the highest level need, budget, scope, risk profile, etc., and so they have the largest leverage on the outcome.

  • There needs to be a common understanding by both business and technical professionals on why there are so many issues with these projects, and going forward, how these projects should be developed, governed, and executed.
  • The project team needs to have the right skills, adequate staffing levels, and then a robust training program on how to best manage and implement the industrial project
  • The up-front design and planning work needs to be adequately funded and given enough time. A weak design and/or poor plan causes too many problems downstream when the activity and capital spend ramps up.
  • The right contracting strategy should be chosen, and the overall team constructed in a complete way and consistent with the strategy. The owner’s team must have the right skills and do all the scoping, concept work, requirements development, and overall management that is typical of successful contracts.  The contracting must be done so that the professional service firms deliver quality and get paid well enough for doing so.
  • Experienced and systematic approaches to the:
    • technical solution,
    • process of doing the project,
    • build and organization of the team

While the above roadmap seems obvious, the root cause of the problematic projects are issues in the above five points, in either the understanding, approach, strategy, or implementation.  Furthermore, they have to be done well enough to the sophisticated level required by the complexity in today’s projects.

When owner’s companies become more open to a longer term value and improved partnering with the contracting firms and the supply base, it can enhance productivity and innovation from their products and services to the owner’s projects over the life of the asset.  For example, engineering firms could provide more long term asset support.  They have significant data on all the projects from the design phase, and can get operational data from the currently operating assets.  Currently after the project build is finished, the engineering contractor moves their resources onto other projects (or if it is slow lets them go).  The owner’s operating department of the asset struggle without the contractor engineering support, design models, people continuity, etc., and often the result is the asset does not operate to its potential. There can be a great business case to further optimization and operational improvements to the operating asset that could be turned into a long term support contract.  Everyone wins.

We must change the way we do things for a better outcome, and the ways do exist.



Tailoring Product Development Processes

There is a wide spectrum of product development processes, from stage gate to spiral processes.  Stage gate processes are able to stage scope and investment decisions and are typically employed in capital intensive industries.  Spiral processes take advantage of many repetitions of the design-build-test cycle and are typically employed in software development.  There are many variants in between.



To best tailor the product development process for the organization, it is important to understand the:

  • business and strategy of the organization
  • architecture and complexity of the product
  • product/project schedule, budget, and requirements
  • risks and uncertainties
  • needed iterations in the process
  • capability and culture of the organization, including Global aspects
  • customers, stakeholders, and suppliers
  • best practices

The resulting product development process is then “systems engineered” as it is an integration of systems and systems elements – technical, process, and people.

There are many useful methods to choose from during this design process, including:

  • Design Structure Matrix (Eppinger)
  • Agile Methods
  • Lean Methods
  • Model Based Engineering
  • Collaborative Supplier Integration
  • Risk-based Planning
  • Quality Approaches

A key aspect of product development is dealing with all the risks and uncertainties, which means iteration is inherent in the process.  There are both planned iterations and unplanned iterations (to fix it when it’s not right).  It is important to understand the linkages, interactions, and drivers behind how the iterations will happen.  From that understanding, iteration can be accelerated through information technology, coordination techniques, or decreased coupling.  After that, by prioritizing risks, planning the needed iterations, planning the integration and test activities, and scheduling reviews to control the process, the project risks can be addressed.

The process must also be tailored to the organization, specific people, and key stakeholders.  This is probably the most difficult part, as it is all about dealing with people, managing change, and shifting cultures.  It is important to pick and choose the most important methods, implement them, and sustain them, in a practical way. Too many processes fail because they are not used, unwieldy, inflexible, not fully coherent,  too conservative, too bureaucratic, take too many resources, or are only partly implemented.  Beyond process definition, there is training, coaching, fine-tuning, and ensuring the team sees that the change is in their self-interest to adopt, and really “owns” any new processes.

While overall improving the process is a complex and difficult initiative, having a competitive Product Development process is key to quality products, low costs, speed to market, satisfied customers, and good business.

How to Dramatically Improve Health Care; Speed, Quality, Costs

After my recent in-depth experiences with both the Japanese and Canadian Health Care systems, I’ve continued my investigation why the Japanese system has dramatically reduced wait times, better outcomes, and lower costs as compared to the Canadian system.  I have included the US system as well.  It is clear to me that applying systems engineering to health care will both improve the system and lower its costs.  When I experienced the Japanese health care system, I was so shocked at how much better and faster it was than the Canadian system, I wrote a post on it in Sept 2013, and I repeat one of the key tables here:


In Canada, the wait times for critical diagnoses are getting worse (see this recent article in the Globe and Mail).  Cancer diagnosis can take 1 to 6 months (!!!) in BC whereas in Japan it can often be done in one day.

The US is getting serious about applying Systems Engineering to their Health Care System.  The White House’s President’s Council of Advisors on Science and Technology (PCAST) has published an excellent report in May of 2014 called “Report to the President, Better Health Care and Lower Costs: Accelerating Improvement through Systems Engineering”.  The Report is an excellent read and is surprisingly bold in its recommendations.   One of the main recommendations is to transition from a fee-for-service model, which is a disincentive to efficient care, to one that pays for value instead of volume.


Health Care Systems are very complex, with evolving medical science and technology, multiple stakeholders, increased specialization, and rising expectations of what can be done to treat illnesses, and a lot of realpolitik.  Systems engineering has been used successfully and widely in many other complex industries, such as manufacturing or aviation.  Systems engineering has also been used to good effect in health care, but too rarely and not widely, and barely at all on the macro scale.

The need to improve health care is required, with increased population, aging, and budgetary pressures.  The opportunity for improvement is massive.  In the US, approximately 33% of health care costs are wasted, 20-33% of hospitalized patients experience a medical error with about half of them preventable, many quality issues, and caregivers and patients do not have enough necessary information when needed.  In Canada, we see many of the same issues as the US, and while we have Universal coverage, wait times for necessary diagnostics or treatment are unnecessarily and often crazily long.  Even in Japan, with its worldwide overall best outcomes, low costs, and low wait times, significant improvements are possible in overall efficiency, information flow, costs, and caregiver conditions.

Examples of how systems engineering can improve health care include:

  • Denver Health saving $200 million in 2006 by doing a systems redesign of their operations. As an example to reduce waste, one industrial engineer found the trauma surgery resident physicians walk 8.5 miles in a 24 hour shift!
  • Kaiser Permanente identified 3x as many sepsis cases and cut mortality from sepsis by 50%
  • Virginia Mason has the lowest rate of serious medical infections and falls and reduced medical malpractice liability by 40%

 Systems Engineering Processsystengprocess

How impactful could systems engineering be if applied at all levels of Health Care? The promise is outcomes as good as Japan or the top tier American care, Universally applied, and lower costs to Government and Patients, with essentially no wait times.  It won’t happen overnight, but with the right strategy we could get there in 3-5 years. It is very feasible – if others can do it, we can too. That will then allow us to also be prepared for the greying of our populations.  It is good business too, as the improved systems can be exported to other parts of the world.  When you have a good system, look how it can dominate the market – like Amazon with its great portal, logistics, and network; or the Internet, with its scalability and extensibility, or air travel with its convenience, low costs, widespread usage, and high safety.

The best studies on how to improve health care by applying systems engineering tools and principles comes from the US.  An excellent paper and collection of studies was published in 2005 by the combined efforts of the National Academy of Engineering and the Institute of Medicine called “Building a Better Delivery Systems, a New Engineering/Health Care Partnership”.  I highly recommend this paper.  Much of this material formed the basis for the 2014 PCAST report to President Obama.  Yet one of the last papers in this collection highlights the real difficulties with making improvements in the US Health Care system by analyzing and giving painful examples of the political difficulty, especially with so many interests, organizations and the huge amount of money in the Health Care systems.

Other barriers include:

  • Misaligned incentive structure – fee-for-service vs fee-for-outcomes or value
  • Availability for data and relevant analytics
  • Limited technical capabilities, especially in small practices that make up the bulk of health care
  • Workforce competencies – limited knowledge of systems engineering tools and practices
  • Leadership / culture / politics

Yet while difficult, governments, organizations, and people around the world understand the need for change, the urgency for change, and that there will be change in Health Care.  It is hard work, it will take time, and there are many barriers, especially politically.  Slowly and steadily, I expect systems engineering tools, principles, and activities to be applied into the Health Care system.  You can help by reading the PCAST and other reports and supporting the application of Systems Engineering to Health Care.

For me, I am approaching Industry, Government and Academic leaders with this message and analysis, participating in consultations, etc.


Improving Systems Education and Research at Canadian Universities

In today’s world, products and processes are becoming more complex, and systems engineering is the best method to manage change and complexity.  Students that have academic and experiential capability in systems engineering will be more useful and attractive to potential employers.  Universities that provide a strong program in Systems will attract better students and improve academic and industry collaborations.  Industry and Government will benefit by improved systems development.


Engineering education worldwide has begun to broaden from preparing students for technical careers in a particular discipline to also prepare technical leaders that will develop complex systems or have their “subsystem” fit better into the next higher level system.  Engineers today are expected to be capable in management concepts and social science that encompass supply chains, politics, economics, and customers.  The leading Universities have made cross-functional organizations that often combine engineering, management, and social science into “Engineering Systems” systems-oriented schools.  These organizations can better cut across the more siloed traditional disciplines to offer integrated systems education and research which benefits from discipline fusion.

The forefront of the Engineering Systems Education and Research Universities include MIT ESD, Georgia Tech ISyE, Stevens SSE and SERC, Keio SDM, TUDelft TPM, and others.  There is a Council of Engineering Systems Universities (CESUN) that helps coordinate the development of this field of study, with about 60 universities as members.  SFU and the University of Waterloo are members of CESUN.

engineering systems

Overall I find much of the best Systems content comes from MIT Engineering Systems Department and associated community, such as from Steven Eppinger, or their book on Engineering Systems by de Weck, Roos, and Magee.  There is a lot of other great material out there from many others, but if I had to choose the best Engineering Systems University program, it would be MIT’s ESD program.  MIT’s ESD Strategic Plan is a worthwhile read.  To also see that other regions are also at the forefront of Systems education, the “SDM in Two Minutes” video from Keio University’s program is also worthwhile.

There is also strong Systems Engineering Professional Education Programs available from places like Caltech or Georgia Tech, as many organizations send mid-career engineers, project managers, business analysts and management to these programs.  INCOSE, the International Council of Systems Engineering also provides links to training and certification as a Systems Engineering Professional, again primarily for professionals in the workforce.

The Systems Engineering discipline primarily came from Industry and Government, especially Defense and Aviation, and is now grown to be applied to develop and manage the complex systems in Energy, Transportation, Health Care and other industries.  Both the Systems Engineering Professional Education and the University Education in Engineering Systems are complementary and synergistic.

Universities that provide Systems education provide Undergraduate programs, Graduate programs, or Professional Certificate programs, or a combination of all three.  Undergraduates with Systems education are able to become useful as a Systems Engineer right away. Charles Wasson makes a great argument for comprehensive systems engineering training at the undergraduate level to all engineers in this paper. At the same time, it can also be good to become well educated in one of the disciplines, like Mechanical or Software Engineering, and then take a Graduate degree in Systems, often with some work experience in between.  Many engineers in the workforce find that their background in one of the disciplines is not enough for being a leader in developing complex multi-disciplinary systems, so they return to get either a Graduate degree or take Professional courses in Systems.  The average age of students in MIT’s System Design and Management Program is 34, reflecting more mature students.


The Canadian University Programs in Engineering Systems or Systems Engineering are not as well developed as the leading Universities in this field.  UBC and SFU have undergraduate programs in Integrated Engineering and Systems Engineering respectively, and both are a good first step towards multi-disciplined engineering, but neither school has a Graduate Level or Professional Programs, and the current curriculum does not generally include the Systems Engineering fundamentals or have the same level of fusion with social sciences or management science as in other leading Universities.  SFU’s program is more of a Mechatronics program than what Systems Engineering is typically known for.  The University of Waterloo has perhaps one of the best Systems program in Canada, with their System Design Engineering program, which is both Undergraduate and Graduate level, though it has a flavour of more “subsystems engineering” than “macro systems engineering”.  Concordia also seems to have a good Systems program, graduate level, and focused on Information Systems.   U of T has a graduate certificates in global engineering or multidisciplinary engineering final project programs, but the bulk of instruction is still in the traditional disciplines, and there isn’t the same level of Systems education or Research as the leading Universities.  Overall for Canadian Universities there is a good start but there is much room for improvement.

Note there is a large diversity in the naming of these “Systems” programs, as to a certain degree, each University likes to brand their program as unique.

In my home region of Vancouver, there are many local companies that heavily use systems engineering in their development.  They include MDA, Westport, Ballard, and many small tech start-ups.  They have all had to teach the Systems Engineering discipline by bringing in external resources, as BC graduates don’t come with much Systems educational background.  For future BC developments, such as a new LNG plant, or improving our Health Care System, Systems Engineering is of great benefit.  In the rest of Canada, we have world leading companies like Bombardier, GE Canada, SNC-Lavalin, Cisco, and Blackberry that all heavily use Systems Engineering.

Canada is shifting from a more Resource-centric economy to more of a Knowledge-based economy.  One of the most effective pillars to do that is to ensure Canada has a very strong systems-centric engineering education at our academic institutions to complement the traditional disciplines.  Canadian Universities must improve their Systems education and Research.  There are great examples by the leading Universities that Canadian Universities can incorporate.

While these changes are difficult to do, because it requires organizational changes, there can be tenure and political issues, there are fixed budgets and five year plans already in place, and it can be hard to fuse departments between different faculties of Engineering, Management, and Social Science – the incredible benefits of improved Systems education to Canada, the Provinces, Industry, Students, and the Universities is well worth the investment.

The Power of Systems Engineering

The Problem

Complexity is increasing everywhere, with increased software, connectivity, public policy issues, and development cycle acceleration. The trend is to add features and functionality, often implemented in software. Software size and complexity are growing exponentially, with the marriage of hardware and software enabling systems-of-systems – with examples being portable phones or airliner cockpits.

New integration problems result from combining rapid technological advancement and obsolescence, increasingly complex hardware and software evolution, and a migration to increasingly software based systems. Yet an increase in software leads to increased interfaces and an increase in integration problems. Software interfaces are not as “transparent” as mechanical interfaces and go beyond inputs and outputs. Additionally complex system interfaces are crossing multiple suppliers, both hardware and software. Overall the trend is that it is not going to get better, it is only going to get worse as software lines of code increase, and integration, verification and validation efforts also increase.

Success is getting harder in the “New Normal”.

  • 50% of product launches fail to live up to company expectations
  • 33% of new products fail to provide a satisfactory return
  • 70% of resources spent on new launches are allocated to products that are not successful in the market
  • 80% of projects cost 20% more person-hours to launch than initially forecast

Problems that arise from unmanaged complexity are no longer affordable. For example there were 18 million vehicle recalls in the US in 2012, which is more recalls than vehicles sold, and each recall costs $100/vehicle/recall leading to $1.8 billion in direct costs!

The Solution

The Systems Engineering Approach is the most effective way to manage complexity and change. Reducing the risk associated with new systems or modifications to complex systems is one of the primary goals of the systems engineer. Solving problems early in the development cycle saves enormous costs and time in the later phases of development. Costs and schedule overruns lessen with increasing systems engineering effort.

Better to Find Problems Early!

Things that go wrong include:

  • Weak or non-existent basis to requirements
  • Inadequate costing and time-scale estimation
  • Weak control of suppliers and subcontractors
  • Integration problems
  • Inadequate test and acceptance strategy

The project who’s turn to be “it”:

  • Scope problems
  • No overall orchestrated cracking of the whip
  • No idea of costs at the outset
  • No system-wide vision
  • An ‘odds and sods’ approach
  • Bitten by unproven technology

Hidden benefits:

  • Risks that didn’t materialize
  • Rework that didn’t need to be done
  • Customer complaints that didn’t occur
  • Product deficiencies that are circumvented

Engineering roles are changing – what an engineer does and is expected of an engineer now includes broader market, financial and social issues.

It is difficult to train good systems engineers and foster “systems thinking” and get organizations to apply the “systems approach”. The systems approach has both an art and science component to it, and like similar disciplines like medicine, it can be taught by expert practitioners, typically from industries that successfully develop complex systems.

Whether you are developing a product or a process, a well implemented systems approach produces superior performance.