Why Conventional FMEAs fail too often, and why the Absolute Assessment Method FMEA is much better.

(Failure Modes and Effects Analysis)

amagasaki-0

On Oct 1, 2016, a commuter train crashed in New Jersey killing one and injuring 108 with high speed being a factor. The root cause of the crash is under investigation.

A similar crash happened in Amagasaki, Japan in April 2005 where 106 were killed and 562 injured, and high speed around a curve was a factor.  The conventional explanation of the root cause of the Amagasaki crash was corporate pressure on the driver to be on time.  Drivers would face harsh penalties for lateness, including harsh and humiliating “training” programs which included weeding and grass cutting duties.  In this case, the driver was speeding.  The resulting countermeasure in Amagasaki has been to put in an expensive $1-billion-dollar train speed control system on the small line to help mitigate a potential accident.

There have been many other high speed passenger train derailments, such as the Santiago de Compostela derailment in Spain in 2013 (79 dead, 139 injured out of 218 passengers), and the Fiesch derailment in Switzerland in 2010 (1 dead, 42 injured).  The root cause explanation of these accidents tends to focus on the drivers driving faster than they should, and countermeasures tend to focus on semi-automated systems to control train speed.

Do we really know the root cause of these accidents, and are the countermeasures both effective and economic?

One of the best root cause analyses I’ve seen on the Amagasaki crash comes from Unuma Takashiro, and his conclusion is unconventional.  Unuma-san is a Failure Modes and Effects Analysis (FMEA) consultant from Japan. FMEA is one of the best methods to analyze a design to help prevent failures.  FMEA was developed in the aviation and space industries in the 1960’s, adopted by the automotive industry in the 1990’s, and is now prevalent in many industries including health care.

Unuma-san argues that in the case of the Amagasaki crash, the speed control system is expensive and not fail-safe.  One economic and effective countermeasure would be to add a $250,000 guard rail, which at the very least would likely prevent a recurrence, and definitely be useful as an additional layer of countermeasure.  The advantage of low-cost and effective countermeasures is that they can be widely-deployed.

amagasaki-1

He argues the real root cause of this failure is that the overall engineering and management approach to mitigating failures was not adequate – both initially to prevent the accident in the first place, and subsequently after the crash by putting in the speed control system but not (also) the guard rail.

Unuma-san has a very interesting and useful website on FMEA practices, and uses the Amagasaki crash as one of many examples.  He promotes a FMEA method that uses an absolute evaluation method of countermeasures, as compared to the conventional FMEA which uses a relative evaluation method of countermeasures.  The problem with the relative evaluation method is that it can easily miss important failure modes that do not make an arbitrary priority cutoff.  Missing important failure modes often leads to unexpected incidents.

He also analyzes the conventional FMEA approach and teachings, and points out many problems seen in industry:

  • ineffective because of missing failure modes,
  • done too late in the design process, making it more difficult and less likely to implement countermeasures,
  • led by team members from other departments that are not responsible for the design, which both lowers the effectiveness of the analysis and can allow the designer to not be held fully accountable for the FMEA results,
  • doesn’t promote economical countermeasures, and
  • many of the common FMEA teachings contain flaws that promote the above problems.

Unuma-san shows that many FMEAs confuse failure mechanisms (the physical, chemical, thermal, electrical, biological, or other stresses leading to the failure mode) and the actual failure modes (ways a product or process can fail), leading to missing failure modes.  If a failure mode is missed, then there may be no countermeasure identified, and subsequently incorporated into the design.

He points out that the relative evaluation FMEA method promotes doing the FMEA on the entire design when enough of the design is done, then once the FMEA is done to a certain level, all of the issues are prioritized, and then acted upon.  The problem with this approach is that FMEAs take a lot of time, and by the time the results are done, the recommended changes to the design can be too late to be easily implemented.  He promotes instead that the designers do the FMEA as they are doing the design in a very concurrent and “local” manner, while evaluating the countermeasures in an absolute manner against the individual failure mode.  This more easily allows for countermeasures to get into the design of the product or process in the early stages.

When non-designers take too much of the FMEA responsibility and scope, the effectiveness of the FMEA is reduced and the results are available late in the design process.  The effectiveness is reduced because non-designers are unable to know all the key information in the heads of the designers, and the designers may feel less accountable for the FMEA quality.  Results are delayed because instead of countermeasures being considered at the time of the design decision, they are made available after the design decision has been made and it is then more difficult and less likely to have any countermeasure implemented.

Unuma-san’s method is simpler than many FMEAs, by using a four-point scale to the third power (64 ratings), vs. many conventional approaches using of a 10-point scale to the third power (1000 ratings).  He promotes determining countermeasures per failure mode, evaluating the likely success of those countermeasures, and whether there is opportunity for optimization and lower costs from reducing overdesign.

Unuma-san goes on to analyze the common teachings of FMEA by referring to many of the most common reference material available in books, training material, websites, etc. and he shows many flaws, inconsistencies, interpretation issues etc. that tend to exacerbate the above issues.  Much of the trouble with conventional FMEAs can be traced to poor teachings.

Unuma-san has consulted for a very long and impressive list of Japanese companies on FMEA in the transportation, health care, manufacturing, and consumer goods industries.

I’ve been both a lead designer of multiple complex systems, and I’ve been helping clients improve their product development processes, including FMEA.  The teachings of Unuma-san resonate strongly with me.  Too often I have seen poorly done FMEAs that miss critical failure modes, late FMEAs whose recommendations are too late to be useful, and FMEA study teams that don’t have enough participation by the design team.  The absolute evaluation method FMEA is a substantial improvement over the relative evaluation method, mostly because it evaluates the likely success of countermeasures.  I highly recommend his webpage on FMEAs, and it is linked here.  It is a little hard to read as the website translation to English isn’t the best, but worthwhile.

I think one of the reasons why FMEA teachings have many issues is that few FMEA teachers have been skilled design engineers, but are instead people that gravitate to process design.  The idea behind the FMEA is good and includes teaching early and effective analysis, unfortunately much of the applied practice falls short.  A skilled design engineer naturally considers failure modes and tries to design them out, while simultaneously considering many other design tradeoffs, such as performance, function, economics, aesthetics, ergonomics, etc.  I think that since the conventional FMEA trainers developed the applied practice of the FMEA, they have continued to build upon the original process of the relative assessment method, and have struggled to develop effective practices that overcome the conventional process shortcomings.  In my experience many design engineers have found FMEA to be a good idea but too slow, too time-consuming, and not effective enough to really embrace it.

What I like about Unuma-san’s method it is practical, effective, time-efficient, and evaluates the likely success of countermeasures.  It can be very useful to have FMEA experts, trained in this method, who can help designers with training, facilitation, documentation, review, etc.

There are a few other improved FMEA methods available that are trying to address some of the effectiveness and lateness problems with conventional FMEAs, such as “FMMEA”, (Failure Modes, Mechanisms, and Effects Analysis), and there are good teachings in these methods as well.  I have found Unuma-san’s method to be among the best and really resonates with me.

FMEA is one of the best methods to help avoid failures.  By making the method more effective, products, processes, projects, and infrastructure can have less problems and be more economical.  I highly recommend further study on this topic for engineers and managers delivering any system.

Craig Louie, P.Eng., Co-Founder, SysEne Consulting

Why are Kei cars so popular in Japan and will they be popular elsewhere?

During my stay in Japan, small 660 cc engine Kei-cars are very noticeable, especially in the more rural areas.  In the past few years in Japan, approximately 40% of new cars sold are Kei cars.

Kei-Car

Kei cars are very popular in Japan because they are inexpensive – about half the price of a Prius, they get the same fuel economy as a Prius, they are very practical and roomy, they are easy to park in crowded Japan, and they have lower taxes and licensing costs.  Women make up approximately 65% of the owners, and in some Prefectures, 99% of households own one, often as a second car.  They are more popular in rural areas as compared to a big city like Tokyo, where it is more convenient to take public transit and owning a car is not as necessary as a more rural area.

Kei cars are not planned for the US or Canada because small cars are not really popular here, nor would they meet the US or Canadian safety standards because of their small size and lightweight build.  The safety of Kei cars is not much of a concern in Japan, because road traffic accidents are amongst the lowest in the world in Japan (about 1/3 the victim rate of the US), and continues to drop every year, even with so many Kei cars on the road.  Japan has good road safety measures, good driver training, and Kei cars are more popular in the more rural areas where road speed tends to be lower (though Japan has the highest rate of elderly traffic deaths at 54% vs. the US at 17%).

The Kei cars in Japan are made by all the major Japanese auto manufacturers, and they are becoming increasingly loaded with high technology based on customer demand – turbochargers, infotainment systems, airbags, remote controlled doors, keyless start, collision avoidance systems, CVT, and four wheel drive.

nbox view

One of car reviewers I like is Bertel Schmitt, of the Blog “The Truth About Cars”, and he writes a pretty good review of a typical Kei car.  It is a positive review for its market segment.

The Japanese Government is concerned that the Kei cars are too popular in Japan, as they not manufactured for export, because of their small size and insufficient safety equipment.  The Japanese Government would prefer Japanese automakers to develop “world cars” for the economies of scale to compete in the Global market.  Therefore, the Japanese Government has raised taxes on the Kei cars in 2014 and plans further tax increases in 2015.

One advantage of these small cars is that they contribute to lowering the amount of oil imports into Japan.  At current oil prices, Japan has a net outflow of $100 billion/year from their economy from oil imports.  The automotive sector in Japan has been steadily lowering its oil consumption over the past 10 years.

Honda is planning to raise their production from 4 million cars today to 6 million by 2017, and they see these minicars as one of the main ways to do achieve their targets by targeting markets in India or Southeast Asia.  They will then be able to realize a return on their Kei car technology investments.

I think Honda is on the right track.  The Japanese market is purchasing Kei cars much more than expected because for a large percentage of Japanese consumers – especially younger people, women, families that need a second car, small business owners, and rural areas – these vehicles make sense.  Even with the 2014 Kei car tax increases, Kei cars are up 12% year to date over 2013, and higher than forecasts.  While the numbers may drop with the 2015 Kei car tax increase, Kei cars have gone from a car to settle for to a desired car.  There will be many world wide markets that have similar conditions where these cars, or similar minicars will make sense.

I don’t expect Kei cars will come to the US or Canada for a long time, if ever.  In the US and Canada, the safety standards are not going to change, small cars are not popular, and fuel and other operational costs are relatively low compared to many other countries.

Honda has also introduced their S660 Roadster which they plan to introduce in 2015 as a Kei car for Japan, and perhaps with a 1 liter motor for other markets.  I’ll be interested to see how this vehicle sells.

honda-s660-concept-7_1600x0w

Movie Review: The Challenger Disaster

1108f-challenger-bd-50p

9/10

This excellent 90 minute movie brings to life the great story of Richard Feynman’s investigation into the Space Shuttle Challenger disaster.  I found the movie had good pacing, rang very true to what actually happened, and had very good acting by William Hurt as Feynman, Bruce Greenwood, and Brian Dennehy.

The movie is based on Feynman’s book “What Do You Care What Other People Think”, which is also a terrific book.  The story follows Feynman’s instrumental role in uncovering the truth about the root cause of the disaster – both technically and politically.  Feynman’s personal heroism against strong headwinds and personal illness makes for a compelling story.

The movie does great justice to key scenes – the dramatic O-ring experiment, the personal difficulties of Feynman, to the political conspiracy surrounding and both supporting and opposing his investigation.

oring

William Hurt’s performance was able to draw me in emotionally into the story.  I’ve not really been a big fan of William’s performance in other movies, as I didn’t like him as Duke Leto in Frank Herbert’s Dune (too stiff), and he was ok in Dark City.  Yet in this movie he was able to capture Feynman’s unique character very well.

The movie inspired me to re-read “What Do You Care What Other People Think?”, which I had read over 20 years ago.  The overall story of Feynman and the Challenger continues to be sharply relevant today with widespread complex system development, that have significant safety consequences, large multi-stakeholder interests, often conflicting, and sometimes these interests are inclined to bury the truth.

One of the most interesting short stories in “What Do You Care What Other People Think” is the story of Richard, and his first wife, Arline.  It is a great love story, despite its tragic nature.  The book’s title came from her.

This movie (and book) is highly recommended!

For a successful technologyreality must take precedence over public relations, for nature cannot be fooled. – Richard Feynman

Is your Complex System Project on track for Ultraquality Implementation?


Boeing_777_above_clouds,_crop

We expect complex systems like an airplane, a nuclear powerplant, or a LNG plant to practically never fail.  Yet systems are becoming increasingly complex, and the more components there are in a system, the more reliable each component must be, to the point where, at the element level, defects become impractical to measure within the time and resources available.

Additionally, in future, our expectations will increase for complex systems durability, reliability, total cost of ownership, and return on investment, as energy and raw materials increase in cost.

Ultraquality is defined as a level of quality so demanding that it is impractical to measure defects, much less certify the system prior to use.  It is a limiting case of quality driven to an extreme, a state beyond acceptable quality limits (AQLs) and statistical quality control.

One example of ultraquality is commercial aircraft failure rates.  Complexity is increasing: the Boeing 767 has 190k software lines of code, whereas the Boeing 777 has 4 million lines of code, and the Boeing 787 about 14 million lines of code.  The allowable failure rate of the flight control system continues to be one in 10 billion hours, which is not testable, yet the number of failures to date is consistent with this order of magnitude.

sloc

Another example of ultraquality is a modern microprocessor, which has the same per chip defect rates despite the number and complexity of operations have increased by factors of thousands.  The corresponding failure rate per individual operation is now so low to be almost unmeasurable.

 

What are the best practices to achieve ultraquality in complex systems?

Meier and Rechtin make a strong case that while analytical techniques like Six Sigma and Robust Engineering Design will get you close, the addition of heuristic methods will get you over the top.  This includes using a zero defects approach not only in manufacturing, but also design, engineering, assembly, test, operation, maintenance, adaptation, and retirement – the complete lifecycle.

There are many examples how analytical techniques alone underestimate failure; for example the nuclear industry analysis of core damage frequency is off by an order of magnitude in reality.

fukushima

A sample of applicable heuristics include:

  • Everyone in the production line is a customer and a supplier [also extended to each person in the development team – engineering, supply, etc.]
  • The Five Why’s
  • Some of the worst failures are system failures
  • Fault avoidance is preferable to fault tolerance in system designs
  • The number of defects remaining in a system after a given level of test or review  (design review, unit test, system test, etc.) is proportional to the number found during that test or review.
  • Testing can indicate the absence of defects in a system only when: (1) The test intensity is known from other systems to find a high percentage of defects, and (2) Few or no defects are discovered in the system under test.

whatwedontknow

[pie chart courtesy Boeing.  FBW = Fly By Wire]

There is a lot more material on “how-to” in the works of Meier and Rechtin, Juran, and Phadke.

Ultraquality requires ultraquality throughout all the development processes, and by extension throughout the delivering organization.  That is, certify a lack of defects in the final product by insisting on a lack of defects anywhere in the development process.  Developing both the processes and organization to achieve this state is possible, is being done in some organizations, and allows for superior business performance.

There are many examples how organizations lack ultraquality in their processes or organization.  General Motors is under heavy criticism these days following the Valukas report, which exposes the poor organization and development practices.  This is anecdotally impacting the GM dealers and turning them into ghost towns.

So back to the tagline: is your complex development project on track for ultraquality implementation?

Model Based Systems Engineering Readiness for Complex Product Development

iron-man_tony-stark-desk_1sm

The increasing nature of complexity of today’s systems and systems-of-systems make it increasingly difficult for systems engineers and program managers to ensure their product satisfies the customer. As an example, in this year alone, General Motors has recalled more vehicles in the US than it made in 2009 to 2013 – and it is only May!

n-GM-570

May 21, 2014, http://www.huffingtonpost.com/2014/05/21/gm-recall-more-than-sold_n_5367478.html

Over the past 5-10 years, a formal discipline of Model Based Systems Engineering (MBSE) has been developed by the Systems Engineering community to catch up with rigorous model tools available to the other domains, such as CAD/FEA for mechanical engineering, or VMGSim/Hysys for chemical engineering, or C++ code generators for software development.

The combination of increased complexity, increased domain model usage, and drive towards virtual product development and simulation capability have made it very difficult to make sure there is consistency in all the models, documents, and data sets for a complex product.  Without one single truth in the data set, there is increased likelihood of downstream problems.  MBSE is now in a position to allow systems engineers develop a rigorous coherent flexible system model that can be an integrating design and development function across the program lifecycle, enabling this future vision:

mbse vision

Source: INCOSE MBSE Workshop, Jan 2014

The main benefits of MBSE are:

  • Reduced rework, earlier visibility into risk and issues
  • Reduced cycle time, reduce development cost, cost avoidance
  • Better communication and more effective analysis
  • Potential for increased re-use (product line reusability: engineering done once, reuse elsewhere)
  • Ability to generate and regenerate current reports and work products
  • Knowledge management (long-term and short-term)
  • Single source of truth
  • Competitiveness (our partners and competitors are doing it)
  • Think about how much of an engineer’s time is spent on data management rather than critical thinking (Change that ratio! Shift the nature of my hours)

While models have always been a part of the document-centric systems engineering process, they are typically limited in scope or duration, and not integrated into a coherent model of the entire system.

MBSE uses a graphical modelling language, called SysML, which is an extension of UML (Universal Modelling Language) developed by the software industry.  The SysML language and a MBSE modelling tool allow systems engineers to develop descriptive models of the system.  As an example:

sysml model

Source: INCOSE MBSE Workshop, Jan 2014

There are several MBSE tools available, Rhapsody (IBM), MagicDraw (No Magic), and Enterprise Architect (Sparx).  These tools have been successfully been used by companies like Ford, Boeing, or Lockheed Martin, and they continue to improve.  MBSE is still relatively early in development as compared to other domain tools, like CAD, FEA, or PLM (Product Lifecycle Management), but is now at a stage that it can have an immediate impact on the developing system.  There are many connecting tools to PLM tools or Requirements Management tools like Rational DOORS or other disciplines.

I have found it really tough (and to a certain degree impractical) using the document-centric systems engineering approach to keep all the various design documents and models up to date and consistent with each other.  I’ve been using MBSE tools from both NoMagic and Sparx, and they are both pretty good at capturing all the necessary systems engineering information in one model.  There aren’t many good tutorials and examples available to the public domain, but still enough to learn from.  I have been able to steadily and productively apply MBSE to my system design and analysis work.

I highly recommend any organization that is doing complex product development to consider MBSE.  It is the future for fast and high quality product development.

 

Insight and Heuristics in System Architecting

 One insight is worth a thousand analyses

iron man simul

-Engineering and Art: Iron Man 3

Systems Architecting is as much art as it is science.  The best book on this subject is from Maier and Rechtin, and I highly recommend it.

ArtArchitecting

-Maier and Rechtin, The Art of Systems Architecting, second edition, CRC Press, 2000

One of the best section of the book deals with using the method of Heuristics in architecting.  Insight, or the ability to structure a complex situation in a way that greatly increases one’s understanding of it, is strongly guided by lessons learned from one’s own or others’ experiences and observations.  Given enough lessons, their meaning can be codified into “heuristics”.  Heuristics are an essential complement to analytics.

As in the previous post, where the system engineer is to consider the whole and apply wisdom, Maier and Rechtin also promote the use of wisdom but they note that “Wisdom does not come easy”

  • Success comes from wisdom
  • Wisdom comes from experience
  • Experience comes from mistakes

While required mistakes can come from the profession as a whole, or from predecessors, it also highlights the importance of systems engineering education from those skilled in the art.

Examples of heuristics are:

  1. Don’t assume that the original statement of the problem is necessarily the best, or even the right one
  2. In partitioning, choose the elements so that they are as independent as possible; that is elements of low external complexity and high internal complexity
  3. Simplify. Simplify. Simplify.
  4. Build in and maintain options as long as possible in the design and implementation of complex systems.  You will need them.
  5. In introducing technological and social change, how you do it is often more important than what you do
  6. If the politics don’t fly, the hardware never will.
  7. Four questions, the Four Whos, need to be answered as a selfconsistent set if a system is to succeed economically; namely, who benefits?, who pays? and, as appropriate, who loses?
  8. Relationships among the elements are what give systems their added value
  9. Sometimes it is necessary to expand the concept in order to simplify the problem.
  10. The greatest leverage in architecting is at the interfaces.

-taken from Maier and Rechtin, The Art of Systems Architecting, second edition, CRC Press, 2000

 Heuristics are tools, and must be used with judgement.  The ones presented in the book are trusted and time-tested.  They may not apply specifically to your complex systems architecting work, though I think you will find most of them do.

Clean Energy Power Using the Elements of Hope

 

The Clean Energy industry depends on significant quantities of precious and rare earth materials, and if these power systems and vehicles were scaled to mass quantity levels, the demand would exceed economic supply.  China is the dominant mining source for rare earth metals, and has recently put in place yearly export quotas, which creates uncertainty in supply and raises prices.  An easy-to-read summary infographic by Vouchercloud connects Rare Earth materials, it uses, and sources (excerpt below).

excerpt

There are many industries that also use precious and rare earth metals – IT, Defence and Health – and this affects worldwide prices and supply.  Even iPhones contain significant amounts – and many consumers don’t recycle them, nor are their rare earth metals be recycled (check out this gorgeous infographic from 911Metallurgist on the iPhone).

If your Clean Energy product or project depends on rare and precious materials, the cost engineering prognosis is especially difficult as the material prices and supply have significant uncertainty, and recycling/reuse/remanufacturing has much longer timeframes than 18 month iPhones.  An automobile is typically on the road for 17 years before disposal and stationary power systems can be 30 years or longer.

A paper from Diederen defines which elements are ideal for Clean Energy, and calling them the “Elements of Hope” (example Fe, Al, Mg). Using the “Elements of Hope” in your product may safeguard you from material supply and cost risks, and potentially give you a competitive advantage.  These elements are likely to have the long term demand less than the economically practical supply.

It is possible to choose designs and materials from these elements, and the tradeoffs can overall be beneficial.  We give three examples:

  1. Electric Motors: many electric motors today use neodymium or dysprosium.  Toyota has recently teamed Tesla to product an Induction Motor powertrain for the RAV4 EV to avoid these rare earth metals.  Even for strong permanent magnets is could be possible to not use rare metals:

    Figure 1: Reference Matthias Katter, "Industrial development of materials for sustainable development (magnets + magneto-caloric materials)", September 2009

    Figure 1: Reference Matthias Katter, “Industrial development of materials for sustainable development (magnets + magneto-caloric materials)”, September 2009

  2. Solar PV industry.Solar
  3. Fuel cell bipolar plates would either be carbon or metallic plates that use low cost material coating.

Your Clean Energy Power/Transportation Product/Project will have the best chance of success when the entire energy, value, recycle, and material chain is an integral part of the strategy, design, and planning process, using the most up to date methods.