System Failure Case Study – Ice Falling from Port Mann Bridge Cables

Posted on December 22, 2016 by Michael Eiche

Background: The Port Mann Bridge is a ten-lane cable-stayed bridge that spans the Fraser River east of Vancouver, B.C. At the time of its opening in 2012, it was the widest bridge in the world at 65 m (213 ft). It is a key artery for Metro Vancouver, with about 100,000 crossings per day.

What Happened: Shortly after opening, during a winter storm in December 2012, “ice bombs” – clumps of ice and snow, dropped from the overhead cables, hitting and damaging over 300 vehicles. Windshields were broken, and dents were left in roofs and along the sides of cars. A few people suffered injuries during the incident.

Proximate Cause: Insufficient measures were in place to mitigate the hazard of ice and snow falling from the overhead cables, which, in the case of the Port Mann bridge, cross over the roadway. The contract between Transportation Investment Corporation (TI Corp.), the Crown corporation responsible for the Port Mann, and Kiewit/Flatiron General Partnership, included the requirement that “Cables and structure shall be designed to avoid ice build-up from falling into traffic.” The bridge’s designer maintained that measures including spacing the cable anchors away from traffic, using central pylons which avoid large cross frames over traffic, and using HDPE stay pipes were sufficient to minimize the risks associated with snow and ice accumulation.

Underlying Issues: There were many parties involved in the issue – the owner/operator, the design-build contractor, the maintenance contractor, and consultants. An analysis of the documentation and internal emails on the issue (from a freedom of information request done by an independent journalist) shows that none of the parties really “owned” the issue, but rather they assumed that either the risk was negligible or at least sufficiently managed, or that someone else would take care of the risk mitigation.

The issue of falling ice and snow from cable-supported bridges is well known. Examples can be found with bridges in Denmark, Sweden, Japan, the U.K., and in the U.S., including the Tacoma Narrows Bridge in Washington state. Methods to manage the risk are also well documented, although in many cases there are few practical solutions beyond traffic closure.

Ice and snow accretion removal systems can be mechanical, thermal, or passive in nature. In addition, localized monitoring of weather and ice/snow build-up can be used to pre-emptively close individual traffic lanes or the entire bridge when necessary. Finally, the basic choice of having vertical cable planes instead of inclined cable planes (“fanned” cables suspended over traffic) can reduce the risk.

Aftermath: B.C.’s transportation minister blamed the design-build contractor, Kiewit/Flatiron, stating that the bridge did not meet the requirements. The provincial automobile insurance agency, ICBC, paid $400,000 in claims, and lawsuits were filed by people who were injured during the incident. In response to one of the civil claims, TI Corp. and Kiewit/Flatiron stated “The buildup and subsequent release of ice and snow from the bridge structure was the result of a confluence of extreme environmental conditions, both unforeseen and unforeseeable to the defendants or any of them and was the inevitable result of an Act of God”.

Despite that statement, cable collars were installed in the fall of 2013 to help mitigate the hazard. These are manually released from the top of the towers and slide down the cable stays to dislodge accumulated snow and ice. The intent is to do this frequently enough to ensure the dislodged pieces are small enough to avoid damage to vehicles below.

In addition, a weather station and cameras were installed to allow for monitoring of conditions that could lead to ice and snow accumulation on the cables. Operating procedures were put in place to shut down the bridge if dangerous conditions were detected.

The Port Mann collar system is said to be one of the most successful systems in the world. However, it is not fool-proof. Snowfalls during December 2016 have so far resulted in about 50 insurance claims from falling ice and snow. This may have been a result of the operators not deploying the collars early enough during rapidly-changing weather conditions.

Interestingly, over the same period, 95 claims resulted from the same situation occurring on another Vancouver area bridge, the Alex Fraser, which has cables that do not hang over the bridge deck. High winds together with rising temperatures can blow ice and snow off the cables into traffic. The bridge was shut down for several hours on two days, due to “ice bombs”. Falling ice also occurred in 2005, 2008, and 2012, but overall the Alex Fraser is less prone to these incidents than the Port Mann. As a temporary measure, the Ministry of Transportation indicated that for the Alex Fraser, it will use a heavy-lift helicopter to blow snow accumulations off the cables.

The experiences on the Port Mann as well the Alex Fraser bridges will be applied to the upcoming bridge to replace the George Massey Tunnel. In particular, the cable stays will not be allowed to cross over traffic and cable collars or an improved alternative will be required.

Lessons Learned:

The requirements were written in a way that left the criticality and expectations somewhat open to interpretation. For some, the word “avoid” means you must completely ensure no risk of an incident. For others, avoid means “try” or “use typical means and methods”. One other section in the listed standards specified “practical” solutions were acceptable. Yet, unless specifically defined, the word “practical” allows for many different measures of acceptable implementation. The expectations around acceptable public safety were not met in this case, and best practices around requirements for public safety are typically better defined than what existed for the Port Mann Bridge.
Designers should account for the complete system and its operating environment. The micro-climates of Vancouver are well known, and hazardous when heavy wet snow is mixed with freezing and thawing conditions.
Repeated questions raised regarding the risks of falling ice and snow could have resulted in a risk analysis leading to a more effective ice hazard mitigation strategy, rather than simply assuming the original design would be adequate.
Public safety issues need to be considered carefully and critically, and receive considerable attention from management.
Ultimately, the Owner is most likely to be held responsible for the performance of the implemented design and its impact on third parties (in this example, motorists being hit by “ice bombs”). While many projects have multiple contractors and parties involved in the design, construction, operation and maintenance, the Owner team needs to ensure that there is sufficient capability, staffing, mandate, and expertise held by these parties to be able to ensure quality in requirements definition and in the design, build, verification, validation, and operation and maintenance stages to mitigate issues.

Michael Eiche, P.Eng. Principal, SysEne Consulting

Addressing the Elephant in the room at COP22

Posted on November 20, 2016 by Flyn McCarthy

The Convention of Parties (COP22) titled the ‘COP of Action’ is wrapping up here in Marrakesh, Morocco with policy makers and negotiators from nearly 200 countries outlining with more clarity their Nationally Determined “Carbon” Contributions (NDCs) and the policy measures intended to meet the Paris Agreement. There was obvious momentum behind the deployment of key measures such as carbon pricing and renewable energy but the elephant in the room was the ample supply of low-cost fossil fuels as well as the election of Donald Trump.

Addressing the elephant in the room was US Secretary of State John Kerry, who reinforced that the US would keep its commitments and stated that he was “convinced the pledges could not be reversed”. Kerry left the crowd with a much needed sense of optimism stating that, “In a time of uncertainty, actionable plans to avoid runaway climate change matter more than ever – and that’s what we got today.” Speaking about a mid-century carbon plan which commits the US to even further reductions of 80% below 2005 levels by 2050. In addition to Kerry’s statement, 360 US businesses including a dozen Fortune 500 companies issued an open letter to President-elect Donald Trump at COP22 to follow through on US commitments made under the Paris Agreement. US Companies including Nike, Hewlett Packard, General Mills and DuPont argue in the letter that participation on climate change is good for business.

Both the US and Canada, as well as Mexico, Sweden and others joined Germany at COP22 in releasing 2050 plans to guide investment and drive reductions in carbon emission. The plans identified tools intended to chart the quickest possible path to carbon reductions including: fossil fuel subsidy removal, carbon pricing, renewable energy requirements and energy efficiency standards. Ontario, Quebec and California held a tri-lateral meeting focused on the benefits of their linked carbon market which helps companies realize the lowest possible costs for carbon reductions. China’s nationwide emissions trading scheme (ETS) to be rolled out in 2017 was also indicated as potentially linking-up to the EU system, taking a step towards an international carbon trading market.

Financial impacts to Emissions-Intensive and Trade-Exposed industry were also recognized in various high level talks with representative from oil and gas, agriculture, mining and manufacturing. Oil and gas companies were a major focus, just prior to COP22 a number of the world’s biggest oil companies, including Saudi Aramco and Royal Dutch Shell, pledged to invest $1 billion to develop climate-friendly technologies as part of the Oil and Gas Climate Initiative (OGCI). Discussions at the COP22 Innovation Forum illustrated the opportunity for significant reductions from the mining industry including emission reductions from the integration of renewables, application of big data for energy efficiency gains and use of green financing to develop innovative technologies. The COP22 Low Carbon Solutions Forum showcased some of the leading companies that are putting carbon competitiveness as a priority including Moroccan miner OCP.

Direct impacts resulting from climate change and adaptation were another major focus of the talks. Impacts including desertification, major droughts and extreme weather events were a key focus of this “Africa focused COP”. Phosphate miner OCP illustrated how their investment in desalination would help to mitigate the risk of drought to their operations. Closer to home, impacts due to disproportionate warming in the far reaches of Canada’s north were also discussed. The lack of cold temperatures required for the development of ice roads, the supply lifelines for northern communities and mining operations in Canada’s North, is a risk that is already being felt as indicated in this year’s Arctic warmth and the lack of sea ice (see below).

Now, after the dust has settled on the conference floor what is clear is that despite the elephant in the room, important global stakeholders are committed to continuing the momentum of the Paris Agreement with legally binding actions and a tight timeframe. Certain aspects of this international climate regime are already in place and others are in the development process. What is also clear is that businesses that have a good understanding of the financial risks and opportunities are already taking action. These companies will be able to adapt and will thrive in the future carbon constrained world.

Flyn McCarthy, P.Eng. Principal, SysEne Consulting Inc.

Why Conventional FMEAs fail too often, and why the Absolute Assessment Method FMEA is much better.

Posted on October 20, 2016 by Craig Louie

(Failure Modes and Effects Analysis)

On Oct 1, 2016, a commuter train crashed in New Jersey killing one and injuring 108 with high speed being a factor. The root cause of the crash is under investigation.

A similar crash happened in Amagasaki, Japan in April 2005 where 106 were killed and 562 injured, and high speed around a curve was a factor. The conventional explanation of the root cause of the Amagasaki crash was corporate pressure on the driver to be on time. Drivers would face harsh penalties for lateness, including harsh and humiliating “training” programs which included weeding and grass cutting duties. In this case, the driver was speeding. The resulting countermeasure in Amagasaki has been to put in an expensive $1-billion-dollar train speed control system on the small line to help mitigate a potential accident.

There have been many other high speed passenger train derailments, such as the Santiago de Compostela derailment in Spain in 2013 (79 dead, 139 injured out of 218 passengers), and the Fiesch derailment in Switzerland in 2010 (1 dead, 42 injured). The root cause explanation of these accidents tends to focus on the drivers driving faster than they should, and countermeasures tend to focus on semi-automated systems to control train speed.

Do we really know the root cause of these accidents, and are the countermeasures both effective and economic?

One of the best root cause analyses I’ve seen on the Amagasaki crash comes from Unuma Takashiro, and his conclusion is unconventional. Unuma-san is a Failure Modes and Effects Analysis (FMEA) consultant from Japan. FMEA is one of the best methods to analyze a design to help prevent failures. FMEA was developed in the aviation and space industries in the 1960’s, adopted by the automotive industry in the 1990’s, and is now prevalent in many industries including health care.

Unuma-san argues that in the case of the Amagasaki crash, the speed control system is expensive and not fail-safe. One economic and effective countermeasure would be to add a $250,000 guard rail, which at the very least would likely prevent a recurrence, and definitely be useful as an additional layer of countermeasure. The advantage of low-cost and effective countermeasures is that they can be widely-deployed.

He argues the real root cause of this failure is that the overall engineering and management approach to mitigating failures was not adequate – both initially to prevent the accident in the first place, and subsequently after the crash by putting in the speed control system but not (also) the guard rail.

Unuma-san has a very interesting and useful website on FMEA practices, and uses the Amagasaki crash as one of many examples. He promotes a FMEA method that uses an absolute evaluation method of countermeasures, as compared to the conventional FMEA which uses a relative evaluation method of countermeasures. The problem with the relative evaluation method is that it can easily miss important failure modes that do not make an arbitrary priority cutoff. Missing important failure modes often leads to unexpected incidents.

He also analyzes the conventional FMEA approach and teachings, and points out many problems seen in industry:

ineffective because of missing failure modes,
done too late in the design process, making it more difficult and less likely to implement countermeasures,
led by team members from other departments that are not responsible for the design, which both lowers the effectiveness of the analysis and can allow the designer to not be held fully accountable for the FMEA results,
doesn’t promote economical countermeasures, and
many of the common FMEA teachings contain flaws that promote the above problems.

Unuma-san shows that many FMEAs confuse failure mechanisms (the physical, chemical, thermal, electrical, biological, or other stresses leading to the failure mode) and the actual failure modes (ways a product or process can fail), leading to missing failure modes. If a failure mode is missed, then there may be no countermeasure identified, and subsequently incorporated into the design.

He points out that the relative evaluation FMEA method promotes doing the FMEA on the entire design when enough of the design is done, then once the FMEA is done to a certain level, all of the issues are prioritized, and then acted upon. The problem with this approach is that FMEAs take a lot of time, and by the time the results are done, the recommended changes to the design can be too late to be easily implemented. He promotes instead that the designers do the FMEA as they are doing the design in a very concurrent and “local” manner, while evaluating the countermeasures in an absolute manner against the individual failure mode. This more easily allows for countermeasures to get into the design of the product or process in the early stages.

When non-designers take too much of the FMEA responsibility and scope, the effectiveness of the FMEA is reduced and the results are available late in the design process. The effectiveness is reduced because non-designers are unable to know all the key information in the heads of the designers, and the designers may feel less accountable for the FMEA quality. Results are delayed because instead of countermeasures being considered at the time of the design decision, they are made available after the design decision has been made and it is then more difficult and less likely to have any countermeasure implemented.

Unuma-san’s method is simpler than many FMEAs, by using a four-point scale to the third power (64 ratings), vs. many conventional approaches using of a 10-point scale to the third power (1000 ratings). He promotes determining countermeasures per failure mode, evaluating the likely success of those countermeasures, and whether there is opportunity for optimization and lower costs from reducing overdesign.

Unuma-san goes on to analyze the common teachings of FMEA by referring to many of the most common reference material available in books, training material, websites, etc. and he shows many flaws, inconsistencies, interpretation issues etc. that tend to exacerbate the above issues. Much of the trouble with conventional FMEAs can be traced to poor teachings.

Unuma-san has consulted for a very long and impressive list of Japanese companies on FMEA in the transportation, health care, manufacturing, and consumer goods industries.

I’ve been both a lead designer of multiple complex systems, and I’ve been helping clients improve their product development processes, including FMEA. The teachings of Unuma-san resonate strongly with me. Too often I have seen poorly done FMEAs that miss critical failure modes, late FMEAs whose recommendations are too late to be useful, and FMEA study teams that don’t have enough participation by the design team. The absolute evaluation method FMEA is a substantial improvement over the relative evaluation method, mostly because it evaluates the likely success of countermeasures. I highly recommend his webpage on FMEAs, and it is linked here. It is a little hard to read as the website translation to English isn’t the best, but worthwhile.

I think one of the reasons why FMEA teachings have many issues is that few FMEA teachers have been skilled design engineers, but are instead people that gravitate to process design. The idea behind the FMEA is good and includes teaching early and effective analysis, unfortunately much of the applied practice falls short. A skilled design engineer naturally considers failure modes and tries to design them out, while simultaneously considering many other design tradeoffs, such as performance, function, economics, aesthetics, ergonomics, etc. I think that since the conventional FMEA trainers developed the applied practice of the FMEA, they have continued to build upon the original process of the relative assessment method, and have struggled to develop effective practices that overcome the conventional process shortcomings. In my experience many design engineers have found FMEA to be a good idea but too slow, too time-consuming, and not effective enough to really embrace it.

What I like about Unuma-san’s method it is practical, effective, time-efficient, and evaluates the likely success of countermeasures. It can be very useful to have FMEA experts, trained in this method, who can help designers with training, facilitation, documentation, review, etc.

There are a few other improved FMEA methods available that are trying to address some of the effectiveness and lateness problems with conventional FMEAs, such as “FMMEA”, (Failure Modes, Mechanisms, and Effects Analysis), and there are good teachings in these methods as well. I have found Unuma-san’s method to be among the best and really resonates with me.

FMEA is one of the best methods to help avoid failures. By making the method more effective, products, processes, projects, and infrastructure can have less problems and be more economical. I highly recommend further study on this topic for engineers and managers delivering any system.

Craig Louie, P.Eng., Co-Founder, SysEne Consulting

There isn’t a dilemma in autonomous vehicles having to choose between harming their passengers or others.

Posted on June 28, 2016 by Craig Louie

A recent study in Science “The Social Dilemma of Autonomous Vehicles” has highlighted how self-driving cars need to have algorithms to decide on actions in extreme situations – and even having to choose between protecting the passengers vs. pedestrians. The study results indicate that participants favor minimizing the overall number of public deaths even if it puts the vehicle in harm’s way. But when asked about which cars they would actually buy, participants would choose a car that would protect them first. This study highlights an apparent conflict between morality and autonomy.

I like the study, as it raises good questions, and it describes part of one of the many issues in autonomous vehicles. I also like the many news articles that have been written on this topic based on the study, as it helps raise awareness of the complexity of the issue – that it is both social and technical. At the same time, for the sake of being newsworthy, and controversial, most narratives I read on the topic frame the study and topic as a social dilemma. Yet when examined through a technical perspective, we will have dramatically safer situations for both passengers and pedestrians with autonomous vehicles, and there isn’t any dilemma.

Traffic related death rates are over 1.25 million deaths worldwide per year, and with aging drivers, distracted driving, higher speeds, prevalence of substance abuse all contributing to stubbornly keep the rate high. For every person killed in a motor-vehicle accident, 8 are hospitalized and 100 are treated and released from emergency rooms. Autonomous driving, when implemented well, will easily reduce this by 90%, and perhaps by 99% when fully implemented. The response time, sensing, spatial awareness, decision-making, and reliability of an autonomous vehicle will be better than most of us, except perhaps for highly trained and talented drivers, and definitely infinitely better than too many of our driving population that cause most accidents (distracted, drunk, inexperienced, tired, reduced reflexes, etc.). The autonomous capability allows us to have a safer response for both the passenger and pedestrian.

Consider that the autonomous vehicle can respond faster than most humans. I have the lane departure warning system on my car, and it is much faster than me. An autonomous vehicle will be able to brake faster, more optimally, and steer a better adaptive path that is more likely to minimize injury to both passenger and pedestrian. Most drivers can’t brake as fast, or optimize the braking pressure, or optimize the steering adjustments during the emergency maneuver as well as a well-implemented autonomous vehicle. The following picture shows a better braking and adaptive steering path with the best overall outcome for both passenger or pedestrian. In the event of a collision, the overall speed, impact angle, etc. will be reduced.

With autonomous vehicles, there will still be accidents, and there will be cases where it will be determined that the autonomous vehicle did not make the best decision. But the overall absolute level of safety will go up so dramatically, that the question will not be “isn’t this the wrong car to buy because it may decide wrongly in an extreme case?” but “isn’t this the right car to buy because it is overall so much safer for me and everyone else?”. The moral path is to embrace autonomous vehicles, and work towards a proper system design and implementation in industry, government, and with consumers.

A more useful Requirements Process Maturity Model

Posted on September 14, 2015 by Craig Louie

A useful diagnostic tool to help determine problem areas and areas for improvements are maturity models. They can be used by both the client and consultant to determine the current level of performance. The target level of performance doesn’t need level 5 (highest capability) for everything, as that is likely too expensive or difficult to achieve, or not necessarily needed.

One of the best ways of improving technology and product development is for your organization be good at developing and managing requirements. About 70% of problems in technology and product development come from requirements and system interaction errors, and fixing these problems at the final acceptance test or in the customer’s hands costs about 100 times more than fixing them in the requirements development and management phases of the project. Basically build the right thing, build things right, and find problems early.

For requirements development and management, there are a few maturity models published, but I have found them too specific to an industry (like for business analysts in the software industry), cover only certain aspects, or don’t cover integration, training, or culture well enough. So I’ve developed the above model based on similar models from consulting houses, CMMI, Six Sigma, Model Based Systems Engineering, PLM, and my own background. I think this can apply to all kinds of systems, from hardware-oriented (manufacturing, construction), software-oriented, or combinations of both.

(click for full size)

Using this tool can then help structure the problem, ask the right questions and prioritize opportunities. Where does your organization stack up?

If you have comments or questions on the model, or have ideas for improvements, please contact me.

System Level Website Failures – Technical, Process, and Organization

Posted on September 8, 2015 by Craig Louie

In BC, we recently had a windstorm that knocked out power in the province for over 700,000 people, some for 4 days. One of the most difficult parts of the outage was that the BC Hydro Website that provides outage updates also went out at the same time. This made it very difficult for people without power decide on what to do, where to go, what to do with the food in the refrigerator, etc. and made for many unhappy customers.

Many critical websites are complex systems, and fail more often than desired. A good example was the failure of the ObamaCare’s HealthCare.gov website launch where there were serious technical problems at the rollout, which has subsequently taken about 6 months to fix the major issues. On launch day, as soon as the website hit about 2,000 simultaneous users, the website performance became unusable, which was an issue since on the first day, 250,000 simultaneous users tried to get access to the website. There are many other problems with the Healthcare.gov, as that project had large budget overruns, with $1.7 Billion dollars spent, which is about 10 times more than budget and what it should have cost. There are also lasting data and security problems with the website and internal database.

The majority of the root causes of the Healthcare.gov failure were systems-level failures in all three major dimensions of any complex system delivery: technical, process, and organization.

Technical: The system design used an outdated 1990’s database server model that doesn’t scale well with many concurrent users, as opposed to using a more typical e-commerce server model that can scale with users.
Process: The system development process used a waterfall approach to build most of the website and then test it, vs. an agile approach where you test the important parts all throughout the development process. Additionally there was very little testing during the development. They were even off by a factor of five on the concurrent user requirement.
Organization: The organizational system of the Government and the Contractor were poor with too many delays, last minute changes, poor subcontracting, poor reporting, and poor coordination.

BC Hydro is conducting a root case investigation of their website failure. Perhaps the root cause was a simple and isolated issue, but I am interested to hear when the investigation is done on whether the failure had similar systems-level causes like the HealthCare.gov launch failure. For any complex interrelated technical, process and organizational complex problem, the Systems Approach is the best way to develop a solution that satisfies the overall needs and meets the expected behaviours of the system.

Why do so many industrial projects underperform these days?

Posted on July 6, 2015 by Craig Louie

About seven out of ten industrial projects underperform in production, operability, and/or have significant cost or schedule overruns. Everyone working on the project, including the sponsors, want a successful, on budget, and on schedule project. There are thousands of reference projects that have been done in the past decade, yet why is it so hard to learn from experience?

There are many reasons, and I’d like to comment on a few of the key ones that are of heightened risk because of today’s environment. There is much material available on industrial project underperformance, and we have talked to many in industry, and unfortunately we hear painful stories too often. As a general background:

Project complexity increases daily, with more difficult to reach resources resulting in a continual need to deploy new combinations of technologies, more difficult environmental regulations, more difficult community relations, etc.
We have been through 15 years of an economic boom in the Global Industry, and even the slowdown blip in 2008 was just a short term 10 month bust cycle with one of the fastest rebounds of industrial activity in 2009. During this boom, the underlying cost structure for engineering and construction services has increased much faster than inflation. On a typical project in the North Sea, companies are having to pay $300/hr for mediocre quality engineering – mediocre since after 15 years of boom, engineering companies have been often taking on less and less capable staff in recent years.
In boom times, many unhealthy projects still make money.
For many years now, many owner companies have been shedding internal experts in the technical functions, and they try to offload work and risks to EPC(m) or other contracting firms. But much of the work and risk cannot be transferred from owner to contractor because they are structurally different. Owners make money from the capital asset and they can still survive a budget overrun. Contractors cannot afford to take any financial liability of an underperforming project. Many owners often try to offload their project management or technical work to the contracting firm, but this is can be problematic mostly owners and contractors have very different perspectives. Owner’s teams need to be able to provide enough business and technical direction, and also provide contractor oversight. When they struggle to do so because they don’t have the resources to do so, the whole project suffers. Owner companies also struggle with internal coherence between all their internal departments and managers when they don’t have enough project resources.
Engineering and EPC(m) firms are always in search of the next project and don’t provide or develop enough long term continuity, R&D, productivity, or innovative support to the project over its entire life-cycle, or to the next project. These contractors cannot hold specialty resources or afford to invest in innovation. Engineering and EPC(m) firms are more service firms than total solutions firms – in part because this is what owner’s ask of them through the procurement process.
Much of the supply base, where much of the innovation does happen, struggle to afford or acquire all the necessary expertise needed to develop reliable and cost-effective solutions.

And now the Global economic macro-environment has weakened, especially in Canada’s Energy and Resource sectors.

With today’s drop in energy and commodity prices, and a general shortage of industrial capital financing, industrial companies are slashing their technical and project teams and departments to reduce their operating expenses. Until mid-2014 or so, production was King. Now we see significant consolidations, downsizing, and a focus on industrial company survival. An overly-lean team without enough access to critical skills is going to make current and future industrial projects even more difficult to meet expectations, budget and schedule.

With weakened balance sheets, industrial companies are going to need successful projects more than ever.

Keys for Improvement

We need to do better and we can do better with an improved application of management, strategy, approaches, and more respect for the complexity of today’s industrial projects. While all key stakeholders have to improve, the greatest leverage is with the project sponsors. They control the highest level need, budget, scope, risk profile, etc., and so they have the largest leverage on the outcome.

There needs to be a common understanding by both business and technical professionals on why there are so many issues with these projects, and going forward, how these projects should be developed, governed, and executed.
The project team needs to have the right skills, adequate staffing levels, and then a robust training program on how to best manage and implement the industrial project
The up-front design and planning work needs to be adequately funded and given enough time. A weak design and/or poor plan causes too many problems downstream when the activity and capital spend ramps up.
The right contracting strategy should be chosen, and the overall team constructed in a complete way and consistent with the strategy. The owner’s team must have the right skills and do all the scoping, concept work, requirements development, and overall management that is typical of successful contracts. The contracting must be done so that the professional service firms deliver quality and get paid well enough for doing so.
Experienced and systematic approaches to the:
- technical solution,
- process of doing the project,
- build and organization of the team

While the above roadmap seems obvious, the root cause of the problematic projects are issues in the above five points, in either the understanding, approach, strategy, or implementation. Furthermore, they have to be done well enough to the sophisticated level required by the complexity in today’s projects.

When owner’s companies become more open to a longer term value and improved partnering with the contracting firms and the supply base, it can enhance productivity and innovation from their products and services to the owner’s projects over the life of the asset. For example, engineering firms could provide more long term asset support. They have significant data on all the projects from the design phase, and can get operational data from the currently operating assets. Currently after the project build is finished, the engineering contractor moves their resources onto other projects (or if it is slow lets them go). The owner’s operating department of the asset struggle without the contractor engineering support, design models, people continuity, etc., and often the result is the asset does not operate to its potential. There can be a great business case to further optimization and operational improvements to the operating asset that could be turned into a long term support contract. Everyone wins.

We must change the way we do things for a better outcome, and the ways do exist.

Tailoring Product Development Processes

Posted on May 27, 2015 by Craig Louie

There is a wide spectrum of product development processes, from stage gate to spiral processes. Stage gate processes are able to stage scope and investment decisions and are typically employed in capital intensive industries. Spiral processes take advantage of many repetitions of the design-build-test cycle and are typically employed in software development. There are many variants in between.

To best tailor the product development process for the organization, it is important to understand the:

business and strategy of the organization
architecture and complexity of the product
product/project schedule, budget, and requirements
risks and uncertainties
needed iterations in the process
capability and culture of the organization, including Global aspects
customers, stakeholders, and suppliers
best practices

The resulting product development process is then “systems engineered” as it is an integration of systems and systems elements – technical, process, and people.

There are many useful methods to choose from during this design process, including:

Design Structure Matrix (Eppinger)
Agile Methods
Lean Methods
Model Based Engineering
Collaborative Supplier Integration
Risk-based Planning
Quality Approaches

A key aspect of product development is dealing with all the risks and uncertainties, which means iteration is inherent in the process. There are both planned iterations and unplanned iterations (to fix it when it’s not right). It is important to understand the linkages, interactions, and drivers behind how the iterations will happen. From that understanding, iteration can be accelerated through information technology, coordination techniques, or decreased coupling. After that, by prioritizing risks, planning the needed iterations, planning the integration and test activities, and scheduling reviews to control the process, the project risks can be addressed.

The process must also be tailored to the organization, specific people, and key stakeholders. This is probably the most difficult part, as it is all about dealing with people, managing change, and shifting cultures. It is important to pick and choose the most important methods, implement them, and sustain them, in a practical way. Too many processes fail because they are not used, unwieldy, inflexible, not fully coherent, too conservative, too bureaucratic, take too many resources, or are only partly implemented. Beyond process definition, there is training, coaching, fine-tuning, and ensuring the team sees that the change is in their self-interest to adopt, and really “owns” any new processes.

While overall improving the process is a complex and difficult initiative, having a competitive Product Development process is key to quality products, low costs, speed to market, satisfied customers, and good business.

How to Dramatically Improve Health Care; Speed, Quality, Costs

Posted on November 23, 2014 by Craig Louie

After my recent in-depth experiences with both the Japanese and Canadian Health Care systems, I’ve continued my investigation why the Japanese system has dramatically reduced wait times, better outcomes, and lower costs as compared to the Canadian system. I have included the US system as well. It is clear to me that applying systems engineering to health care will both improve the system and lower its costs. When I experienced the Japanese health care system, I was so shocked at how much better and faster it was than the Canadian system, I wrote a post on it in Sept 2013, and I repeat one of the key tables here:

In Canada, the wait times for critical diagnoses are getting worse (see this recent article in the Globe and Mail). Cancer diagnosis can take 1 to 6 months (!!!) in BC whereas in Japan it can often be done in one day.

The US is getting serious about applying Systems Engineering to their Health Care System. The White House’s President’s Council of Advisors on Science and Technology (PCAST) has published an excellent report in May of 2014 called “Report to the President, Better Health Care and Lower Costs: Accelerating Improvement through Systems Engineering”. The Report is an excellent read and is surprisingly bold in its recommendations. One of the main recommendations is to transition from a fee-for-service model, which is a disincentive to efficient care, to one that pays for value instead of volume.

Health Care Systems are very complex, with evolving medical science and technology, multiple stakeholders, increased specialization, and rising expectations of what can be done to treat illnesses, and a lot of realpolitik. Systems engineering has been used successfully and widely in many other complex industries, such as manufacturing or aviation. Systems engineering has also been used to good effect in health care, but too rarely and not widely, and barely at all on the macro scale.

The need to improve health care is required, with increased population, aging, and budgetary pressures. The opportunity for improvement is massive. In the US, approximately 33% of health care costs are wasted, 20-33% of hospitalized patients experience a medical error with about half of them preventable, many quality issues, and caregivers and patients do not have enough necessary information when needed. In Canada, we see many of the same issues as the US, and while we have Universal coverage, wait times for necessary diagnostics or treatment are unnecessarily and often crazily long. Even in Japan, with its worldwide overall best outcomes, low costs, and low wait times, significant improvements are possible in overall efficiency, information flow, costs, and caregiver conditions.

Examples of how systems engineering can improve health care include:

Denver Health saving $200 million in 2006 by doing a systems redesign of their operations. As an example to reduce waste, one industrial engineer found the trauma surgery resident physicians walk 8.5 miles in a 24 hour shift!
Kaiser Permanente identified 3x as many sepsis cases and cut mortality from sepsis by 50%
Virginia Mason has the lowest rate of serious medical infections and falls and reduced medical malpractice liability by 40%

Systems Engineering Process

How impactful could systems engineering be if applied at all levels of Health Care? The promise is outcomes as good as Japan or the top tier American care, Universally applied, and lower costs to Government and Patients, with essentially no wait times. It won’t happen overnight, but with the right strategy we could get there in 3-5 years. It is very feasible – if others can do it, we can too. That will then allow us to also be prepared for the greying of our populations. It is good business too, as the improved systems can be exported to other parts of the world. When you have a good system, look how it can dominate the market – like Amazon with its great portal, logistics, and network; or the Internet, with its scalability and extensibility, or air travel with its convenience, low costs, widespread usage, and high safety.

The best studies on how to improve health care by applying systems engineering tools and principles comes from the US. An excellent paper and collection of studies was published in 2005 by the combined efforts of the National Academy of Engineering and the Institute of Medicine called “Building a Better Delivery Systems, a New Engineering/Health Care Partnership”. I highly recommend this paper. Much of this material formed the basis for the 2014 PCAST report to President Obama. Yet one of the last papers in this collection highlights the real difficulties with making improvements in the US Health Care system by analyzing and giving painful examples of the political difficulty, especially with so many interests, organizations and the huge amount of money in the Health Care systems.

Other barriers include:

Misaligned incentive structure – fee-for-service vs fee-for-outcomes or value
Availability for data and relevant analytics
Limited technical capabilities, especially in small practices that make up the bulk of health care
Workforce competencies – limited knowledge of systems engineering tools and practices
Leadership / culture / politics

Yet while difficult, governments, organizations, and people around the world understand the need for change, the urgency for change, and that there will be change in Health Care. It is hard work, it will take time, and there are many barriers, especially politically. Slowly and steadily, I expect systems engineering tools, principles, and activities to be applied into the Health Care system. You can help by reading the PCAST and other reports and supporting the application of Systems Engineering to Health Care.

For me, I am approaching Industry, Government and Academic leaders with this message and analysis, participating in consultations, etc.

Improving Systems Education and Research at Canadian Universities

Posted on November 18, 2014 by Craig Louie

In today’s world, products and processes are becoming more complex, and systems engineering is the best method to manage change and complexity. Students that have academic and experiential capability in systems engineering will be more useful and attractive to potential employers. Universities that provide a strong program in Systems will attract better students and improve academic and industry collaborations. Industry and Government will benefit by improved systems development.

Worldwide

Engineering education worldwide has begun to broaden from preparing students for technical careers in a particular discipline to also prepare technical leaders that will develop complex systems or have their “subsystem” fit better into the next higher level system. Engineers today are expected to be capable in management concepts and social science that encompass supply chains, politics, economics, and customers. The leading Universities have made cross-functional organizations that often combine engineering, management, and social science into “Engineering Systems” systems-oriented schools. These organizations can better cut across the more siloed traditional disciplines to offer integrated systems education and research which benefits from discipline fusion.

The forefront of the Engineering Systems Education and Research Universities include MIT ESD, Georgia Tech ISyE, Stevens SSE and SERC, Keio SDM, TUDelft TPM, and others. There is a Council of Engineering Systems Universities (CESUN) that helps coordinate the development of this field of study, with about 60 universities as members. SFU and the University of Waterloo are members of CESUN.

Overall I find much of the best Systems content comes from MIT Engineering Systems Department and associated community, such as from Steven Eppinger, or their book on Engineering Systems by de Weck, Roos, and Magee. There is a lot of other great material out there from many others, but if I had to choose the best Engineering Systems University program, it would be MIT’s ESD program. MIT’s ESD Strategic Plan is a worthwhile read. To also see that other regions are also at the forefront of Systems education, the “SDM in Two Minutes” video from Keio University’s program is also worthwhile.

There is also strong Systems Engineering Professional Education Programs available from places like Caltech or Georgia Tech, as many organizations send mid-career engineers, project managers, business analysts and management to these programs. INCOSE, the International Council of Systems Engineering also provides links to training and certification as a Systems Engineering Professional, again primarily for professionals in the workforce.

The Systems Engineering discipline primarily came from Industry and Government, especially Defense and Aviation, and is now grown to be applied to develop and manage the complex systems in Energy, Transportation, Health Care and other industries. Both the Systems Engineering Professional Education and the University Education in Engineering Systems are complementary and synergistic.

Universities that provide Systems education provide Undergraduate programs, Graduate programs, or Professional Certificate programs, or a combination of all three. Undergraduates with Systems education are able to become useful as a Systems Engineer right away. Charles Wasson makes a great argument for comprehensive systems engineering training at the undergraduate level to all engineers in this paper. At the same time, it can also be good to become well educated in one of the disciplines, like Mechanical or Software Engineering, and then take a Graduate degree in Systems, often with some work experience in between. Many engineers in the workforce find that their background in one of the disciplines is not enough for being a leader in developing complex multi-disciplinary systems, so they return to get either a Graduate degree or take Professional courses in Systems. The average age of students in MIT’s System Design and Management Program is 34, reflecting more mature students.

Canada

The Canadian University Programs in Engineering Systems or Systems Engineering are not as well developed as the leading Universities in this field. UBC and SFU have undergraduate programs in Integrated Engineering and Systems Engineering respectively, and both are a good first step towards multi-disciplined engineering, but neither school has a Graduate Level or Professional Programs, and the current curriculum does not generally include the Systems Engineering fundamentals or have the same level of fusion with social sciences or management science as in other leading Universities. SFU’s program is more of a Mechatronics program than what Systems Engineering is typically known for. The University of Waterloo has perhaps one of the best Systems program in Canada, with their System Design Engineering program, which is both Undergraduate and Graduate level, though it has a flavour of more “subsystems engineering” than “macro systems engineering”. Concordia also seems to have a good Systems program, graduate level, and focused on Information Systems. U of T has a graduate certificates in global engineering or multidisciplinary engineering final project programs, but the bulk of instruction is still in the traditional disciplines, and there isn’t the same level of Systems education or Research as the leading Universities. Overall for Canadian Universities there is a good start but there is much room for improvement.

Note there is a large diversity in the naming of these “Systems” programs, as to a certain degree, each University likes to brand their program as unique.

In my home region of Vancouver, there are many local companies that heavily use systems engineering in their development. They include MDA, Westport, Ballard, and many small tech start-ups. They have all had to teach the Systems Engineering discipline by bringing in external resources, as BC graduates don’t come with much Systems educational background. For future BC developments, such as a new LNG plant, or improving our Health Care System, Systems Engineering is of great benefit. In the rest of Canada, we have world leading companies like Bombardier, GE Canada, SNC-Lavalin, Cisco, and Blackberry that all heavily use Systems Engineering.

Canada is shifting from a more Resource-centric economy to more of a Knowledge-based economy. One of the most effective pillars to do that is to ensure Canada has a very strong systems-centric engineering education at our academic institutions to complement the traditional disciplines. Canadian Universities must improve their Systems education and Research. There are great examples by the leading Universities that Canadian Universities can incorporate.

While these changes are difficult to do, because it requires organizational changes, there can be tenure and political issues, there are fixed budgets and five year plans already in place, and it can be hard to fuse departments between different faculties of Engineering, Management, and Social Science – the incredible benefits of improved Systems education to Canada, the Provinces, Industry, Students, and the Universities is well worth the investment.

The SysEne Blog

Systems thinking for the real world