Proton failure investigation finds quality control the root problem

Please consider donating to Behind the Black, by giving either a one-time contribution or a regular subscription, as outlined in the tip jar to the right. Your support will allow me to continue covering science and culture as I have for the past twenty years, independent and free from any outside influence.

In the heat of competition: The Russian investigation into the most recent Proton rocket launch failure has now found that the cause of the turbo pump failure was because of significant management failures.

The investigation into the MexSat-1 failure established that a fast spinning shaft inside a turbine of the RD-0212 engine propelling the third stage can break easily due to excessive vibrations. (The turbine is designed to pump propellant into four thrusters which steer the rocket in flight.) Yet, despite the problem lingering in the engine’s design for decades, the fact that two of these three accidents had happened in the past 15 months was itself is not an accident!

In an interview with the Russian business web site, the head of Roskosmos Igor Komarov disclosed that due to recent easing of requirements for the quality of metal that had gone into the production of the shaft, the turbine became more vulnerable to vibrations. Additional fascinating details on the same issue had surfaced on the online forum of the Novosti Kosmonavtiki magazine.

As it turned out, dangerously low requirements for the turbine shaft were set in the design documentation during the development of the rocket. However the issue was identified early during testing and the production team self-imposed extra margins for the affected components to remedy the problem. However in 2013, the new management began questioning why so much manufactured parts had been disqualified during production, even when they had met lowest requirements set in the design documentation. By that time, the new generation of workers and mid-level production managers no longer saw a reason to fight for more stringent requirements, which were actually making their own work more difficult. As a result, the hardware which was barely making through the quality control was certified for the installation on the engine, thus giving the old design flaw more chances to surface. [emphasis mine]

The description above reminds me strongly of the circumstances that took place prior to the Challenger failure in 1986: Engineers trying to fix a problem that managers don’t want to see.


  • Edward

    An especially interesting article, Robert.

    A good question to ask: Is that “new management” the result of Russia’s decision to consolidate its space industry or was that management “new” before that decision.

    As the article noted:
    “The obvious question is why the original design documentation was not updated right after the problem was first discovered. Apparently, attempts by mid-level engineers to bring up the issue were resisted by the management on the manufacturing side and by the military certification service, which had existed in the USSR, because no real accident had been known at the time as resulting from that problem.”

    This brings up the ‘change is bad’ philosophy of engineering. If it works properly, don’t change it. An exception is an economic improvement, if something is less economical than it could or should be, then it falls under the classification of ‘not working.’

    Since no accident had been known to result from the design, there was no pressing need to make changes. On the other hand, “new management” made a change in the material without fully understanding the system. It is clear to me that they *thought* that they understood it, but they didn’t have the experience and knowledge of the original designers. Another philosophy is to not make changes until you understand the system. Otherwise you are likely to break it (and — oh! — that’s just what happened).

    Which brings us back to change is bad. They chose to ease the requirements in the quality of the metal, because the rate of rejection of the manufactured parts. It seemed to fall into the area of economical improvement.

    The Challenger comparison is another topic, with some similarities, but I am running out of time, so I will save that for later.

  • David M. Cook

    Let me assist you Edward, with my favorite NASA quote:

    “Take off your engineering hat and put on your management hat!”

    In other words, damn the engineering! Just launch it! We don’t care about people’s lives!

  • pzatchok

    More than likely a case of bad paperwork.

    A design improvement was made years ago and blindly followed by everyone in the workforce. It was known but never documented completely.
    As soon as the management team that knew of the problem and the reasons for it retired the next group came in and just simply followed the only documented procedures and requirements.

    I run into this everyday, all day at work and it is a real and constant fight. We are still updating things years after the engineers who implemented the changes are gone and left no reasons behind for the changes.
    It takes a well trained staff of workers to watch for and spot the wrong steps or the extraneous steps in a procedure. Then they need to be smart enough to work around them or call in the correct engineers to make the right changes and then you have to make sure the engineers actually make the correct changes to the documentation. (And engineers are always forgetting to make those documented changes.)

  • Edward


    David, that is a quote from Thiokol, not from NASA. Misconceptions like these are exactly why I wanted to revisit the Challenger disaster that Robert mentioned. (However, this time I won’t get into the lie the press told us about waivers.) Too many people believe untruths or explain away the poor launch decision on silly things such as “group-think.” They miss that there was a lot of engineering and analysis that had gone into the problem that was being considered that night, and that what is obvious in hindsight was not obvious at the time.

    When Thiokol got the solid rocket booster (SRB) contract, they redesigned a booster they already used for unmanned rockets, and part of that redesign was to add a second – a redundant – O-ring for additional safety. It did not work out as they expected, as conditions caused the added O-ring (now considered the primary O-ring) to relax and even, under certain conditions, form a gap instead of being a tight seal. This phenomenon was called “rotation”*and it caused the joint to open a little at the primary O-ring. Under cold conditions and other conditions, the hot gasses (flames) from the motor would leak past the primary O-ring, a problem called blow-by. This was unexpected, it deviated from the engineers’ expectations, but because it did not cause damage to the second O-ring, NASA and Thiokol accepted it as normal for the joint. The author of “The Challenger Launch Decision” used the phrase “normalization of deviance.”

    Thiokol’s Roger Boisjoly noted that the damage to the primary O-ring was especially bad during the Shuttle launch of a particularly cold day in January 1985, and he was so concerned that he got Thiokol to start a redesign of the joint during the summer of 1985. However, Boisjoly’s report to NASA in March of 1985 said that it was OK to launch under those conditions, because there was a requirement that the Shuttle launch on cold days (down to 29 deg. F), and he did not expect that there would ever be another day quite so cold in the future. What is the harm in downplaying his concern in order to keep the Shuttle flying?

    Unfortunately, the night of 27 January 1986 was another sub-freezing night, and Boisjoly feared that the next morning’s launch would be unsafe unless the O-rings were allowed to warm up to a higher temperature than they would be at the scheduled launch time. Thiokol and NASA held a teleconference, for which NASA had brought their own Viton expert (the O-ring material) out of retirement; his opinion was that the O-rings should be safe (though less resilient) down to 25 deg. F.

    During the teleconference, Thiokol presented arguments using many of the charts that Boisjoly had used in his March 1985 report. This confused the NASA engineers; which was true: the well thought out March report or the hastily prepared oral presentation that evening?

    Adding to the confusion, Thiokol insisted that the Shuttle not launch until the O-rings had warmed to 53 deg. F, a new (and probably unreasonable) condition that NASA had never heard of before.

    To make matters worse, Thiokol presented data that showed the blow-by problem existed even for a launch that took place with a 73 deg. F O-ring. For years, the understanding among most of the engineers was that temperature was not driving the blow-by problem, that there were other factors that contributed more, and this data point reminded everyone of this opinion, even those at Thiokol.

    It seems to me that (with the probable exception of Boisjoly) everyone believed that even in the worst case, the secondary O-ring would perform properly. As I recall, there was never any blow-by evidence or damage to the secondary from any previous launch.

    Thiokol took a few minutes with the speakerphones on mute (even NASA’s) to consider their position. The data that they presented did not make their case, and the new launch criterion contradicted previous statements from Thiokol, including contradicting their qualification test at 40 deg. F. Thiokol realized that they had not made a good case, and I believe that many of them also did not believe the case that they were making, that they believed the other factors were more of the cause of blow-by than temperature was.

    It was during that time that the Thiokol person said that they should put on their management hats. (Everyone involved in the teleconference were engineers, even those in management positions, and they were all familiar with the design and the blow-by situation, so everyone understood the engineering and the data involved.)

    The NASA engineers were expecting Thiokol to come up with a better argument, such as not to launch below the qualification temperature of 40 deg. F. That was a tenable argument, as it did not come from absolutely nowhere, as the 53 Deg. argument had, so the NASA people (many, most, or all) were already expecting that there would be a launch delay. However, Thiokol did not try such a negotiation in order to gain more time to study the issue; they backed down from their position and recommended launch.

    At some point, that evening, one of the NASA managers said, “I will not agree to launch against the contractor’s recommendation.” He also said, “For God’s sake, don’t let me make a dumb mistake.” It is clear to me that had Thiokol stuck to their position, despite the poor engineering reasoning, the flight would have been delayed. NASA was not so eager to fly that they would violate a contractor’s recommendation, but they seemed to desire a reasonable and consistent explanation as to why they were delaying a launch.

    In my opinion, had Boisjoly expressed his actual opinion in his March report, that he thought the launch temperature was too cold for safety, then NASA would have been immediately on board the next January. There probably would not have been a need for a conference call, as NASA probably would have deemed the temperature too low for launch without any additional Thiokol input. Over the rest of 1985, Thiokol and NASA would have worked out new and reasonable launch temperature criteria, and all would be right with the world (although the launch temperature likely would have been above the original requirement of the contract, and some at NASA and in Congress might have been unhappy about that).

    I also believe that had Boisjoly admitted that evening that he gave a bogus conclusion in his March report then NASA would have listened more attentively. To me, the 53 deg. requirement still seems arbitrary (it is based upon his review of various blow-by events) but I think that Boisjoly could have convinced NASA to delay launch in order to revisit the temperature concerns. He probably would have lost some credibility by admitting to a lie, but at least he might have stopped the tragic launch the next morning.

    Several people have suggested that NASA’s preflight readiness reviews included proving that it was safe to fly, but that night — people think — NASA’s position was that Thiokol had to prove that it was not safe to fly, the opposite of the normal process. In my (not so humble) opinion, Thiokol had already proved it was safe on every other flight, and a confused NASA could not understand what had changed to make Thiokol think that it was suddenly unsafe even above the qualification temperature. Qualification testing already proved the safety of the SRB at temperatures below Thiokol’s newly recommended temperature, so what was Thiokol saying?

    Qualification testing proves that the design works during the qualification conditions. To suggest that the SRBs were unsafe at the qualification temperature rendered useless the test and perhaps even the concept of qualification testing. What an appalling thought.

    The temperatures were low on other recent launch attempts. The Shuttle was delayed several times for reasons other than temperature, but Thiokol had not said that those other cold days were unsafe. What had changed that day? (Answer: The temperature was similar to the January 1985 launch.) Thiokol did not say why those other cold days were OK, but the 28th was not. This probably helped Thiokol’s engineers to believe that it was OK to launch. Boisjoly had not made a sufficiently good case even within Thiokol about temperature. He had made a case for redesign for the general reason that several factors contributed to blow-by, but did not clearly state his concern that temperature seemed to be a primary cause at lower temperatures. Boisjoly had not raised concerns on those other sub-53 deg. days.

    Ultimately, it was Thiokol itself that was still convinced that it was safe to launch despite the low temperature. Otherwise, they would have recommended against flying. They already heard NASA say that they wouldn’t launch against Thiokol’s recommendation. NASA was not asking Thiokol to prove that it was unsafe; they were merely confused that Thiokol inexplicably changed position using the same data. Why was it safe on the previous launch attempt, but it isn’t safe today?

    In several people’s opinion, had the data been presented differently that evening, NASA would have clearly seen that low temperature was a problem. After the accident, a Temperature vs. O-ring damage chart was generated that showed that above a certain temperature at launch there were no blow-by problems, in a certain range there sometimes were problems, and below a certain temperature there were always problems, and they got worse at lower temperatures. The temperature for Challenger was deep in the “worse” range.

    This temperature vs. O-ring damage chart would have answered the question of what was different about January 28th; it could have been the new data that showed that launch that morning was unsafe. ‘We made a chart that shows, at low temperatures, temperature becomes a driving cause of blow-by.’ That chart could have helped everyone better understand the system and the problem. It was certainly conclusive evidence after the fact. Many people, in hindsight, thought the NASA and Thiokol engineers were stupid for making a “go” launch decision when this post-tragedy chart showed otherwise.

    It isn’t just the data that is important; it is the presentation of the data. If you can’t see the problem, you don’t understand the problem.

    Thus, when you think you have an understanding of the system, but you don’t really, it is easy to make incorrect management decisions, such as relaxing the requirements for the metal in a turbopump shaft. And when we have 20/20 hindsight, it is easy to criticize those who made the decisions without sufficient knowledge.

    A major difference between the manager and the entry-level wage earner is that the manager is required to make the right decisions with insufficient input. It is his experience and his good judgment that are why he gets paid the big bucks. Sometimes his decisions are wrong, and he does not earn those big bucks, but those are two of the characteristics for making someone a manager.

    I don’t think that it was so much a case of managers not wanting to see a problem, as Robert suggests, but that it is that the problem was not made clear to the managers. In both cases, Challenger and Proton, other issues clouded the relevant data.

    I presented a bunch of “if onlys,” which is a useless exercise, but that seems to be the nature of discussions about Challenger’s last flight. Even I criticize the decisions made by those involved, but then again, it is easy to do.

    * Rotation occurred because the pressure in the rocket caused the thin metal of the structure to expand, or bulge, a little – kind of like a bulging soda can. The joints between sections were thicker and did not expand as much, but the bulging of the rest of the structure caused a motion of the joints and each “rotated” a little in the direction of the bulge of the rest of the segment, and that caused the primary O-ring to be less well seated in its groove. See figure 6 at this link for a figure that should help explain the phenomenon:

  • David M. Cook

    Wow! Thanks for the extremely detailed reply. I was not aware who exactly made this quote, so I was not correct.

    However, I stand by my attitude of NASA, best described by Betty Grissom: “They just don’t care about you”.

    I remember when the shuttle was being designed & constructed. Experts in the field said there is a black zone on take off where you could lose a crew, and the flimsy TPS meant you could lose a crew on re-entry. Both of these issues resulted in a total crew loss.

    I was angry at the government for cancelling Apollo for this way-too-complicated system. I was then angry for NASA allowing Skylab to de-orbit (because of the shuttle delays).
    I’m still upset for all of the money wasted on NASA. Pork is pork.

    NASA should be buying off-the-shelf spacecraft, not designing them. Private contractors need to be able to point to a successful safety record, not write waivers so they can put humans on a non-man-rated ship.

  • Edward

    Betty Grissom’s attitude has some merit, but NASA probably cares as much as any organization whose employee dies on the job: the colleagues and bosses go to the funeral, and there is an insurance policy paid for by the company. I don’t know many companies that memorialize their fallen, but NASA and the CIA do.

    If she said that after her husband’s Mercury flight, well… she didn’t get a ticker tape parade because Gus was second, not first. Glenn insisted all the astronauts be in his parade, because he understood that there wouldn’t be another one until Americans landed on the moon.

    Be careful about how little NASA, Thiokol, or any organization’s employees care about people’s lives. Most in positions where health and safety are concerns are professional about that. The airline mechanic does not figure that it is just too much work to tighten that stubborn bolt, and if the engine falls off, well then it won’t be his family on the plane. They take their responsibilities seriously and make sure that the job is done right.

    Even though there is nothing that they can do to prevent it, train engineers get emotional when they hit someone or something. They can apply the brakes they have, but that train just ain’t gonna stop in time or slow enough soon enough.

    As for commercial space, we spent far too much time counting on NASA and the Congress that controlled it to make space a common destination. Around 1980, Robert Truax thought that starting a private rocket company was a good idea. It wasn’t until the 1990s, when the space community finally wearied of waiting for NASA, that they decided that Truax was right. Even then, early companies, such as Kistler, did not do well.

    Fortunately, we are changing paradigms in this country. We are starting to make some of our rocketry commercial. The only company to do well is the one that made price a serious goal.

    Let that be a lesson. Price is an important consideration in the space business. Even the low-cost cube-sats and other inexpensive small satellites are becoming popular — maybe too popular.

    I think that we have seen that safety is also an important consideration. Failed rockets and lost crews result in loss of confidence and business as well as in failed companies or programs. This is what happened to Sea Launch as well as the Shuttle and Apollo. I hope that it does not happen to Virgin Galactic.

Leave a Reply

Your email address will not be published. Required fields are marked *