My $500M Mars Rover Mistake: A Failure Story

My $500M Mars Rover Mistake: A Failure Story

Some mistakes feel worse than death. 

A February evening in 2003 started out routine at NASA’s Jet Propulsion Laboratory (JPL) in Pasadena, CA. I gowned up in cleanroom garb and passed into the High Bay 1 airlock in Building 179 where nearly all of NASA’s historic interplanetary spacecraft have been built since the Moon-bound Ranger series in the 1960s. After years of work by thousands of engineers, technicians, and scientists, there were only two weeks remaining before the Spirit Mars Rover would be transported to Cape Canaveral in Florida for launch ahead of its sibling, Opportunity. 

I was into my unofficial second shift having already logged 12 hours that Wednesday. Long workdays are a nominal scenario for the assembly and test phase. Every system of a spacecraft is thoroughly tested and confirmed to be in perfect working order before it is buttoned up for the last time on Earth. Spirit and Opportunity, part of a now-historic twin mission, were among the most complex spacecraft ever built at that time and represented nearly a billion dollars invested by NASA. No pressure. 

The rovers, between them, had 62 brushed-type motors to drive and steer the wheels, control the robotic arm, aim the cameras, point the antenna to Earth, and the various robotic origami unfolding and deployments following landing. The rover had undergone extensive testing to simulate the harsh conditions it would face on Mars as a field geologist. Especially critical are events involving pyrotechnics, as the explosive shockwaves can damage brittle carbon components inside these motors. That night, while my colleagues focused on testing the rover itself, I was tasked with verifying the integrity of the motors in the Rock Abrasion Tool (RAT) attached to the end of Spirit’s robotic arm.

Spirit (left), Opportunity (right) and Marie Curie (flight spare to Sojourner rover), Monday Feb 10, 2003. PIA04422 Courtesy NASA/JPL-Caltech

Disassembling and inspecting motor components after each round of environmental testing is not practical. However, we can check their internal condition by examining their electrical performance. To do this, using a device called a break-out-box, we disconnect the motor from the spacecraft and hook it up to an external power supply and strip chart recorder. A functioning motor will show a smooth, exponential decrease in electrical current during spin-up, while any problems show up as blips in the signal.

It was a test I had performed numerous times. My various roles on the project had given me the experience to decipher the maze of diagrams mapping the 10,000 pin-to-pin connections that made everything on the spacecraft work, and my responsibility in writing the instructions on how to connect and control all the motors on the rovers made me the obvious choice for this test campaign.

Inside the cleanroom, John, the electrical chief in charge there, helped me find the equipment I needed. Then Mary, our cabling expert, did the careful work of unplugging connectors and inserting test equipment on the interface I asked for. We ran our pre-test confirmation routine. The connection interface was operational, the power supply settings and strip chart setup were correct, and a quick test pulse to a reference motor validated the configuration. With everything in order, the reference motor was removed and we jumpered-in Spirit’s RAT-Revolve motor, responsible for rotating the grinder and brush on a Mars rock. The testing steps were confirmed one last time, and we had a green light for pulsing the waiting motor with energy.

To get the clearest signal and reveal the smallest of imperfections from the motor, the standard procedure is to give it as much power as it wants. This makes it vitally important to send the inrush of electrons to the right place. A wrong connection could do blue-smoke-releasing catastrophic damage. Our pre-test routine was an important precaution to verify that this potentially-dangerous configuration was correct.

The pulse was sent to the motor. As always, the result was immediate, but this time, alarmingly unfamiliar. The strip chart did not look like anything we had seen before. It did not even look like a broken motor. It was decidedly — something else. My mind raced for explanations and in what seemed like an instant, arrived at the most likely explanation. My eyes followed the wires from our breakout box on the test cart to the spacecraft, and the reason for the unfamiliar signal landed like a dagger through my heart. All that power we just released did not go into the RAT-Revolve motor. Due to a mistake I had made with the break-out-box, it went the other direction on the connector interface, sending a surge of electricity straight into the spacecraft, instead of the motor. 

Ooooohhhh ssshhhhiiiiitttt.

The strip-chart plot from the test that night. It is not supposed to be flat, and instead should tail exponentially downward.

The possible consequences rolled over me in nauseating waves. I may have just created a $500M piece of scrap. With only two weeks until the spacecraft was delivered for launch operations, THERE WAS NO TIME to recover from a big problem. I was instantly aware that there may be only one rover launched to Mars on this synodic cycle. And my hands were holding the still-warm rover murder weapon.

I had learned from countless experiences in this and other projects that bad news doesn’t get better with age so I immediately keyed the mic on my headset and told Leo, the test conductor running the other testing in parallel, what had just happened. His response twisted the knife in my chest. ‘Yeah, we seem to have lost all spacecraft telemetry just a bit ago.’ NOT a good sign.

Everyone in my vicinity was listening in on the voice loop on their headsets, and off-mic, John unleashed a string of profanities about me that could serve as an advanced tutorial for even the most seasoned sailors. The team immediately ran the spacecraft’s emergency shutdown procedure and we were instructed to leave the cleanroom for what would probably be a damage assessment briefing.

I had turned 28 less than a month prior, looked and felt much younger, and was a few years into my first big job after college. This first significant step in my dream career as an interplanetary spacecraft engineer, which I had aspired to since junior high, was perhaps also going to be my last. Others in the system test area moved away from me as dark reality descended. Matt, the Assembly Test and Launch Operations manager, firmly instructed me to write down everything I could remember about what had just transpired. I’m not sure when the tears started, but they were probably flowing as I recorded those details alone in a conference room.

With my notes in hand, Leo and my colleagues meticulously examined the evening’s events. There were two obvious things that had happened. One, a large pulse of electricity had gone somewhere other than intended, and two, telemetry had stopped coming from the spacecraft. Ominously, but perhaps with a ray of hope, there was not an obvious link between these two things. As the team reasoned through the problem, it seemed the surge of electricity likely ended up in the H-Bridge motor driver circuit, essentially a smart traffic controller for electricity. What I did was NOT GOOD, but luckily because of something called back-EMF[1], this was one part of the rover actually designed to handle extra energy.

We decided that the errant pulse had somehow glitched the system enough to interrupt the data flow without permanently disabling it. With the spacecraft already powered off, we would do what you do with your own consumer electronics: we would turn it back on to see if the power cycle had cleared the problem.

It was close to midnight and notifications about the incident had made it up the chain of management to Pete, the Project Manager. Replanning across the entire project of a thousand people was at stake. The team, now with a lot of extra attention and oversight, re-grouped and ran the standard spacecraft power-on procedure. When booting up the spacecraft, it takes a bit for the electronics to come online, then for the software to boot up and start producing telemetry. There is a circuit that produces a pulse every clock cycle (8 times a second), turning a red light on the ground support instrument rack into a robot heartbeat indicator. The spacecraft power supply went through its familiar progression of voltage steps and currents, but after too much time, the heartbeat remained dark, and the telemetry never came. 

I don’t really remember what happened next. Probably something about meetings in the morning to figure out what the hell do we do now?! What I do remember is the feeling of emotional devastation that followed me home where I recounted the story to my wife. I was convinced I would lose my job in the morning and space exploration history would attach my name to a particular chapter of infamy.

Back at JPL in the morning, in a meeting with a fresh shift and hold outs from the prior night of disaster, we once again worked through the detailed sequence of reconstructed events looking for clues or possible recovery, which felt more and more fleeting until one crucial piece of the puzzle was recognized.

The Fluke 87III digital multimeter is a ubiquitous tool in the labs of JPL. When I entered the cleanroom the previous night, I needed one and asked John, the sailor linguist, where I could get one for my test. All were in use, so he pointed near the spacecraft to one apparently monitoring bus voltage but not involved in any testing. I carefully removed the leads and proceeded on to my date with destiny in testing the RAT motors. The monitoring multimeter I disconnected was actually completing the circuit that powered the spacecraft’s ground test telemetry. I inadvertently disabled the connection the instant I removed the leads.

We immediately realized that the next thing to do was to restore the multimeter to this duty and power up the spacecraft.

We did just that. It worked. There was a collective gasp as the telemetry flickered back to life — Spirit was not dead after all!

The team resumed testing, having lost only a few hours, and I exhaled the most monumental sigh of relief in my lifetime, reassured that I might not have actually doomed the mission to a single-rover endeavor.

The rest of that morning was a blur. Weeks of analyses followed on the RAT-Revolve motor H-bridge channel leading to detailed discussions of possible thin-film demetallization. Ultimately the project gained the confidence to disposition the hardware: Use As Is.

The long days continued. I moved to Cape Canaveral to begin the final preparations before launching the rovers to Mars, and more thrilling stress-filled moments punctuated the days and weeks. Then Spirit was on Mars, and after a year of latent stress, it turned out the RAT-Revolve motor worked just fine, and the whole experience became a life lesson.

The Lesson

As I’ve recounted this tale over time, it has not only enriched my understanding but also inspired others to explore and share their own brushes with failure. The act of sharing transforms these experiences into valuable lessons, both for the storytellers and their audience. Later in my career, at my asteroid mining startup Planetary Resources, we recognized the power of these narratives in our hiring process and team culture. We deliberately asked job candidates to share a failure story of theirs, inviting them to acknowledge and learn from their past challenges, while recognizing that failure is a natural process of learning. The core lesson I’ve drawn from my rover ordeal is best expressed in these words:

“Let your scars serve you; they are an invaluable learning experience and investment in your capability and resilience.”

In the depths of the crisis, when the tears were flowing and everyone else in the system test center was moving away, one person walked toward me. Ernie, a wise and kind man who had come out of retirement to help with the round-the-clock spacecraft shift work approached me and put his hand around my shoulder, and in a gentle grandfatherly voice quietly reassured me. He then uttered the clear words that I will never forget: ‘Remember this feeling the next time you have to sign-off that something is OK.’

I went on to become Flight Director for Spirit and Opportunity as they explored the surface of Mars, earning NASA’s Exceptional Achievement Medal for my efforts, so obviously I didn’t get fired for this incident. But that wasn’t clear until a few days later in one of the more pivotal meetings of my life. In the tense period following the mishap, with definitive analysis still pending, passionate and polarized debates ensued about the tests’ hazards, and many argued for stopping them altogether. The debate concluded, and the criticality of these tests — making sure our motors would function flawlessly on Mars — was still paramount. The tests needed to continue. And I still remember the shock when Project Manager Pete delivered the decision and the follow-on news: ‘These tests will continue. And Chris will continue to lead them as we have paid for his education. He’s the last person on Earth who would make this mistake again.’ 

I found myself returning to the ‘scene of the crime’ for many more tests, after I had carefully revised the procedures to eliminate the chance of repeating the same mistake. Each time I conducted this test again, Pete’s vote of confidence combined with Ernie’s words of wisdom brought with it a moment of nausea, a stark reminder of the past incident, but also the readiness and confidence to continue. The trust management showed in me, despite the initial error, marked a key moment in my career, highlighting growth and the ability to overcome challenges.

Now, whenever I’m called upon to give my approval or endorsement for something significant, I’m instantly transported back to that moment — the room, the lighting, the chair I was in, the table, the pit in my stomach, the intense mix of fear, anxiety and regret for an oversight that nearly led to catastrophe. Ernie’s wisdom that day, combined with his compassionate approach during my moment of vulnerability, left an indelible mark on me. Now, when faced with critical decisions, I not only recall that experience but also strive to assist others in navigating their own challenging moments. And like Pete did for me, my aim is to aid in transforming these experiences into catalysts for growth and resilience, reinforcing the notion that our responses to adversity can define our path forward.

These stories of near misses, learning curves, and eventual triumphs are not just mine, but are shared by many who build things. In space exploration, failure is not an option — it comes pre-installed. Every misstep is a stepping stone towards greater success, and together, our collective wisdom can pave the way for future innovations, achievements and breakthroughs in developing and growing our presence in and benefit from space.

 I’d love to learn from fellow space entrepreneurs, engineers, scientists, technicians, and others who would share their own ‘Failure Stories.’ If you’ve been able to overcome a failure and benefit from it, share your story on my threads on LinkedIn, the service formerly known as Twitter/X, or BlueSky.

Spirit Mars rover under construction. A yellow Fluke digital multimeter (bottom left), and its essential place in line with spacecraft telemetry (January 2003)

“It’s in the valley of failure that we sow the seeds of success.”

— Jason Altucher

“No experience is in itself a cause of our success or failure. We do not suffer from the shock of our experiences—the so-called trauma—but instead we make out of them whatever suits our purposes. We are not determined by our experiences, but the meaning we give them is self-determining.”

— Ichiro Kishimi, The Courage to Be Disliked

“The very best news is bad news delivered early enough to fix it.”

— Lindy Elkins-Tanton, Principal Investigator of the Psyche mission

[1] Back-EMF (ElectroMotive Force): the energy a motor creates when it starts acting like a mini power generator, especially during times when it slows down or the timing isn’t quite right

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Hacker News – https://www.chrislewicki.com/articles/failurestory

Exit mobile version