by Gerard J. Holzmann

The amount of software that was used for the first moon landing in 1969 was the equivalent of perhaps 7500 lines of C code. Of course at that time the language C didn't exist - the code was written in assembly and had to fit within the 36864 words of memory that the computer in the lunar lander supported. A lot has changed since then. Today, a desktop PC can have up to a million times more memory. What is fascinating, though, is that today there are hardly any applications that require fewer lines of code than that first lunar lander. Clearly, few of these applications solve problems that are more difficult than landing a spacecraft on the moon.

All this becomes even more interesting if we consider NASA's program to return astronauts to the moon [1]. The design of the hardware for the new spacecraft is already well on its way, and so is the development of the software. My best guess for the final size of the code that will support the new landings (based on trend-lines for growth of software sizes for both manned and unmanned missions over the last few decades) would be in the range of 5 to 10 million lines of code. Yet it is clear that the problem of landing on the moon has not become a thousand times harder in the last forty years.

A concept image of the Ares I rocket now in develpment on the launch pad at NASA's Kennedy Space Center. The software that controls a spacecraft is a good example of a safety-critical application. Image: NASA/MSFC.
A concept image of the Ares I rocket now in develpment on the launch pad at NASA's Kennedy Space Center. The software that controls a spacecraft is a good example of a safety-critical application. Image: NASA/MSFC.

The software that controls a spacecraft is a good example of a safety-critical application: there is very little room for error. Does it really matter how many lines of code are written? An industry rule of thumb is that one should expect to see roughly one residual defect per one thousand lines of code, after all reviews and tests have been completed. With exceptional effort the defect rates can sometimes be pushed back further, to say one residual defect per ten thousand lines of code, but we do not know how to reliably reduce it to zero in large applications like the ones we are considering here.

If we start with 7500 lines of code, it is easy to see that we can reduce the chances of software failure reasonably effectively. Still, even the first lunar missions saw some unanticipated software issues in flight [2]. If we move to 7.5 million lines of code, even under the best of circumstances we should expect to see more residual defects during a mission. Most of these defects are likely benign and can be worked around, but there is always the chance of the one killer bug that can end a mission completely. We want to do everything we can to catch those serious defects before they catch us.

The standard approach in dealing with safety-critical systems is expressed by the acronym PDCR: prevent, detect, contain and recover. The best strategy is to prevent defects from entering the software design cycle entirely. This can be achieved by strengthening the way in which software requirements are captured, checked and tracked. Formalized requirements can also be used both for test generation and for formal design verification with tools such as Spin [3]. Another form of prevention is to look at the defects that have plagued earlier space missions. Alas, this is a richer set than we would like. These software 'lessons learned' can be captured in coding standards, ideally with machine checkable rules [4]. After all, who is going to read through 7.5 million lines of code to find all deviations from the standard?

Our ability to detect defects as early as possible depends increasingly on tool-based verification strategies. Logic model-checking techniques [3], for instance, can be invaluable for identifying subtle design errors in multi-threaded software systems. More basic still is the use of state-of-the-art static source code analysis tools [5]. The best tools can intercept a range of common coding defects with low false positive rates.

Since it would be unwise to plan only for perfect software, the next strategy is to structure the system in such a way that the failure of one part does not jeopardize the correct functioning of unrelated parts. This defect containment strategy requires not only a well-vetted software architecture, but also coding discipline, eg modularity, and a generous use of runtime assertions and software safety margins.

Finally, when a software defect reveals itself and is successfully contained, a recovery strategy can help us find a path back to a functional system. This can be done, for instance, by replacing a failing complex module with a simpler one that was more thoroughly verifiable before flight.

Building reliable software systems is not really rocket science. It often boils down to that precious commodity that we all possess but sometimes forget to use: common sense.

Links:
[1] http://www.nasa.gov/missions/solarsystem/cev.html
[2] http://www.hq.nasa.gov/alsj/a11/a11.landing.html
[3] http://spinroot.com/
[4] http://spinroot.com/p10/
[5] http://spinroot.com/static/

Please contact:
Gerard J. Holzmann
NASA/JPL Laboratory for Reliable Software, Caltech, USA

Next issue: April 2025
Special theme:
Cultural AI
Call for the next issue
Get the latest issue to your desktop
RSS Feed