Why Program Testing is Important

 

The effect of errors in a program written for a programming assignment is usually not serious. Perhaps the student loses a few points on that assignment, or she may be lucky and the grader doesnŐt even notice the error. For real-world problems, however, instead of a course grade, much more may be at stake: money, jobs, and even lives. Here are a few examples selected from a plethora of software horror stories:

 

á       In September, 1999, the Mars Climate Orbiter crashed into the planet instead of reaching a safe orbit. A report by a NASA investigation board stated that the main reason for the loss of the spacecraft was a failure to convert measurements of rocket thrusts from English units to metric units in a section of ground-based navigation-related mission software.

 

á       In June, 1996, an unmanned Ariane 5 rocket, developed by the European Space Agency at a cost of 7 billion dollars, exploded 37 seconds after lift-off on its maiden flight. A report by a board of inquiry identified the cause of the failure as a complete loss of guidance and attitude information due to specification and design errors in the inertial reference system software. More specifically, a run-time error occurred when a 64-bit floating-point number was converted to a 16-bit integer.

 

á       In March of 1991, DSC Communications shipped a software upgrade to its Bell customers for a product used in high-capacity telephone call routing and switching systems. During the summer, major telephone outages occurred in these systems in California, District of Columbia, Maryland, Virginia, West Virginia, and Pennsylvania. These were caused by an error introduced into the signaling software when three lines of code in the several million lines of code were changed and the company felt it was unnecessary to retest the program.

 

á       On February 25, 1991, during the Gulf War, a Patriot missile defense system at Dharan, Saudi Arabia, failed to track and intercept an incoming Scud missile. This missile hit an American Army barracks, killing 28 soldiers and injuring 98 others. An error in the guidance software produced inaccurate calculation of the time since system start-up due to accumulated roundoff errors that result from inexact binary representations of real numbers. And this time calculation was a key factor in determining the exact location of the incoming missile. The sad epilogue is that corrected software arrived in Dharan on February 26, the next day.

 

These are but a few examples of program errors that are more than just a nuisance and can lead to very serious and even tragic results. In such cases, careful software design, coding, and extensive and thorough testing are mandatory. In safety-critical situations where errors cannot be tolerated, relying on the results of test runs may not be sufficient because testing can show only the presence of errors, not their absence. It may be necessary to give a deductive proof that the program is correct and that it will always produce the correct results (assuming no system malfunction).