The Case For Unit And Integration Testing In Software Development

November 05, 2020

Long ago, Leonardo Da Vinci had many ideas. Some of them were great, while others not so much. Among them was a premature prototype of what we know today as a helicopter. To his bad fortune, his idea was not realized by him — at least not successfully. Although his idea of having several men on a platform that spins a machine like a screw and propels it to the sky was interesting, in theory, the low power/weight ratio of men and the machine itself would fail. Nevertheless, there was only one way he could know for sure: by testing his theory, either by doing mathematical calculations to determine the possibilities or by creating the machine in question and trying to fly it. No matter how, he had to test because testing is the only way to separate ideas from actual results.

This also happens every day to software developers, especially because of human factors. Even when a protocol or model has worked for many years, errors may occur in its implementation or development. A badly positioned curly brace ({}) can completely alter the continuity of an algorithm. A document in the wrong format (UTF-8, ISO-8859-1 …) can transform a special character into a wildcard character and the search for records becomes much more complex. A user’s input may be completely different than expected and cause a GIGO. Many factors can fail (ask a college student what happens when you have been doing calculations for hours and after all that effort you realize that you had the calculator in radians instead of degrees). All of these issues (and others), however, can be partially (or totally) mitigated with automatic software testing.

In this blog I will show you what automatic software testing is and why it is so important. Since showing is much more powerful than telling you, I will also present different real-life stories that will sustain my case in favor of unit and integration testing.

First, let’s define what automatic testing is. In software development, automatic testing is a technique to test and compare the actual outcome versus the expected outcome of a component or process without human intervention. There are several types of automation tests, ranging from smoke testing to functional or regression testing. However, for the sake of brevity, I will focus this article on the two more commonly used types for developers: unit tests and integration tests.

A perfect example of why an integration test is very important is the case of the Mars Climate Orbiter.

Mars Climate Orbiter

In 1998, NASA sent an expensive ($327.6 million ) space probe into space to orbit and, among other things, collect climate information from the Red Planet. Each of its components had been tested many times. However, the mission was a failure. What happened? Although all components worked perfectly on their own, the different teams that developed them never agreed to use the same metric system. Therefore, some components performed calculations with the English system and others with the metric system. This resulted in the space probe approaching Mars at the wrong angle and losing all communication and tracking. It is not even known if the probe crashed or remained wandering through space.

An integration test, as you can see, can identify issues that are most relevant when two or more components are working together, especially during communication. For other most independent issues a unit test may be best.

Unit Testing

A unit test, as the name implies, tests a part (unit) of a set. It allows you to detect specific failures on individual components early in the process. Yet, it is common for software developers and clients, to assume that only integration tests are necessary. The thinking is that if the set works, it implies that the units work, but that simply is not true. Remember Murphy’s Law: “Anything that can go wrong will go wrong.” Pessimistic, yes, but it has saved many lives. The story of the infamous Therac-25 can show you why.

Therac-25

The Therac-25 was a radiation machine developed by Atomic Energy of Canada Limited (AECL) in the 1980s. Contrary to its predecessor, Therac-20, which used a software system along with hardware (physical) security methods, the Therac-25 would be the first radiation machine that would use a completely software-controlled system. Since it was no longer needed, AECL removed all physical security. It took for granted that the Therac-20’s long history of safe operation, whose components and software was inherited by the Therac-25, was sufficient proof that its components would work perfectly. The result of this thinking was the overexposure to radiation of at least 6 people (of which three died), a Class I recall from the Food and Drug Administration (FDA) and several lawsuits. The cause? Several software errors that, according to the tests, had been present since Therac-20.

Among the many mistakes, the most notable were two:

  1. If a physician mistakenly pressed the X and tried to proceed, but then pressed the E (the correct code to apply the maximum safe amount of radiation), the machine was configured to apply a radiation beam to the patient in a lethal dose.
  2. The safety method to prevent a lethal amount of radiation from being applied used an integer variable instead of a Boolean. The value 0 meant that it was safe to continue (TRUE, secure). Any value other than 0 meant the opposite (FALSE, not secure). In theory, although bad praxis, this should work. However, the software was coded in such a way that for each security validation that worked correctly, the code assigned a 0 to the variable, but when not, instead of assigning a specific value to the variable, such as -1, the code increased the value of the variable by 1 each time. This was a lethal error. Programmers forgot that their validation code ran hundreds of times per minute and that data types cannot contain an infinite number of values. Therefore, from time to time, in both Therac-20 and Therac-25, an Integer Overflow occurred in the code and the variable was reset to 0.

The reason why fatalities never occurred in Therac-20 was due to security hardware controls since when the error occurred in the software and the machine was about to kill someone, the hardware caused an internal error that restarted the process and forced the physician to start over.

As you can see, a unit test can show you component-specific errors that may be overshadowed during an integration test. In software development, this happens frequently and it’s our responsibility to prevent those things from happening.

Both unit and integration test are very important. The lack of doing one or the other, for whatever reason, will always carry high risks. When those risks are counted in human lives, no reset button is worth it.

Leave a Reply

  • (will not be published)