The way computers do arithmetic can be surprisingly different from how we were all taught in school. As a result, programs that do even simple-seeming arithmetic can result in software defects that range from merely embarrassing to extremely expensive. In this article I’ll explain some of the ways that computer arithmetic differs from grade school arithmetic, so that you can avoid the resulting problems. The examples I give will be in C#, but these problems are common to Java, C, C++ and many other common programming languages.
Integer Addition
What do you suppose the output of this little C# program is?
using System; class Program { static void Main() { int x = 2; int y = 2; int z = x + y; Console.WriteLine(z); } }
If you guessed 4, you’re right. Arithmetic isn’t that difficult in C#! But what about this one? (I’m going to elide the “ceremony” of declaring a class, a main method, and so on, for the remainder of this article.)
int x = 2000000000; int y = 2000000000; int z = x + y; Console.WriteLine(z); // prints -294967296
Two plus two is four, so two billion plus two billion is… -294,967,296 ? What on earth is going on here?
In many languages an integer (“int” in C#) can only represent values within a certain range:
Console.WriteLine(int.MinValue); // prints -2147483648 Console.WriteLine(int.MaxValue); // prints 2147483647
Those are the largest and smallest possible integers in this system. (These unusual-seeming numbers are in fact carefully chosen; the total number of possible values is 2^{32} and therefore an integer can be represented in 32 bits of memory.) What if we try to go outside of those bounds?
int x = 2147483645; int y = 5; int z = x + y; Console.WriteLine(z); // prints -2147483646
Integer arithmetic “wraps around”! If your result is larger than the largest possible integer then you start again from the smallest possible integer. As we were adding five we hit the largest possible integer, wrapped around to the smallest possible integer, and kept on counting up from there.
If you work it out, you’ll see that the difference between four billion and the largest possible integer is exactly the same as the difference between the smallest possible integer and our result of -294,967,296.
Why do we have this crazy system? Because first, the vast majority of integer calculations carried out in real programs never have results that are anywhere even close to the maximum and minimum values, and second, because this kind of arithmetic can be made extraordinarily fast. These oddities are the price you pay for achieving billions of computations per second on commodity hardware.
In C# the “checked” statement can be used to turn wrap-arounds into exceptions, but doing so makes arithmetic slower.
Integer Division and Multiplication
Here’s an error I see all the time. Can you predict what this prints? Can you spot the defect?
int original = 20000; int discountPercentage = 70; int result = (discountPercentage / 100) * original; Console.WriteLine(result);
Seventy percent of 20,000 is 14,000, but this prints 0! Why?
Integer division results in an integer. So what is 70 / 100? Clearly it cannot be 0.7, because that’s not an integer. When integer division is not exact in C# it always rounds towards zero, not to the nearest integer, so the first calculation of 70 / 100 results in 0, which is then multiplied by the original value. You might think that the code should read:
int result = (discountPercentage * original) / 100;
But that then becomes problematic when the numbers get larger:
int original = 200000000; int discountPercentage = 70; int result = (discountPercentage * original) / 100; Console.WriteLine(result); // prints 11150981
Clearly 11,150,981 is not 70% of 200,000,000. Again, we are wrapping around! First 70 multiplied by 200,000,000 is computed, but 14,000,000,000 is massively larger than the largest possible integer. This “wraps around” several times before landing on 1,115,098,112, which is then divided by 100. Be very careful when mixing division and multiplication.
Late Integer Promotion
Sometimes you do really do need to do integer arithmetic where a result is larger than 2,147,483,647 or smaller than -2,147,483,648. In C#, as in many other languages, there is a “long integer” type which has a truly enormous range:
Console.WriteLine(long.MinValue); // prints -9223372036854775808 Console.WriteLine(long.MaxValue); // prints 9223372036854775807
Long integers take up twice the memory of regular integers and on some hardware platforms calculations are slower. Like regular integers they “wrap around” when the bounds are exceeded in a calculation. But their range is so absurdly huge that they should be able to handle anything we throw at them, right? Let’s try our calculations again in longs:
int x = 2000000000; int y = 2000000000; long z = x + y; Console.WriteLine(z); int original = 200000000; int discountPercentage = 70; long result = (discountPercentage * original) / 100; Console.WriteLine(result);
Does this solve the problem? No! We get the same “wrong” answers as before. Why?
The arithmetic is still being done in regular integers; the conversion from regular integer to long integer is happening after the calculations are done. What we need to do here is perform all the calculations in longs:
long x = 2000000000; long y = 2000000000; long z = x + y; Console.WriteLine(z); // prints 4000000000 long original = 200000000; long discountPercentage = 70; long result = (discountPercentage * original) / 100; Console.WriteLine(result); // prints 140000000
Finally the results are correct.
What if you need to do integer arithmetic on numbers that do not fit into a long? Many languages now support arbitrary-precision integer arithmetic, where an integer has as much memory as it needs to represent an arbitrarily large number. In C# you can use the BigInteger type. The price you pay is, of course, more memory and more time, but you get to do mathematics that much more closely resembles grade school arithmetic.
Binary Floating Point Representation Error
So far we’ve only seen arithmetic on integers. We can also use computers to do arithmetic on fractions as well. In school we learned that there are two standard ways to do arithmetic on fractions. The first is to represent a fraction as a pair of integers, called the numerator and denominator: 2 / 3, for example is a fraction with numerator 2 and denominator 3. The second is decimal notation: 1.234 is in fact just a compact way to write the fraction 1234 / 1000. In any decimal notation with finitely many digits the fraction has a denominator which is a power of ten.
Many programming languages have a low-precision type for representing fractions called “float” and a high-precision type called “double” – so-called because it has twice the precision of a float. The exact differences between floats and doubles are not relevant for the purposes of this article, so for the rest of it we’ll just consider doubles. What should the output of this program be? (Notice that to emphasize that an integer is in fact a fraction we’ll append an unnecessary “.0” to the literal representation of the number in the program.)
double c = 10.0; double e = 92.0; double f = e / c; Console.WriteLine(f); Console.WriteLine(f == 9.2);
If you guessed “9.2” and “True”, you’re right. Apparently fractions work perfectly normally. Not so fast!
double g = c * c * f; Console.WriteLine(g); Console.WriteLine(g == 920.0);
This produces “920” and… “False” ? What is going on here?
To demonstrate we need to use a trick:
Console.WriteLine(g.ToString("r")); // 919.99999999999989
“g” is almost exactly but not quite equal to 920.0. Somehow an error has crept in to our calculations. Where?
Doubles do not represent numbers the same way we all learned to in school. We saw earlier that a decimal is a fraction where the denominator is a power of ten. A double is a fraction where the denominator is a power of two. When we divided 92.0 by 10.0 the exact value would be 92/10, but in its most simplified form, 46 / 5, it doesn’t have a power of two in the denominator. The runtime therefore attempts to find the best fraction that does have a power of two in the denominator which fits into the size reserved for a double. The best fraction is
9.199999999999999289457264239899814128875732421875
If you were to write that out as a fraction in its simplest form you would discover that all the factors of five disappear, leaving only factors of two in the denominator. This is the best fraction because any other fraction would either be farther away from the true value of 9.2 or require more memory than would fit into a double.
This then is the value that is actually stored in local variable “f”. It is a value extremely close to, but slightly smaller than 9.2. There is some representation error accumulated. Think about the actual number stored as being “9.2 + error”. Now we multiply that value by 100. We know from regular school arithmetic that “9.2 + error” multiplied by 100 is “920 + 100 x error”; we have made the error one hundred times bigger!
With our first case, the error in “f” is small enough that the runtime considers 9.2 and the value stored in “f” to be the same number. But when that error has grown by a factor of 100, it is now large enough that 920 and the value stored in “g” are not the same number; in fact the exact value stored in “g” is:
919.9999999999998863131622783839702606201171875
As we saw, when printing out a double the runtime will typically round off numbers that are extremely close to integers. If you use the special version of ToString with the “r” argument, this instructs the runtime to print out a more accurate, less rounded-off version of the number.
In C# if you are performing a financial calculation that requires that decimal numbers be represented with no error then you should use the aptly-named “decimal” type. It represents fractions with powers of ten in the denominator with no error. Though it is larger and slower than the double type, it gives you accurate results for financial calculations. The guidance I give people is to always use “decimal” for financial calculations and “double” for physical calculations.
Binary Floating Point Rounding Error
What do you think the output of this program fragment is?
double d = 10000000000000.0; double e = 1.0 / 1024.0; double f = d; double g = 0.0; for (int i = 0; i < 100000; ++i) { f = f + e; g = g + e; } g = g + d; Console.WriteLine(f.ToString("r")); Console.WriteLine(g.ToString("r"));
All the fractions here have powers of two in the denominator, so there is no representation error. It looks like both “f” and “g” should be the results of computing 10,000,000,000,000 + 100,000 / 1,024. But the results are different; “f” is actually unchanged! It is still 10,000,000,000,000, and “g” is 10,000,000,000,097.656. What explains this discrepancy?
A double can only hold so much precision; any operation that produces a result that requires more precision may be rounded off. In this case the initial value of “f” is so large and the value being added each time is so small that the result is rounded off to the original value every time. By contrast with “g” we start at zero and successfully compute the sum, which is then large enough to be added to “d” without being rounded back to the original value of “d”.
The takeaway here is that in floating point arithmetic, unlike integer arithmetic or school arithmetic, the order in which you do addition matters. Always add small quantities together before adding them to large quantities to ensure the most accurate possible result.
Summing Up (Pun Intended)
These are just a few of the ways in which computer math differs from pencil-and-paper grade school math; there are many others I could mention. By understanding those differences you can avoid causing software defects. Keep an eye out for:
- Integer arithmetic that can overflow
- Unexpected rounding in integer division
- Conversion to long integer types after a calculation has already overflowed
- Representation and rounding errors in floating point arithmetic
- Financial calculations performed with doubles that should be performed with decimals
- Integer arithmetic that should be performed with a “big integer” library instead of range-bounded integers
-------------
Eric Lippert is a software architect at Coverity, where he designs and implements static analysis software to detect defects in C# code. He is a former member of the C# language design team at Microsoft. Eric writes about programming and language design at www.ericlippert.com, and is a regular contributor to the Ask The Bug Guys series on the Coverity Development Testing blog.