## 4.4 Decimal Arithmetic

The 80x86 CPUs use the binary numbering system for their native internal representation. The binary numbering system is, by far, the most common numbering system in use in computer systems today. In days long since past, however, there were computer systems that were based on the decimal (base 10) numbering system rather than the binary numbering system. Consequently, their arithmetic system was decimal based rather than binary. Such computer systems were very popular in systems targeted for business/commercial systems1. Although systems designers have discovered that binary arithmetic is almost always better than decimal arithmetic for general calculations, the myth still persists that decimal arithmetic is better for money calculations than binary arithmetic. Therefore, many software systems still specify the use of decimal arithmetic in their calculations (not to mention that there is lots of legacy code out there whose algorithms are only stable if they use decimal arithmetic). Therefore, despite the fact that decimal arithmetic is generally inferior to binary arithmetic, the need for decimal arithmetic still persists.

Of course, the 80x86 is not a decimal computer; therefore we have to play tricks in order to represent decimal numbers using the native binary format. The most common technique, even employed by most so-called decimal computers, is to use the binary coded decimal, or BCD representation. The BCD representation (see "Nibbles" on page 56) uses four bits to represent the 10 possible decimal digits. The binary value of those four bits is equal to the corresponding decimal value in the range 0..9. Of course, with four bits we can actually represent 16 different values. The BCD format ignores the remaining six bit combinations.

```

Table 1:  Binary Code Decimal (BCD) Representation

BCD Representation
Decimal Equivalent

0000
0

0001
1

0010
2

0011
3

0100
4

0101
5

0110
6

0111
7

1000
8

1001
9

1010
Illegal

1011
Illegal

1100
Illegal

1101
Illegal

1110
Illegal

1111
Illegal

```

Since each BCD digit requires four bits, we can represent a two-digit BCD value with a single byte. This means that we can represent the decimal values in the range 0..99 using a single byte (versus 0..255 if we treat the value as an unsigned binary number). Clearly it takes a bit more memory to represent the same value in BCD as it does to represent the same value in binary. For example, with a 32-bit value you can represent BCD values in the range 0..99,999,999 (eight significant digits) but you can represent values in the range 0..4,294,967,295 (better than nine significant digits) using the binary representation.

Not only does the BCD format waste memory on a binary computer (since it uses more bits to represent a given integer value), but decimal arithmetic is slower. For these reasons, you should avoid the use of decimal arithmetic unless it is absolutely mandated for a given application.

Binary coded decimal representation does offer one big advantage over binary representation: it is fairly trivial to convert between the string representation of a decimal number and the BCD representation. This feature is particularly beneficial when working with fractional values since fixed and floating point binary representations cannot exactly represent many commonly used values between zero and one (e.g., 1/10). Therefore, BCD operations can be efficient when reading from a BCD device, doing a simple arithmetic operation (e.g., a single addition) and then writing the BCD value to some other device.

### 4.4.1 Literal BCD Constants

HLA does not provide, nor do you need, a special literal BCD constant. Since BCD is just a special form of hexadecimal notation that does not allow the values \$A..\$F, you can easily create BCD constants using HLA's hexadecimal notation. Of course, you must take care not to include the symbols 'A'..'F' in a BCD constant since they are illegal BCD values. As an example, consider the following MOV instruction that copies the BCD value '99' into the AL register:

```	mov( \$99, al );

```

The important thing to keep in mind is that you must not use HLA literal decimal constants for BCD values. That is, "mov( 95, al );" does not load the BCD representation for ninety-five into the AL register. Instead, it loads \$5F into AL and that's an illegal BCD value. Any computations you attempt with illegal BCD values will produce garbage results. Always remember that, even though it seems counter-intuitive, you use hexadecimal literal constants to represent literal BCD values.

### 4.4.2 The 80x86 DAA and DAS Instructions

The integer unit on the 80x86 does not directly support BCD arithmetic. Instead, the 80x86 requires that you perform the computation using binary arithmetic and use some auxiliary instructions to convert the binary result to BCD. To support packed BCD addition and subtraction with two digits per byte, the 80x86 provides two instructions: decimal adjust after addition (DAA) and decimal adjust after subtraction (DAS). You would execute these two instructions immediately after an ADD/ADC or SUB/SBB instruction to correct the binary result in the AL register.

Two add a pair of two-digit (i.e., single-byte) BCD values together, you would use the following sequence:

```	mov( bcd_1, al );    // Assume that bcd1 and bcd2 both contain

add( bcd_2, al );    // value BCD values.

daa();

```

The first two instructions above add the two byte values together using standard binary arithmetic. This may not produce a correct BCD result. For example, if bcd_1 contains \$9 and bcd_2 contains \$1, then the first two instructions above will produce the binary sum \$A instead of the correct BCD result \$10. The DAA instruction corrects this invalid result. It checks to see if there was a carry out of the low order BCD digit and adjusts the value (by adding six to it) if there was an overflow. After adjusting for overflow out of the L.O. digit, the DAA instruction repeats this process for the H.O. digit. DAA sets the carry flag if the was a (decimal) carry out of the H.O. digit of the operation.

The DAA instruction only operates on the AL register. It will not adjust (properly) for a decimal addition if you attempt to add a value to AX, EAX, or any other register. Specifically note that DAA limits you to adding two decimal digits (a single byte) at a time. This means that for the purposes of computing decimal sums, you have to treat the 80x86 as though it were an eight-bit processor, capable of adding only eight bits at a time. If you wish to add more than two digits together, you must treat this as a multiprecision operation. For example, to add four decimal digits together (using DAA), you must execute a sequence like the following:

```	// Assume "bcd_1:byte[2];", "bcd_2:byte[2];", and "bcd_3:byte[2];"

mov( bcd_1[0], al );

daa();

mov( al, bcd_3[0] );

mov( bcd_1[1], al );

daa();

mov( al, bcd_3[1], al );

// Carry is set at this point if there was unsigned overflow.

```

Since a binary addition of a word requires only three instructions, you can see why decimal arithmetic is so expensive2.

The DAS (decimal adjust after subtraction) adjusts the decimal result after a binary SUB or SBB instruction. You use it the same way you use the DAA instruction. Examples:

```	// Two-digit (one byte) decimal subtraction:

mov( bcd_1, al );    // Assume that bcd1 and bcd2 both contain

sub( bcd_2, al );    // value BCD values.

das();

// Four-digit (two-byte) decimal subtraction.

// Assume "bcd_1:byte[2];", "bcd_2:byte[2];", and "bcd_3:byte[2];"

mov( bcd_1[0], al );

sub( bcd_2[0], al );

das();

mov( al, bcd_3[0] );

mov( bcd_1[1], al );

sbb( bcd_2[1], al );

das();

mov( al, bcd_3[1], al );

// Carry is set at this point if there was unsigned overflow.

```

Unfortunately, the 80x86 only provides support for addition and subtraction of packed BCD values using the DAA and DAS instructions. It does not support multiplication, division, or any other arithmetic operations. Because decimal arithmetic using these instructions is so limited, you'll rarely see any programs use these instructions.

### 4.4.3 The 80x86 AAA, AAS, AAM, and AAD Instructions

In addition to the packed decimal instructions (DAA and DAS), the 80x86 CPUs support four unpacked decimal adjustment instructions. Unpacked decimal numbers store only one digit per eight-bit byte. As you can imagine, this data representation scheme wastes a considerable amount of memory. However, the unpacked decimal adjustment instructions support the multiplication and division operations, so they are marginally more useful.

The instruction mnemonics AAA, AAS, AAM, and AAD stand for "ASCII adjust for Addition, Subtraction, Multiplication, and Division" (respectively). Despite their name, these instructions do not process ASCII characters. Instead, they support an unpacked decimal value in AL whose L.O. four bits contain the decimal digit and the H.O. four bits contain zero. Note, though, that you can easily convert an ASCII decimal digit character to an unpacked decimal number by simply ANDing AL with the value \$0F.

The AAA instruction adjusts the result of a binary addition of two unpacked decimal numbers. If the addition of those two values exceeds 10, then AAA will subtract 10 from AL and increment AH by one (as well as set the carry flag). AAA assumes that the two values you add together were legal unpacked decimal values. Other than the fact that AAA works with only one decimal digit at a time (rather than two), you use it the same way you use the DAA instruction. Of course, if you need to add together a string of decimal digits, using unpacked decimal arithmetic will require twice as many operations and, therefore, twice the execution time.

You use the AAS instruction the same way you use the DAS instruction except, of course, it operates on unpacked decimal values rather than packed decimal values. As for AAA, AAS will require twice the number of operations to add the same number of decimal digits as the DAS instruction. If you're wondering why anyone would want to use the AAA or AAS instructions, keep in mind that the unpacked format supports multiplication and division, while the packed format does not. Since packing and unpacking the data is usually more expensive than working on the data a digit at a time, the AAA and AAS instruction are more efficient if you have to work with unpacked data (because of the need for multiplication and division).

The AAM instruction modifies the result in the AX register to produce a correct unpacked decimal result after multiplying two unpacked decimal digits using the MUL instruction. Because the largest product you may obtain is 81 (9*9 produces the largest possible product of two single digit values), the result will fit in the AL register. AAM unpacks the binary result by dividing it by 10, leaving the quotient (H.O. digit) in AH and the remainder (L.O. digit) in AL. Note that AAM leaves the quotient and remainder in different registers than a standard eight-bit DIV operation.

Technically, you do not have to use the AAM instruction immediately after a multiply. AAM simply divides AL by ten and leaves the quotient and remainder in AH and AL (respectively). If you have need of this particular operation, you may use the AAM instruction for this purpose (indeed, that's about the only use for AAM in most programs these days).

If you need to multiply more than two unpacked decimal digits together using MUL and AAM, you will need to devise a multiprecision multiplication that uses the manual algorithm from earlier in this chapter. Since that is a lot of work, this section will not present that algorithm. If you need a multiprecision decimal multiplication, see the next section; it presents a better solution.

The AAD instruction, as you might expect, adjusts a value for unpacked decimal division. The unusual thing about this instruction is that you must execute it before a DIV operation. It assumes that AL contains the least significant digit of a two-digit value and AH contains the most significant digit of a two-digit unpacked decimal value. It converts these two numbers to binary so that a standard DIV instruction will produce the correct unpacked decimal result. Like AAM, this instruction is nearly useless for its intended purpose as extended precision operations (e.g., division of more than one or two digits) are extremely inefficient. However, this instruction is actually quite useful in its own right. It computes AX = AH*10+AL (assuming that AH and AL contain single digit decimal values). You can use this instruction to easily convert a two-character string containing the ASCII representation of a value in the range 0..99 to a binary value. E.g.,

```	mov( '9', al );

mov( '9', ah );    // "99" is in AH:AL.

and( \$0F0F, ax );  // Convert from ASCII to unpacked decimal.

aad();             // After this, AX contains 99.

```

The decimal and ASCII adjust instructions provide an extremely poor implementation of decimal arithmetic. To better support decimal arithmetic on 80x86 systems, Intel incorporated decimal operations into the FPU. The next section discusses how to use the FPU for this purpose. However, even with FPU support, decimal arithmetic is inefficient and less precise than binary arithmetic. Therefore, you should carefully consider whether you really need to use decimal arithmetic before incorporating it into your programs.

### 4.4.4 Packed Decimal Arithmetic Using the FPU

To improve the performance of applications that rely on decimal arithmetic, Intel incorporated support for decimal arithmetic directly into the FPU. Unlike the packed and unpacked decimal formats of the previous sections, the FPU easily supports values with up to 18 decimal digits of precision, all at FPU speeds. Furthermore, all the arithmetic capabilities of the FPU (e.g., transcendental operations) are available in addition to addition, subtraction, multiplication, and division. Assuming you can live with only 18 digits of precision and a few other restrictions, decimal arithmetic on the FPU is the right way to go if you must use decimal arithmetic in your programs.

The first fact you must note when using the FPU is that it doesn't really support decimal arithmetic. Instead, the FPU provides two instruction, FBLD and FBSTP, that convert between packed decimal and binary floating point formats when moving data to and from the FPU. The FBLD (float/BCD load) instruction loads an 80-bit packed BCD value unto the top of the FPU stack after converting that BCD value to the IEEE binary floating point format. Likewise, the FBSTP (float/BCD store and pop) instruction pops the floating point value off the top of stack, converts it to a packed BCD value, and stores the BCD value into the destination memory location.

Once you load a packed BCD value into the FPU, it is no longer BCD. It's just a floating point value. This presents the first restriction on the use of the FPU as a decimal integer processor: calculations are done using binary arithmetic. If you have an algorithm that absolutely positively depends upon the use of decimal arithmetic, it may fail if you use the FPU to implement it3.

The second limitation is that the FPU supports only one BCD data type: a ten-byte 18-digit packed decimal value. It will not support smaller values nor will it support larger values. Since 18 digits is usually sufficient and memory is cheap, this isn't a big restriction.

A third consideration is that the conversion between packed BCD and the floating point format is not a cheap operation. The FBLD and FBSTP instructions can be quite slow (more than two orders of magnitude slower than FLD and FSTP, for example). Therefore, these instructions can be costly if you're doing simple additions or subtractions; the cost of conversion far outweighs the time spent adding the values a byte at a time using the DAA and DAS instructions (multiplication and division, however, are going to be faster on the FPU).

You may be wondering why the FPU's packed decimal format only supports 18 digits. After all, with ten bytes it should be possible to represent 20 BCD digits. As it turns out, the FPU's packed decimal format uses the first nine bytes to hold the packed BCD value in a standard packed decimal format (the first byte contains the two L.O. digits and the ninth byte holds the H.O. two digits). The H.O. bit of the tenth byte holds the sign bit and the FPU ignores the remaining bits in the tenth byte. If you're wondering why Intel didn't squeeze in one more digit (i.e., use the L.O. four bits of the tenth byte to allow for 19 digits of precision), just keep in mind that doing so would create some possible BCD values that the FPU could not exactly represent in the native floating point format. Hence the limitation to 18 digits.

The FPU uses a one's complement notation for negative BCD values. That is, the sign bit contains a one if the number is negative or zero and it contains a zero if the number is positive or zero (like the binary one's complement format, there are two distinct representations for zero).

HLA's tbyte type is the standard data type you would use to define packed BCD variables. The FBLD and FBSTP instructions require a tbyte operand. Unfortunately, the current version of HLA does not let you (directly) provide an initializer for a tbyte variable. One solution is to use the @NOSTORAGE option and initialize the data following the variable declaration. For example, consider the following code fragment:

```static

tbyteObject: tbyte; @nostorage

byte  \$21, \$43, \$65, 0, 0, 0, 0, 0, 0, 0;

```

This tbyteObject declaration tells HLA that this is a tbyte object but does not explicitly set aside any space for the variable (see "The Static Sections" on page 167). The following BYTE directive sets aside ten bytes of storage and initializes these ten bytes with the value \$654321 (remember that the 80x86 organizes data from the L.O. byte to the H.O. byte in memory). While this scheme is inelegant, it will get the job done. The chapters on Macros and the Compile-Time Language will discuss a better way to initialize tbyte and qword data.

Because the FPU converts packed decimal values to the internal floating point format, you can mix packed decimal, floating point, and (binary) integer formats in the same calculation. The following program demonstrate how you might achieve this:

```

program MixedArithmetic;

#include( "stdlib.hhf" )

static

tb: tbyte; @nostorage;

byte \$21,\$43,\$65,0,0,0,0,0,0,0;

begin MixedArithmetic;

fbld( tb );

fmul( 2.0 );

fbstp( tb );

stdout.put( "bcd value is " );

stdout.puttb( tb );

stdout.newln();

end MixedArithmetic;

Program 4.7	 Mixed Mode FPU Arithmetic

```

The FPU treats packed decimal values as integer values. Therefore, if your calculations produce fractional results, the FBSTP instruction will round the result according to the current FPU rounding mode. If you need to work with fractional values, you need to stick with floating point results.

1In fact, until the release of the IBM 360 in the middle 1960's, most scientific computer systems were binary based while most commercial/business systems were decimal based. IBM pushed their system\360 as a single purpose solution for both business and scientific applications. Indeed, the model designation (360) was derived from the 360 degrees on a compass so as to suggest that the system\360 was suitable for computations "at all points of the compass" (i.e., business and scientific).

2You'll also soon see that it's rare to find decimal arithmetic done this way. So it hardly matters.

3An example of such an algorithm might by a multiplication by ten by shifting the number one digit to the left. However, such operations are not possible within the FPU itself, so algorithms that misbehave inside the FPU are actually quite rare.