11.2.3 The FPU Instruction Set

11.2.3 The FPU Instruction Set

The FPU adds over 80 new instructions to the 80x86 instruction set. We can classify these instructions as data movement instructions, conversions, arithmetic instructions, comparisons, constant instructions, transcendental instructions, and miscellaneous instructions. The following sections describe each of the instructions in these categories.

11.2.4 FPU Data Movement Instructions

The data movement instructions transfer data between the internal FPU registers and memory. The instructions in this category are FLD, FST, FSTP, and FXCH. The FLD instruction always pushes its operand onto the floating point stack. The FSTP instruction always pops the top of stack after storing the top of stack (tos). The remaining instructions do not affect the number of items on the stack.

11.2.4.1 The FLD Instruction

The FLD instruction loads a 32 bit, 64 bit, or 80 bit floating point value onto the stack. This instruction converts 32 and 64 bit operands to an 80 bit extended precision value before pushing the value onto the floating point stack.

The FLD instruction first decrements the top of stack (TOS) pointer (bits 11-13 of the status register) and then stores the 80 bit value in the physical register specified by the new TOS pointer. If the source operand of the FLD instruction is a floating point data register, STi, then the actual register the FPU uses for the load operation is the register number before decrementing the tos pointer. Therefore, "fld( st0 );" duplicates the value on the top of the stack.

The FLD instruction sets the stack fault bit if stack overflow occurs. It sets the denormalized exception bit if you load an 80-bit denormalized value. It sets the invalid operation bit if you attempt to load an empty floating point register onto the stop of stack (or perform some other invalid operation).

Examples:
		fld( st1 );
 
		fld( real32_variable );
 
		fld( real64_variable );
 
		fld( real80_variable );
 
		fld( real_constant );
 

 
Note that there is no way to directly load a 32-bit integer register onto the floating point stack, even if that register contains a REAL32 value. To accomplish this, you must first store the integer register into a memory location then you can push that memory location onto the FPU stack using the FLD instruction. E.g.,
		mov( eax, tempReal32 );								// Save REAL32 value in EAX to memory.
 
		fld( tempReal32 );								// Push that real value onto the FPU stack.
 

 
Note: loading a constant via FLD is actually an HLA extension. The FPU doesn't support this instruction type. HLA creates a REAL80 object in the "constants" segment and uses the address of this memory object as the true operand for FLD.

11.2.4.2 The FST and FSTP Instructions

The FST and FSTP instructions copy the value on the top of the floating point register stack to another floating point register or to a 32, 64, or 80 bit memory variable. When copying data to a 32 or 64 bit memory variable, the 80 bit extended precision value on the top of stack is rounded to the smaller format as specified by the rounding control bits in the FPU control register.

The FSTP instruction pops the value off the top of stack when moving it to the destination location. It does this by incrementing the top of stack pointer in the status register after accessing the data in ST0. If the destination operand is a floating point register, the FPU stores the value at the specified register number before popping the data off the top of the stack.

Executing an "fstp( st0 );" instruction effectively pops the data off the top of stack with no data transfer. Examples:
		fst( real32_variable );
 
		fst( real64_variable );
 
		fst( realArray[ ebx*8 ] );
 
		fst( real80_variable );
 
		fst( st2 );
 
		fstp( st1 );
 
The last example above effectively pops ST1 while leaving ST0 on the top of the stack.

The FST and FSTP instructions will set the stack exception bit if a stack underflow occurs (attempting to store a value from an empty register stack). They will set the precision bit if there is a loss of precision during the store operation (this will occur, for example, when storing an 80 bit extended precision value into a 32 or 64 bit memory variable and there are some bits lost during conversion). They will set the underflow exception bit when storing an 80 bit value into a 32 or 64 bit memory variable, but the value is too small to fit into the destination operand. Likewise, these instructions will set the overflow exception bit if the value on the top of stack is too big to fit into a 32 or 64 bit memory variable. The FST and FSTP instructions set the denormalized flag when you try to store a denormalized value into an 80 bit register or variable¹. They set the invalid operation flag if an invalid operation (such as storing into an empty register) occurs. Finally, these instructions set the C1 condition bit if rounding occurs during the store operation (this only occurs when storing into a 32 or 64 bit memory variable and you have to round the mantissa to fit into the destination).

Note: Because of an idiosyncrasy in the FPU instruction set related to the encoding of the instructions, you cannot use the FST instruction to store data into a real80 memory variable. You may, however, store 80-bit data using the FSTP instruction.

11.2.4.3 The FXCH Instruction

The FXCH instruction exchanges the value on the top of stack with one of the other FPU registers. This instruction takes two forms: one with a single FPU register as an operand, the second without any operands. The first form exchanges the top of stack (tos) with the specified register. The second form of FXCH swaps the top of stack with ST1.

Many FPU instructions, e.g., FSQRT, operate only on the top of the register stack. If you want to perform such an operation on a value that is not on the top of stack, you can use the FXCH instruction to swap that register with tos, perform the desired operation, and then use the FXCH to swap the tos with the original register. The following example takes the square root of ST2:
		fxch( st2 );
 
		fsqrt();
 
		fxch( st2 );
 
The FXCH instruction sets the stack exception bit if the stack is empty. It sets the invalid operation bit if you specify an empty register as the operand. This instruction always clears the C1 condition code bit.

11.2.5 Conversions

The FPU performs all arithmetic operations on 80 bit real quantities. In a sense, the FLD and FST/FSTP instructions are conversion instructions as well as data movement instructions because they automatically convert between the internal 80 bit real format and the 32 and 64 bit memory formats. Nonetheless, we'll simply classify them as data movement operations, rather than conversions, because they are moving real values to and from memory. The FPU provides five other instructions that convert to or from integer or binary coded decimal (BCD) format when moving data. These instructions are FILD, FIST, FISTP, FBLD, and FBSTP.

11.2.5.1 The FILD Instruction

The FILD (integer load) instruction converts a 16, 32, or 64 bit two's complement integer to the 80 bit extended precision format and pushes the result onto the stack. This instruction always expects a single operand. This operand must be the address of a word, double word, or quad word integer variable. You cannot specify one of the 80x86's 16 or 32 bit general purpose registers. If you want to push an 80x86 general purpose register onto the FPU stack, you must first store it into a memory variable and then use FILD to push that value of that memory variable.

The FILD instruction sets the stack exception bit and C1 (accordingly) if stack overflow occurs while pushing the converted value. Examples:
		fild( word_variable );
 
		fild( dword_val[ ecx*4 ] );
 
		fild( qword_variable );
 
11.2.5.2 The FIST and FISTP Instructions

The FIST and FISTP instructions convert the 80 bit extended precision variable on the top of stack to a 16, 32, or 64 bit integer and store the result away into the memory variable specified by the single operand. These instructions convert the value on tos to an integer according to the rounding setting in the FPU control register (bits 10 and 11). As for the FILD instruction, the FIST and FISTP instructions will not let you specify one of the 80x86's general purpose 16 or 32 bit registers as the destination operand.

The FIST instruction converts the value on the top of stack to an integer and then stores the result; it does not otherwise affect the floating point register stack. The FISTP instruction pops the value off the floating point register stack after storing the converted value.

These instructions set the stack exception bit if the floating point register stack is empty (this will also clear C1). They set the precision (imprecise operation) and C1 bits if rounding occurs (that is, if there is any fractional component to the value in ST0). These instructions set the underflow exception bit if the result is too small (i.e., less than one but greater than zero or less than zero but greater than -1). Examples:
		fist( word_var[ ebx*2 ] );
 
		fist( qword_var );
 
		fistp( dword_var );
 

 
Don't forget that these instructions use the rounding control settings to determine how they will convert the floating point data to an integer during the store operation. Be default, the rounding control is usually set to "round" mode; yet most programmers expect FIST/FISTP to truncate the decimal portion during conversion. If you want FIST/FISTP to truncate floating point values when converting them to an integer, you will need to set the rounding control bits appropriately in the floating point control register, e.g.,
static
 
	fcw16:				word;
 
	fcw16_2:				word;
 
	IntResult:				int32;
 
		.
 
		.
 
		.
 
		fstcw( fcw16 );
 
		mov( fcw16, ax );
 
		or( $0c00, ax );       // Rounding control=%11 (truncate).
 
		mov( ax, fcw16_2 );    // Store into memory and reload the ctrl word.
 
		fldcw( fcw16_2 );
 

 
		fistp( IntResult );							// Truncate ST0 and store as int32 object.
 

 
		fldcw( fcw16 );							// Restore original rounding control
 

 
11.2.5.3 The FBLD and FBSTP Instructions

The FBLD and FBSTP instructions load and store 80 bit BCD values. The FBLD instruction converts a BCD value to its 80 bit extended precision equivalent and pushes the result onto the stack. The FBSTP instruction pops the extended precision real value on TOS, converts it to an 80 bit BCD value (rounding according to the bits in the floating point control register), and stores the converted result at the address specified by the destination memory operand. Note that there is no FBST instruction which stores the value on tos without popping it.

The FBLD instruction sets the stack exception bit and C1 if stack overflow occurs. It sets the invalid operation bit if you attempt to load an invalid BCD value. The FBSTP instruction sets the stack exception bit and clears C1 if stack underflow occurs (the stack is empty). It sets the underflow flag under the same conditions as FIST and FISTP. Examples:
// Assuming fewer than eight items on the stack, the following
 
// code sequence is equivalent to an fbst instruction:
 
		fld( st0 );
 
		fbstp( tbyte_var );
 
// The following example easily converts an 80 bit BCD value to
 
// a 64 bit integer:
 
		fbld( tbyte_var );
 
		fist( qword_var );
 

 
11.2.6 Arithmetic Instructions

The arithmetic instructions make up a small, but important, subset of the FPU's instruction set. These instructions fall into two general categories - those which operate on real values and those which operate on a real and an integer value.

11.2.6.1 The FADD and FADDP Instructions

These two instructions take the following forms:
		fadd()
 
		faddp()
 
		fadd( st0, sti );
 
		fadd( sti, st0 );
 
		faddp( st0, sti );
 
		fadd( mem_32_64 );
 
		fadd( real_constant );
 
The first two forms are equivalent. They pop the two values on the top of stack, add them, and push their sum back onto the stack.

The next two forms of the FADD instruction, those with two FPU register operands, behave like the 80x86's ADD instruction. They add the value in the source register operand to the value in the destination register operand. Note that one of the register operands must be ST0.

The FADDP instruction with two operands adds ST0 (which must always be the source operand) to the destination operand and then pops ST0. The destination operand must be one of the other FPU registers.

The last form above, FADD with a memory operand, adds a 32 or 64 bit floating point variable to the value in ST0. This instruction will convert the 32 or 64 bit operands to an 80 bit extended precision value before performing the addition. Note that this instruction does not allow an 80 bit memory operand.

These instructions can raise the stack, precision, underflow, overflow, denormalized, and illegal operation exceptions, as appropriate. If a stack fault exception occurs, C1 denotes stack overflow or underflow.

Like FLD( real_constant), the FADD( real_constant ) instruction is an HLA extension. Note that it creates a 64-bit variable holding the constant value and emits the FADD( mem64 ) instruction, specifying the read-only object it creates in the constants segment.

11.2.6.2 The FSUB, FSUBP, FSUBR, and FSUBRP Instructions

These four instructions take the following forms:
		fsub()
 
		fsubp()
 
		fsubr()
 
		fsubrp()
 
		fsub( st0, sti )
 
		fsub( sti, st0 );
 
		fsubp( st0, sti );
 
		fsub( mem_32_64 );
 
		fsub( real_constant );
 
		fsubr( st0, sti )
 
		fsubr( sti, st0 );
 
		fsubrp( st0, sti );
 
		fsubr( mem_32_64 );
 
		fsubr( real_constant );
 
With no operands, the FSUB and FSUBP instructions operate identically. They pop ST0 and ST1 from the register stack, compute ST1-ST0, and the push the difference back onto the stack. The FSUBR and FSUBRP instructions (reverse subtraction) operate in an almost identical fashion except they compute ST0-ST1 and push that difference.

With two register operands ( source, destination ) the FSUB instruction computes destination := destination - source. One of the two registers must be ST0. With two registers as operands, the FSUBP also computes destination := destination - source and then it pops ST0 off the stack after computing the difference. For the FSUBP instruction, the source operand must be ST0.

With two register operands, the FSUBR and FSUBRP instruction work in a similar fashion to FSUB and FSUBP, except they compute destination := source - destination.

The FSUB(mem) and FSUBR(mem) instructions accept a 32 or 64 bit memory operand. They convert the memory operand to an 80 bit extended precision value and subtract this from ST0 (FSUB) or subtract ST0 from this value (FSUBR) and store the result back into ST0.

These instructions can raise the stack, precision, underflow, overflow, denormalized, and illegal operation exceptions, as appropriate. If a stack fault exception occurs, C1 denotes stack overflow or underflow.

Note: the instructions that have real constants as operands aren't true FPU instructions. These are extensions provided by HLA. HLA generates a constant segment memory object initialized with the constant's value.

11.2.6.3 The FMUL and FMULP Instructions

The FMUL and FMULP instructions multiply two floating point values. These instructions allow the following forms:
		fmul()
 
		fmulp()
 
		fmul( sti, st0 );
 
		fmul( st0, sti );
 
		fmul( mem_32_64 );
 
		fmul( real_constant );
 
		fmulp( st0, sti );
 
With no operands, FMUL and FMULP both do the same thing - they pop ST0 and ST1, multiply these values, and push their product back onto the stack. The FMUL instructions with two register operands compute destination := destination * source. One of the registers (source or destination) must be ST0.

The FMULP( ST0, STi ) instruction computes STi := STi * ST0 and then pops ST0. This instruction uses the value for i before popping ST0. The FMUL(mem) instruction requires a 32 or 64 bit memory operand. It converts the specified memory variable to an 80 bit extended precision value and the multiplies ST0 by this value.

These instructions can raise the stack, precision, underflow, overflow, denormalized, and illegal operation exceptions, as appropriate. If rounding occurs during the computation, these instructions set the C1 condition code bit. If a stack fault exception occurs, C1 denotes stack overflow or underflow.

Note: the instruction that has a real constant as its operand isn't a true FPU instruction. It is an extension provided by HLA (see the note at the end of the previous section for details).

11.2.6.4 The FDIV, FDIVP, FDIVR, and FDIVRP Instructions

These four instructions allow the following forms:
		fdiv()
 
		fdivp()
 
		fdivr()
 
		fdivrp()
 
		fdiv( sti, st0 );
 
		fdiv( st0, sti );
 
		fdivp( st0, sti );
 
		fdivr( sti, st0 );
 
		fdivr( st0, sti );
 
		fdivrp( st0, sti );
 
		fdiv( mem_32_64 );
 
		fdivr( mem_32_64 );
 
		fdiv( real_constant );
 
		fdivr( real_constant );
 
With no operands, the FDIV and FDIVP instructions pop ST0 and ST1, compute ST1/ST0, and push the result back onto the stack. The FDIVR and FDIVRP instructions also pop ST0 and ST1 but compute ST0/ST1 before pushing the quotient onto the stack.

With two register operands, these instructions compute the following quotients:
		fdiv( sti, st0 );						// ST0 := ST0/STi
 
		fdiv( st0, sti );						// STi := STi/ST0
 
		fdivp( st0, sti );						// STi := STi/ST0  then pop ST0
 
		fdivr( st0, sti );						// ST0 := ST0/STi
 
		fdivrp( st0, sti );						// STi := ST0/STi then pop ST0
 
The FDIVP and FDIVRP instructions also pop ST0 after performing the division operation. The value for i in these two instructions is computed before popping ST0.

These instructions can raise the stack, precision, underflow, overflow, denormalized, zero divide, and illegal operation exceptions, as appropriate. If rounding occurs during the computation, these instructions set the C1 condition code bit. If a stack fault exception occurs, C1 denotes stack overflow or underflow.

Note: the instructions that have real constants as operands aren't true FPU instructions. These are extensions provided by HLA.

11.2.6.5 The FSQRT Instruction

The FSQRT routine does not allow any operands. It computes the square root of the value on top of stack (TOS) and replaces ST0 with this result. The value on TOS must be zero or positive, otherwise FSQRT will generate an invalid operation exception.

This instruction can raise the stack, precision, denormalized, and invalid operation exceptions, as appropriate. If rounding occurs during the computation, FSQRT sets the C1 condition code bit. If a stack fault exception occurs, C1 denotes stack overflow or underflow.

Example:
// Compute Z := sqrt(x**2 + y**2);
 
		fld( x );				// Load X.
 
		fld( st0 );				// Duplicate X on TOS.
 
		fmul();				// Compute X**2.
 
		fld( y );				// Load Y
 
		fld( st0 );				// Duplicate Y.
 
		fmul();				// Compute Y**2.
 
		fadd();				// Compute X**2 + Y**2.
 
		fsqrt();				// Compute sqrt( X**2 + Y**2 ).
 
		fstp( z );				// Store result away into Z.
 
11.2.6.6 The FPREM and FPREM1 Instructions

The FPREM and FPREM1 instructions compute a partial remainder. Intel designed the FPREM instruction before the IEEE finalized their floating point standard. In the final draft of the IEEE floating point standard, the definition of FPREM was a little different than Intel's original design. Unfortunately, Intel needed to maintain compatibility with the existing software that used the FPREM instruction, so they designed a new version to handle the IEEE partial remainder operation, FPREM1. You should always use FPREM1 in new software you write, therefore we will only discuss FPREM1 here, although you use FPREM in an identical fashion.

FPREM1 computes the partial remainder of ST0/ST1. If the difference between the exponents of ST0 and ST1 is less than 64, FPREM1 can compute the exact remainder in one operation. Otherwise you will have to execute the FPREM1 two or more times to get the correct remainder value. The C2 condition code bit determines when the computation is complete. Note that FPREM1 does not pop the two operands off the stack; it leaves the partial remainder in ST0 and the original divisor in ST1 in case you need to compute another partial product to complete the result.

The FPREM1 instruction sets the stack exception flag if there aren't two values on the top of stack. It sets the underflow and denormal exception bits if the result is too small. It sets the invalid operation bit if the values on tos are inappropriate for this operation. It sets the C2 condition code bit if the partial remainder operation is not complete. Finally, it loads C3, C1, and C0 with bits zero, one, and two of the quotient, respectively.

Example:
// Compute Z := X mod Y
 
		fld( y );
 
		fld( x );
 
		repeat
 

 
			fprem1();
 
			fstsw( ax );     // Get condition code bits into AX.
 
			and( 1, ah );    // See if C2 is set.
 

 
		until( @z );        // Repeat until C2 is clear.
 
		fstp( z );          // Store away the remainder.
 
		fstp( st0 );        // Pop old Y value.
 

 
11.2.6.7 The FRNDINT Instruction

The FRNDINT instruction rounds the value on the top of stack (TOS) to the nearest integer using the rounding algorithm specified in the control register.

This instruction sets the stack exception flag if there is no value on the TOS (it will also clear C1 in this case). It sets the precision and denormal exception bits if there was a loss of precision. It sets the invalid operation flag if the value on the tos is not a valid number. Note that the result on tos is still a floating point value, it simply does not have a fractional component.

11.2.6.8 The FABS Instruction

FABS computes the absolute value of ST0 by clearing the mantissa sign bit of ST0. It sets the stack exception bit and invalid operation bits if the stack is empty.

Example:
// Compute X := sqrt(abs(x));
 

 
		fld( x );
 
		fabs();
 
		fsqrt();
 
		fstp( x );
 

 
11.2.6.9 The FCHS Instruction

FCHS changes the sign of ST0's value by inverting the mantissa sign bit (that is, this is the floating point negation instruction). It sets the stack exception bit and invalid operation bits if the stack is empty. Example:
// Compute X := -X if X is positive, X := X if X is negative.
 
		fld( x );
 
		fabs();
 
		fchs();
 
		fstp( x );
 

 
11.2.7 Comparison Instructions

The FPU provides several instructions for comparing real values. The FCOM, FCOMP, and FCOMPP instructions compare the two values on the top of stack and set the condition codes appropriately. The FTST instruction compares the value on the top of stack with zero.

Generally, most programs test the condition code bits immediately after a comparison. Unfortunately, there are no conditional jump instructions that branch based on the FPU condition codes. Instead, you can use the FSTSW instruction to copy the floating point status register (see "The FPU Status Register" on page 547) into the AX register; then you can use the SAHF instruction to copy the AH register into the 80x86's condition code bits. After doing this, you can use the conditional jump instructions to test some condition. This technique copies C0 into the carry flag, C2 into the parity flag, and C3 into the zero flag. The SAHF instruction does not copy C1 into any of the 80x86's flag bits.

Since the SAHF instruction does not copy any FPU status bits into the sign or overflow flags, you cannot use signed comparison instructions. Instead, use unsigned operations (e.g., SETA, SETB) when testing the results of a floating point comparison. Yes, these instructions normally test unsigned values and floating point numbers are signed values. However, use the unsigned operations anyway; the FSTSW and SAHF instructions set the 80x86 flags register as though you had compared unsigned values with the CMP instruction.

The Pentium II and (upwards) compatible processors provide an extra set of floating point comparison instructions that directly affect the 80x86 condition code flags. These instructions circumvent having to use FSTSW and SAHF to copy the FPU status into the 80x86 condition codes. These instructions include FCOMI and FCOMIP. You use them just like the FCOM and FCOMP instructions except, of course, you do not have to manually copy the status bits to the FLAGS register. Do be aware that these instructions are not available on many processors in common use today (as of 1/1/2000). However, as time passes it may be safe to begin assuming that everyone's CPU supports these instructions. Since this text assumes a minimum Pentium CPU, it will not discuss these two instructions any further.

11.2.7.1 The FCOM, FCOMP, and FCOMPP Instructions

The FCOM, FCOMP, and FCOMPP instructions compare ST0 to the specified operand and set the corresponding FPU condition code bits based on the result of the comparison. The legal forms for these instructions are
		fcom()
 
		fcomp()
 
		fcompp()
 
		fcom( sti )
 
		fcomp( sti )
 
		fcom( mem_32_64 )
 
		fcomp( mem_32_64 )
 
		fcom( real_constant )
 
		fcomp( real_constant )
 
With no operands, FCOM, FCOMP, and FCOMPP compare ST0 against ST1 and set the processor flags accordingly. In addition, FCOMP pops ST0 off the stack and FCOMPP pops both ST0 and ST1 off the stack.

With a single register operand, FCOM and FCOMP compare ST0 against the specified register. FCOMP also pops ST0 after the comparison.

With a 32 or 64 bit memory operand, the FCOM and FCOMP instructions convert the memory variable to an 80 bit extended precision value and then compare ST0 against this value, setting the condition code bits accordingly. FCOMP also pops ST0 after the comparison.

These instructions set C2 (which winds up in the parity flag) if the two operands are not comparable (e.g., NaN). If it is possible for an illegal floating point value to wind up in a comparison, you should check the parity flag for an error before checking the desired condition.

These instructions set the stack fault bit if there aren't two items on the top of the register stack. They set the denormalized exception bit if either or both operands are denormalized. They set the invalid operation flag if either or both operands are quite NaNs. These instructions always clear the C1 condition code.

Note: the instructions that have real constants as operands aren't true FPU instructions. These are extensions provided by HLA. When HLA encounters such an instruction, it creates a real64 read-only variable in the constants segment and initializes this variable with the specified constant. Then HLA translates the instruction to one that specifies a real64 memory operand. Note that because of the precision differences (64 bits vs. 80 bits), if you use a constant operand in a floating point instruction you may not get results that are as precise as you would expect.

Example of a floating point comparison:
		fcompp();
 
		fstsw( ax );
 
		sahf();
 
		setb( al );   // AL = true if ST1 < ST0.
 
			.
 
			.
 
			.
 

 
Note that you cannot compare floating point values in an HLA run-time boolean expression (e.g., within an IF statement).

11.2.7.2 The FTST Instruction

The FTST instruction compares the value in ST0 against 0.0. It behaves just like the FCOM instruction would if ST1 contained 0.0. Note that this instruction does not differentiate -0.0 from +0.0. If the value in ST0 is either of these values, ftst will set C3 to denote equality. Note that this instruction does not pop st(0) off the stack. Example:
		ftst();
 
		fstsw( ax );
 
		sahf();
 
		sete( al );					// Set AL to 1 if TOS = 0.0
 
11.2.8 Constant Instructions

The FPU provides several instructions that let you load commonly used constants onto the FPU's register stack. These instructions set the stack fault, invalid operation, and C1 flags if a stack overflow occurs; they do not otherwise affect the FPU flags. The specific instructions in this category include:
		fldz()			;Pushes +0.0.
 
		fld1()			;Pushes +1.0.
 
		fldpi()			;Pushes p.
 
		fldl2t()			;Pushes log2(10).
 
		fldl2e()			;Pushes log2(e).
 
		fldlg2()			;Pushes log10(2).
 
		fldln2()			;Pushes ln(2).
 
11.2.9 Transcendental Instructions

The FPU provides eight transcendental (log and trigonometric) instructions to compute sin, cos, partial tangent, partial arctangent, 2x-1, y * log2(x), and y * log2(x+1). Using various algebraic identities, it is easy to compute most of the other common transcendental functions using these instructions.

11.2.9.1 The F2XM1 Instruction

F2XM1 computes 2st0-1. The value in ST0 must be in the range -1.0 ð ST0 ð +1.0. If ST0 is out of range F2XM1 generates an undefined result but raises no exceptions. The computed value replaces the value in ST0. Example:
; Compute 10x using the identity: 10x = 2x*lg(10) (lg = log2).
 
		fld( x );
 
		fldl2t();
 
		fmul();
 
		f2xm1();
 
		fld1();
 
		fadd();
 
Note that F2XM1 computes 2x-1, which is why the code above adds 1.0 to the result at the end of the computation.

11.2.9.2 The FSIN, FCOS, and FSINCOS Instructions

These instructions pop the value off the top of the register stack and compute the sine, cosine, or both, and push the result(s) back onto the stack. The FSINCOS pushes the sine followed by the cosine of the original operand, hence it leaves cos(ST0) in ST0 and sin(ST0) in ST1.

These instructions assume ST0 specifies an angle in radians and this angle must be in the range -263 < ST0 < +263. If the original operand is out of range, these instructions set the C2 flag and leave ST0 unchanged. You can use the FPREM1 instruction, with a divisor of 2p, to reduce the operand to a reasonable range.

These instructions set the stack fault/C1, precision, underflow, denormalized, and invalid operation flags according to the result of the computation.

11.2.9.3 The FPTAN Instruction

FPTAN computes the tangent of ST0 and pushes this value and then it pushes 1.0 onto the stack. Like the FSIN and FCOS instructions, the value of ST0 is assumed to be in radians and must be in the range -263<ST0<+263. If the value is outside this range, FPTAN sets C2 to indicate that the conversion did not take place. As with the FSIN, FCOS, and FSINCOS instructions, you can use the FPREM1 instruction to reduce this operand to a reasonable range using a divisor of 2p.

If the argument is invalid (i.e., zero or p radians, which causes a division by zero) the result is undefined and this instruction raises no exceptions. FPTAN will set the stack fault, precision, underflow, denormal, invalid operation, C2, and C1 bits as required by the operation.

11.2.9.4 The FPATAN Instruction

This instruction expects two values on the top of stack. It pops them and computes the following:

ST0 = tan-1( ST1 / ST0 )

The resulting value is the arctangent of the ratio on the stack expressed in radians. If you have a value you wish to compute the tangent of, use FLD1 to create the appropriate ratio and then execute the FPATAN instruction.

This instruction affects the stack fault/C1, precision, underflow, denormal, and invalid operation bits if an problem occurs during the computation. It sets the C1 condition code bit if it has to round the result.

11.2.9.5 The FYL2X Instruction

This instruction expects two operands on the FPU stack: y is found in ST1 and x is found in ST0. This function computes:

ST0 = ST1 * log2( ST0 )

This instruction has no operands (to the instruction itself). The instruction uses the following syntax:
		fyl2x();
 

 
Note that this instruction computes the base two logarithm. Of course, it is a trivial matter to compute the log of any other base by multiplying by the appropriate constant.

11.2.9.6 The FYL2XP1 Instruction

This instruction expects two operands on the FPU stack: y is found in ST1 and x is found in ST0. This function computes:

ST0 = ST1 * log2( ST0 + 1.0 )

The syntax for this instruction is
		fyl2xp1();
 

 
Otherwise, the instruction is identical to FYL2X.

11.2.10 Miscellaneous instructions

The FPU includes several additional instructions which control the FPU, synchronize operations, and let you test or set various status bits. These instructions include FINIT/FNINIT, FLDCW, FSTCW, FCLEX/FNCLEX, and FSTSW.

11.2.10.1 The FINIT and FNINIT Instructions

The FINIT instruction initializes the FPU for proper operation. Your applications should execute this instruction before executing any other FPU instructions. This instruction initializes the control register to 37Fh (see "The FPU Control Register" on page 544), the status register to zero (see "The FPU Status Register" on page 547) and the tag word to 0FFFFh. The other registers are unaffected. Examples:
	FINIT();
 
	FNINIT();
 

 
The difference between FINIT and FNINIT is that FINIT first checks for any pending floating point exceptions before initializing the FPU; FNINIT does not.

11.2.10.2 The FLDCW and FSTCW Instructions

The FLDCW and FSTCW instructions require a single 16 bit memory operand:
		fldcw( mem_16 );
 
		fstcw( mem_16 );
 
These two instructions load the control register (see "The FPU Control Register" on page 544) from a memory location (FLDCW) or store the control word to a 16 bit memory location (FSTCW).

When using the FLDCW instruction to turn on one of the exceptions, if the corresponding exception flag is set when you enable that exception, the FPU will generate an immediate interrupt before the CPU executes the next instruction. Therefore, you should use the FCLEX instruction to clear any pending interrupts before changing the FPU exception enable bits.

11.2.10.3 The FCLEX and FNCLEX Instructions

The FCLEX and FNCLEX instructions clear all exception bits the stack fault bit, and the busy flag in the FPU status register (see "The FPU Status Register" on page 547). Examples:
	fclex();
 
	fnclex();
 

 
The difference between these instructions is the same as FINIT and FNINIT.

11.2.10.4 The FSTSW and FNSTSW Instructions
		fstsw( ax )
 
		fnstsw( ax )
 
		fstsw( mem_16 )
 
		fnstsw( mem_16 )
 
These instructions store the FPU status register (see "The FPU Status Register" on page 547) into a 16 bit memory location or the AX register. These instructions are unusual in the sense that they can copy an FPU value into one of the 80x86 general purpose registers (specifically, AX). Of course, the whole purpose behind allowing the transfer of the status register into AX is to allow the CPU to easily test the condition code register with the SAHF instruction. The difference between FSTSW and FNSTSW is the same as for FCLEX and FNCLEX.

11.2.11 Integer Operations

The 80x87 FPUs provide special instructions that combine integer to extended precision conversion along with various arithmetic and comparison operations. These instructions are the following:
		fiadd( int_16_32 )
 
		fisub( int_16_32 )
 
		fisubr( int_16_32 )
 
		fimul( int_16_32 )
 
		fidiv( int_16_32 )
 
		fidivr( int_16_32 )
 
		ficom( int_16_32 )
 
		ficomp( int_16_32 )
 
These instructions convert their 16 or 32 bit integer operands to an 80 bit extended precision floating point value and then use this value as the source operand for the specified operation. These instructions use ST0 as the destination operand.

¹Storing a denormalized value into a 32 or 64 bit memory variable will always set the underflow exception bit.

11.2.3 The FPU Instruction Set

11.2.4 FPU Data Movement Instructions

11.2.4.1 The FLD Instruction

11.2.4.2 The FST and FSTP Instructions

11.2.4.3 The FXCH Instruction

11.2.5 Conversions

11.2.5.1 The FILD Instruction

11.2.5.2 The FIST and FISTP Instructions

11.2.5.3 The FBLD and FBSTP Instructions

11.2.6 Arithmetic Instructions

11.2.6.1 The FADD and FADDP Instructions

11.2.6.2 The FSUB, FSUBP, FSUBR, and FSUBRP Instructions

11.2.6.3 The FMUL and FMULP Instructions

11.2.6.4 The FDIV, FDIVP, FDIVR, and FDIVRP Instructions

11.2.6.5 The FSQRT Instruction

11.2.6.6 The FPREM and FPREM1 Instructions

11.2.6.7 The FRNDINT Instruction

11.2.6.8 The FABS Instruction

11.2.6.9 The FCHS Instruction

11.2.7 Comparison Instructions

11.2.7.1 The FCOM, FCOMP, and FCOMPP Instructions

11.2.7.2 The FTST Instruction

11.2.8 Constant Instructions

11.2.9 Transcendental Instructions

11.2.9.1 The F2XM1 Instruction

11.2.9.2 The FSIN, FCOS, and FSINCOS Instructions

11.2.9.3 The FPTAN Instruction

11.2.9.4 The FPATAN Instruction

11.2.9.5 The FYL2X Instruction

11.2.9.6 The FYL2XP1 Instruction

11.2.10 Miscellaneous instructions

11.2.10.1 The FINIT and FNINIT Instructions

11.2.10.2 The FLDCW and FSTCW Instructions

11.2.10.3 The FCLEX and FNCLEX Instructions

11.2.10.4 The FSTSW and FNSTSW Instructions

11.2.11 Integer Operations

Web Site Hits Since Jan 1, 2000

Web Site Hits Since
Jan 1, 2000