The application itself may also test to see if a math coprocessor is installed. During initialization, it tests for the math coprocessor in the same way the BIOS does. In fact, it is becoming more common for applications themselves to test for the presence of the math coprocessor as well as other parameters, and then configure themselves automatically to operate with the hardware that is installed.
If no math coprocessor is present, the CPU will perform the math calculations using lengthy software instructions that emulate the math coprocessor’s built-in functions. While the emulation may occur within the operating system, it is more likely to occur within the application program. Thus, when a math coprocessor instruction is encountered, the application will execute the emulation subroutine. Software emulation performance speeds are much slower than those of math coprocessors. This is the primary reason why many CAD programs can’t operate without a math coprocessor–their execution speed would be unacceptably slow.
Computers operate using base 2, or binary numbers. That is, only two numerals (0 and 1) represent all values. We’re accustomed to the decimal system, which has 10 numerals, 0 through 9. The binary system is used with computers (and digital electronics in general) because binary values can be represented as “off” and “on.”
INTEGERS AND REAL NUMBERS
In mathematics, there are several number systems. The two that are relevant here are integers and real numbers. Integers are the set of all whole numbers, both negative and positive, and zero; there are no fractions. The real number system includes integers and all fractions.
A microprocessor is optimized to handle integer arithmetic. In other words, CPUs are adept at performing mathematical operations on whole numbers (in this case, whole binary numbers). Calculations using real numbers are executed using integers to approximate the real values. The math coprocessor, on the other hand, is optimized to handle real numbers.
FLOATING POINT REPRESENTATION
The 32-bit word of a 386 or 486 CPU can represent the integers -231 through +231 (or approximately -2 billion to +2 billion in decimal). The 16-bit word in the 286 CPU can represent the values 32,768 through +32,768. In both cases, one bit must be reserved for the sign. Neither range covers enough values to be useful in personal computer applications.
To accommodate larger values, multiple precision representation is used. For example, double precision uses two 32-bit words to represent a single integer value. This results in 63 bits, plus a bit for the sign. Larger values can be handled using multiple words. This, however, requires more CPU time. Arithmetic performed on each word of a multiple precision number can result in a carry (or borrow), which must be added (or subtracted) from the upper word. Software can handle this easily, but must carry out several instructions to do so. Multiple precision arithmetic takes longer than single precision.
Math-intensive applications use real numbers, not just integers. Without a math coprocessor, a method must be used to represent real numbers within the capabilities of the CPU’s format. To do this, real numbers are scaled so they can be represented as integers. Scaling simply means multiplying an integer by another value.
Using this method, the 32-bit word could be scaled to represent a much larger range of numbers, and could represent real numbers as well as integers. There are, however, limitations imposed when the CPU does arithmetic with scaled, fixed precision integers. For example, if two large 32-bit numbers are multiplied, the result will be larger than 32 bits; an overflow will then occur, resulting in an error. Another error can result when dividing two very large, but nearly equal numbers. The calculated answer will be too small to represent and an underflow occurs. The third limitation occurs due to rounding errors. These limitations can be avoided by having the CPU read multiple words, however, this slows down the computer considerably.
A better way of representing a large range of real numbers is to use scientific notation. Scientific notation is simply a way of scaling values. The scaling factor is always a power of 10, and the number being scaled always has a single digit to the left of the decimal point, so it is not written as an integer. Here are some examples of scientific notation: 3.2 x [10.sup.1] = 32 -6.250 x [10.sup.3] = -6,250 2.5 x [10.sup.-1] = 0.25 (note that [10.sup.-1] is 0.1) 3.0 x [10.sup.-4] = 0.0003
Note that in floating point representation, the decimal point “floats,” so that it always follows the first digit. This makes it easy to keep track of where it belongs.
INTEGER VS. FLOATING POINT
As mentioned, computers work with two different representations of numbers: integers and floating point numbers. Integers are “whole” numbers such as 1, 13, and 529. Much of the math used in computer application programs is performed on integers. For example, if you give a spreadsheet the command to “go to line 115″ from line 10, the program moves down 105 lines (115 – 10). Lines, of course, are only expressed in whole numbers. On the other hand, other mathematical operations require fractions. Fractions are always represented as decimals rather than as a ratio (such as 1/2).
It’s sometimes difficult to work with two numbers that differ greatly in size, such as 1,593.0 and 0.0001. Scientific notation was invented to make such calculations easier. In scientific notation, only one digit precedes the decimal point, while the remaining digits follow behind the point. This number is then multiplied (or “scaled”) by a power of 10. For example, the scientific notation of the two numbers above would be 1.593 x [10.sup.3] and 1.0 x [10.sup.-4]. The power of 10 is always the number of digits that the decimal point has been moved–positive when moving to the left and negative when moving to the right. This notation, using a base number (called the significand) and the power (called the exponent), is also called floating point, since the decimal point “floats” to that position which leaves one digit to its left.
Computers work with binary numbers rather than the more familiar decimals. Floating point numbers in binary are represented the same way as they are in decimal, except they have a “binary point” instead of a decimal point and they are calculated in base 2, rather than base 10.
In floating point representation, there are three parts to the number: the sign, preceding the number; the number to be scaled, called the significand; and the scaling factor, called the exponent.
A 32-bit word can be used to hold the sign, the significand, and the exponent (all in binary), and represents a wide range of real numbers. Double precision extends the range even further.
Floating point representation makes real-number arithmetic easier, too. For addition and subtraction, the scaled numbers are simply added or subtracted. For example: (2.345 x [10.sup.4]) + (3.227 x [10.sup.4]) = 5.572 x [10.sup.4]
If the exponent is not the same, the significand must be adjusted. To add 4.453 x [10.sup.5] and 2.372 x [10.sup.3], the second number must be adjusted so its exponent is [10.sup.5]; thus, the addition would be: (4.453 x [10.sup.5]) + (0.02372 x [10.sup.5]) = 4.47672 x [10.sup.5]
Scientific notation makes calculations of large numbers simple. The following example shows how floating point notation eases the execution of mathematical functions that have very large or very small numbers (or worse, a combination of the two). Avogardro’s Number is an example of a very large number and the intrinsic charge on an electron is an example of a very small number. Avogadro’s Number is 6.022 x [10.sup.23] and the charge on an electron is 1.602 x [10.sup.-19]. Neither of these numbers could “fit” into the CPU without floating point representation; one is too large and the other too small.
To multiply the two, simply multiply the significand values and add the exponents: (6.022 x [10.sup.23]) x (1.602 x [10.sup.-19]) = 9.647 x [10.sup.4]
This is how the computer handles floating point arithmetic, except it uses binary instead of decimal. While this eases the processor’s workload, it still requires several memory fetches for the CPU; double precision requires even more. The CPU actually emulates floating point arithmetic by using multiple registers and multiple instructions to perform a single floating point arithmetic operation.
The math coprocessor is designed to eliminate potential problems (such as overflow and underflow), as well as the time-consuming emulation process necessary to complete floating point operations.
First, the internal registers on a coprocessor are very large, making overflow and underflow almost impossible. In fact, the internal registers of the math coprocessor can represent numbers as large as [10.sup.4,932] or as small as [10.sup.-4,932]. To put this into perspective, the larger number is a “1″ followed by nearly 5,000 zeros, and the smaller number is a decimal point followed by almost 5,000 zeros and a “1.” These large registers also practically eliminate any problems with rounding.
Second, the math coprocessor’s internal operating instructions are written specifically to work with floating point numbers. Because the microcode is optimized, the math coprocessor executes floating point arithmetic very quickly.
Finally, the math coprocessor is built with direct instructions for trigonometric and logarithmic functions. These calculations would normally require that a fairly long algorithm be calculated by the CPU. This is why programs with trig functions or logarithms show the most substantial improvement after the addition of a math coprocessor.
The math coprocessor usually has six different types of instructions. Three of these are non-mathematical and are used for moving data, comparing data, and controlling the coprocessor. The other three instruction types are mathematical.
The first of these involves constant instructions. These allow a mathematical constant, such as 1.0 or Pi, to be quickly retrieved for calculations, making the process much faster, since the constants don’t have to be retrieved from memory.
Non-transcendental functions consist of common mathematical operations, including addition, subtraction, multiplication, division, square root, absolute value, rounding, and other numerical manipulations.
Finally, transcendental functions allow the math coprocessor to execute trigonometric and logarithmic operations. These include sine, cosine, tangent, and several base 2 logs and antilogs.
This rich set of mathematical operations allows the math coprocessor to execute many operations with a single instruction–operations that, if emulated with the CPU, would require many instructions.
The i486 integrated CPU/math coprocessor is even more advanced, as the two components are completely coupled on one silicon device and can achieve higher performance than the i387 non-integrated model. The actual floating point registers used within the math coprocessors are identical, thereby ensuring software compatibility.
Math coprocessor operation centers around six internal register types: Data, Tag Word, Status Word, Instruction and Data pointers, and Control Word.
Data registers are composed of eight 80-bit registers. Depending on how much precision is required by the software, a portion or all of these registers will be used. These registers can be thought of as a stack; the math coprocessor’s numeric instructions can address the data either in the registers relative to the “top” of the stack or on the data in the “top” register. This provides more flexibility for programmers creating subroutines in their code.
Tag Word marks the content of each of the data registers and helps optimize the math coprocessor’s performance by identifying empty registers. Tag Word also simplifies exception handling by eliminating complex decoding operations typically required in a CPU exception routine.
The 16-bit Status Word is used to report the overall status of the math coprocessor. Through a series of codes, a host of exception conditions and busy codes can be reported by Status Word. For example, if the math coprocessor detects an underflow, overflow, precision error, or other invalid operation, it will indicate this in Status Word.
The Instruction and Data pointers are used to pass information about instructions or data in memory back to the CPU in case of an exception. Because the math coprocessor can operate in one of four modes: 32-bit protected, 32-bit real, 16-bit protected, or 16-bit real, these registers will appear differently, depending on the operating mode. Programmers can use the information in these registers to initiate their own error handlers or subroutines.
Control Word is used by the software to define numeric precision, rounding, and exception masking operations. The precision options are used primarily to provide compatibility with earlier generations of math coprocessors that have less than 64-bit precision.
In addition to the main registers discussed here, the math coprocessor also provides six debug and five test registers. These registers are intended for programmers’ use during application development.
MATH-COPROCESSOR OPERATION IN YOUR COMPUTER
For programmers, newer math coprocessors are viewed as part of the CPU. That is, programmers can write their code with math-coprocessor instructions included along with the CPU instructions. The code can easily test for the presence of a math coprocessor in the PC. Then, if the application created is running on a PC with a math coprocessor installed, the math-specific instruction will execute on the math coprocessor.
In assembly language, all math-coprocessor instructions start with an “F,” as in FADD, whereas the corresponding CPU instruction is ADD. Above right is a short example of some 386/387 assembly-language code that uses the math coprocessor to calculate the circumference of a circle.
In most programs, if the math coprocessor is absent, the CPU will automatically emulate the math function using a long series of CPU instructions. As expected, however, the math coprocessor will execute the specific math function much faster than the CPU.
For example, a floating point division calculation takes about 24 microseconds with an 8086 CPU and 8087 math coprocessor combination. Without the math coprocessor, the 8086 takes about 2,000 microseconds to complete the calculation.
If the programmer is certain that a math coprocessor is present, the code can be highly optimized to rely heavily on the functions performed best by the math coprocessor.
Instructions for the math coprocessor differ from those for the CPU. o alert the CPU that a math-coprocessor instruction is coming; it is preceded by an ESCape command. When the CPU reads this instruction, it knows the following instruction and data (if any) are for the math coprocessor. However, all communications between the CPU and math coprocessor are transparent to the application program. Master synchronization between the chips is handled by hardware. In most systems, the math coprocessor operates at the same clock rate as the CPU, although some math coprocessors have the capability to operate from a separate, asynchronous clock.
The CPU then passes the instruction to the math coprocessor, which signals the CPU when it is ready to accept the data. The data, or operand, is either held by the math coprocessor, or used in an arithmetic operation in conjunction with a number already in the math coprocessor. When the math coprocessor has all the data it needs, it executes the proper mathematical function by accessing the internal microcode defined by that particular instruction.
The instruction for the math coprocessor does not always require data to be fetched. For example, if your spreadsheet cell had the equation SQRT(C4*D2), the math coprocessor would first retrieve the data for cells C4 and D2. It would then multiply them and hold the result. Next, it would be given the SQRT (square root) instruction. The data for this instruction (the product of C4 and D2) is already held, so it’s unnecessary to fetch it from memory.
Therefore, not only does the specialized SQRT function itself save a lot of time, but because the data was already held in the math coprocessor, the calculation as a whole takes less time. The CPU, executing this same function, might require many more memory accesses and a great deal more time, since it would have to execute an algorithm to calculate the square root.
You might be wondering at this point what the CPU is doing while the math coprocessor is performing the calculation. With many applications, it is briefly waiting for the math coprocessor to finish. However, newer application programs take advantage of this time to execute CPU instructions concurrently.
That is, while the math coprocessor is performing its calculations, the CPU continues to execute the application program. If the CPU gets to an instruction that requires the results from the math coprocessor, it has to wait until the math coprocessor is finished.
In spite of the brief waiting, the CPU/math coprocessor combination will still execute the program faster than the CPU could by itself. Borland’s Quattro Pro is an example of a program that takes advantage of concurrent processing when a math coprocessor is present.
Installing a math coprocessor in your PC may be the most effective performance boost you can buy, without moving to a faster, more expensive machine. This is especially true for programs that are specifically written to take advantage of a math coprocessor. You can contact your applications-software developer to determine precisely how much you will benefit by adding a math coprocessor to your PC.