In general, computations with floating-point numbers are only approximate. The precision of a floating-point number is not necessarily correlated at all with the accuracy of that number. For instance, 3.142857142857142857 is a more precise approximation to π than 3.14159, but the latter is more accurate. The precision refers to the number of bits retained in the representation. When an operation combines a short floating-point number with a long one, the result will be a long floating-point number. This rule is made to ensure that as much accuracy as possible is preserved; however, it is by no means a guarantee. Common Lisp numerical routines do assume, however, that the accuracy of an argument does not exceed its precision. Therefore when two small floating-point numbers are combined, the result will always be a small floating-point number. This assumption can be overridden by first explicitly converting a small floating-point number to a larger representation. (Common Lisp never converts automatically from a larger size to a smaller one.)
Rational computations cannot overflow in the usual sense (though of course there may not be enough storage to represent one), as integers and ratios may in principle be of any magnitude. Floating-point computations may get exponent overflow or underflow; this is an error.
X3J13 voted in June 1989 to address certain problems relating to floating-point overflow and underflow, but certain parts of the proposed solution were not adopted, namely to add the macro without-floating-underflow-traps to the language and to require certain behavior of floating-point overflow and underflow. The committee agreed that this area of the language requires more discussion before a solution is standardized.
For the record, the proposal that was considered and rejected (for the nonce) introduced a macro without-floating-underflow-traps that would execute its body in such a way that, within its dynamic extent, a floating-point underflow must not signal an error but instead must produce either a denormalized number or zero as the result. The rejected proposal also specified the following treatment of overflow and underflow:
These points refer to conditions floating-point-overflow and floating-point-underflow that were approved by X3J13 and are described in section 29.5.
When rational and floating-point numbers are compared or combined by a numerical function, the rule of floating-point contagion is followed: when a rational meets a floating-point number, the rational is first converted to a floating-point number of the same format. For functions such as + that take more than two arguments, it may be that part of the operation is carried out exactly using rationals and then the rest is done using floating-point arithmetic.
X3J13 voted in January 1989 to apply the rule of floating-point contagion stated above to the case of combining rational and floating-point numbers. For comparing, the following rule is to be used instead: When a rational number and a floating-point number are to be compared by a numerical function, in effect the floating-point number is first converted to a rational number as if by the function rational, and then an exact comparison of two rational numbers is performed. It is of course valid to use a more efficient implementation than actually calling the function rational, as long as the result of the comparison is the same. In the case of complex numbers, the real and imaginary parts are handled separately. ________________________________________________________________
Rationale: In general, accuracy cannot be preserved in combining operations, but it can be preserved in comparisons, and preserving it makes that part of Common Lisp algebraically a bit more tractable. In particular, this change prevents the breakdown of transitivity. Let a be the result of (/ 10.0 single-float-epsilon), and let j be the result of (floor a). (Note that (= a (+ a 1.0)) is true, by the definition of single-float-epsilon.) Under the old rules, all of (<= a j), (< j (+ j 1)), and (<= (+ j 1) a) would be true; transitivity would then imply that (< a a) ought to be true, but of course it is false, and therefore transitivity fails. Under the new rule, however, (<= (+ j 1) a) is false.
For functions that are mathematically associative (and possibly commutative), a Common Lisp implementation may process the arguments in any manner consistent with associative (and possibly commutative) rearrangement. This does not affect the order in which the argument forms are evaluated, of course; that order is always left to right, as in all Common Lisp function calls. What is left loose is the order in which the argument values are processed. The point of all this is that implementations may differ in which automatic coercions are applied because of differing orders of argument processing. As an example, consider this expression:
One implementation might process the arguments from left to right, first adding 1/3 and 2/3 to get 1, then converting that to a double-precision floating-point number for combination with 1.0D0, then successively converting and adding 1.0 and 1.0E-15. Another implementation might process the arguments from right to left, first performing a single-precision floating-point addition of 1.0 and 1.0E-15 (and probably losing some accuracy in the process!), then converting the sum to double precision and adding 1.0D0, then converting 2/3 to double-precision floating-point and adding it, and then converting 1/3 and adding that. A third implementation might first scan all the arguments, process all the rationals first to keep that part of the computation exact, then find an argument of the largest floating-point format among all the arguments and add that, and then add in all other arguments, converting each in turn (all in a perhaps misguided attempt to make the computation as accurate as possible). In any case, all three strategies are legitimate. The user can of course control the order of processing explicitly by writing several calls; for example:
The user can also control all coercions simply by writing calls to coercion functions explicitly.
In general, then, the type of the result of a numerical function is a floating-point number of the largest format among all the floating-point arguments to the function; but if the arguments are all rational, then the result is rational (except for functions that can produce mathematically irrational results, in which case a single-format floating-point number may result).
There is a separate rule of complex contagion. As a rule, complex numbers never result from a numerical function unless one or more of the arguments is complex. (Exceptions to this rule occur among the irrational and transcendental functions, specifically expt, log, sqrt, asin, acos, acosh, and atanh; see section 12.5.) When a non-complex number meets a complex number, the non-complex number is in effect first converted to a complex number by providing an imaginary part of zero.
If any computation produces a result that is a ratio of two integers such that the denominator evenly divides the numerator, then the result is immediately converted to the equivalent integer. This is called the rule of rational canonicalization.
If the result of any computation would be a complex rational with a zero imaginary part, the result is immediately converted to a non-complex rational number by taking the real part. This is called the rule of complex canonicalization. Note that this rule does not apply to complex numbers whose components are floating-point numbers. Whereas #C(5 0) and 5 are not distinct values in Common Lisp (they are always eql), #C(5.0 0.0) and 5.0 are always distinct values in Common Lisp (they are never eql, although they are equalp).