What is the safest way to divide two IEEE 754 floating point numbers?
In my case the language is JavaScript, but I guess this isn't important. The goal is to avoid the normal floating point pitfalls.
I've read that one could use a "correction factor" (cf) (e.g. 10 uplifted to some number, for instance 10^10) like so:
(a * cf) / (b * cf)
But I'm not sure this makes a difference in division?
Edit:
Apparently people find this question so bad that they feel the need to down-vote it. Maybe I should have mentioned that I've already looked at the other floating point posts on StackOverflow and I've still not found a single SO post on how to divide two floating point numbers. If the answer is that there is no difference between the solutions for working around floating point issues when adding and when dividing, then just answer that please. Otherwise please tell my why you down-voted this question to avoid me making this mistake in the future.
Edit 2:
I've been asked in the comments which pitfalls I'm referring to, so I thought I'd just add a quick note here as well for the people who don't read the comments:
When adding 0.1 and 0.2, you would expect to get 0.3, but with floating point arithmetic you get 0.30000000000000004 (at least in JavaScript). This is just one example of a common pitfall.
The above issue is discussed many times here on StackOverflow, but I don't know what can happen when dividing and if it differs from the pitfalls found when adding or multiplying. It might be that that there is no risks, in which case that would be a perfectly good answer.
The safest way is to simply divide them. Any prescaling will either do nothing, or increase rounding error, or cause overflow or underflow.
If you prescale by a power of two you may cause overflow or underflow, but will otherwise make no difference in the result.
If you prescale by any other number, you will introduce additional rounding steps on the multiplications, which may lead to increased rounding error on the division result.
If you simply divide, the result will be the closest representable number to the ratio of the two inputs.
IEEE 754 64-bit floating point numbers are incredibly precise. A difference in one part in almost 10^16 can be represented.
There are a few operations, such as floor and exact comparison, that make even extremely low significance bits matter. If you have been reading about floating point pitfalls you should have already seen examples. Avoid those. Round your output to an appropriate number of decimal places. Be careful adding numbers of very different magnitude.
The following program demonstrates the effects of using each power of 10 from 10 through 1e20 as scale factor. Most get the same result as not multiplying, 6.0, which is also the rational number arithmetic result. Some get a slightly larger result.
You can experiment with different division problems by changing the initializers for a
and b
. The program prints their exact values, after rounding to double.
import java.math.BigDecimal;
public class Test {
public static void main(String[] args) {
double mult = 10;
double a = 2;
double b = 1.0 / 3.0;
System.out.println("a=" + new BigDecimal(a));
System.out.println("b=" + new BigDecimal(b));
System.out.println("No multiplier result="+(a/b));
for (int i = 0; i < 20; i++) {
System.out.println("mult="+mult + " result="+((a * mult) / (b * mult)));
mult *= 10;
}
}
}
Output:
a=2
b=0.333333333333333314829616256247390992939472198486328125
No multiplier result=6.0
mult=10.0 result=6.000000000000001
mult=100.0 result=6.000000000000001
mult=1000.0 result=6.0
mult=10000.0 result=6.000000000000001
mult=100000.0 result=6.000000000000001
mult=1000000.0 result=6.0
mult=1.0E7 result=6.000000000000001
mult=1.0E8 result=6.0
Floating point division will produce exactly the same "pitfalls" as addition or multiplication operations, and no amount of pre-scaling will fix it - the end result is the end result and it's the internal representation of that in IEEE-754 that causes the "problem".
The solution is to completely forget about these precision issues during calculations themselves, and to perform rounding as late as possible, i.e. only when displaying the results of the calculation, at the point at which the number is converted to a string using the .toFixed()
function provided precisely for that purpose.