부동 소수점 숫자가 왜 정확하지 않습니까?
부동 소수점 숫자로 저장할 때 일부 숫자의 정확도가 떨어지는 이유는 무엇입니까?
예를 들어, 십진수 9.2
는 두 개의 십진 정수 ( 92/10
) 의 비율로 정확하게 표현 될 수 있으며 , 둘 모두 이진 ( 0b1011100/0b1010
)으로 정확하게 표현 될 수 있습니다 . 그러나 부동 소수점 숫자로 저장된 동일한 비율은 정확히 다음과 같습니다 9.2
.
32-bit "single precision" float: 9.19999980926513671875
64-bit "double precision" float: 9.199999999999999289457264239899814128875732421875
64 비트 의 메모리 에서 표현하기에 그렇게 단순한 숫자가 어떻게 "너무 클"수 있습니까?
대부분의 프로그래밍 언어에서 부동 소수점 숫자는 과학적 표기법 과 비슷하게 표현됩니다 . 지수와 가수 (의미라고도 함). 매우 간단한 숫자 9.2
는 실제로이 분수입니다.
5,179,139,571,476,070 * 2 -49
지수가 -49
있고 가수가있는 곳 5179139571476070
. 그것을 표현하는 것은 불가능 이유는 일부 진수이 방법은 지수와 가수 모두 정수이어야한다는 것입니다. 다시 말해, 모든 float는 정수에 2 의 정수를 곱한 정수 여야합니다 .
9.2
n 은 단순히 92/10
이지만 n 이 정수 값으로 제한 되면 10 을 2 n 으로 표현할 수 없습니다 .
데이터보기
먼저 32 비트 및 64 비트를 만드는 구성 요소를 확인 하는 몇 가지 기능이 있습니다float
. 출력에만 관심이 있다면 (파이썬의 예) 다음과 같이 광택을 내십시오.
def float_to_bin_parts(number, bits=64):
if bits == 32: # single precision
int_pack = 'I'
float_pack = 'f'
exponent_bits = 8
mantissa_bits = 23
exponent_bias = 127
elif bits == 64: # double precision. all python floats are this
int_pack = 'Q'
float_pack = 'd'
exponent_bits = 11
mantissa_bits = 52
exponent_bias = 1023
else:
raise ValueError, 'bits argument must be 32 or 64'
bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0'))
return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
이 기능에는 많은 복잡성이 있으며 설명하는 것이 매우 중요하지만 관심이 있으시면 우리 목적의 중요한 자원은 struct 모듈입니다.
파이썬 float
은 64 비트 배정도 숫자입니다. C, C ++, Java 및 C #과 같은 다른 언어에서는 배정 밀도가 별도의 유형을 가지며 double
종종 64 비트로 구현됩니다.
예를 들어 해당 함수를 호출하면 다음과 9.2
같이 얻을 수 있습니다.
>>> float_to_bin_parts(9.2)
['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
데이터 해석
반환 값을 세 가지 구성 요소로 나눈 것을 볼 수 있습니다. 이러한 구성 요소는 다음과 같습니다.
- 기호
- 멱지수
- 가수 (Significand 또는 분수라고도 함)
기호
부호는 첫 번째 구성 요소에 단일 비트로 저장됩니다. 설명하기 쉽습니다 0
. float는 양수를 의미합니다. 1
그것은 부정적인 것을 의미합니다. 9.2
양수 이므로 부호 값은 0
입니다.
멱지수
지수는 중간 구성 요소에 11 비트로 저장됩니다. 우리의 경우, 0b10000000010
. 10 진수로 값을 나타냅니다 1026
. 이 구성 요소의 특질은 당신이 동일한 숫자 빼기해야한다는 것입니다 2 - 1 (비트 수) - 1 진정한 지수를 얻을 수를; 우리의 경우, 이는 실제 지수를 얻기 위해 0b1111111111
(소수 1023
)를 빼는 것을 의미 합니다 0b00000000011
(소수 숫자 3).
가수
가수는 세 번째 구성 요소에 52 비트로 저장됩니다. 그러나이 구성 요소에도 단점이 있습니다. 이 문제를 이해하려면 다음과 같이 과학적 표기법을 사용하십시오.
6.0221413x10 23
가수는 6.0221413
입니다. 과학적 표기법의 가수는 항상 0이 아닌 하나의 숫자로 시작합니다. 동일은 바이너리은 두 자리 숫자를 가지고 제외하고, 바이너리에 대한 진정한 보유 : 0
와 1
. 그래서 바이너리 가수는 항상 시작 1
! float가 저장 될 때 1
공간을 절약하기 위해 이진 가수 앞의가 생략됩니다. 진정한 가수 를 얻으려면 세 번째 요소 앞에 다시 배치해야 합니다 .
1.0010011001100110011001100110011001100110011001100110
This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.
When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 252 to move it 52 places to the right:
0.0010011001100110011001100110011001100110011001100110
In decimal notation, that's the same as dividing 675539944105574
by 4503599627370496
to get 0.1499999999999999
. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)
Now that we've transformed the third component into a fractional number, adding 1
gives the true mantissa.
Recapping the Components
- Sign (first component):
0
for positive,1
for negative - Exponent (middle component): Subtract 2(# of bits) - 1 - 1 to get the true exponent
- Mantissa (last component): Divide by 2(# of bits) and add
1
to get the true mantissa
Calculating the Number
Putting all three parts together, we're given this binary number:
1.0010011001100110011001100110011001100110011001100110 x 1011
Which we can then convert from binary to decimal:
1.1499999999999999 x 23 (inexact!)
And multiply to reveal the final representation of the number we started with (9.2
) after being stored as a floating point value:
9.1999999999999993
Representing as a Fraction
9.2
Now that we've built the number, it's possible to reconstruct it into a simple fraction:
1.0010011001100110011001100110011001100110011001100110 x 1011
Shift mantissa to a whole number:
10010011001100110011001100110011001100110011001100110 x 1011-110100
Convert to decimal:
5179139571476070 x 23-52
Subtract the exponent:
5179139571476070 x 2-49
Turn negative exponent into division:
5179139571476070 / 249
Multiply exponent:
5179139571476070 / 562949953421312
Which equals:
9.1999999999999993
9.5
>>> float_to_bin_parts(9.5)
['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.
Assemble the binary scientific notation:
1.0011 x 1011
Shift the decimal point:
10011 x 1011-100
Subtract the exponent:
10011 x 10-1
Binary to decimal:
19 x 2-1
Negative exponent to division:
19 / 21
Multiply exponent:
19 / 2
Equals:
9.5
Further reading
- The Floating-Point Guide: What Every Programmer Should Know About Floating-Point Arithmetic, or, Why don’t my numbers add up? (floating-point-gui.de)
- What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg 1991)
- IEEE Double-precision floating-point format (Wikipedia)
- Floating Point Arithmetic: Issues and Limitations (docs.python.org)
- Floating Point Binary
This isn't a full answer (mhlester already covered a lot of good ground I won't duplicate), but I would like to stress how much the representation of a number depends on the base you are working in.
Consider the fraction 2/3
In good-ol' base 10, we typically write it out as something like
- 0.666...
- 0.666
- 0.667
When we look at those representations, we tend to associate each of them with the fraction 2/3, even though only the first representation is mathematically equal to the fraction. The second and third representations/approximations have an error on the order of 0.001, which is actually much worse than the error between 9.2 and 9.1999999999999993. In fact, the second representation isn't even rounded correctly! Nevertheless, we don't have a problem with 0.666 as an approximation of the number 2/3, so we shouldn't really have a problem with how 9.2 is approximated in most programs. (Yes, in some programs it matters.)
Number bases
So here's where number bases are crutial. If we were trying to represent 2/3 in base 3, then
(2/3)10 = 0.23
In other words, we have an exact, finite representation for the same number by switching bases! The take-away is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.
To drive this point home, let's look at 1/2. It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.
(1/2)10 = 0.510 = 0.12 = 0.1111...3
Why are floating point numbers inaccurate?
Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.
While all of the other answers are good there is still one thing missing:
It is impossible to represent irrational numbers (e.g. π, sqrt(2)
, log(3)
, etc.) precisely!
And that actually is why they are called irrational. No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision.
Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. You would need to store a pair of (possibly very big) integers a
and b
to hold the number represented by the fraction a/b
. All your arithmetic would have to be done on fractions just like in highschool math (e.g. a/b * c/d = ac/bd
).
But of course you would still run into the same kind of trouble when pi
, sqrt
, log
, sin
, etc. are involved.
TL;DR
For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every not-representable number is approximated. Some numbers (i.e. irrational) can never be represented no matter the system.
There are infinitely many real numbers (so many that you can't enumerate them), and there are infinitely many rational numbers (it is possible to enumerate them).
The floating-point representation is a finite one (like anything in a computer) so unavoidably many many many numbers are impossible to represent. In particular, 64 bits only allow you to distinguish among only 18,446,744,073,709,551,616 different values (which is nothing compared to infinity). With the standard convention, 9.2 is not one of them. Those that can are of the form m.2^e for some integers m and e.
You might come up with a different numeration system, 10 based for instance, where 9.2 would have an exact representation. But other numbers, say 1/3, would still be impossible to represent.
Also note that double-precision floating-points numbers are extremely accurate. They can represent any number in a very wide range with as much as 15 exact digits. For daily life computations, 4 or 5 digits are more than enough. You will never really need those 15, unless you want to count every millisecond of your lifetime.
Why can we not represent 9.2 in binary floating point?
Floating point numbers are (simplifying slightly) a positional numbering system with a restricted number of digits and a movable radix point.
A fraction can only be expressed exactly using a finite number of digits in a positional numbering system if the prime factors of the denominator (when the fraction is expressed in it's lowest terms) are factors of the base.
The prime factors of 10 are 5 and 2, so in base 10 we can represent any fraction of the form a/(2b5c).
On the other hand the only prime factor of 2 is 2, so in base 2 we can only represent fractions of the form a/(2b)
Why do computers use this representation?
Because it's a simple format to work with and it is sufficiently accurate for most purposes. Basically the same reason scientists use "scientific notation" and round their results to a reasonable number of digits at each step.
It would certainly be possible to define a fraction format, with (for example) a 32-bit numerator and a 32-bit denominator. It would be able to represent numbers that IEEE double precision floating point could not, but equally there would be many numbers that can be represented in double precision floating point that could not be represented in such a fixed-size fraction format.
However the big problem is that such a format is a pain to do calculations on. For two reasons.
- If you want to have exactly one representation of each number then after each calculation you need to reduce the fraction to it's lowest terms. That means that for every operation you basically need to do a greatest common divisor calculation.
- If after your calculation you end up with an unrepresentable result because the numerator or denominator you need to find the closest representable result. This is non-trivil.
Some Languages do offer fraction types, but usually they do it in combination with arbitary precision, this avoids needing to worry about approximating fractions but it creates it's own problem, when a number passes through a large number of calculation steps the size of the denominator and hence the storage needed for the fraction can explode.
Some languages also offer decimal floating point types, these are mainly used in scenarios where it is imporant that the results the computer gets match pre-existing rounding rules that were written with humans in mind (chiefly financial calculations). These are slightly more difficult to work with than binary floating point, but the biggest problem is that most computers don't offer hardware support for them.
Try this
DecimalFormat decimalFormat = new DecimalFormat("#.##");
String.valueOf(decimalFormat.format(decimalValue))));
'decimalValue
' is your value to convert.
참고URL : https://stackoverflow.com/questions/21895756/why-are-floating-point-numbers-inaccurate
'Programing' 카테고리의 다른 글
올바른 / 올바른 패키지 __init__.py 파일을 작성하는 방법 (0) | 2020.05.21 |
---|---|
EJB-언제 원격 및 / 또는 로컬 인터페이스를 사용해야합니까? (0) | 2020.05.21 |
최대 절전 모드에서 분리 된 객체를 다시 부착하는 올바른 방법은 무엇입니까? (0) | 2020.05.21 |
SQL Server 2008 Management Studio Intellisense가 작동하지 않는 이유는 무엇입니까? (0) | 2020.05.21 |
소멸자를 언제 만들어야합니까? (0) | 2020.05.21 |