PostgreSQL query returning zero rows with double precision field/function

PostgreSQL query returning zero rows with double precision field/function - postgresql

I've got:
SELECT x(point), y(point) WHERE x(point) = 3.69334468807005
x and y are of type double precision.
I see that this value is in the table indeed, however running the query in PostgreSQL does not return anything. Why could that be the case? Maybe due to a precision problem?
Thanks!

When dealing with floating point numbers (both single or double precision) then doing an exact compare is futile in 99% of all cases. This is true not only for PostgreSQL but for all computer languages using FP arithmetic.
The three reasons are, that the internal representation of a double can contain much more digits than displayed and that at the same time many numbers cannot be expressed exactly using FP (0.1 is an often cited example) and that therefore all "displayed" values are truncated to something a human can comprehend (i.e. nothing like "0.099999999999999999999999999" instead of "0.1").
Therefore it is necessary to to avoid direct comparison as soon as one of the numbers to be compared has been calculated (rounding errors) or has been converted from a string. Instead some "range" must be admitted like
where x between 3.69334468807004 and 3.69334468807006 -- note the different numbers
The only valid cases for direct comparison are cases where the value has been just copied previously. A fictive example would be:
SELECT x, y, f1(x,y), f2(x,y), ... INTO TEMP temp_xy FROM points;
SELECT * FROM points p JOIN temp_xy t on p.x = t.x and p.y = t.y;
x and y have been just copied, therefore they can be used as a join criteria.
Edit A good starter for this and some more non-intuitive problems with floats is this article.

Old school answer: "Don't compare floating point numbers solely for equality." (Elements of Programming Style, Kernighan and Plauger, 1978)
Why? Because comparing two floats for equality will always work under certain circumstances, but it will almost never work under slightly different circumstances. That's due to the nature of floating-point numbers, not to programmer skill.
The canonical article for floating-point math is What Every Computer Scientist Should Know About Floating-Point Arithmetic.
In your case, you might be able to adapt the relative difference function from this C language FAQ. (Scroll down, look for RelDif().)

You could certainly test if it is a precision problem, just expand the WHERE clause of your statement to be a range, and tighten that range (by adding more precision) until you have your record or can confirm it is related to precision:
SELECT x(point), y(point)
WHERE x(point) > 3.69
AND x(point) < 3.70
The other thing I would look at is perhaps using some other form of key when filtering your data. Does your table have some sort of natural key you could use or maybe just add an auto-incremented field for use a primary key?
I have also seen indexes behave badly when functions are involved. Are there any indexes on this table?

Related

What accounts for most of the integer multiply instructions?

The majority of integer multiplications don't actually need multiply:
Floating-point is, and has been since the 486, normally handled by dedicated hardware.
Multiplication by a constant, such as for scaling an array index by the size of the element, can be reduced to a left shift in the common case where it's a power of two, or a sequence of left shifts and additions in the general case.
Multiplications associated with accessing a 2D array, can often be strength reduced to addition if it's in the context of a loop.
So what's left?
Certain library functions like fwrite that take a number of elements and an element size as runtime parameters.
Exact decimal arithmetic e.g. Java's BigDecimal type.
Such forms of cryptography as require multiplication and are not handled by their own dedicated hardware.
Big integers e.g. for exploring number theory.
Other cases I'm not thinking of right now.
None of these jump out at me as wildly common, yet all modern CPU architectures include integer multiply instructions. (RISC-V omits them from the minimal version of the instruction set, but has been criticized for even going this far.)
Has anyone ever analyzed a representative sample of code, such as the SPEC benchmarks, to find out exactly what use case accounts for most of the actual uses of integer multiply (as measured by dynamic rather than static frequency)?

PostgreSQL: When casting an integer to a non-integer type to force floating point division in PostgreSQL, which number type should I use?

I know there are many integer division questions on StackOverflow, but I did not see this one.
Similar to many programming languages, PostgreSQL performs integer division if both operands are integers.
If one has:
SELECT s.id AS student_id,
COUNT(DISTINCT(bc.book_id)) / COUNT(c.id) AS average_books_per_class
FROM student s
LEFT JOIN class c
ON c.student_id = s.id
LEFT JOIN book_class bc
ON bc.class_id = c.id
GROUP BY s.id
Then to get what one intends, one must cast COUNT(DISTINCT(bc.book_id)) to a non-integer number type. If one does not do so, then Postgres, similar to many programming languages, does integer division which is unlikely to give what one wants as a result.
Postgres supports two syntaxes for doing this cast:
CAST( value AS type )
for example:
CAST( COUNT(DISTINCT(bc.book_id)) AS DOUBLE PRECISION )
It also supports the syntax:
value::type
for example:
COUNT(DISTINCT(bc.book_id))::decimal
Both syntaxes work, personally I prefer the one using CAST because it is more explicit (I think explicit is good). Others may prefer value::type because it is expressive yet terse -- shorter is often (up to a limit) better.
My question is about the number type to use.
In casting COUNT(DISTINCT(bc.book_id)) to a non-integer number type, Postgres gives the following types:
decimal
numeric
real
double precision
In my query I chose DOUBLE PRECISION.
I am wondering, specifically in the case of division, but also in any other context where one might need to cast an integer number type in PostgreSQL to a non-integer number type, which of the four choices is the best one and why?

decimal and numeric are synonyms, so there is one less choice.
This is the proper type if you either need very high precision (use numeric without any type modifiers) or if you want to round to a certain number of decimal positions (use numeric(20,2) for two decimal positions).
This data type is precise, but slow, because it is a “binary coded decimal” type.
real and double precision are 4-byte and 8-byte floating point numbers, fast but with rounding errors and limited precision.
Never use real, it has very low precision.
In most practical applications it shouldn't matter which of the two remaining types you use for that specific purpose.
Using the CAST syntax has the advantage that it complies with the standard.
Remark: DISTINCT(col) is syntactically correct, but misleading, because it is not a function. Write DISTINCT col instead.

Getting around floating point error with logarithms?

I'm trying to write a basic digit counter (an integer is inputted and the number of digits of that integer is outputted) for positive integers. This is my general formula:
dig(x) := Math.floor(Math.log(x,10))
I tried implementing the equivalent of dig(x) in Ruby, and found that when I was computing dig(1000) I was getting 2 instead of 3 because Math.log was returning 2.9999999999999996 which would then be truncated down to 2. What is the proper way to handle this problem? (I'm assuming this problem can occur regardless of the language used to implement this approach, but if that's not the case then please explain that in your answer).

To get an exact count of the number of digits in an integer, you can do the usual thing: (in C/C++, assuming n is non-negative)
int digits = 0;
while (n > 0) {
n = n / 10; // integer division, just drops the ones digit and shifts right
digits = digits + 1;
}
I'm not certain but I suspect running a built-in logarithm function won't be faster than this, and this will give you an exact answer.
I thought about it for a minute and couldn't come up with a way to make the logarithm-based approach work with any guarantees, and almost convinced myself that it is probably a doomed pursuit in the first place because of floating point rounding errors, etc.

From The Art of Computer Programming volume 2, we will eliminate one bit of error before the floor function is applied by adding that one bit back in.
Let x be the result of log and then do x += x / 0x10000000 for a single precision floating point number (C's float). Then pass the value into floor.
This is guaranteed to be the fastest (assuming you have the answer in numerical form) because it uses only a few floating point instructions.

Floating point is always subject to roundoff error; that's one of the hazards you need to be aware of, and actively manage, when working with it. The proper way to handle it, if you must use floats is to figure out what the expected amount of accumulated error is and allow for that in comparisons and printouts -- round off appropriately, compare for whether the difference is within that range rather than comparing for equality, etcetera.
There is no exact binary-floating-point representation of simple things like 1/10th, for example.
(As others have noted, you could rewrite the problem to avoid using the floating-point-based solution entirely, but since you asked specifically about working log() I wanted to address that question; apologies if I'm off target. Some of the other answers provide specific suggestions for how you might round off the result. That would "solve" this particular case, but as your floating operations get more complicated you'll have to continue to allow for roundoff accumulating at each step and either deal with the error at each step or deal with the cumulative error -- the latter being the more complicated but more accurate solution.)
If this is a serious problem for an application, folks sometimes use scaled fixed point instead (running financial computations in terms of pennies rather than dollars, for example). Or they use one of the "big number" packages which computes in decimal rather than in binary; those have their own round-off problems, but they round off more the way humans expect them to.

Selecting floating point numbers in decimal form

I've a small number in a PostgreSQL table:
test=# CREATE TABLE test (r real);
CREATE TABLE
test=# INSERT INTO test VALUES (0.00000000000000000000000000000000000000000009);
INSERT 0 1
When I run the following query it returns the number as 8.96831e-44:
test=# SELECT * FROM test;
r
-------------
8.96831e-44
(1 row)
How can I show the value in psql in its decimal form (0.00000000000000000000000000000000000000000009) instead of the scientific notation? I'd be happy with 0.0000000000000000000000000000000000000000000896831 too. Unfortunately I can't change the table and I don't really care about loss of precision.
(I've played with to_char for a while with no success.)

Real in Postgres is a floating point datatype, stored on 4 bytes, that is 32 bits.
Your value,
0.00000000000000000000000000000000000000000009
Can not be precisely represented in a 32bit IEEE754 floating point number. You can check the exact values in this calculator
You cold try and use double precision (64bits) to store it, according to the calculator, that seems to be an exact representation. NOT TRUE Patricia showed that it was just the calculator rounding the value, even though explicitly asking it not to... Double would mean a bit more precision, but still no exact value, as this number is not representable using finite number of binary digits. (Thanks, Patricia, a lesson learnt (again): don't believe what you see on the Intertubez)
Under normal circumstances, you should use a NUMERIC(precision, scale) format, that would store the number precisely to get back the correct value.
However, your value to store seems to have a scale larger than postgres allows (which seems to be 30) for exact decimal represenations. If you don't want to do calculations, just store them (which would not be a very common situation, I admit), you could try storing them as strings... (but this is ugly...)
EDIT
This to_char problem seems to be a known bug...
Quote:
My immediate reaction to that is that float8 values don't have 57 digits
of precision. If you are expecting that format string to do something
useful you should be applying it to a numeric column not a double
precision one.
It's possible that we can kluge things to make this particular case work
like you are expecting, but there are always going to be similar-looking
cases that can't work because the precision just isn't there.
In a quick look at the code, the reason you just get "0." is that it's
rounding off after 15 digits to ensure it doesn't print garbage. Maybe
it could be a bit smarter for cases where the value is very much smaller
than 1, but it wouldn't be a simple change.
(from here)
However, I find this not defendable. IMHO a double (IEEE754 64bit floating point to be exact) will always have ~15 significant decimal digits, if the value fits into the type...
Recommended reading:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Postgres numeric types
BUG #6217: to_char() gives incorrect output for very small float values

Matlab precision: simple subtraction is not zero

I compute this simple sum on Matlab:
2*0.04-0.5*0.4^2 = -1.387778780781446e-017
but the result is not zero. What can I do?

Aabaz and Jim Clay have good explanations of what's going on.
It's often the case that, rather than exactly calculating the value of 2*0.04 - 0.5*0.4^2, what you really want is to check whether 2*0.04 and 0.5*0.4^2 differ by an amount that is small enough to be within the relevant numerical precision. If that's the case, than rather than checking whether 2*0.04 - 0.5*0.4^2 == 0, you can check whether abs(2*0.04 - 0.5*0.4^2) < thresh. Here thresh can either be some arbitrary smallish number, or an expression involving eps, which gives the precision of the numerical type you're working with.
EDIT:
Thanks to Jim and Tal for suggested improvement. Altered to compare the absolute value of the difference to a threshold, rather than the difference.

Matlab uses double-precision floating-point numbers to store real numbers. These are numbers of the form m*2^e where m is an integer between 2^52 and 2^53 (the mantissa) and e is the exponent. Let's call a number a floating-point number if it is of this form.
All numbers used in calculations must be floating-point numbers. Often, this can be done exactly, as with 2 and 0.5 in your expression. But for other numbers, most notably most numbers with digits after the decimal point, this is not possible, and an approximation has to be used. What happens in this case is that the number is rounded to the nearest floating-point number.
So, whenever you write something like 0.04 in Matlab, you're really saying "Get me the floating-point number that is closest to 0.04. In your expression, there are 2 numbers that need to be approximated: 0.04 and 0.4.
In addition, the exact result of operations like addition and multiplication on floating-point numbers may not be a floating-point number. Although it is always of the form m*2^e the mantissa may be too large. So you get an additional error from rounding the results of operations.
At the end of the day, a simple expression like yours will be off by about 2^-52 times the size of the operands, or about 10^-17.
In summary: the reason your expression does not evaluate to zero is two-fold:
Some of the numbers you start out with are different (approximations) to the exact numbers you provided.
The intermediate results may also be approximations of the exact results.

What you are seeing is quantization error. Matlab uses doubles to represent numbers, and while they are capable of a lot of precision, they still cannot represent all real numbers because there are an infinite number of real numbers. I'm not sure about Aabaz's trick, but in general I would say there isn't anything you can do, other than perhaps massaging your inputs to be double-friendly numbers.

I do not know if it is applicable to your problem but often the simplest solution is to scale your data.
For example:
a=0.04;
b=0.2;
a-0.2*b
ans=-6.9389e-018
c=a/min(abs([a b]));
d=b/min(abs([a b]));
c-0.2*d
ans=0
EDIT: of course I did not mean to give a universal solution to these kind of problems but it is still a good practice that can make you avoid a few problems in numerical computation (curve fitting, etc ...). See Jim Clay's answer for the reason why you are experiencing these problems.

I'm pretty sure this is a case of ye olde floating point accuracy issues.
Do you need 1e-17 accuracy? Is this merely a case of wanting 'pretty' output?
In that case, you can just use a formatted sprintf to display the number of significant digits you want.
Realize that this is not a matlab problem, but a fundamental limitation of how numbers are represented in binary.
For fun, work out what .1 is in binary...
Some references:
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
http://www.mathworks.com/support/tech-notes/1100/1108.html