Why does round sometimes produce extra digits? [duplicate] - tsql

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
Occasionally, when selecting aggregate data (usually AVG()) from a database, I'll get a repeating decimal such as:
2.7777777777777777
When I apply ROUND(AVG(x), 2) to the value, I sometimes get a result like:
2.7800000000000002
I happen to know that the actual sample has 18 rows with SUM(x) = 50. So this command should be equivalent:
SELECT ROUND(50.0/18, 2)
However, it produces the expected result (2.78). Why does rounding sometimes produce wrong results with aggregate functions?
In order to verify the above result, I wrote a query against a dummy table:
declare #temp table(
x float
)
insert into #temp values (1.0);
insert into #temp values (2.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (8.0);
insert into #temp values (9.0);
select round(avg(x), 2),
sum(x),count(*)
from #temp
I'm aware of the gotchas of floating point representation, but this simple case seems not to be subject to those.

Decimal numbers don't always (or even usually) map 1:1 to an exact representation in floating point.
In your case, the two closest numbers that double precision floating point can represent are;
40063D70A3D70A3D which is approximately 2.77999999999999980460074766597
40063D70A3D70A3E which is approximately 2.78000000000000024868995751604
There exists no double precision number between those two numbers, in this case the database chose the higher value which - as you see - rounds to 2.7800000000000002.

Your table uses floats. When given floats as parameters, both AVG() and ROUND() return floats. Floats cannot be precisely represented. When you do ROUND(50.0/18, 2) you're giving it NUMERIC(8,6) which it returns as DECIMAL(8,6).
Try declaring your columns to use a more precise type like DECIMAL(18,9). The results should be more predictable.

Related

Esper: Take the last value from each id and take the mean of all but the most extreme

I have 5 temperature sensors. I want to calculate the mean temperature of 4 - excluding the most extreme value (high or low).
Firstly: will std:unique(id) create a window of the last temperature readings for each id 1-5?
select
avg(tempEvent.temp) as meantemp
from
Event(id in (1, 2, 3, 4, 5)).std:unique(id) as tempEvent
Secondly: how could I change the select statement (possibly using an expression if necessary) to only calculate the mean of four values excluding the most extreme?
The background is, I want to know the deviations of each temperature from the average, but I don't want the average to include an anomalous id. Otherwise all temperatures will look like they are deviating from the average but really only one is.
Finding the average of the middle four values is simple enough, though not as elegant as your solution. The code below will work for any number of temps.
SELECT
AVG(temp) AS meantemp
FROM (
SELECT
temp,
COUNT(temp) AS c,
RANK () OVER (PARTITION BY temp ORDER BY temp) AS r
FROM
[table]
)
WHERE
r > 1
AND r < (c-1)
;
As for your second question, I'm not sure I understand. Do you want the value from among the four middle values that has the greatest absolute deviation from the mean of those four values?

Dividing AVG of column1 by AVG of column2

I am trying to divide the average value of column1 by the average value of column 2, which will give me an average price from my data. I believe there is a problem with my syntax / structure of my code, or I am making a rookie mistake.
I have searched stack and cannot find many examples of dividing two averaged columns, and checked the postgres documentation.
The individual average query is working fine (as shown here)
SELECT (AVG(CAST("Column1" AS numeric(4,2))),2) FROM table1
But when I combine two of them in an attempt to divide, It simply does not work.
SELECT (AVG(CAST("Column1" AS numeric(4,2))),2) / (AVG(CAST("Column2" AS numeric(4,2))),2) FROM table1
I am receiving the following error; "ERROR: row comparison operator must yield type boolean, not type numeric". I have tried a few other variations which have mostly given me syntax errors.
I don't know what you are trying to do with your current approach. However, if you want to take the ratio of two averages, you could also just take the ratio of the sums:
SELECT SUM(CAST(Column1 AS numeric(4,2))) / SUM(CAST(Column2 AS numeric(4,2)))
FROM table1;
Note that SUM() just takes a single input, not two inputs. The reason why we can use the sums is that average would normalize both the numerator and denominator by the same amount, which is the number of rows in table1. Hence, this factor just cancels out.

Postgres: Get percentile of number not necessarily in table column

Imagine I have a column my_variable of floats in my a my_table. I know how to convert each of the rows in this my_variable column into percentiles, but my question is: I have a number x that is not necessarily in the table. Let's call it 7.67. How do I efficiently compute where 7.67 falls in that distribution of my_variable? I would like to be able to say "7.67 is in the 16.7th percentile" or "7.67 is larger than 16.7% of rows in my_variable." Note that 7.67 is not something taken from the column, but I'm inputting it in the SQL query itself.
I was thinking about ordering my_variable in ascending order and counting the number of rows that fall below the number I specify and dividing by the total number of rows, but is there a more computationally efficient way of doing this, perhaps?
If your data does not change too often, you can use a materialized view or a different table, call it percentiles, in which you store 100 or 1.000 (depending on the precision you need). This table should have a descending index on the value column.
Each row contains the minimum value to reach a certain percentile and the percentile itself.
Then you just need to get the first row that have value greater than the given data and read the percentile value.
In you example the table will contain 1.000 rows, and you could have someting like:
Percentile value
16.9 7.71
16.8 7.69
16.7 7.66
16.6 7.65
16.5 7.62
And your query could be something like:
SELECT TOP 1 Percentile FROM percentiles where 7.67 < value ORDER BY value desc
This is a valid solution if the number of SELECTs you make is much bigger than the numbers of updates in the my_table table.
I ended up doing:
select (avg(dummy_var::float))
from (
select case when var_name < 3.14 then 1 else 0 end as dummy_var
from table_name where var_name is not null
)
Where var_name was the variable of interest, table_name was the table of interest, and 3.14 was the number of interest.

Permutations with ordering and restrictions

I have some symbols and I need to arrange them and determine the number of ways in which they can be arranged:
(x11,x12,x13); (x21,x22,x23); (y11,y12,y13); (y21, y22, y23); (y31,y32,y33); (z11, z12, z13)
Note that while calculating the permutations, the order within the round brackets must be maintained. For instance, one possible order could be x11, x21, x12, x22, x13, x23 ... Note that here x13 occurs after x12 which occurs after x11.
How do we determine the total number of permutations with this restriction?
One way would be to create 2 tables and do a cartesian join:
create table letters (letter char(1));
create table numbers (num numeric);
insert into letters values ('A');
insert into letters values ('B');
insert into letters values ('C');
insert into numbers values (1);
insert into numbers values (2);
insert into numbers values (3);
select letter, num
from letters, numbers;
With more code, you can do this without the "hard tables" and instead hardcode the data within an array, then have postgres use it the same as a table-- just depends on how much data we're talking about.
Then again, I could be completely misunderstanding the problem you're trying to solve as the number of permutations in my example is "number of rows in letters table" times "Number of rows in numbers table"

Getting float values out of PostgreSQL

I am having trouble retrieving float/real values out of PostgreSQL.
For example, I would like to store: 123456789123456, the retrieve that exact same number with a select statement.
table tbl (num real)
insert into tbl(num) values('123456789123456');
As it is now, if I "select num from tbl" the result is "1.23457e+14"
If I run "select CAST(num AS numeric) as num from tbl" the result is 123457000000000
If I run "select CAST(num AS float) as num from tbl" the result is 123456788103168 (where did this number come from)
How on earth can I select the value and get "123456789123456" as the result?
Thanks so much in advance
You declared the table with the column having a type of "real", which is a fairly low-precision floating-point number.
You probably want to use the type "double precision" (aka "float" or "float8") for a reasonable degree of floating-point accuracy. If you know the magnitude and precision of numbers you need to store, you may be better off declaring the column type as numeric(PREC,SCALE) instead - PREC being the total number of digits to keep, and SCALE the number of digits that will be to the right of the decimal point.
The real type has only 6 decimal digits of precision, so it can't store your number exactly. You may need to use "double precision" or "numeric/decimal" type.
Source: http://www.postgresql.org/docs/8.4/static/datatype-numeric.html .