MS SQL Float Decimal Comparison Problems - tsql

I'm in the process of normalising a database, and part of this involves converting a column from one table from a FLOAT to a DECIMAL(28,18). When I then try to join this converted column back to the source column, it returns no results in some cases.
It seems to bee something to do with the ways its converted. For example, the FLOAT converted to a DECIMAL(28,18) produces:
51.051643260000006000
The original FLOAT is
51.05164326
I have tried various ways of modifying the FLOAT, and none of these work either:
CAST(51.05164326 AS DECIMAL(28,18)) = 51.051643260000000000
STR(51.05164326 , 28,18) = 51.0516432599999990
The reason for the conversion is due to improving the accuracy of these fields.
Has anyone got a consistent strategy to convert these numbers, and be able to ensure subsequent joins work?
Thanks in advance
CM

For your application, you need to consider how many decimal places you need. It looks like in reality you require about 8-14 decimal places not 18.
One way to do the conversion is cast(cast(floatColumn as decimal(28,14)) as decimal(28,18)).
To do a join between a decimal and float column, you can do something like this:
ON cast(cast(floatColumn as decimal(28,14)) as decimal(28,18)) = decimalColumn
Provided the double-cast is the same double-cast used to create the decimalColumn, this will allow you to make use of an index on the decimalColumn.
Alternatively you can use a range join:
ON floatColumn > decimalColumn - #epsilon AND floatColumn < decimalColumn + #epsilon
This should still make use of the index on decimalColumn.
However, it is unusual to join on decimals. Unless you actually need to join on them or need to do a direct equality comparision (as opposed to a range comparison), it may be better to simply do the conversion as you are, and document the fact that there is a small loss of accuracy due to the initial choice of an inappropriate data type.
For more information see:
Is it correct to compare two rounded floating point numbers using the == operator?
Dealing with accuracy problems in floating-point numbers

Related

In PostgreSQL, is it possible to have a default format for real columns?

In PostgreSQL, I have a column with people's height in meters. If the height is, say 1.75 m, it shows properly, but if the height is 1.70 m, it shows as 1.7. I would like to have this already formatted to two decimal places, showing as 1.70 without formatting in each and every SQL call. Can I specify this in the table creation? Or a stored procedure, or something? I've seen a few things about timestamps, but not for real fields. Knowing how to format the decimal point as a colon (1,70) would be a plus.
Basically, presentation and "cosmetics" are the job of the application, not the database.
Having a default number of decimal places for floats would also create a problem, because the data returned by the database would not be the actual data in the column. So if you did a SELECT and it returned a value of 1.75, then if you searched for this value, you might not find it because the actual value stored was not 1.75 but 1.7499999999 and it was only rounded for display.
Potential solutions:
If you want to store a specified number of digits, use NUMERIC. This will solve the 1.7499999999 problem above. If you use NUMERIC, when doing a SELECT you get the actual contents of the column.
In your app, if you use an ORM, use a Decimal (or similar) type for the column with the appropriate settings so it displays the way you want.
Or create a view with the format applied to the column, but in this case if you want the trailing zero, the type will be text and not float, and it will not be searchable unless you create an extra index on it.
Generated column with the number formatted as you want, maybe easier than a view

AWS Athena: Handling big numbers

I have files on S3 where two columns contain only positive integers which can be of 10^26. Unfortunately, according to AWS Docs Athena only supports values in a range up to 2^63-1 (approx 10^19). So at the moment these column are represented as a string.
When it comes to filtering it is not that big of an issue, as I can use regex. For example, if I want to get all records between 5e^21 and 6e^21 my query would look like:
SELECT *
FROM database.table
WHERE (regexp_like(col_1, '^5[\d]{21}$'))
I have approx 300M rows (approx 12GB in parquet) and it takes about 7 seconds, so performance wise it ok.
However, sometimes I would like to perform some math operation on these two big columns, e.g subtract one big column from another. Casting these records to DOUBLE wouldn't work due to approximation error. Ideally, I would want to stay within Athena. At the moment, I have about 100M rows that are greater then 2^63-1, but this number can grow in a future.
What would be the right way to approach problem of having numerical records that exceed available range? Also what are your thoughts on using regex for filtering? Is there a better/more appropriate way to do it?
You can cast numbers of the form 5e21 to an approximate 64bit double or an exact numeric 128bit decimal. First you'll need to remove the caret ^, with the replace function. Then a simple cast will work:
SELECT CAST(replace('5e^21', '^', '') as DOUBLE);
_col0
--------
5.0E21
or
SELECT CAST(replace('5e^21', '^', '') as DECIMAL);
_col0
------------------------
5000000000000000000000
If you are going to this table often, I would rewrite it the new data type to save processing time.

Rounding to a variable number of decimal places in SSRS

I am trying to find a way to round a field in SSRS to a dynamic number of decimal places. I know I can format it dynamically, and it may eventually come to that, but many of my users are going to take this report directly to Excel and are going to want to have actual numeric fields.
My t-SQL code includes these declared variables:
NumLong01 DECIMAL(23,8)
, NumLongDP01 INTEGER
The first set of entries in this table is for headers and rounding parameters. So I add values for these two as:
NULL
,4
and then I add the actual table values as:
543210987654321.87654321
,NULL
That way I can put a whole series of numbers into the table but they all have to be formatted the same way.
Running this query yields:
When I go to ReportBuilder, my field has this expression:
=Fields!NumLong01.Value
If I want to format a certain number of decimal places, I can just do this:
=Round(Fields!NumLong01.Value,2) or some such. What I tried to do, though, was to make it dynamic:
=Round(Fields!NumLong01.Value,First(Fields!NumLongDP01.Value, "DataSet1"))
This ended up rounding to 0 decimal places. I subsequently learned--by just using the second half in my field--that this was a NULL value. So I tried Sum instead of First--again, just in my field--and got the 4 that I expected. Great, so now I had my number, and I just put that in as my rounding:
=Round(Fields!NumLong01.Value,Sum(Fields!NumLongDP01.Value, "DataSet1"))
Only problem is, this yields an error. Next I asked myself if maybe it wasn't seeing this as a number for some reason. So i just added it onto my field. No problems. So I really don't know what it's doing. Is it thinking that this field might become so long that it will round to an illegal number of decimals places?
Now, I can do this:
=IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 8,Round(Fields!NumLong01.Value,8),IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 7,Round(Fields!NumLong01.Value,7),IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 6,Round(Fields!NumLong01.Value,6),IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 5,Round(Fields!NumLong01.Value,5),IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 4,Round(Fields!NumLong01.Value,4),IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 3,Round(Fields!NumLong01.Value,3),IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 2,Round(Fields!NumLong01.Value,2),IIf(Sum(Fields!NumLongDP01.Value, "DataSet1") = 1,Round(Fields!NumLong01.Value,1),Round(Fields!NumLong01.Value,0)))))))))
...and that works. But it seems like such a ridiculous way to go about it.
I'm also comfortable passing only rounded numbers out of t-SQL. But then I run into the problem of showing only a certain number of decimals on the report, because in the number formatting it doesn't allow for a dynamic number of decimal places for some reason.
Any ideas would be appreciated.
This isn't an exhaustive list of ways to accomplish dynamic rounding or number formatting as you can achieve this using custom code in the report or by adapting your dataset's SQL query.
Using Rounding:
The first set of entries in this table is for headers and rounding parameters. That way I can put a whole series of numbers into the table but they all have to be formatted the same way.
To avoid building expressions in your report that require aggregate functions such First and Sum and generating a blank row that you then have to remove, consider just entering the number of decimal places for every row instead of using a header row. The costs (storage and expression evaluation) are low even if it seems redundant.
This means that instead of using: =Round(Fields!NumLong01.Value,First(Fields!NumLongDP01.Value, "DataSet1")) you can use =Round(Fields!NumLong01.Value,Fields!NumLongDP01.Value) either as an expression or as a calculated field in DataSet1 or whatever your dataset is called.
Using Number Formatting:
But then I run into the problem of showing only a certain number of decimals on the report, because in the number formatting it doesn't allow for a dynamic number of decimal places for some reason.
You can define custom formatting for the NumLong01 field in the report and make it dynamic using an expression to build your custom formatting string.
Open the Text Box Properties for the NumLong01 textbox or tablix field
Open Number tab and select Custom from the Category list
Click the fx button and use the following expression ="0." + StrDup(First(Fields!NumLongDP01.Value, "DataSet1"), "0")
Using your example data, this expression would produce the custom formatting string 0.0000 which changes 543210987654321.87654321 to 543210987654321.8765. For your information, StrDup duplicates the specified string X number of times.
In cases where the fractional part of the number is less than the decimal precision required, this formatting string will pad it with 0s. If that's not desired, change the string to be duplicated to "#" like so: StrDup(First(Fields!NumLongDP01.Value, "DataSet1"), "#").
You can also use this method as a calculated field in the dataset but only if you have removed the header row and are entering the decimal places for every row as mentioned earlier. This is because you can't use the aggregate function in the calculated field expression.
To do this, add a calculated field to your dataset with the expression: =Format(Fields!NumLong01.Value, "0." + StrDup(Fields!NumLongDP01.Value, "0"))

rethinkdb: group documents by price range

I want to group documents in rethinkdb by price range (0-100, 100-200, 200-300, and so on), instead of a single price value. How do I do that?
Unfortunately, ReQL doesn't support rounding at the moment (see github issue #866), but you can get something similar through some minor annoyances.
First of all, I would recommend making this an index on the given table if you're going to be running this regularly or on large data sets. The function I have here is not the most efficient because we can't round numbers, and an index would help mitigate that a lot.
These code samples are in Python, since I didn't see any particular language referenced. To create the index, run something like:
r.db('foo').table('bar').index_create('price_range',
lambda row: row['price'].coerce_to('STRING').split('.')[0].coerce_to('NUMBER')
.do(lambda x: x.sub(x.mod(100)))).run()
This will create a secondary index based on the price where 0 indicates [0-100), 100 is [100-200), and so on. At this point, a group-by is trivial:
r.db('foo').table('bar').group(index='price_range').run()
If you would really rather not create an index, the mapping can be done during the group in a single query:
r.db('foo').table('bar').group(
lambda row: row['price'].coerce_to('STRING').split('.')[0].coerce_to('NUMBER')
.do(lambda x: x.sub(x.mod(100)))).run()
This query is fairly straight-forward, but to document what is going on:
coerce_to('STRING') - we obtain a string representation of the number, e.g. 318.12 becomes "318.12".
split('.') - we split the string on the decimal point, e.g. "318.12". becomes ["318", "12"]. If there is no decimal point, everything else should still work.
[0] - we take the first value of the split string, which is equivalent the original number rounded down. e.g. "318".
coerce_to('NUMBER') - we convert the string back into an integer, which allows us to do modulo arithmetic on it so we can round, e.g. "318" becomes 318.
.do(lambda x: x.sub(x.mod(100))) - we round the resulting integer down to the nearest 100 by running (essentially) x = x - (x % 100), e.g. 318 becomes 300.

Converting an int representing number of cents to money

Like this question, except T-SQL instead of php.
206275947 = 2062759.47
etc.
The problem I'm running into is that an attempt to SUM the values in this column is overflowing the integer datatype in SQL.
SUM(CONVERT(money,[PaymentInCentsAmt]))
Is just tacking on ".00" to the end of the value. What obvious thing am I missing?
how about use money/100?
If you are counting money and especially if youare getting overflows you should try making variables and columns as type decimal that allows as much significance as the calcualations need