I was doing some tests on Postgres using the tinyint extension when I came across something surprising regarding its range. On typing select -128::tinyint it gave me an ERROR: tinyint out of range message which was not what I was expecting at all.
Assuming negative numbers should be 1 greater (or is it less) than the positive maximum (127 for single byte integers) I thought it was a bug with the extension, however on trying this with non-extended numbers I found exactly the same thing was happening.
select -32768::smallint -> out of range
select -2147483648::integer -> out of range
select -9223372036854775808::bigint -> out of range
Referring to the numeric data type documentation (https://www.postgresql.org/docs/current/datatype-numeric.html)
these numbers should all be possible - all negative numbers one less -32767, -2147483647, -9223372036854775807 work correctly so I am curious as to why this is happening, or does this even happen with other peoples copies.
I tried using both postgresql 10 and postgresql 11 on a ubuntu 16.x desktop.
I think this is because the cast operator :: has a higher precedence that the minus sign.
So -32768::smallint is executed as -1 * 32768::smallint which indeed is invalid.
Using parentheses fixes this: (-32768)::smallint or using the SQL standard cast() operator: cast(-32768 as smallint)
Related
I'm porting a procedure from Oracle to Postgres.
In select of a query, I have TO_CHAR(v_numeric, '990.000')
It seems, the same TO_CHAR(v_numeric, '990.000') works in Postgres with same result.
Can someone please explain what the '990.000' in the query does?
TO_CHAR(123.4, '990.000') returns 123.400 in both Oracle and Postgres. Whereas TO_CHAR(1234.400, '990.000') returns ######## in Oracle and ###.### in Postgres. Does this ######## and ###.### hold the same numeric value which is inputted?
to_char is a function to format a number as string for output. The PostgreSQL function is there expressly for Oracle compatibility, but it is not totally compatible, as you see.
The format 990.000 means that there will be one to three digits before the decimal point and three digits after it. 9 means that a value of 0 in that position will result in a blank rather than a 0.
The # characters signify that the number cannot be represented in that format. The reason is that there are more than three digits before the decimal point.
The resulting string does not "hold" a number, it is the rendering of a number as a string. It doesn't hold anything but the characters it consists of.
I have files on S3 where two columns contain only positive integers which can be of 10^26. Unfortunately, according to AWS Docs Athena only supports values in a range up to 2^63-1 (approx 10^19). So at the moment these column are represented as a string.
When it comes to filtering it is not that big of an issue, as I can use regex. For example, if I want to get all records between 5e^21 and 6e^21 my query would look like:
SELECT *
FROM database.table
WHERE (regexp_like(col_1, '^5[\d]{21}$'))
I have approx 300M rows (approx 12GB in parquet) and it takes about 7 seconds, so performance wise it ok.
However, sometimes I would like to perform some math operation on these two big columns, e.g subtract one big column from another. Casting these records to DOUBLE wouldn't work due to approximation error. Ideally, I would want to stay within Athena. At the moment, I have about 100M rows that are greater then 2^63-1, but this number can grow in a future.
What would be the right way to approach problem of having numerical records that exceed available range? Also what are your thoughts on using regex for filtering? Is there a better/more appropriate way to do it?
You can cast numbers of the form 5e21 to an approximate 64bit double or an exact numeric 128bit decimal. First you'll need to remove the caret ^, with the replace function. Then a simple cast will work:
SELECT CAST(replace('5e^21', '^', '') as DOUBLE);
_col0
--------
5.0E21
or
SELECT CAST(replace('5e^21', '^', '') as DECIMAL);
_col0
------------------------
5000000000000000000000
If you are going to this table often, I would rewrite it the new data type to save processing time.
I looked though the forum but I couldn't find a issue like mine.
Essentially I have a table called [p005_MMAT].[dbo].[Storage_Max]. It has three columns Date, HistValue and Tag_ID. I want to make all the values in 'HistValue' column to have 2 decimal places. For example if a number is 1.1, I want it to be 1.10 or if its 1 then also I want it to look like 1.00.
Here is the sql update statement I am using
update [p005_MMAT].[dbo].[Storage_Max]
set [HistValue] = cast([HistValue] as decimal (10,2))
where [Tag_ID] = 94
After executing the query it says 3339 rows affected but when I perform a simple select statement it appears the column had no affect of. I have used that cast function in select statement and it adds two decimal places.
Please advice.
The problem is the datatype and SQL Server. Float or real will not have the trailing zeros. You either have to change the datatype of the column or just deal with it and handle the formatting in your queries or application.
You could run something like the following
select
cast([HistValue] as decimal (10,2))
from [p005_MMAT].[dbo].[Storage_Max]
where [Tag_ID] = 94
I want to group documents in rethinkdb by price range (0-100, 100-200, 200-300, and so on), instead of a single price value. How do I do that?
Unfortunately, ReQL doesn't support rounding at the moment (see github issue #866), but you can get something similar through some minor annoyances.
First of all, I would recommend making this an index on the given table if you're going to be running this regularly or on large data sets. The function I have here is not the most efficient because we can't round numbers, and an index would help mitigate that a lot.
These code samples are in Python, since I didn't see any particular language referenced. To create the index, run something like:
r.db('foo').table('bar').index_create('price_range',
lambda row: row['price'].coerce_to('STRING').split('.')[0].coerce_to('NUMBER')
.do(lambda x: x.sub(x.mod(100)))).run()
This will create a secondary index based on the price where 0 indicates [0-100), 100 is [100-200), and so on. At this point, a group-by is trivial:
r.db('foo').table('bar').group(index='price_range').run()
If you would really rather not create an index, the mapping can be done during the group in a single query:
r.db('foo').table('bar').group(
lambda row: row['price'].coerce_to('STRING').split('.')[0].coerce_to('NUMBER')
.do(lambda x: x.sub(x.mod(100)))).run()
This query is fairly straight-forward, but to document what is going on:
coerce_to('STRING') - we obtain a string representation of the number, e.g. 318.12 becomes "318.12".
split('.') - we split the string on the decimal point, e.g. "318.12". becomes ["318", "12"]. If there is no decimal point, everything else should still work.
[0] - we take the first value of the split string, which is equivalent the original number rounded down. e.g. "318".
coerce_to('NUMBER') - we convert the string back into an integer, which allows us to do modulo arithmetic on it so we can round, e.g. "318" becomes 318.
.do(lambda x: x.sub(x.mod(100))) - we round the resulting integer down to the nearest 100 by running (essentially) x = x - (x % 100), e.g. 318 becomes 300.
am new to postgresql (redshift)
i am copying CSV files from S3 to RedShift and there's an error about trying to save 2.35555E7 number into a numeric | 18, 0 column . what is the right datatype for this datum ?
thanks
numeric (18,0) implies a scale of zero, which is a way of saying no decimals -- it's a bit like a smaller bigint.
http://www.postgresql.org/docs/current/static/datatype-numeric.html
If you want to keep it as numeric, you want to use numeric instead -- with no precision or scale.
If not, just use a real or a double precision type, depending on the number of significant digits (6 vs 15, respectively) you want to keep around.
Your example data (2.35555E7) suggests you're using real, so probably try that one first.
Note: select 2.35555E7::numeric(18,0) works fine per the comments, but I assume there's some other data in your set that is causing issues.