Unsigned field in Amazon Redshift? - amazon-redshift

I was looking for a way to create a table with unsigned integer (I know I will have only positive integers, so why not to increase the range twofold). To create an integer field, I do:
create table funny_table(
my_field bigint
);
So I thought that using my_field bigint unsigned will solve my problem, but syntax error tells me otherwise. Looking though the documentation tells nothing about unsigned integers. Is it even possible?

Unfortunately Amazon Redshift doesn't support unsigned integer. As a workaround, we are using numeric(20,0) for bigint unsigned data. Here is an example.
create table funny_table(
my_field numeric(20, 0)
);
insert into funny_table values ( 18446744073709551614 );
select * from funny_table;
my_field
----------------------
18446744073709551614
(1 row)
See here for the details of Numeric type.

As already mentioned, Redshift does not support unsigned. Given that, please take a closer at what you need to achieve.
bigint occupies 8 bytes giving you a range of -9223372036854775808 to 9223372036854775807
numeric occupies 128-bit (Variable, up to 128 bits) but offers a bigger range at the expense of memory.
I believe the idea behind using unsigned is doubling the range WITHOUT EXTRA EXPENSE TO STORAGE. So if you are comfortable with the highest positive value of 2^63 - 1 go with bigint and forget about the unsigned because it costs 8 bytes anyway.
If you have bigger positive integers go with numeric(20, 0) (or higher precision) but you need to be aware that it is still signed and occupies more than 8 bytes.

For anyone that is running into the issue that they cant copy the unsigned int value from the source table:
You have to change the tupple mapping from previously:
("id", "int", "id", "int")
to
("id", "id", "decimal(20,0)")
We always had null values in our redshift cluster for the id. The change in the mapping changed this behaviour, which resulted in correctly copying the value from the source table.

Related

What is the difference between numeric(19,0) and bigint in POSTGRES?

I am trying to map the java type to the SQL type and I encountered such a scenario.
If I elaborate, I was using the auto-ddl api to apply scripts to my database while starting my spring container. Now I am trying to generate the scripts using liquibase by generating the db-changelog for POSTGRE server.
Are numeric(19,0) and BIGINT same in Postgres server? Please put some light on this.
The main difference is the storage:
bigint (and smallint and integer) are stored as integer values in the processor's native format (usually two's complement binary numbers).
The range is limited (but high), the storage space occupied is 8 bytes, and arithmetic is blazingly fast.
numeric is stored as binary coded decimal of variable length.
The range and the precision is almost unlimited (up to 131072 digits before the decimal point; up to 16383 digits after the decimal point), but arithmetic operations are comparatively slow.
BIGINT range is -9223372036854775808 to 9223372036854775807, so you can't store a number greater than 9223372036854775807, but NUMERIC (19, 0) can do.
Please find the following example:
CREATE TABLE TestTable (NumVal NUMERIC (19,0), IntVal BIGINT);
INSERT INTO TestTable (NumVal, IntVal) VALUES
('9223372036854775808', 9223372036854775807);
SELECT * FROM TestTable;
Here you can't store 9223372036854775808 in to BIGINT, but you can store the same value or greater than the value to NUMERIC (19, 0)
db<>fiddle for the same.
Numeric has variable storage size, while bigint is always 8 bytes.
SELECT pg_column_size(123456789112345678911111555678::numeric(30,0)) AS numeric30,
pg_column_size(1234567891123456789::numeric(19,0)) AS numeric19,
pg_column_size(123::numeric(19,0)) AS numeric3,
pg_column_size(1234567891123456789::bigint) AS bigint;
numeric30|numeric19|numeric3|bigint
---------|---------|--------|------
22| 16| 8| 8
Additionaly, from documentation (emphasis mine):
Calculations with numeric values yield exact results where possible,
e.g. addition, subtraction, multiplication. However, calculations on
numeric values are very slow compared to the integer types, or to the
floating-point types described in the next section.

PostgreSQL: Difference between "bytea" and "bit varying" types

The PostgreSQL types bytea and bit varying sound similar:
bytea stores binary strings.
bit varying stores strings of 1's and 0's.
The documentation does not mention a maximum size for either. Is it 1GB like character varying?
I have two separate use cases, both over a table with millions of rows:
Storing MD5 hashes
That would be a bytea with a length of 16 bytes or a bit(128). It would be used for:
Deduplication: Heavy use of GROUP BY, with an index I suppose.
Querying with WHERE md5 = for exact matches only.
Displaying as a hex string for human use.
Storing arbitrary binary data
Strings of binary data of varying length up to 4kB for:
Bitwise operations to find the strings matching a certain mask. Example at the end of this post.
Extracting some bytes, for instance get the integer value of the byte 14 in my string.
Some deduplication.
Working example for the bitwise operation, using bit varying. The mask is X'00FF00' and the it returns only the row X'AAAAAA'. I shortened the strings for the example but it would be over their full length, up to 4kB. Is it possible to do something similar with bytea?
CREATE TABLE test1 (mystring bit varying);
INSERT INTO test1 VALUES (X'AAAAAA'), (X'ABCABC');
SELECT * FROM test1 WHERE mystring & X'00FF00' = X'00AA00';
Which of bytea and bit varying is the more appropriate?
I saw the UUID type is made to store exactly 16 bytes, would that be any advantage to store the MD5's?
In general, if you're not using bitwise operations you should be using bytea.
I store larger values in bytea and then convert substrings to bit varying for bitwise operations where possible, mostly because clients understand bytea much more consistently than bit varying and the I/O format is more compact.
MD5 values should be stored as bytea. Bitwise operations on them make no sense, and you generally want to fetch them as binary.
I think bit varying really has two uses:
To store flags fields that are literally bit strings; and
As an interim data type for internal calculations
For pretty much everything else, use bytea.
There's nothing stopping you storing a 4k bitfield if that's what it is, though.
It appears the maximum length of bytea is 1 GB. [1]
For bitwise operation use bit varying (explanation see below)
For storing MD5 hash use bytea. It will take less storage than bit varying
The benefit using UUID is UUID algorithm somehow guarantees your uniqueness, not only in your table, but also in your database or even across your database (even if you generate UUID in your application). I think if you are using UUID without dashes it will be more efficient for storing, comparing and sorting in UUID (comparison between bytea and UUID see below).
For bitwise operation use bit varying
If you concern about storage:
bit varying takes more storage than bytea. If you are okay then you should try comparing the function they both offer:
bit varying
vs
bytea
So far I can see bit varying will be more suitable for you to do bitwise operation though bytea is generally accepted way to store arbitrary data.
PostgreSQL offers a single bytea operator: concatenation. You can append one byte value to another bytea value using the concatenation operator ||. [1]
Note that you cannot compare two bytea value, even for equality/inequality. You can, of course, convert bytea value into another value using the CAST(), and that opens up other operators. [1]
Comparison between UUID and bytea
create table u(uuid uuid primary key, payload character(300));
create table b( bytea bytea primary key, payload character(300));
INSERT INTO u
SELECT uuid_generate_v4()
FROM generate_series(1,1000*1000);
INSERT INTO b
SELECT random_bytea(16)
FROM generate_series(1,1000*1000);
VACUUM ANALYZE u;
VACUUM ANALYZE b;
## Your table size
SELECT pg_size_pretty(pg_total_relation_size('u'));
pg_size_pretty
----------------
81 MB
SELECT pg_size_pretty(pg_total_relation_size('b'));
pg_size_pretty
----------------
101 MB
## Speed comparison
\timing on
## Common select
select * from u limit 1000;
Time: 1.433 ms
select * from b limit 1000;
Time: 1.396 ms
## Random Select
SELECT * FROM u OFFSET random()*1000 LIMIT 10000;
Time: 42.453 ms
SELECT * FROM b OFFSET random()*1000 LIMIT 10000;
Time: 10.962 ms
Conclusion : I don't think there will be more benefit using UUID except its uniqueness and smaller size (will be faster to insert)
Note: No Index, there is only one connection
Some source :
PostgreSQL: "The Comprehensive Guide to Building, Programming, And Administratoring PostgreSQL Databases" Book

How do I store an unsigned int in postgres sql?

How do I store an unsigned int (uint32) in postgres? I noticed that numeric(10,0) would fit the number of digits, but is this the best way?
On further research another similar problem is storing a uint64. I've found numeric(20,0) check (BETWEEN 0 AND '18446744073709551615'::numeric(20,0)). There aren't any native types for this I believe.
Arithmetic on numeric values is very slow compared to the integer types.
Use bigint. It stores an 8 byte integer up to 2^63 - 1 = 9223372036854775807
[You probably don't need the entire unsigned range]

Why unsigned integer is not available in PostgreSQL?

I came across this post (What is the difference between tinyint, smallint, mediumint, bigint and int in MySQL?) and realized that PostgreSQL does not support unsigned integer.
Can anyone help to explain why is it so?
Most of the time, I use unsigned integer as auto incremented primary key in MySQL. In such design, how can I overcome this when I port my database from MySQL to PostgreSQL?
Thanks.
It's not in the SQL standard, so the general urge to implement it is lower.
Having too many different integer types makes the type resolution system more fragile, so there is some resistance to adding more types into the mix.
That said, there is no reason why it couldn't be done. It's just a lot of work.
It is already answered why postgresql lacks unsigned types. However I would suggest to use domains for unsigned types.
http://www.postgresql.org/docs/9.4/static/sql-createdomain.html
CREATE DOMAIN name [ AS ] data_type
[ COLLATE collation ]
[ DEFAULT expression ]
[ constraint [ ... ] ]
where constraint is:
[ CONSTRAINT constraint_name ]
{ NOT NULL | NULL | CHECK (expression) }
Domain is like a type but with an additional constraint.
For an concrete example you could use
CREATE DOMAIN uint2 AS int4
CHECK(VALUE >= 0 AND VALUE < 65536);
Here is what psql gives when I try to abuse the type.
DS1=# select (346346 :: uint2);
ERROR: value for domain uint2 violates check constraint "uint2_check"
You can use a CHECK constraint, e.g.:
CREATE TABLE products (
id integer,
name text,
price numeric CHECK (price > 0)
);
Also, PostgreSQL has serial, smallserial and bigserial types for auto-increment.
The talk about DOMAINS is interesting but not relevant to the only possible origin of that question. The desire for unsigned ints is to double the range of ints with the same number of bits, it's an efficiency argument, not the desire to exclude negative numbers, everybody knows how to add a check constraint.
When asked by someone about it, Tome Lane stated:
Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications. We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.
What is the "POLA"? Google gave me 10 results that are meaningless. Not sure if it's politically incorrect thought and therefore censored. Why would this search term not yield any result? Whatever.
You can implement unsigned ints as extension types without too much trouble. If you do it with C-functions, then there will be about no performance penalties at all. You won't need to extend the parser to deal with literals because PgSQL has such an easy way to interpret strings as literals, just write '4294966272'::uint4 as your literals. Casts shouldn't be a huge deal either. You don't even need to do range exceptions, you can just treat the semantics of '4294966273'::uint4::int as -1024. Or you can throw an error.
If I wanted this, I would have done it. But since I'm using Java on the other side of SQL, to me it is of little value since Java doesn't have those unsigned integers either. So I gain nothing. I'm already annoyed if I get a BigInteger from a bigint column, when it should fit into long.
Another thing, if I did have the need to store 32 bit or 64 bit types, I can use PostgreSQL int4 or int8 respectively, just remembering that the natural order or arithmetic won't work reliably. But storing and retrieving is unaffected by that.
Here is how I can implement a simple unsigned int8:
First I will use
CREATE TYPE name (
INPUT = uint8_in,
OUTPUT = uint8_out
[, RECEIVE = uint8_receive ]
[, SEND = uint8_send ]
[, ANALYZE = uint8_analyze ]
, INTERNALLENGTH = 8
, PASSEDBYVALUE ]
, ALIGNMENT = 8
, STORAGE = plain
, CATEGORY = N
, PREFERRED = false
, DEFAULT = null
)
the minimal 2 functions uint8_in and uint8_out I must first define.
CREATE FUNCTION uint8_in(cstring)
RETURNS uint8
AS 'uint8_funcs'
LANGUAGE C IMMUTABLE STRICT;
CREATE FUNCTION uint64_out(complex)
RETURNS cstring
AS 'uint8_funcs'
LANGUAGE C IMMUTABLE STRICT;
need to implement this in C uint8_funcs.c. So I go use the complex example from here and make it simple:
PG_FUNCTION_INFO_V1(complex_in);
Datum complex_in(PG_FUNCTION_ARGS) {
char *str = PG_GETARG_CSTRING(0);
uint64_t result;
if(sscanf(str, "%llx" , &result) != 1)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input syntax for uint8: \"%s\"", str)));
return (Datum)SET_8_BYTES(result);
}
ah well, or you can just find it done already.
According to the latest documentation, the signed integer is supported but no unsigned integer in the table. However, the serial type is kind of similar to unsigned except it starts from 1 not from zero. But the upper limit is the same as signed. So the system truly does not have unsigned support. As pointed out by Peter, the door is open to implement the unsigned version. The code may have to be updated a lot, just too much work from my experience working with C programming.
https://www.postgresql.org/docs/10/datatype-numeric.html
integer 4 bytes typical choice for integer -2147483648 to +2147483647
serial 4 bytes autoincrementing integer 1 to 2147483647
Postgres does have an unsigned integer type that is unbeknownst to many: OID.
The oid type is currently implemented as an unsigned four-byte integer. […]
The oid type itself has few operations beyond comparison. It can be
cast to integer, however, and then manipulated using the standard
integer operators. (Beware of possible signed-versus-unsigned confusion
if you do this.)
It is not a numeric type though, and trying to do any arithmetic (or even bitwise operations) with it is going to fail. Also, it's just 4 bytes (INTEGER), there is no corresponding 8 byte (BIGINT) unsigned type.
So it's not really a good idea to use this yourself, and I agree with all the other answers that in a Postgresql database design you should always use an INTEGER or BIGINT column for your serial primary key - having it start in the negative (MINVALUE) or allowing it to wrap around (CYCLE) if you want to exhaust the full domain.
However, it is quite useful for input/output conversion, like your migration from another DBMS. Inserting the value 2147483648 into an integer column will lead to an "ERROR: integer out of range", while using the expression 2147483648::OID works just fine.
Similarly, when selecting an integer column as text with mycolumn::TEXT, you will get negative values at some point, but with mycolumn::OID::TEXT you will always get a natural number.
See an example at dbfiddle.uk.

How can I assign a data type decimal to a column in Postgresql?

I'm working with postgresql-9.1 recently.
For some reason I have to use a tech which does not support data type numeric but decimal. Unfortunately, the data type of columns which I've assigned decimal to them in my Postgresql are always numeric. I tried to alter the type, but it did not work though I've got the messages just like "Query returned successfully with no result in 12 ms".
SO, I want to know how can I get the columns to be decimal.
Any help will be highly appreciate.
e.g.
My creating clauses:
CREATE TABLE IF NOT EXISTS htest
(
dsizemin decimal(8,3) NOT NULL,
dsizemax decimal(8,3) NOT NULL,
hidentifier character varying(10) NOT NULL,
tgrade character varying(10) NOT NULL,
fdvalue decimal(8,3),
CONSTRAINT htest_pkey PRIMARY KEY (dsizemin , dsizemax , hidentifier , tgrade )
);
My altering clauses:
ALTER TABLE htest
ALTER COLUMN dsizemin TYPE decimal(8,3);
But it does not work.
In PostgreSQL, "decimal" is an alias for "numeric" which poses some problems when your app thinks it expects a type called "decimal" from the database. As Craig noted above, you can't even create a domain called "decimal"
There is no good workaround in the database side. The only thing you can do is change the application to expect a numeric data type back.
Use Numeric (precision, scale) to store decimals
precision represents the total number of expected digits on either side of the decimal point. scale is the number decimals you wish to store.
This Numeric (5,5) would imply you only want numbers less than 1 (negative or positive) with 5 decimal points. Debug, it may be Numeric (6,5) if the postgre sql errors out because it things the leading 0 is a decimal.
0.12345 would be an example of the above.
1.12345 would need a field Numeric (6,5)
100.12345 would need a field Numeric (8,5)
-100.12345 would need a field Numeric (8,5)
When you write a select statement to see the decimals, it rounds to 2; but if you do something like Select 100 * [field] from [table], then extra decimals should start appearing....