Signature of the Redshift internal "identity" function - amazon-redshift

While working on a legacy Redshift database I discovered unfamiliar pattern for default identity values for an autoincrement column. E.g.:
create table sometable (row_id bigint default "identity"(24078855, 0, '1,1'::text), ...
And surprisingly I wasn't able to find any docs about that identity function. The only thing I was able to dig up is the following:
select * from pg_proc proc
join pg_language lang on proc.prolang = lang.oid
where proc.proname = 'identity';
So I've found out that function to be internal, and it's prosrc column is just ff_identity_int64 (not googleable, unfortunately).
Could someone please provide me with some info about its first and second arguments? I mean 24078855 and 0 from that example "identity"(24078855, 0, '1,1'::text). ('1,1'::text -- here first 1 is the start value and second 1 is the step of increment). But 24078855 and 0 are still mysterious for me.

"identity"(24078855, 0, '1,1'::text)
table OID
0-based column index
text representation of parameters provided with IDENTITY clause
For reference look at pg_attrdef table

The IDENTITY clause is documented in the CREATE TABLE docs here: http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html#identity-clause
IDENTITY(seed, step)
Clause that specifies that the column is an IDENTITY column. An IDENTITY column contains unique auto-generated values. The data type for an IDENTITY column must be either INT or BIGINT. When you add rows using an INSERT statement, these values start with the value specified as seed and increment by the number specified as step. When you load the table using a COPY statement, an IDENTITY column might not be useful. With a COPY operation, the data is loaded in parallel and distributed to the node slices. To be sure that the identity values are unique, Amazon Redshift skips a number of values when creating the identity values. As a result, identity values are unique and sequential, but not consecutive, and the order might not match the order in the source files.

I've found that the first argument is the oid of the table. So, in your example (without the schema specified)
select oid from pg_class where relname = 'sometable'

Related

PostgreSQL: serial vs identity

To have an integer auto-numbering primary key on a table, you can use SERIAL
But I noticed the table information_schema.columns has a number of identity_ fields, and indeed, you could create a column with a GENERATED specifier...
What's the difference? Were they introduced with different PostgreSQL versions? Is one preferred over the other?
serial is the "old" implementation of auto-generated unique values that has been part of Postgres for ages. However that is not part of the SQL standard.
To be more compliant with the SQL standard, Postgres 10 introduced the syntax using generated as identity.
The underlying implementation is still based on a sequence, the definition now complies with the SQL standard. One thing that this new syntax allows is to prevent an accidental override of the value.
Consider the following tables:
create table t1 (id serial primary key);
create table t2 (id integer primary key generated always as identity);
Now when you run:
insert into t1 (id) values (1);
The underlying sequence and the values in the table are not in sync any more. If you run another
insert into t1 default_values;
You will get an error because the sequence was not advanced by the first insert, and now tries to insert the value 1 again.
With the second table however,
insert into t2 (id) values (1);
Results in:
ERROR: cannot insert into column "id"
Detail: Column "id" is an identity column defined as GENERATED ALWAYS.
So you can't accidentally "forget" the sequence usage. You can still force this, using the override system value option:
insert into t2 (id) overriding system value values (1);
which still leaves you with a sequence that is out-of-sync with the values in the table, but at least you were made aware of that.
identity columns also have another advantage: they also minimize the grants you need to give to a role in order to allow inserts.
While a table using a serial column requires the INSERT privilege on the table and the USAGE privilege on the underlying sequence this is not needed for tables using an identity columns. Granting the INSERT privilege is enough.
It is recommended to use the new identity syntax rather than serial

PostgreSQL id column not defined

I am new in PostgreSQL and I am working with this database.
I got a file which I imported, and I am trying to get rows with a certain ID. But the ID is not defined, as you can see it in this picture:
so how do I access this ID? I want to use an SQL command like this:
SELECT * from table_name WHERE ID = 1;
If any order of rows is ok for you, just add a row number according to the current arbitrary sort order:
CREATE SEQUENCE tbl_tbl_id_seq;
ALTER TABLE tbl ADD COLUMN tbl_id integer DEFAULT nextval('tbl_tbl_id_seq');
The new default value is filled in automatically in the process. You might want to run VACUUM FULL ANALYZE tbl to remove bloat and update statistics for the query planner afterwards. And possibly make the column your new PRIMARY KEY ...
To make it a fully fledged serial column:
ALTER SEQUENCE tbl_tbl_id_seq OWNED BY tbl.tbl_id;
See:
Creating a PostgreSQL sequence to a field (which is not the ID of the record)
What you see are just row numbers that pgAdmin displays, they are not really stored in the database.
If you want an artificial numeric primary key for the table, you'll have to create it explicitly.
For example:
CREATE TABLE mydata (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
obec text NOT NULL,
datum timestamp with time zone NOT NULL,
...
);
Then to copy the data from a CSV file, you would run
COPY mydata (obec, datum, ...) FROM '/path/to/csvfile' (FORMAT 'csv');
Then the id column is automatically filled.

oracle how to change the next autogenerated value of the identity column

I've created table projects like so:
CREATE TABLE projects (
project_id NUMBER(10,0) GENERATED BY DEFAULT ON NULL AS IDENTITY ,
project_name VARCHAR2(75 CHAR) NOT NULL
Then I've inserted ~150,000 rows while importing data from my old MySQL table. the MySQL had existing id numbers which i need to preserve so I added the id number to the SQL during the insert. Now when I insert new rows into the oracle table, the id is a very low number. Can you tell me how to reset my counter on the project_id column to start at 150,001 so not to mess up any of my existing id numbers? essentially i need the oracle version of:
ALTER TABLE tbl AUTO_INCREMENT = 150001;
Edit: Oracle 12c now supports the identity data type, allowing an auto number primary key that does not require us to create a sequence + insert trigger.
SOLUTION:
after some creative google search terms I was able to find this thread on the oracle docs site. here is the solution for changing the identity's nextval:
ALTER TABLE projects MODIFY project_id GENERATED BY DEFAULT ON NULL AS IDENTITY ( START WITH 150000);
Here is the solution that i found on this oracle thread:. The concept is to alter your identity column rather than adjust the sequence. Actually, the sequences that are automatically created aren't editable or drop-able.
ALTER TABLE projects MODIFY project_id GENERATED BY DEFAULT ON NULL AS IDENTITY ( START WITH 150000);
According to this source, you can do it like this:
ALTER TABLE projects MODIFY project_id
GENERATED BY DEFAULT ON NULL AS IDENTITY (START WITH LIMIT VALUE);
The START WITH LIMIT VALUE clause can only be specified with an ALTER TABLE statement (and by implication against an existing identity column). When this clause is specified, the table will be scanned for the highest value in the PROJECT_ID column and the sequence will commence at this value + 1.
The same is also stated in the oracle thread referenced in OP's own answer:
START WITH LIMIT VALUE, which is specific to identity_options, can only be used with ALTER TABLE MODIFY. If you specify START WITH LIMIT VALUE, then Oracle Database locks the table and finds the maximum identity column value in the table (for increasing sequences) or the minimum identity column value (for decreasing sequences) and assigns the value as the sequence generator's high water mark. The next value returned by the sequence generator will be the high water mark + INCREMENT BY integer for increasing sequences, or the high water mark - INCREMENT BY integer for decreasing sequences.
The following statement creates the sequence customers_seq in the sample schema oe. This sequence could be used to provide customer ID numbers when rows are added to the customers table.
CREATE SEQUENCE customers_seq
START WITH 1000
INCREMENT BY 1
NOCACHE
NOCYCLE;
The first reference to customers_seq.nextval returns 1000. The second returns 1001. Each subsequent reference will return a value 1 greater than the previous reference.
http://docs.oracle.com/cd/B12037_01/server.101/b10759/statements_6014.htm

How can I copy an IDENTITY field?

I’d like to update some parameters for a table, such as the dist and sort key. In order to do so, I’ve renamed the old version of the table, and recreated the table with the new parameters (these can not be changed once a table has been created).
I need to preserve the id field from the old table, which is an IDENTITY field. If I try the following query however, I get an error:
insert into edw.my_table_new select * from edw.my_table_old;
ERROR: cannot set an identity column to a value [SQL State=0A000]
How can I keep the same id from the old table?
You can't INSERT data setting the IDENTITY columns, but you can load data from S3 using COPY command.
First you will need to create a dump of source table with UNLOAD.
Then simply use COPY with EXPLICIT_IDS parameter as described in Loading default column values:
If an IDENTITY column is included in the column list, the EXPLICIT_IDS
option must also be specified in the COPY command, or the COPY command
will fail. Similarly, if an IDENTITY column is omitted from the column
list, and the EXPLICIT_IDS option is specified, the COPY operation
will fail.
You can explicitly specify the columns, and ignore the identity column:
insert into existing_table (col1, col2) select col1, col2 from another_table;
Use ALTER TABLE APPEND twice, first time with IGNOREEXTRA and the second time with FILLTARGET.
If the target table contains columns that don't exist in the source
table, include FILLTARGET. The command fills the extra columns in the
source table with either the default column value or IDENTITY value,
if one was defined, or NULL.
It moves the columns from one table to another, extremely quickly; took me 4s for 1GB table in dc1.large node.
Appends rows to a target table by moving data from an existing source
table.
...
ALTER TABLE APPEND is usually much faster than a similar CREATE TABLE
AS or INSERT INTO operation because data is moved, not duplicated.
Faster and simpler than UNLOAD + COPY with EXPLICIT_IDS.

auto-increment column in PostgreSQL on the fly?

I was wondering if it is possible to add an auto-increment integer field on the fly, i.e. without defining it in a CREATE TABLE statement?
For example, I have a statement:
SELECT 1 AS id, t.type FROM t;
and I am can I change this to
SELECT some_nextval_magic AS id, t.type FROM t;
I need to create the auto-increment field on the fly in the some_nextval_magic part because the result relation is a temporary one during the construction of a bigger SQL statement. And the value of id field is not really important as long as it is unique.
I search around here, and the answers to related questions (e.g. PostgreSQL Autoincrement) mostly involving specifying SERIAL or using nextval in CREATE TABLE. But I don't necessarily want to use CREATE TABLE or VIEW (unless I have to). There are also some discussions of generate_series(), but I am not sure whether it applies here.
-- Update --
My motivation is illustrated in this GIS.SE answer regarding the PostGIS extension. The original query was:
CREATE VIEW buffer40units AS
SELECT
g.path[1] as gid,
g.geom::geometry(Polygon, 31492) as geom
FROM
(SELECT
(ST_Dump(ST_UNION(ST_Buffer(geom, 40)))).*
FROM point
) as g;
where g.path[1] as gid is an id field "required for visualization in QGIS". I believe the only requirement is that it is integer and unique across the table. I encountered some errors when running the above query when the g.path[] array is empty.
While trying to fix the array in the above query, this thought came to me:
Since the gid value does not matter anyways, is there an auto-increment function that can be used here instead?
If you wish to have an id field that assigns a unique integer to each row in the output, then use the row_number() window function:
select
row_number() over () as id,
t.type from t;
The generated id will only be unique within each execution of the query. Multiple executions will not generate new unique values for id.