PostgreSQL: serial vs identity - postgresql

To have an integer auto-numbering primary key on a table, you can use SERIAL
But I noticed the table information_schema.columns has a number of identity_ fields, and indeed, you could create a column with a GENERATED specifier...
What's the difference? Were they introduced with different PostgreSQL versions? Is one preferred over the other?

serial is the "old" implementation of auto-generated unique values that has been part of Postgres for ages. However that is not part of the SQL standard.
To be more compliant with the SQL standard, Postgres 10 introduced the syntax using generated as identity.
The underlying implementation is still based on a sequence, the definition now complies with the SQL standard. One thing that this new syntax allows is to prevent an accidental override of the value.
Consider the following tables:
create table t1 (id serial primary key);
create table t2 (id integer primary key generated always as identity);
Now when you run:
insert into t1 (id) values (1);
The underlying sequence and the values in the table are not in sync any more. If you run another
insert into t1 default_values;
You will get an error because the sequence was not advanced by the first insert, and now tries to insert the value 1 again.
With the second table however,
insert into t2 (id) values (1);
Results in:
ERROR: cannot insert into column "id"
Detail: Column "id" is an identity column defined as GENERATED ALWAYS.
So you can't accidentally "forget" the sequence usage. You can still force this, using the override system value option:
insert into t2 (id) overriding system value values (1);
which still leaves you with a sequence that is out-of-sync with the values in the table, but at least you were made aware of that.
identity columns also have another advantage: they also minimize the grants you need to give to a role in order to allow inserts.
While a table using a serial column requires the INSERT privilege on the table and the USAGE privilege on the underlying sequence this is not needed for tables using an identity columns. Granting the INSERT privilege is enough.
It is recommended to use the new identity syntax rather than serial

Related

Remove "identity flag" from a column in PostgreSQL

I have some tables in PostgreSQL 12.9 that were declared as something like
-- This table is written in old style
create table old_style_table_1 (
id bigserial not null primary key,
...
);
-- This table uses new feature
create table new_style_table_2 (
id bigint generated by default as identity,
...
);
Second table seems to be declared using the identity flag introduced in 10th version.
Time went by, and we have partitioned the old tables, while keeping the original sequences:
CREATE TABLE partitioned_old_style_table_1 (LIKE old_style_table_1 INCLUDING DEFAULTS) PARTITION BY HASH (user_id);
CREATE TABLE partitioned_new_style_table_2 (LIKE new_style_table_2 INCLUDING DEFAULTS) PARTITION BY HASH (user_id);
DDL for their id columns seems to be id bigint default nextval('old_style_table_1_id_seq') not null and id bigint default nextval('new_style_table_2_id_seq') not null.
Everything has worked fine so far. Partitioned tables proved to be a great boon and we decided to retire the old tables by dropping them.
DROP TABLE old_style_table_1, new_style_table_2;
-- [2BP01] ERROR: cannot drop desired object(s) because other objects depend on them
-- Detail: default value for column id of table old_style_table_1 depends on sequence old_style_table_1_id_seq
-- default value for column id of table new_style_table_2 depends on sequence new_style_table_2_id_seq
After some pondering I've found out that sequences may have owners in postgres, so I opted to change them:
ALTER SEQUENCE old_style_table_1_id_seq OWNED BY partitioned_old_style_table_1.id;
DROP TABLE old_style_table_1;
-- Worked out flawlessly
ALTER SEQUENCE new_style_table_2_id_seq OWNED BY partitioned_new_style_table_2.id;
ALTER SEQUENCE new_style_table_2_id_seq OWNED BY NONE;
-- Here's the culprit of the question:
-- [0A000] ERROR: cannot change ownership of identity sequence
So, apparently the fact that this column has pg_attribute.attidentity set to 'd' forbids me from:
• changing the default value of the column:
ALTER TABLE new_style_table_2 ALTER COLUMN id SET DEFAULT 0;
-- [42601] ERROR: column "id" of relation "new_style_table_2" is an identity column
• dropping the default value:
ALTER TABLE new_style_table_2 ALTER COLUMN id DROP DEFAULT;
-- [42601] ERROR: column "id" of relation "new_style_table_2" is an identity column
-- Hint: Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead.
• dropping the identity, column or the table altogether (new tables already depend on the sequence):
ALTER TABLE new_style_table_2 ALTER COLUMN id DROP IDENTITY IF EXISTS;
-- or
ALTER TABLE new_style_table_2 DROP COLUMN id;
-- or
DROP TABLE new_style_table_2;
-- result in
-- [2BP01] ERROR: cannot drop desired object(s) because other objects depend on them
-- default value for column id of table partitioned_new_style_table_2 depends on sequence new_style_table_2_id_seq
I've looked up the documentation, it provides the way to SET IDENTITY or ADD IDENTITY, but no way to remove it or to change to a throwaway sequence without attempting to drop the existing one.
➥ So, how am I able to remove an identity flag from the column-sequence pair so it won't affect other tables that use this sequence?
UPD: Tried running UPDATE pg_attribute SET attidentity='' WHERE attrelid=16816; on localhost, still receive [2BP01] and [0A000]. :/
Though I managed to execute the DROP DEFAULT value bit, but it seems like a dead end.
I don't think there is a safe and supported way to do that (without catalog modifications). Fortunately, there is nothing special about sequences that would make dropping them a problem. So take a short down time and:
remove the default value that uses the identity sequence
record the current value of the sequence
drop the table
create a new sequence with an appropriate START value
use the new sequence to set new default values
If you want an identity column, you should define it on the partitioned table, not on one of the partitions.

Unexpected creation of duplicate unique constraints in Postgres

I am writing an idempotent schema change script for a Postgres 12 database. However I noticed that if I include the IF NOT EXISTS in an ADD COLUMN statement then even if the column already exists it is adding duplicate Indexes for the uniqueness constraint which already exists. Simple example:
-- set up base table
CREATE TABLE IF NOT EXISTS test_table
(id SERIAL PRIMARY KEY
);
-- statement intended to be idempotent
ALTER TABLE test_table
ADD COLUMN IF NOT EXISTS name varchar(50) UNIQUE;
Running this script creates a new index test_table_name_key[n] each time it is run. I can't find anything in the Postgres documentation and don't understand why this is allowed to happen? If I break it into two parts eg:
ALTER TABLE test_table
ADD COLUMN IF NOT EXISTS name varchar(50);
ALTER TABLE
ADD CONSTRAINT test_table_name_key UNIQUE (name);
Then the transaction fails because Postgres rejects the creation of a constraint which already exists (which I can then catch in a DO EXCEPTION block). As far as I can tell this is because doing it by this approach I am forced to give the constraint a name. This constrasts with the ALTER COLUMN SET NOT NULL which can be run multiple times without error or side effects as far as I can tell.
Question: why does it add a duplicate unique constraint and are there any problems with having multiple identical indexes on a table column? (I think this is a subtle 'error' and only spotted it by chance so am concerned it may arise in a production situation)
You can create multiple unique constraints on the same column as long as they have different names, simply because there is nothing in the PostgreSQL code that forbids that. Each unique constraint will create a unique index with the same name, because that is how unique constraints are implemented.
This can be a valid use case: for example, if the index is bloated, you could create a new constraint and then drop the old one.
But normally, it is useless and does harm, because each index will make data modifications on the table slower.

Postgresql: why no automatic updates of a id sequence after bulk insert

Assuming, I have this table:
CREATE TABLE IF NOT EXISTS public.test
(
"Id" smallint NOT NULL GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
"Value" character varying(10)
);
and I insert some rows:
INSERT INTO public.test ("Id", "Value") VALUES
(1, 'Val1'),
(2, 'Val2'),
(3, 'Val3');
Everything is fine. But now I want to insert another row
INSERT INTO public.test ("Value") VALUES ('Val9');
and I get the error: duplicate key violates unique constraint
That is, how I learned, because the pk-sequence is out of sync. But why? Is there any senseful reason why pg do not update the sequence automatically? After single INSERTs it works well, but after BULK not? Is there any way to avoid these manually updates? Or is it standard to correct every pk-sequence of every table after every little bulk insert (if there is a serial id)?
For futher information, I use GENERATED BY DEFAULT because I want to migrate a database from mysql and I want to preserve the IDs. And I would like the idea of having a lot of flexibility with the keys (similar to mysql).
But what do I not understand here?
Is it possible to correct the sequence automatically without knowing it's concrete name?
Sorry, more questions than I wanted to ask. But ... I don't understand this concept. Would appreciate some explanation.
The problem is the BY DEFAULT in GENERATED BY DEFAULT AS IDENTITY. That means that you can override the automatically generated keys, like your INSERT statements do.
When that happens, the sequence that implements the identity column is not modified. So when you insert a row without specifying "Id", it will start counting at 1, which causes the conflict.
Never override the default. To make sure this doesn't happen by accident, use GENERATED ALWAYS AS IDENTITY.

oracle how to change the next autogenerated value of the identity column

I've created table projects like so:
CREATE TABLE projects (
project_id NUMBER(10,0) GENERATED BY DEFAULT ON NULL AS IDENTITY ,
project_name VARCHAR2(75 CHAR) NOT NULL
Then I've inserted ~150,000 rows while importing data from my old MySQL table. the MySQL had existing id numbers which i need to preserve so I added the id number to the SQL during the insert. Now when I insert new rows into the oracle table, the id is a very low number. Can you tell me how to reset my counter on the project_id column to start at 150,001 so not to mess up any of my existing id numbers? essentially i need the oracle version of:
ALTER TABLE tbl AUTO_INCREMENT = 150001;
Edit: Oracle 12c now supports the identity data type, allowing an auto number primary key that does not require us to create a sequence + insert trigger.
SOLUTION:
after some creative google search terms I was able to find this thread on the oracle docs site. here is the solution for changing the identity's nextval:
ALTER TABLE projects MODIFY project_id GENERATED BY DEFAULT ON NULL AS IDENTITY ( START WITH 150000);
Here is the solution that i found on this oracle thread:. The concept is to alter your identity column rather than adjust the sequence. Actually, the sequences that are automatically created aren't editable or drop-able.
ALTER TABLE projects MODIFY project_id GENERATED BY DEFAULT ON NULL AS IDENTITY ( START WITH 150000);
According to this source, you can do it like this:
ALTER TABLE projects MODIFY project_id
GENERATED BY DEFAULT ON NULL AS IDENTITY (START WITH LIMIT VALUE);
The START WITH LIMIT VALUE clause can only be specified with an ALTER TABLE statement (and by implication against an existing identity column). When this clause is specified, the table will be scanned for the highest value in the PROJECT_ID column and the sequence will commence at this value + 1.
The same is also stated in the oracle thread referenced in OP's own answer:
START WITH LIMIT VALUE, which is specific to identity_options, can only be used with ALTER TABLE MODIFY. If you specify START WITH LIMIT VALUE, then Oracle Database locks the table and finds the maximum identity column value in the table (for increasing sequences) or the minimum identity column value (for decreasing sequences) and assigns the value as the sequence generator's high water mark. The next value returned by the sequence generator will be the high water mark + INCREMENT BY integer for increasing sequences, or the high water mark - INCREMENT BY integer for decreasing sequences.
The following statement creates the sequence customers_seq in the sample schema oe. This sequence could be used to provide customer ID numbers when rows are added to the customers table.
CREATE SEQUENCE customers_seq
START WITH 1000
INCREMENT BY 1
NOCACHE
NOCYCLE;
The first reference to customers_seq.nextval returns 1000. The second returns 1001. Each subsequent reference will return a value 1 greater than the previous reference.
http://docs.oracle.com/cd/B12037_01/server.101/b10759/statements_6014.htm

T-SQL create table with primary keys

Hello I wan to create a new table based on another one and create primary keys as well.
Currently this is how I'm doing it. Table B has no primary keys defined. But I would like to create them in table A. Is there a way using this select top 0 statement to do that? Or do I need to do an ALTER TABLE after I created tableA?
Thanks
select TOP 0 *
INTO [tableA]
FROM [tableB]
SELECT INTO does not support copying any of the indexes, constraints, triggers or even computed columns and other table properties, aside from the IDENTITY property (as long as you don't apply an expression to the IDENTITY column.
So, you will have to add the constraints after the table has been created and populated.
The short answer is NO. SELECT INTO will always create a HEAP table and, according to Books Online:
Indexes, constraints, and triggers defined in the source table are not
transferred to the new table, nor can they be specified in the
SELECT...INTO statement. If these objects are required, you must
create them after executing the SELECT...INTO statement.
So, after executing SELECT INTO you need to execute an ALTER TABLE or CREATE UNIQUE INDEX in order to add a primary key.
Also, if dbo.TableB does not already have an IDENTITY column (or if it does and you want to leave it out for some reason), and you need to create an artificial primary key column (rather than use an existing column in dbo.TableB to serve as the new primary key), you could use the IDENTITY function to create a candidate key column. But you still have to add the constraint to TableA after the fact to make it a primary key, since just the IDENTITY function/property alone does not make it so.
-- This statement will create a HEAP table
SELECT Col1, Col2, IDENTITY(INT,1,1) Col3
INTO dbo.MyTable
FROM dbo.AnotherTable;
-- This statement will create a clustered PK
ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable_Col3 PRIMARY KEY (Col3);