postgres autoincrement not updated on explicit id inserts - postgresql

I have the following table in postgres:
CREATE TABLE "test" (
"id" serial NOT NULL PRIMARY KEY,
"value" text
)
I am doing following insertions:
insert into test (id, value) values (1, 'alpha')
insert into test (id, value) values (2, 'beta')
insert into test (value) values ('gamma')
In the first 2 inserts I am explicitly mentioning the id. However the table's auto increment pointer is not updated in this case. Hence in the 3rd insert I get the error:
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(1) already exists.
I never faced this problem in Mysql in both MyISAM and INNODB engines. Explicit or not, mysql always update autoincrement pointer based on the max row id.
What is the workaround for this problem in postgres? I need it because I want a tighter control for some ids in my table.
UPDATE:
I need it because for some values I need to have a fixed id. For other new entries I dont mind creating new ones.
I think it may be possible by manually incrementing the nextval pointer to max(id) + 1 whenever I am explicitly inserting the ids. But I am not sure how to do that.

That's how it's supposed to work - next_val('test_id_seq') is only called when the system needs a value for this column and you have not provided one. If you provide value no such call is performed and consequently the sequence is not "updated".
You could work around this by manually setting the value of the sequence after your last insert with explicitly provided values:
SELECT setval('test_id_seq', (SELECT MAX(id) from "test"));
The name of the sequence is autogenerated and is always tablename_columnname_seq.

In the recent version of Django, this topic is discussed in the documentation:
Django uses PostgreSQL’s SERIAL data type to store auto-incrementing
primary keys. A SERIAL column is populated with values from a sequence
that keeps track of the next available value. Manually assigning a
value to an auto-incrementing field doesn’t update the field’s
sequence, which might later cause a conflict.
Ref: https://docs.djangoproject.com/en/dev/ref/databases/#manually-specified-autoincrement-pk
There is also management command manage.py sqlsequencereset app_label ... that is able to generate SQL statements for resetting sequences for the given app name(s)
Ref: https://docs.djangoproject.com/en/dev/ref/django-admin/#django-admin-sqlsequencereset
For example these SQL statements were generated by manage.py sqlsequencereset my_app_in_my_project:
BEGIN;
SELECT setval(pg_get_serial_sequence('"my_project_aaa"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_aaa";
SELECT setval(pg_get_serial_sequence('"my_project_bbb"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_bbb";
SELECT setval(pg_get_serial_sequence('"my_project_ccc"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_ccc";
COMMIT;

It can be done automatically using a trigger. This way you are sure that the largest value is always used as the next default value.
CREATE OR REPLACE FUNCTION set_serial_id_seq()
RETURNS trigger AS
$BODY$
BEGIN
EXECUTE (FORMAT('SELECT setval(''%s_%s_seq'', (SELECT MAX(%s) from %s));',
TG_TABLE_NAME,
TG_ARGV[0],
TG_ARGV[0],
TG_TABLE_NAME));
RETURN OLD;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER set_mytable_id_seq
AFTER INSERT OR UPDATE OR DELETE
ON mytable
FOR EACH STATEMENT
EXECUTE PROCEDURE set_serial_id_seq('mytable_id');
The function can be reused for multiple tables. Change "mytable" to the table of interest.
For more info regarding triggers:
https://www.postgresql.org/docs/9.1/plpgsql-trigger.html
https://www.postgresql.org/docs/9.1/sql-createtrigger.html

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

Getting error for auto increment fields when inserting records without specifying columns

We're in process of converting over from SQL Server to Postgres. I have a scenario that I am trying to accommodate. It involves inserting records from one table into another, WITHOUT listing out all of the columns. I realize this is not recommended practice, but let's set that aside for now.
drop table if exists pk_test_table;
create table public.pk_test_table
(
recordid SERIAL PRIMARY KEY NOT NULL,
name text
);
--example 1: works and will insert a record with an id of 1
insert into pk_test_table values(default,'puppies');
--example 2: fails
insert into pk_test_table
select first_name from person_test;
Error I receive in the second example:
column "recordid" is of type integer but expression is of type
character varying Hint: You will need to rewrite or cast the
expression.
The default keyword will tell the database to grab the next value.
Is there any way to utilize this keyword in the second example? Or some way to tell the database to ignore auto-incremented columns and just them be populated like normal?
I would prefer to not use a subquery to grab the next "id".
This functionality works in SQL Server and hence the question.
Thanks in advance for your help!
If you can't list column names, you should instead use the DEFAULT keyword, as you've done in the simple insert example. This won't work with a in insert into ... select ....
For that, you need to invoke nextval. A subquery is not required, just:
insert into pk_test_table
select nextval('pk_test_table_id_seq'), first_name from person_test;
You do need to know the sequence name. You could get that from information_schema based on the table name and inferring its primary key, using a function that takes just the table name as an argument. It'd be ugly, but it'd work. I don't think there's any way around needing to know the table name.
You're inserting value into the first column, but you need to add a value in the second position.
Therefore you can use INSERT INTO table(field) VALUES(value) syntax.
Since you need to fetch values from another table, you have to remove VALUES and put the subquery there.
insert into pk_test_table(name)
select first_name from person_test;
I hope it helps
I do it this way via a separate function- though I think I'm getting around the issue via the table level having the DEFAULT settings on a per field basis.
create table public.pk_test_table
(
recordid integer NOT NULL DEFAULT nextval('pk_test_table_id_seq'),
name text,
field3 integer NOT NULL DEFAULT 64,
null_field_if_not_set integer,
CONSTRAINT pk_test_table_pkey PRIMARY KEY ("recordid")
);
With function:
CREATE OR REPLACE FUNCTION func_pk_test_table() RETURNS void AS
$BODY$
INSERT INTO pk_test_table (name)
SELECT first_name FROM person_test;
$BODY$
LANGUAGE sql VOLATILE;
Then just execute the function via a SELECT FROM func_pk_test_table();
Notice it hasn't had to specify all the fields- as long as constraints allow it.

Manipulate rows automatically before the `INSERT` statement

I'm looking for a way to manipulate rows automatically before adding them to a table in postgreSQL. Say for instance we have the following table:
CREATE TABLE foo (
id serial NOT NULL,
value integer NOT NULL,
CONSTRAINT "Foo_pkey" PRIMARY KEY (id),
CONSTRAINT "Foo_value_check" CHECK (value >= 0)
)
Now one can insert rows:
INSERT INTO foo (id,value) VALUES ('0','2')
And when one enters:
INSERT INTO foo (id,value) VALUES ('1','-2')
An error will occur. Is it possible to define a "rewrite rule" that given the value column contains a value less than zero, zero is used (for instance)?
Yes, it is possible. One way is to use triggers. A trigger causes a procedure to be run on particular actions, which can allow you to modify the data to be inserted (amongst other things).
To set up a trigger, you first create a function that will perform the checks and modifications you want. The variable new in your function will be implicitly declared and contain the new row to be inserted / updated so you can check and modify the values before they reach the table.
You then specify that this function is to be called before insert or update on one or more tables.
Example:
CREATE FUNCTION validate_foo_row()
RETURNS TRIGGER AS $$
BEGIN
IF new.value<0 THEN
new.value=0;
END IF;
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER trig_validate_foo BEFORE INSERT ON foo
FOR EACH ROW EXECUTE PROCEDURE validate_foo_row();
SqlFiddle Here
The above simplistic example only triggers for inserts, you might want to have it trigger for updates as well.
You can read more about triggers in the postgresql manual. They are powerful and are capable of a lot more than this simple example shows.

PostgreSQL: trivial INSERT fails the first time, succeeds afterwards

I am puzzled by a weird Postgres problem I encounter in the trivial database shown below: If I first insert a tag and explicitly specify its ID and then try to insert another tag without passing an ID, then this second insert fails. If I try a third time (again without ID), the insert succeeds.
DROP DATABASE IF EXISTS mydb;
CREATE DATABASE mydb;
\c mydb
DROP SCHEMA public;
CREATE SCHEMA core;
CREATE TABLE core.tag
(
id serial PRIMARY KEY,
title text NOT NULL
);
-- this works: all columns specified explicitly
INSERT INTO core.tag(id, title) VALUES (1, 'known tag');
-- omitting the tag ID fails with
-- ERROR: duplicate key value violates unique constraint "tag_pkey"
-- DETAIL: Key (id)=(1) already exists.
INSERT INTO core.tag(title) VALUES ('unknown tag');
-- this works again ?!?
INSERT INTO core.tag(title) VALUES ('unknown tag');
The issue only seems to occur on a freshly created database and once it does, it does not seem to happen again. I have never come across anything like this - so far, I have just inserted data with or without explicit ID and AFAICS, nothing ever failed like this...
Does anyone have an idea what's going on here ?!?
Environment: PostgreSQL 9.1.3 on Mac OSX 10.7.5
Of course this fails.
What happens?
When you create the table, a sequence is also created that generates the values for the ID column. The sequence starts with 1 but it is only used if you do not specify a value for the ID column.
Now when you run
INSERT INTO core.tag(id, title) VALUES (1, 'known tag');
you bypass Postgres' automatic assigment of the ID value, the sequence "stays" at one.
Now when you run
INSERT INTO core.tag(title) VALUES ('unknown tag');
Postgres takes the next value from the sequence - which is 1. But that alreay exists so the insert fails. After taking the value from the sequence, the next value is 2, so the subsequent insert without specifying an ID value gets the 2 and succeeds.
The solution is to either never include the ID column in your inserts. Or - if you do - request the ID from the sequence:
INSERT INTO core.tag(id, title) VALUES (nextval('tag_id_seq'), 'known tag');
When a serial column is created it is automatically associated with a sequence which is named <table_name>_<column_name>_seq. And that's the name I used in the above statement.
More details about how the serial "data type" works are in the manual: http://www.postgresql.org/docs/current/static/datatype-numeric.html#DATATYPE-SERIAL

Manual inserts on a postgres table with a primary key sequence

I'm converting a MySQL table to PostgreSQL for the first time in my life and running into the traditional newbie problem of having no auto_increment.
Now I've found out that the postgres solution is to use a sequence and then request the nextval() of this sequence as the default value every time you insert. I've also read that the SERIAL type creates a sequence and a primary key automatically, and that nextval() increments the counter even when called inside transactions to avoid locking the sequence.
What I can't find addressed is the issue of what happens when you manually insert values into a field with a UNIQUE or PRIMARY constraint and a nextval() of a sequence as default. As far as I can see, this causes the INSERT to fail when the sequence reaches that value.
Is there a simple (or common) way to fix this ?
A clear explanation would be very much appreciated.
Update: If you feel I shouldn't do this, will never be able to fix this or am making some flawed assumptions, please feel free to point them out in your answers. Above all, please tell me what to do instead to offer programmers a stable and robust database that can't be corrupted with a simple insert (preferably without hiding everything behind stored procedures)
If you're migrating your data then I would drop the sequence constraint on the column, perform all of your inserts, use setval() to set the sequence to the maximum value of your data and then reinstate your column sequence nextval() default.
You can create a trigger which will check if currval('id_sequence_name')>=NEW.id.
If your transaction did not use default value or nextval('id_sequence_name'), then a currval function will throw an error, as it works only when sequence was updated in current session. If you use nextval and then try to insert bigger primary key then it will throw another error. A transaction will be then aborted.
This would prevent inserting any bad primary keys which would break serial.
Example code:
create table test (id serial primary key, value text);
create or replace function test_id_check() returns trigger language plpgsql as
$$ begin
if ( currval('test_id_seq')<NEW.id ) then
raise exception 'currval(test_id_seq)<id';
end if;
return NEW;
end; $$;
create trigger test_id_seq_check before insert or update of id on test
for each row execute procedure test_id_check();
Then inserting with default primary key will work fine:
insert into test(value) values ('a'),('b'),('c'),('d');
But inserting too big primary key will error out and abort:
insert into test(id, value) values (10,'z');
To expand on Tometzky's great answer, here is a more general version:
CREATE OR REPLACE FUNCTION check_serial() RETURNS trigger AS $$
BEGIN
IF currval(TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME || '_' || TG_ARGV[0] || '_seq') <
(row_to_json(NEW)->>TG_ARGV[0])::bigint
THEN RAISE SQLSTATE '55000'; -- same as currval() of uninitialized sequence
END IF;
RETURN NULL;
EXCEPTION
WHEN SQLSTATE '55000'
THEN RAISE 'manual entry of serial field %.%.% disallowed',
TG_TABLE_SCHEMA, TG_TABLE_NAME, TG_ARGV[0]
USING HINT = 'use DEFAULT instead of specifying value manually',
SCHEMA = TG_TABLE_SCHEMA, TABLE = TG_TABLE_NAME, COLUMN = TG_ARGV[0];
END;
$$ LANGUAGE plpgsql;
Which you can apply to any column, say test.id, thusly:
CREATE CONSTRAINT TRIGGER test_id_check
AFTER INSERT OR UPDATE OF id ON test
FOR EACH ROW EXECUTE PROCEDURE check_serial(id);
I don't exactly understand you question, but if your goal is just to do the insert, and have a valid field (e.g. an id), then insert the values without the id field, that's what "default" stands for. It will work.
E.g. havin a id serial NOT NULL and a CONSTRAINT table_pkey PRIMARY KEY(id) in the table definition will auto-set the id and auto-increment a sequence table_id_seq.
What about using a CHECK?
CREATE SEQUENCE pk_test
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 1;
CREATE TABLE test (
id INT PRIMARY KEY CHECK (id=currval('pk_test')) DEFAULT nextval('pk_test'),
num int not null
);
ALTER SEQUENCE pk_test OWNED BY test.id;
-- Testing:
INSERT INTO test (num) VALUES (3) RETURNING id, num;
1,3 -- OK
2,3 -- OK
INSERT INTO test (id, num) values (30,3) RETURNING id, num;
/*
ERROR: new row for relation "test" violates check constraint "test_id_check"
DETAIL: Failing row contains (30, 3).
********** Error **********
ERROR: new row for relation "test" violates check constraint "test_id_check"
SQL state: 23514
Detail: Failing row contains (30, 3).
*/
DROP TABLE test;