Postgres Upsert Cardinality Violation - postgresql

I am trying to insert extracted data from a sql table into a postgres table where the rows may or may not exist. If they do exist, I would like to set a specific column to its default (0)
The table is as
site_notes (
job_id text primary key,
attachment_id text,
complete int default 0);
My query is
INSERT INTO site_notes (
job_id,
attachment_id
)
VALUES
{jobs_sql}
ON CONFLICT (job_id) DO UPDATE
SET complete = DEFAULT;
However I am getting an error: psycopg2.errors.CardinalityViolation: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Would anyone be able to advise on how to set the complete column to the default on event of a conflict ?
Many Thanks

An INSERT ... ON CONFLICT DO UPDATE statement (and indeed an UPDATE statement too) is not allowed to modify the same row more than once. It is not clear what {jobs_sql} in your question is, but it must contain several rows, and at least two of those have the same job_id.
Make sure that the same job_id does not occur more than once in the rows you want to INSERT.

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

Appending array during Postgresql upsert & getting ambiguous column error

When trying to do a query like below:
INSERT INTO employee_channels (employee_id, channels)
VALUES ('46356699-bed1-4ec4-9ac1-76f124b32184', '{a159d680-2f2e-4ba7-9498-484271ad0834}')
ON CONFLICT (employee_id)
DO UPDATE SET channels = array_append(channels, 'a159d680-2f2e-4ba7-9498-484271ad0834')
WHERE employee_id = '46356699-bed1-4ec4-9ac1-76f124b32184'
AND NOT lower(channels::text)::text[] #> ARRAY['a159d680-2f2e-4ba7-9498-484271ad0834'];
I get the following error
[42702] ERROR: column reference "channels" is ambiguous Position: 245
The specific reference to channels it's referring to is the 'channels' inside array_append.
channels is a CITEXT[] data type
You may need to specify the EXCLUDED table in your set statement.
SET channels = array_append(EXCLUDED.channels, 'a159d680-2f2e-4ba7-9498-484271ad0834')
When using the ON CONFLICT DO UPDATE clause the values that aren't inserted because of the conflict are stored in the EXCLUDED table. Its an ephemeral table you don't have to actually make, the way NEW and OLD are in triggers.
From the PostgreSQL Manual:
conflict_action specifies an alternative ON CONFLICT action. It can be either DO NOTHING, or a DO UPDATE clause specifying the exact
details of the UPDATE action to be performed in case of a conflict.
The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
existing row using the table's name (or an alias), and to rows
proposed for insertion using the special excluded table. SELECT
privilege is required on any column in the target table where
corresponding excluded columns are read.
Note that the effects of all per-row BEFORE INSERT triggers are reflected in excluded values, since those effects may have contributed
to the row being excluded from insertion.

incorrect data update on Sybase trigger execution

I have a table test_123 with the column as:
int_1 (int),
datetime_1 (datetime),
tinyint_1 (tinyint),
datetime_2 (datetime)
So when column datetime_1 is updated and the value at column tinyint_1 = 1 that time i have to update my column datetime_2 with column value of datetime_1
I have created the below trigger for this.. but with my trigger it is updating all datetime2 column values with datetime_1 column when tinyint_1 = 1 .. but i just want to update that particular row where datetime_1 value has updated( i mean changed)..
Below is the trigger..
CREATE TRIGGER test_trigger_upd
ON test_123
FOR UPDATE
AS
FOR EACH STATEMENT
IF UPDATE(datetime_1)
BEGIN
UPDATE test_123
SET test_123.datetime_2 = inserted.datetime_1
WHERE test_123.tinyint_1 = 1
END
ROW-level triggers are not supported in ASE. There are only after-statement triggers.
As commented earlier, the problem you're facing is that you need to be able to link the rows in the 'inserted' pseudo-table to the base table itself. You can only do that if there is a key -- meaning: a column that uniquely identifies a row, or a combination of columns that does so. Without that, you simply cannot identify the row that needs to be updated, since there may be multiple rows with identical column values if uniqueness is not guaranteed.
(and on a side note: not having a key in a table is bad design practice -- and this problem is one of the many reasons why).
A simple solution is to add an identity column to the table, e.g.
ALTER TABLE test_123 ADD idcol INT IDENTITY NOT NULL
You can then add a predicate 'test_123.idcol = inserted.idcol' to the trigger join.

Duplicate Key error when using INSERT DEFAULT

I am getting a duplicate key error, DB2 SQL Error: SQLCODE=-803, SQLSTATE=23505, when I try to INSERT records. The primary key is one column, INTEGER 4, Generated, and it is the first column.
the insert looks like this: INSERT INTO SCHEMA.TABLE1 values (DEFAULT, ?, ?, ...)
It's my understanding that using the value DEFAULT will just let DB2 auto-generate the key at the time of insert, which is what I want. This works most of the time, but sometimes/randomly I get the duplicate key error. Thoughts?
More specifically, I'm running against DB2 9.7.0.3, using Scriptella to copy a bunch of records from one database to another. Sometimes I can process a bunch with no problems, other times I'll get the error right away, other times after 2 records, or 20 records, or 30 records, etc. Does not seem to be a pattern, nor is it the same record every time. If I change the data to copy 1 record instead of a bunch, sometimes I'll get the error one time, then it's fine the next time.
I thought maybe some other process was inserting records during my batch program, and creating keys at the same time. However, the tables I'm copying TO should not have any other users/processes trying to INSERT records during this same time frame, although there could be READS happening.
Edit: adding create info:
Create table SCHEMA.TABLE1 (
SYSTEM_USER_KEY INTEGER NOT NULL
generated by default as identity (start with 1 increment by 1 cache 20),
COL2...,
)
alter table SCHEMA.TABLE1
add constraint SYSTEM_USER_SYSTEM_USER_KEY_IDX
Primary Key (SYSTEM_USER_KEY);
You most likely have records in your table with IDs that are bigger then the next value in your identity sequence. To find out what the current value your sequence is about at, run the following query.
select s.nextcachefirstvalue-s.cache, s.nextcachefirstvalue-s.increment
from syscat.COLIDENTATTRIBUTES as a inner join syscat.sequences as s on a.seqid=s.seqid
where a.tabschema='SCHEMA'
and a.TABNAME='TABLE1'
and a.COLNAME='SYSTEM_USER_KEY'
So basically what happened is that somehow you got records in your table with ids that are bigger then the current last value of your identity sequence. So sooner or later these ids will collide with identity generated ids.
There are different reasons on how this could have happened. One possibility is that data was loaded which already contained values for the id column or that records were inserted with an actual value for the ID. Another option is that the identity sequence was reset to start at a lower value than the max id in the table.
Whatever the cause, you may also want the fix:
SELECT MAX(<primary_key_column>) FROM onsite.forms;
ALTER TABLE <table> ALTER COLUMN <primary_key_column> RESTART WITH <number from previous query + 1>;

postgres autoincrement not updated on explicit id inserts

I have the following table in postgres:
CREATE TABLE "test" (
"id" serial NOT NULL PRIMARY KEY,
"value" text
)
I am doing following insertions:
insert into test (id, value) values (1, 'alpha')
insert into test (id, value) values (2, 'beta')
insert into test (value) values ('gamma')
In the first 2 inserts I am explicitly mentioning the id. However the table's auto increment pointer is not updated in this case. Hence in the 3rd insert I get the error:
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(1) already exists.
I never faced this problem in Mysql in both MyISAM and INNODB engines. Explicit or not, mysql always update autoincrement pointer based on the max row id.
What is the workaround for this problem in postgres? I need it because I want a tighter control for some ids in my table.
UPDATE:
I need it because for some values I need to have a fixed id. For other new entries I dont mind creating new ones.
I think it may be possible by manually incrementing the nextval pointer to max(id) + 1 whenever I am explicitly inserting the ids. But I am not sure how to do that.
That's how it's supposed to work - next_val('test_id_seq') is only called when the system needs a value for this column and you have not provided one. If you provide value no such call is performed and consequently the sequence is not "updated".
You could work around this by manually setting the value of the sequence after your last insert with explicitly provided values:
SELECT setval('test_id_seq', (SELECT MAX(id) from "test"));
The name of the sequence is autogenerated and is always tablename_columnname_seq.
In the recent version of Django, this topic is discussed in the documentation:
Django uses PostgreSQL’s SERIAL data type to store auto-incrementing
primary keys. A SERIAL column is populated with values from a sequence
that keeps track of the next available value. Manually assigning a
value to an auto-incrementing field doesn’t update the field’s
sequence, which might later cause a conflict.
Ref: https://docs.djangoproject.com/en/dev/ref/databases/#manually-specified-autoincrement-pk
There is also management command manage.py sqlsequencereset app_label ... that is able to generate SQL statements for resetting sequences for the given app name(s)
Ref: https://docs.djangoproject.com/en/dev/ref/django-admin/#django-admin-sqlsequencereset
For example these SQL statements were generated by manage.py sqlsequencereset my_app_in_my_project:
BEGIN;
SELECT setval(pg_get_serial_sequence('"my_project_aaa"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_aaa";
SELECT setval(pg_get_serial_sequence('"my_project_bbb"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_bbb";
SELECT setval(pg_get_serial_sequence('"my_project_ccc"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_ccc";
COMMIT;
It can be done automatically using a trigger. This way you are sure that the largest value is always used as the next default value.
CREATE OR REPLACE FUNCTION set_serial_id_seq()
RETURNS trigger AS
$BODY$
BEGIN
EXECUTE (FORMAT('SELECT setval(''%s_%s_seq'', (SELECT MAX(%s) from %s));',
TG_TABLE_NAME,
TG_ARGV[0],
TG_ARGV[0],
TG_TABLE_NAME));
RETURN OLD;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER set_mytable_id_seq
AFTER INSERT OR UPDATE OR DELETE
ON mytable
FOR EACH STATEMENT
EXECUTE PROCEDURE set_serial_id_seq('mytable_id');
The function can be reused for multiple tables. Change "mytable" to the table of interest.
For more info regarding triggers:
https://www.postgresql.org/docs/9.1/plpgsql-trigger.html
https://www.postgresql.org/docs/9.1/sql-createtrigger.html