My PSQL after insert trigger fails to insert into another table when ON DUPLICATE encounters a dupilcate - postgresql

I am slowly working through a feature where I am importing large csv files. The contents of the csv file has a chance that when it is uploaded the contents will trigger a uniqueness conflict. I've combed stack overflow for some similar resources but I still can't seem to get my trigger to update another table when a duplicate entry is found. The following code is what I have currently implemented with my line of logic for this process. Also, this is implemented in a rails app but the underlying sql is the following.
When a user uploads a file, the following happens when its processed.
CREATE TEMP TABLE codes_temp ON COMMIT DROP AS SELECT * FROM codes WITH NO DATA;
create or replace function log_duplicate_code()
returns trigger
language plpgsql
as
$$
begin
insert into duplicate_codes(id, campaign_id, code_batch_id, code, code_id, created_at, updated_at)
values (gen_random_uuid(), excluded.campaign_id, excluded.code_batch_id, excluded.code, excluded.code_id, now(), now());
return null;
end;
$$
create trigger log_duplicate_code
after insert on codes
for each row execute procedure log_duplicate_code();
INSERT INTO codes SELECT * FROM codes_temp ct
ON CONFLICT (campaign_id, code)
DO update set updated_at = excluded.updated_at;
DROP TRIGGER log_duplicate_code ON codes;
When I try to run this process nothing happens at all. If I were to have a csv file with this value CODE01 and then upload again with CODE01 the duplicate_codes table doesn't get populated at all and I don't understand why. There is no error that gets triggered or anything so it seems like DO UPATE..... is doing something. What am I missing here?
I also have some questions that come to my mind even if this were to work as intended. For example, I am uploading millions of these codes, etc.
1). Should my trigger be a statement trigger instead of a row for scalability?
2). What if someone else tries to upload another file that has millions of codes? I have my code wrapped in a transaction. Would a new separate trigger be created? Will this conflict with a previously processing process?
####### EDIT #1 #######
Thanks to Adriens' comment I do see that After Insert does not have the OLD key phrase. I updated my code to use EXCLUDED and I receive the following error for the trigger.
ERROR: missing FROM-clause entry for table "excluded" (PG::UndefinedTable)
Finally, here are the S.O. posts I've used to try to tailor my code but I just can't seem to make it work.
####### EDIT #2 #######
I have a little more context on to how this is implemented.
When the CSV is loaded, a staging table called codes_temp is created and dropped at the end of the transaction. This table contains no unique constraints. From what I read only the actual table that I want to insert codes should have the unique constraint error.
In my INSERT statement, the DO update set updated_at = excluded.updated_at; doesn't trigger a unique constraint error. As of right now, I don't know if it should or not. I borrowed this logic taken from this s.o. question postgresql log into another table with on conflict it seemed to me like I had to update something if I specify the DO UPDATE SET clause.
Last, the correct criteria for codes in the database is the following:
For example, this is an example entry in my codes table
id, campaign_id, code
1, 1, CODE01
2, 1, CODE02
3, 1, CODE03
If any of these codes appear again somewhere, This should not be inserted into the codes table but it needs to be inserted into the duplicate_codes table because they were already uploaded before.
id, campaign_id, code
1, 1, CODE01.
2, 1, CODE02
3, 1, CODE03
As for the codes_temp table I don't have any unique constraints, so there is no criteria to select the right one.
postgresql log into another table with on conflict
Postgres insert on conflict update using other table
Postgres on conflict - insert to another table
How to do INSERT INTO SELECT and ON DUPLICATE UPDATE in PostgreSQL 9.5?

Seems to me something like:
INSERT INTO
codes
SELECT
distinct on(campaign_id, code) *
FROM
codes_temp ct
ORDER BY
campaign_id, code, id DESC;
Assuming id was assigned sequentially, the above would select the most recent row into codes.
Then:
INSERT INTO
duplicate_codes
SELECT
*
FROM
codes_temp AS ct
LEFT JOIN
codes
ON
ct.id = codes.id
WHERE
codes.id IS NULL;
The above would select the rows in codes_temp that where not selected into codes into the duplicates table.
Obviously not tested on your data set. I would create a small test data set that has uniqueness conflicts and test with.

Related

PostgreSQL - Left pad value on COPY

I'm bringing in data from Excel into a PostgreSQL Db. There's a lot wrong with this data, but one thing that seems to connect several tables is a customer_id.
However, in the customer table I've a unique char(8) that always has a leading zero. Yes, if it were up to me I'd enforce this data weren't so screwy upstream, but I'm dealing with sales folks here, manufacturing there, financing, etc.
And, the customer id ALMOST matches through these various sources! It is just that the customer_id some data doesn't have the leading zero, so customers.id = '01234567' does represent orders.customer_id = '1234567'.
I'm using COPY command in Postgres, which is a new thing to me. Unfortunately, I cannot define a foreign key relationship on customer.id because of this small discrepancy.
How would I do a COPY and tell the column value to add a leading zero? Is this possible? I'm hoping I can do it right in the COPY statement? Thanks for any insight in how to do this!
EDIT:
A comment lead me to this documentation. I'll update with an answer after I figure this out. Looks like an ON BEFORE INSERT is what I'll need.
CREATE TRIGGER trigger_name
{BEFORE | AFTER} { event }
ON table_name
[FOR [EACH] { ROW | STATEMENT }]
EXECUTE PROCEDURE trigger_function
I'm the original poster and this is the answer to my question. I was bringing in data from XLS to PG and the leading zeros on customer_id(s) were dropped when exporting XLS to CSV for a COPY into PG.
Thanks be to an answer here that really pointed me down the right path: Postgresql insert trigger to set value
-- create table
CREATE TABLE T (customer_id char(8));
-- draft function to be used by trigger. NOTE the double single quotes.
CREATE FUNCTION lpad_8_0 ()
RETURNS trigger AS '
BEGIN
NEW.customer_id := (SELECT LPAD(NEW.customer_id, 8, ''0''));
RETURN NEW;
END' LANGUAGE 'plpgsql';
-- setup on before insert trigger to execute lpad_8_0 function
CREATE TRIGGER my_on_before_insert_trigger
BEFORE INSERT ON T
FOR EACH ROW
EXECUTE PROCEDURE lpad_8_0();
-- some sample inserts
INSERT INTO T
VALUES ('1234'), ('7');
Here's a working fiddle: http://sqlfiddle.com/#!17/a176e/1/0
NOTE: If the value here were larger than char(8) the COPY will still fail.

PL/pgSQL query in PostgreSQL returns result for new, empty table

I am learning to use triggers in PostgreSQL but run into an issue with this code:
CREATE OR REPLACE FUNCTION checkAdressen() RETURNS TRIGGER AS $$
DECLARE
adrCnt int = 0;
BEGIN
SELECT INTO adrCnt count(*) FROM Adresse
WHERE gehoert_zu = NEW.kundenId;
IF adrCnt < 1 OR adrCnt > 3 THEN
RAISE EXCEPTION 'Customer must have 1 to 3 addresses.';
ELSE
RAISE EXCEPTION 'No exception';
END IF;
END;
$$ LANGUAGE plpgsql;
I create a trigger with this procedure after freshly creating all my tables so they are all empty. However the count(*) function in the above code returns 1.
When I run SELECT count(*) FROM adresse; outside of PL/pgSQL, I get 0.
I tried using the FOUND variable but it is always true.
Even more strangely, when I insert some values into my tables and then delete them again so that they are empty again, the code works as intended and count(*) returns 0.
Also if I leave out the WHERE gehoert_zu = NEW.kundenId, count(*) returns 0 which means I get more results with the WHERE clause than without.
--Edit:
Here is an example of how I use the procedure:
CREATE TABLE kunde (
kundenId int PRIMARY KEY
);
CREATE TABLE adresse (
id int PRIMARY KEY,
gehoert_zu int REFERENCES kunde
);
CREATE CONSTRAINT TRIGGER adressenKonsistenzTrigger AFTER INSERT ON Kunde
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE checkAdressen();
INSERT INTO kunde VALUES (1);
INSERT INTO adresse VALUES (1,1);
It looks like I am getting the DEFERRABLE INITIALLY DEFERRED part wrong. I assumed the trigger would be executed after the first INSERT statement but it happens after the second one, although the inserts are not inside a BEGIN; - COMMIT; - Block.
According to the PostgreSQL Documentation inserts are commited automatically every time if not inside such a block and thus there shouldn't be an entry in adresse when the first INSERT statement is commited.
Can anyone point out my mistake?
--Edit:
The trigger and DEFERRABLE INITIALLY DEFERRED seem to be working all right.
My mistake was to assume that since I am not using a BEGIN-COMMIT-Block each insert would be executed in an own transaction with the trigger being executed afterwards every time.
However even without the BEGIN-COMMIT all inserts get bundled into one transaction and the trigger is executed afterwards.
Given this behaviour, what is the point in using BEGIN-COMMIT?
You need a transaction plus the "DEFERRABLE INITIALLY DEFERRED" because of the chicken and egg problem.
starting with two empty tables:
you cannot insert a single row into the person table, because the it needs at least one address.
you cannot insert a single row into the address table, because the FK constraint needs a corresponding row on the person table to exist
This is why you need to bundle the two inserts into one operation: the transaction. You need the BEGIN+ COMMIT, and the DEFERRABLE allows transient forbidden database states to exists: it causes the check to be evaluated at commit time.
This may seem a bit silly, but the answer is you need to stop deferring the trigger and run it BEFORE the insert. If you run it after the insert, of course there is data in the table.
As far as I can tell this is working as expected.
One further note, you probably dont mean:
RAISE EXCEPTION 'No Exception';
You probably want
RAISE INFO 'No Exception';
Then you can change your settings and run queries in transactions to test that the trigger does what you want it to do. As it is, every insert is going to fail and you have no way to move this into production without editing your procedure.

Postgresql function not working as expected with INSERT INTO

I have function to insert data from one table to another
$BODY$
BEGIN
INSERT INTO backups.calls2 (uid,queue_id,connected,callerid2)
SELECT distinct (c.uid) ,c.queue_id,c.connected,c.callerid2
FROM public.calls c
WHERE c.connected is not null;
RETURN;
EXCEPTION WHEN unique_violation THEN NULL;
END;
$BODY$
And structure of table:
CREATE TABLE backups.nc_calls_id
(
uid character(30) NOT NULL,
queue_id integer,
callerid2 text,
connected timestamp without time zone,
id serial NOT NULL,
CONSTRAINT calls2_pkey PRIMARY KEY (uid)
)
WITH (
OIDS=FALSE
);
When I have first executed this query, everything went ok, 200000 rows was inserted to new table with unique Id.
But now, when I executing it again, no rows are being inserted
From the rather minimalist description given (no PostgreSQL version, no CREATE FUNCTION statement showing params etc, no other table structure, no function invocation) I'm guessing that you're attempting to do a merge, where you insert a row only if it doesn't exist by skipping rows if they already exist.
What the above function will do is skip all rows if any row already exists.
You need to either use a loop to do the insert within individual BEGIN ... EXCEPTION blocks (slow) or LOCK the table and do an INSERT INTO ... SELECT ... FROM newtable WHERE NOT EXISTS (SELECT 1 FROM oldtable where oldtable.key = newtable.key).
The INSERT INTO ... SELECT ... WHERE NOT EXISTS method will perform a lot better but will fail if more than one runs concurrently or if anything else inserts into the destination table at the same time. LOCKing the destination table before running it will make sure it's safe.
The PL/PgSQL looping BEGIN ... EXCEPTION method sounds nice and safe at first glance. Then you think about what happens when you run two of them at once. One will insert some keys first, one will insert other keys first, so they have a split of the values between them. That's OK, together they make up the full set. But what if only one of them commits and the other fails for some reason? You'll have an interesting sparsely inserted result. For that reason it's probably best to lock the destination table if using this approach too ... in which case you might as well use the vastly more efficient single pass INSERT with subquery-based uniqueness violation check.

use trigger to insert into table if data is not already present

I have two tables with the same structure. Table 1 has multiple rows which can have same values. Now i want to insert the same rows into table 2 excluding duplicate rows. I am able to do this normally using 'minus', but i want to write a trigger such that if a new row is inserted into table 1 and is not present in table 2 then insert in table 2 otherwise not. I am new to triggers. The trigger i have written gives me "trigger is mutating" error when i insert in table 1.
INSERT INTO t3(name1,name2,num1,num2) select name1,name2,num1,num2 from t1 group by name1,name2,num1,num2 minus select * from t3
when i write the above code it works fine but when i include this into my trigger it gives error. How do i perform the above with the help of a trigger?
Please help,
Thanks
Pranay
You don't need to requery the table from a row-level trigger. That's what the :NEW. syntax is for, e.g.:
INSERT INTO t3(name1,name2,num1,num2)
select :NEW.name1,:NEW.name2,:NEW.num1,:NEW.num2 from DUAL
minus select name1,name2,num1,num2 from t3;
Although I think the above code looks a bit silly. I'd prefer to put a unique constraint on t3 then add a handler in the trigger to take care of any DUP_VAL_ON_INDEX exceptions.

how to emulate "insert ignore" and "on duplicate key update" (sql merge) with postgresql?

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.
What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?
With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):
INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")
9.5 brings support for "UPSERT" operations.
INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.
...
Further example of new syntax:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!
Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
WHERE EXISTS(SELECT 1 FROM my_table
WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";
Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.
For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;
For those of us who have an earlier version, this right join will work instead:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.
You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.
There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.
That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.
This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)
To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.
INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
('935',' Citroën Brazil','Citroën'),
('ABC', 'Toyota', 'Toyota'),
('ZOM',' OM','OM')
) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
WHERE NOT EXISTS (
--ignore anything that has already been inserted
SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
INSERT INTO mytable(col1,col2)
SELECT 'val1','val2'
WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
As #hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:
query = "INSERT INTO db_table_name(column_name)
VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"
The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table.
I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1.
Example:
q = 'ALTER id_column serial RESTART WITH 1'
If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial.
I hope this information is helpful to everyone.
*I have no college degree in software development/coding. Everything I know in coding, I study on my own.
Looks like PostgreSQL supports a schema object called a rule.
http://www.postgresql.org/docs/current/static/rules-update.html
You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.
I haven't tried this myself, so I can't speak from experience or offer an example.
This solution avoids using rules:
BEGIN
INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION
WHEN unique_violation THEN
UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;
but it has a performance drawback (see PostgreSQL.org):
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.
For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:
DO
$do$
BEGIN
PERFORM id
FROM whatever_table;
IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;