Postgresql function not working as expected with INSERT INTO - postgresql

I have function to insert data from one table to another
$BODY$
BEGIN
INSERT INTO backups.calls2 (uid,queue_id,connected,callerid2)
SELECT distinct (c.uid) ,c.queue_id,c.connected,c.callerid2
FROM public.calls c
WHERE c.connected is not null;
RETURN;
EXCEPTION WHEN unique_violation THEN NULL;
END;
$BODY$
And structure of table:
CREATE TABLE backups.nc_calls_id
(
uid character(30) NOT NULL,
queue_id integer,
callerid2 text,
connected timestamp without time zone,
id serial NOT NULL,
CONSTRAINT calls2_pkey PRIMARY KEY (uid)
)
WITH (
OIDS=FALSE
);
When I have first executed this query, everything went ok, 200000 rows was inserted to new table with unique Id.
But now, when I executing it again, no rows are being inserted

From the rather minimalist description given (no PostgreSQL version, no CREATE FUNCTION statement showing params etc, no other table structure, no function invocation) I'm guessing that you're attempting to do a merge, where you insert a row only if it doesn't exist by skipping rows if they already exist.
What the above function will do is skip all rows if any row already exists.
You need to either use a loop to do the insert within individual BEGIN ... EXCEPTION blocks (slow) or LOCK the table and do an INSERT INTO ... SELECT ... FROM newtable WHERE NOT EXISTS (SELECT 1 FROM oldtable where oldtable.key = newtable.key).
The INSERT INTO ... SELECT ... WHERE NOT EXISTS method will perform a lot better but will fail if more than one runs concurrently or if anything else inserts into the destination table at the same time. LOCKing the destination table before running it will make sure it's safe.
The PL/PgSQL looping BEGIN ... EXCEPTION method sounds nice and safe at first glance. Then you think about what happens when you run two of them at once. One will insert some keys first, one will insert other keys first, so they have a split of the values between them. That's OK, together they make up the full set. But what if only one of them commits and the other fails for some reason? You'll have an interesting sparsely inserted result. For that reason it's probably best to lock the destination table if using this approach too ... in which case you might as well use the vastly more efficient single pass INSERT with subquery-based uniqueness violation check.

Related

Accidentally detected all data from a table, insert dummy data to table using loop psql

I had accidentally deleted most of the rows in my Postgres table (data is not important its in my test environment, but I need a dummy data to be insert in to these table).
Let us take three tables,
MAIN_TABLE(main_table_id, main_fields)
ADDRESS_TABLE(address_table_id, main_table_id, address_type, other_fielsds)
CHAID_TABLE(chaid_table_id,main_table_id, shipping_address_id, chaild_fields)
I had accidentally deleted most of the data from ADDRESS_TABLE.
ADDRESS_TABLE has a foreign key from MAIN_TABLE ,i.e. main_table_id. for each row in MAIN_TABLE there is two entries in ADDRESS_TABLE, in which one entry is its address_type is "billing/default" and other entry is for address_type "shipping".
CHAID_TABLE has two foreign keys one from MAIN_TABLE, i.e. main_table_id and other from ADDRESS_TABLE i.e., shipping_address_id. this shipping_address_id is address id of ADDRESS_TABLE, its address_type is shipping and ADDRESS_TABLE.main_table_id = CHAID_TABLE.main_table_id.
These are the things that I needed.
I need to create two dummy address entries for each raw in MAIN_TABLE one is of address type "billing/default" and other is of type "shipping".
I need to insert address_table_id to the CHAID_TABLE whose ADDRESS_TABLE.main_table_id = CHAID_TABLE.main_table_id. and addres_type ="shipping"
if first is done I know how to insert second, because it is a simple insert query. I guess.
it can be done like,
UPDATE CHAID_TABLE
SET shipping_address_id = ADDRESS_TABLE.address_table_id
FROM ADDRESS_TABLE
WHERE ADDRESS_TABLE.main_table_id = CHAID_TABLE.main_table_id
AND ADDRESS_TABLE.addres_type ='shipping';
for doing first one i can use loop in psql, ie loop through all the entries in MAIN_TABLE and insert two dummy rows for each rows. But I don't know how to do these please help me to solve this.
I hope your solution is this, Create a function that loop through all the rows in MAIN_TABLE, inside the loop do the action you want, here two insert statement, one issue of this solution is you have same data in all address.
CREATE OR REPLACE FUNCTION get_all_MAIN_TABLE () RETURNS SETOF MAIN_TABLE AS
$BODY$
DECLARE
r MAIN_TABLE %rowtype;
BEGIN
FOR r IN
SELECT * FROM MAIN_TABLE
LOOP
-- can do some processing here
INSERT INTO ADDRESS_TABLE ( main_table_id, address_type, other_fielsds)
VALUES('shipping', r.main_table_id,'NAME','','other_fielsds');
INSERT INTO ADDRESS_TABLE ( main_table_id, address_type, other_fielsds)
VALUES('billing/default',r.main_table_id,'NAME','','other_fielsds');
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql;
SELECT * FROM get_all_MAIN_TABLE ();

Is it safe to insert data from inside of a CTE expression?

Background
I have a function that generates detail rows and inserts these into a detail table. I want to automatically ensure that the referenced master row is available before inserting the detail rows.
A BEFORE INSERT trigger does the job, but unfortunately the job is done too well. If a detail row is prevented due to a unique index, the master row will still be inserted, leaving me with a master without children (I don't want that).
I managed to solve this by inserting the master rows inside of a cte and then inserting the detail rows in the actual query. This works, but I'm worried that it's not a safe way of doing it.
INSERT rows from inside of a CTE expression
Here is the code to try this out. The input_cte is just generating static dummy data in the example. This particular example could be built with separate static insert SQLs, but in my real case the input_cte is dynamic.
Is this a safe way to solve this, or is it out of spec and could blow up in the next PG version?
CREATE TABLE master (
id INT NOT NULL GENERATED BY DEFAULT AS IDENTITY,
PRIMARY KEY (id)
);
CREATE TABLE detail (
id INT NOT NULL GENERATED BY DEFAULT AS IDENTITY,
master_id INT,
PRIMARY KEY (id, master_id)
);
ALTER TABLE detail
ADD CONSTRAINT detail_master_id_fkey
FOREIGN KEY (master_id)
REFERENCES master (id);
WITH input_cte AS (
SELECT 1 AS master_id, 1 AS detail_id
UNION SELECT 1, 2
UNION SELECT 2, 1
),
insert_cte AS (
--Ignore conflicts, as the master row could already exist
INSERT INTO master (id)
SELECT DISTINCT master_id
FROM input_cte
ON CONFLICT DO NOTHING
)
INSERT INTO detail (id, master_id)
SELECT detail_id, master_id
FROM input_cte;
SELECT * FROM master;
SELECT * FROM detail;
Edit
Bummer.. I just realised that this method will also insert master rows even if the detail rows would be stopped by my unique index.
I see two options. Which should I choose?
Check exactly what I can insert before trying to insert. I.e. do the same thing as my unique index does.
Go ahead with the above solution or the BEFORE INSERT trigger concept and then clean up unused master rows afterwards with a separate DELETE query.
Edit 2 as a reponse to Haleemur Ali's comment
I agree with Haleemur, but in my case it's a bit more complicated than the small example I created. The unique key on my detail can actually have null values. The index in my detail table (project_sequence) looks like this to enable null values:
CREATE UNIQUE INDEX
project_sequence_unique_combinations ON main.project_sequence
(project_id, controlpoint_type_id, COALESCE(drawing_id, 0), COALESCE(layer_guid, '00000000-0000-0000-0000-000000000000'));
Due to possible NULL values I cannot use these fields in my primary key, so I have a surrogate integer key. I'm calculating these key values in my CTE so they will always be unique. I.e. they can be inserted into the master table (sequence) even if the detail rows gets stopped by the unique index.*
To clarify I've inserted my actual code below. This code works as it should, but it sure would be nice to be able to utilize the new fancy DEFERRED and REFERENCES triggers.
SELECT COALESCE(MAX(id), 0)
INTO _max_sequence_id
FROM main.sequence;
WITH cte AS (
SELECT
d.project_id,
pct.controlpoint_type_id,
d.id as drawing_id,
DENSE_RANK() OVER(ORDER BY d.id, sequence_group_key) + _max_sequence_id + 1 AS new_sequence_id
FROM main.drawing d
CROSS JOIN main.project_controlpoint_type pct
--This left JOIN along with "ps.project_id IS NULL" is my
--current solution, i.e. its "option 1" from above.
LEFT JOIN main.project_sequence ps
ON ps.project_id = d.project_id
AND ps.drawing_id = d.id
AND ps.controlpoint_type_id = pct.controlpoint_type_id
WHERE d.project_id = _project_id
AND pct.project_id = _project_id
AND pct.sequence_level_id = 2
AND ps.project_id IS NULL
),
insert_sequence_cte AS (
INSERT INTO main.sequence
(id, project_id, last_value)
SELECT DISTINCT cte.new_sequence_id, cte.project_id, 0
FROM cte
ON CONFLICT DO NOTHING
)
INSERT INTO main.project_sequence
(project_id, controlpoint_type_id, drawing_id, sequence_id)
SELECT
project_id,
controlpoint_type_id,
drawing_id,
new_sequence_id
FROM cte;
The manual page on WITH queries states that your use case is legitimate and supported:
You can use data-modifying statements (INSERT, UPDATE, or DELETE) in WITH.
and
... data-modifying statements are only allowed in WITH clauses that are attached to the top-level statement. However, normal WITH visibility rules apply, so it is possible to refer to the WITH statement's output from the sub-SELECT.
Further:
If a data-modifying statement in WITH lacks a RETURNING clause, then it forms no temporary table and cannot be referred to in the rest of the query. Such a statement will be executed nonetheless.
If the rule "no master without a detail" is important to your model, you should let the DB enforce it. This will free you from "auto-inserting" master to reduce the potential for error and let you use conventional methods again.
Take a look at CONSTRAINT TRIGGERS. They allow you to just detect and reject violations of your no-master-without-detail rule while leaving the actual INSERTs to application code.
Your use case would need a CONSTRAINT TRIGGER on your master table that is DEFERRABLE INITIALLY DEFERRED. This allows you to INSERT master, then INSERT detail and still be sure that the transaction only commits if all is consistent.
From the manual linked above:
Constraint triggers must be AFTER ROW triggers on plain tables (not foreign tables). They can be fired either at the end of the statement causing the triggering event, or at the end of the containing transaction; in the latter case they are said to be deferred.
You'll need two triggers, one handling INSERT/UPDATE on master and another one handling DELETE/UPDATE on detail:
CREATE CONSTRAINT TRIGGER trigger_assert_master_has_detail
AFTER INSERT OR UPDATE OF id ON master
DEFERRABLE INITIALLY DEFERRED FOR EACH ROW
EXECUTE PROCEDURE assert_master_has_detail();
CREATE CONSTRAINT TRIGGER trigger_assert_no_leftover_master
AFTER DELETE OR UPDATE OF master_id ON detail
DEFERRABLE INITIALLY DEFERRED FOR EACH ROW
EXECUTE PROCEDURE assert_no_leftover_master();
Note that UPDATEs will only fire the trigger if they concern the FK/PK column.
The two trigger functions will then check if there are 1-n details for the master:
CREATE FUNCTION assert_master_has_detail() RETURNS trigger
AS
$$
BEGIN
IF NOT EXISTS (SELECT 1 FROM detail WHERE master_id = NEW.id)
THEN
RAISE EXCEPTION 'no detail for master_id=%', NEW.id;
ELSE
RETURN NEW;
END IF;
END;
$$
LANGUAGE plpgsql;
CREATE FUNCTION assert_no_leftover_master() RETURNS trigger
AS
$$
BEGIN
IF NOT EXISTS (SELECT 1 FROM detail WHERE master_id = OLD.master_id)
AND EXISTS (SELECT 1 FROM master WHERE id = OLD.master_id)
THEN
RAISE EXCEPTION 'last detail for master_id=% removed, but master still exists', OLD.master_id;
ELSE
RETURN NULL;
END IF;
END;
$$
LANGUAGE plpgsql;
Example of a violation:
INSERT INTO master
VALUES (1);
-- ERROR: no detail for master_id=1
-- CONTEXT: PL/pgSQL function assert_master_has_detail() line 5 at RAISE
and a legal scenario:
INSERT INTO master
VALUES (1);
INSERT INTO detail
VALUES (10, 1);
-- trigger fired at end of transaction, finds everything is OK
Here's a complete solution as dbfiddle.

PL/pgSQL query in PostgreSQL returns result for new, empty table

I am learning to use triggers in PostgreSQL but run into an issue with this code:
CREATE OR REPLACE FUNCTION checkAdressen() RETURNS TRIGGER AS $$
DECLARE
adrCnt int = 0;
BEGIN
SELECT INTO adrCnt count(*) FROM Adresse
WHERE gehoert_zu = NEW.kundenId;
IF adrCnt < 1 OR adrCnt > 3 THEN
RAISE EXCEPTION 'Customer must have 1 to 3 addresses.';
ELSE
RAISE EXCEPTION 'No exception';
END IF;
END;
$$ LANGUAGE plpgsql;
I create a trigger with this procedure after freshly creating all my tables so they are all empty. However the count(*) function in the above code returns 1.
When I run SELECT count(*) FROM adresse; outside of PL/pgSQL, I get 0.
I tried using the FOUND variable but it is always true.
Even more strangely, when I insert some values into my tables and then delete them again so that they are empty again, the code works as intended and count(*) returns 0.
Also if I leave out the WHERE gehoert_zu = NEW.kundenId, count(*) returns 0 which means I get more results with the WHERE clause than without.
--Edit:
Here is an example of how I use the procedure:
CREATE TABLE kunde (
kundenId int PRIMARY KEY
);
CREATE TABLE adresse (
id int PRIMARY KEY,
gehoert_zu int REFERENCES kunde
);
CREATE CONSTRAINT TRIGGER adressenKonsistenzTrigger AFTER INSERT ON Kunde
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE checkAdressen();
INSERT INTO kunde VALUES (1);
INSERT INTO adresse VALUES (1,1);
It looks like I am getting the DEFERRABLE INITIALLY DEFERRED part wrong. I assumed the trigger would be executed after the first INSERT statement but it happens after the second one, although the inserts are not inside a BEGIN; - COMMIT; - Block.
According to the PostgreSQL Documentation inserts are commited automatically every time if not inside such a block and thus there shouldn't be an entry in adresse when the first INSERT statement is commited.
Can anyone point out my mistake?
--Edit:
The trigger and DEFERRABLE INITIALLY DEFERRED seem to be working all right.
My mistake was to assume that since I am not using a BEGIN-COMMIT-Block each insert would be executed in an own transaction with the trigger being executed afterwards every time.
However even without the BEGIN-COMMIT all inserts get bundled into one transaction and the trigger is executed afterwards.
Given this behaviour, what is the point in using BEGIN-COMMIT?
You need a transaction plus the "DEFERRABLE INITIALLY DEFERRED" because of the chicken and egg problem.
starting with two empty tables:
you cannot insert a single row into the person table, because the it needs at least one address.
you cannot insert a single row into the address table, because the FK constraint needs a corresponding row on the person table to exist
This is why you need to bundle the two inserts into one operation: the transaction. You need the BEGIN+ COMMIT, and the DEFERRABLE allows transient forbidden database states to exists: it causes the check to be evaluated at commit time.
This may seem a bit silly, but the answer is you need to stop deferring the trigger and run it BEFORE the insert. If you run it after the insert, of course there is data in the table.
As far as I can tell this is working as expected.
One further note, you probably dont mean:
RAISE EXCEPTION 'No Exception';
You probably want
RAISE INFO 'No Exception';
Then you can change your settings and run queries in transactions to test that the trigger does what you want it to do. As it is, every insert is going to fail and you have no way to move this into production without editing your procedure.

PostgreSQL, triggers, and concurrency to enforce a temporal key

I want to define a trigger in PostgreSQL to check that the inserted row, on a generic table, has the the property: "no other row exists with the same key in the same valid time" (the keys are sequenced keys). In fact, I has already implemented it. But since the trigger has to scan the entire table, now i'm wondering: is there a need for a table-level lock? Or this is managed someway by the PostgreSQL itself?
Here is an example.
In the upcoming PostgreSQL 9.0 I would have defined the table in this way:
CREATE TABLE medicinal_products
(
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
market_time PERIOD,
EXCLUDE USING gist
(aic_code CHECK WITH =,
market_time CHECK WITH &&)
);
but in fact I have been defined it like this:
CREATE TABLE medicinal_products
(
PRIMARY KEY (aic_code, vs),
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
vs DATE NOT NULL,
ve DATE,
CONSTRAINT valid_time_range
CHECK (ve > vs OR ve IS NULL)
);
Then, I have written a trigger that check the costraint: "two distinct medicinal products can have the same code in two different periods, but not in same time".
So the code:
INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
return an error.
One solution is to have a second table to use for detecting clashes, and populate that with a trigger. Using the schema you added into the question:
CREATE TABLE medicinal_product_date_map(
aic_code char(9) NOT NULL,
applicable_date date NOT NULL,
UNIQUE(aic_code, applicable_date));
(note: this is the second attempt due to misreading your requirement the first time round. hope it's right this time).
Some functions to maintain this table:
CREATE FUNCTION add_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
INSERT INTO medicinal_product_date_map
SELECT $1, $2 + offset
FROM generate_series(0, $3 - $2)
$$;
CREATE FUNCTION clr_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
DELETE FROM medicinal_product_date_map
WHERE aic_code = $1 AND applicable_date BETWEEN $2 AND $3
$$;
And populate the table first time with:
SELECT count(add_medicinal_product_date_range(aic_code, vs, ve))
FROM medicinal_products;
Now create triggers to populate the date map after changes to medicinal_products: after insert calls add_, after update calls clr_ (old values) and add_ (new values), after delete calls clr_.
CREATE FUNCTION sync_medicinal_product_date_map()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
IF TG_OP = 'UPDATE' OR TG_OP = 'DELETE' THEN
PERFORM clr_medicinal_product_date_range(OLD.aic_code, OLD.vs, OLD.ve);
END IF;
IF TG_OP = 'UPDATE' OR TG_OP = 'INSERT' THEN
PERFORM add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve);
END IF;
RETURN NULL;
END;
$$;
CREATE TRIGGER sync_date_map
AFTER INSERT OR UPDATE OR DELETE ON medicinal_products
FOR EACH ROW EXECUTE PROCEDURE sync_medicinal_product_date_map();
The uniqueness constraint on medicinal_product_date_map will trap any products being added with the same code on the same day:
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT 0 1
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
ERROR: duplicate key value violates unique constraint "medicinal_product_date_map_aic_code_applicable_date_key"
DETAIL: Key (aic_code, applicable_date)=(1 , 2010-03-01) already exists.
CONTEXT: SQL function "add_medicinal_product_date_range" statement 1
SQL statement "SELECT add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve)"
PL/pgSQL function "sync_medicinal_product_date_map" line 6 at PERFORM
This depends on the values being checked for having a discrete space- which is why I asked about dates vs timestamps. Although timestamps are, technically, discrete since Postgresql only stores microsecond-resolution, adding an entry to the map table for every microsecond the product is applicable for is not practical.
Having said that, you could probably also get away with something better than a full-table scan to check for overlapping timestamp intervals, with some trickery on looking for only the first interval not after or not before... however, for easy discrete spaces I prefer this approach which IME can also be handy for other things too (e.g. reports that need to quickly find which products are applicable on a certain day).
I also like this approach because it feels right to leverage the database's uniqueness-constraint mechanism this way. Also, I feel it will be more reliable in the context of concurrent updates to the master table: without locking the table against concurrent updates, it would be possible for a validation trigger to see no conflict and allow inserts in two concurrent sessions, that are then seen to conflict when both transaction's effects are visible.
Just a thought, in case the valid time blocks could be coded with a number or something, creating a UNIQUE index on Id+TimeBlock would be blazingly fast and resolve all table lock problems.
It is managed by PostgreSQL itself. On a select it acquires an ACCESS_SHARE lock which means that you can query the table but do not perform updates.
A radical solution which might help you is to use a cache like ehcache or memcached to store the id/timeblock info and not use the postgresql at all. Many can be persisted so they would survive a server restart and they do not exhibit this locking behavior.
Why can't you use a UNIQUE constraint? Will be much faster (it's an index) and easier.

how to emulate "insert ignore" and "on duplicate key update" (sql merge) with postgresql?

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.
What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?
With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):
INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")
9.5 brings support for "UPSERT" operations.
INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.
...
Further example of new syntax:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!
Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
WHERE EXISTS(SELECT 1 FROM my_table
WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";
Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.
For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;
For those of us who have an earlier version, this right join will work instead:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.
You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.
There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.
That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.
This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)
To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.
INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
('935',' Citroën Brazil','Citroën'),
('ABC', 'Toyota', 'Toyota'),
('ZOM',' OM','OM')
) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
WHERE NOT EXISTS (
--ignore anything that has already been inserted
SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
INSERT INTO mytable(col1,col2)
SELECT 'val1','val2'
WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
As #hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:
query = "INSERT INTO db_table_name(column_name)
VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"
The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table.
I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1.
Example:
q = 'ALTER id_column serial RESTART WITH 1'
If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial.
I hope this information is helpful to everyone.
*I have no college degree in software development/coding. Everything I know in coding, I study on my own.
Looks like PostgreSQL supports a schema object called a rule.
http://www.postgresql.org/docs/current/static/rules-update.html
You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.
I haven't tried this myself, so I can't speak from experience or offer an example.
This solution avoids using rules:
BEGIN
INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION
WHEN unique_violation THEN
UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;
but it has a performance drawback (see PostgreSQL.org):
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.
For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:
DO
$do$
BEGIN
PERFORM id
FROM whatever_table;
IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;