Is there a way to generate some set of rows exactly once on demand in Postgres? - postgresql

The basic problem is that we are managing a significant amount of generated rows, and it is mission critical that this data is generated exactly once and only if necessary. Suppose you have a data relation:
CREATE TABLE sometable (
id SERIAL,
refID INTEGER,
...
);
Now, in some PL/PGSQL function we have:
...
-- Advisory locks didn't help here? :(
IF FALSE = SELECT EXISTS( SELECT 1 FROM sometable WHERE refID = dataID) THEN
-- Generate fixed number of new rows in sometable that reference dataID.
END IF;
...
In short, the rows that should not be generated more than once some times are. As noted, advisory locks of the form PERFORM pg_advisory_lock(dataID) sadly did not help prevent this. Is there any hope?
EDIT: Forgot to mention that I ran into the duplicate data issue when testing with pgbench.
EDIT 2: Incorrect code fix, clarify issue.

Perhaps the simplest solution is just to have a separate processed_ids table with a unique constraint on the id in question. Your function can try to insert to that table and if there is an exception then that ID is already processed.

Related

Generate a unique uuid for each row in a table with Postgres

I have a UUID constraint set up as my Id field in one of my tables, however despite this (I think due to the fact that uuid_generate_v4 only creates on UUID per transaction?) when I imported a load of CSV data into my table, each row in the table was given the same UUID.
I want to be able to change this and give each row a unique UUID, however running
update monitors_nontest set id = uuid_generate_v1()
Again only produces one UUID for each row.
How can I change this command so that each row gets a different UUID?
Just a possibility. How did you actually generate the uuid? The function uuid_generate_v1() actually generates a different value each time it is called. But that is the key each time it is called. It seems if called within a sub select the Postgres optimizer may feel it can bypass the call and use cached result. Try the following.
create table uuid_gen( id1a uuid
, id1b uuid
, num1 integer
);
insert into uuid_gen(num1)
select generate_series(1,50);
update uuid_gen set id1a = uuid_generate_v1();
update uuid_gen set id1b = (select uuid_generate_v1());
select count(distinct id1a), count(distinct id1b) from uuid_gen;
Unfortunately, I could not find a fiddle processor that had the function uuid_generate_v1() available, nor uuid_generate_v4() which worked exactly the same.
I had not exactly the same problem, but something similar.
I was working with pre-Postgres13 version (so I did not have any function ready).
I had a table where I needed to insert new rows into the table while generating a new UUID(v4) for each new row.
I was looking everywhere but couldn't find anything w/o creating a function or installing extensions.
So I made it this way:
INSERT INTO monitors_nontest (id, col1, col2, col3)
SELECT uuid_in(md5(random()::text || random()::text)::cstring), mn.col1, mn.col2, mn.col3
FROM monitors_nontest mn
WHERE mn.col1 = 'some-text'
This could be adjusted for the UPDATE query. I hope it will help somebody else.

PostgreSQL - Left pad value on COPY

I'm bringing in data from Excel into a PostgreSQL Db. There's a lot wrong with this data, but one thing that seems to connect several tables is a customer_id.
However, in the customer table I've a unique char(8) that always has a leading zero. Yes, if it were up to me I'd enforce this data weren't so screwy upstream, but I'm dealing with sales folks here, manufacturing there, financing, etc.
And, the customer id ALMOST matches through these various sources! It is just that the customer_id some data doesn't have the leading zero, so customers.id = '01234567' does represent orders.customer_id = '1234567'.
I'm using COPY command in Postgres, which is a new thing to me. Unfortunately, I cannot define a foreign key relationship on customer.id because of this small discrepancy.
How would I do a COPY and tell the column value to add a leading zero? Is this possible? I'm hoping I can do it right in the COPY statement? Thanks for any insight in how to do this!
EDIT:
A comment lead me to this documentation. I'll update with an answer after I figure this out. Looks like an ON BEFORE INSERT is what I'll need.
CREATE TRIGGER trigger_name
{BEFORE | AFTER} { event }
ON table_name
[FOR [EACH] { ROW | STATEMENT }]
EXECUTE PROCEDURE trigger_function
I'm the original poster and this is the answer to my question. I was bringing in data from XLS to PG and the leading zeros on customer_id(s) were dropped when exporting XLS to CSV for a COPY into PG.
Thanks be to an answer here that really pointed me down the right path: Postgresql insert trigger to set value
-- create table
CREATE TABLE T (customer_id char(8));
-- draft function to be used by trigger. NOTE the double single quotes.
CREATE FUNCTION lpad_8_0 ()
RETURNS trigger AS '
BEGIN
NEW.customer_id := (SELECT LPAD(NEW.customer_id, 8, ''0''));
RETURN NEW;
END' LANGUAGE 'plpgsql';
-- setup on before insert trigger to execute lpad_8_0 function
CREATE TRIGGER my_on_before_insert_trigger
BEFORE INSERT ON T
FOR EACH ROW
EXECUTE PROCEDURE lpad_8_0();
-- some sample inserts
INSERT INTO T
VALUES ('1234'), ('7');
Here's a working fiddle: http://sqlfiddle.com/#!17/a176e/1/0
NOTE: If the value here were larger than char(8) the COPY will still fail.

Postgresql function not working as expected with INSERT INTO

I have function to insert data from one table to another
$BODY$
BEGIN
INSERT INTO backups.calls2 (uid,queue_id,connected,callerid2)
SELECT distinct (c.uid) ,c.queue_id,c.connected,c.callerid2
FROM public.calls c
WHERE c.connected is not null;
RETURN;
EXCEPTION WHEN unique_violation THEN NULL;
END;
$BODY$
And structure of table:
CREATE TABLE backups.nc_calls_id
(
uid character(30) NOT NULL,
queue_id integer,
callerid2 text,
connected timestamp without time zone,
id serial NOT NULL,
CONSTRAINT calls2_pkey PRIMARY KEY (uid)
)
WITH (
OIDS=FALSE
);
When I have first executed this query, everything went ok, 200000 rows was inserted to new table with unique Id.
But now, when I executing it again, no rows are being inserted
From the rather minimalist description given (no PostgreSQL version, no CREATE FUNCTION statement showing params etc, no other table structure, no function invocation) I'm guessing that you're attempting to do a merge, where you insert a row only if it doesn't exist by skipping rows if they already exist.
What the above function will do is skip all rows if any row already exists.
You need to either use a loop to do the insert within individual BEGIN ... EXCEPTION blocks (slow) or LOCK the table and do an INSERT INTO ... SELECT ... FROM newtable WHERE NOT EXISTS (SELECT 1 FROM oldtable where oldtable.key = newtable.key).
The INSERT INTO ... SELECT ... WHERE NOT EXISTS method will perform a lot better but will fail if more than one runs concurrently or if anything else inserts into the destination table at the same time. LOCKing the destination table before running it will make sure it's safe.
The PL/PgSQL looping BEGIN ... EXCEPTION method sounds nice and safe at first glance. Then you think about what happens when you run two of them at once. One will insert some keys first, one will insert other keys first, so they have a split of the values between them. That's OK, together they make up the full set. But what if only one of them commits and the other fails for some reason? You'll have an interesting sparsely inserted result. For that reason it's probably best to lock the destination table if using this approach too ... in which case you might as well use the vastly more efficient single pass INSERT with subquery-based uniqueness violation check.

c# ora-02291 error

This is an odd one...
I have two tables tableA and tableB
tableB has a foreign key in tableA.
I have 2 sprocs, one inserts to tableA, the other to tableB.
using odp.net I run the first sproc, inserting a record into tableA. I can then open SQLPlus and select this record
I then run the second sproc, inserting into tableB.
It fails with "ora-02291-integrity-constraint-violated-parent-key-not-found"
I have double, triple, quadruple checked for typos etc... nothing.
To make things even more odd when I do this same operation manually in SQLPlus, with the same sprocs, it works without a problem.
This is killing me 12+ hours looking for something I know has to be simple.
Here are the sprocs.
SPROCA
CREATE OR REPLACE PROCEDURE genData_TestTrackerSegment
(
INTX_ID IN IntxSegment.IntxID%TYPE,
siteid IN INT
)
AS
BEGIN
INSERT INTO INTXSEGMENT(INTXID,INTXTYPEID,VERSION,ISPRIVATE,
SEGMENTTYPE,STARTDATETIME,INTXDIRECTION,SITEID)
VALUES(INTX_ID,1,1,0,1,SYSDATE,1,siteid);
COMMIT;
END;
SPROCB
CREATE OR REPLACE PROCEDURE genData_TestTrackerPart
(
INTX_ID IN IntxSegment.IntxID%TYPE,
INTX_PART_ID IN INTX_PARTICIPANT.INTX_PART_ID%TYPE,
INDIVID IN INDIVIDUAL.INDIVID%TYPE,
CALLID IN INTX_PARTICIPANT.CALLIDKEY%TYPE
)AS
BEGIN
INSERT INTO
INTX_PARTICIPANT(INTXID,INTX_PART_ID,INDIVID,ROLE,
CALLIDKEY,RECORDED,VERSION,STARTDATETIME)
VALUES(INTX_ID,INTX_PART_ID,INDIVID,1,CALLID,1,1,SYSDATE);
COMMIT;
END;
Yeah I am more than certain - it is without a shadow of a doubt that FKEY.
That being said I fixed this..... This is sooooooo stupid by the way.
I was under the (mistaken) assumption that 'named parameters' in ODP.NET meant I did not have to add these parameters in the same order they are referenced in the stored procedure. Long story short - after I re-wrote this about 4 times I modified the order of the parameters and it is now fixed. –
For me, ORA-02291 error means that one of the FK used is not a valid pk or not an existing pk from the other table.
Just Make sure that each FK inserted in your TABLE is a valid one.

how to emulate "insert ignore" and "on duplicate key update" (sql merge) with postgresql?

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.
What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?
With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):
INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")
9.5 brings support for "UPSERT" operations.
INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.
...
Further example of new syntax:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!
Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
WHERE EXISTS(SELECT 1 FROM my_table
WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";
Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.
For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;
For those of us who have an earlier version, this right join will work instead:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.
You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.
There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.
That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.
This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)
To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.
INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
('935',' Citroën Brazil','Citroën'),
('ABC', 'Toyota', 'Toyota'),
('ZOM',' OM','OM')
) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
WHERE NOT EXISTS (
--ignore anything that has already been inserted
SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
INSERT INTO mytable(col1,col2)
SELECT 'val1','val2'
WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
As #hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:
query = "INSERT INTO db_table_name(column_name)
VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"
The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table.
I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1.
Example:
q = 'ALTER id_column serial RESTART WITH 1'
If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial.
I hope this information is helpful to everyone.
*I have no college degree in software development/coding. Everything I know in coding, I study on my own.
Looks like PostgreSQL supports a schema object called a rule.
http://www.postgresql.org/docs/current/static/rules-update.html
You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.
I haven't tried this myself, so I can't speak from experience or offer an example.
This solution avoids using rules:
BEGIN
INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION
WHEN unique_violation THEN
UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;
but it has a performance drawback (see PostgreSQL.org):
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.
For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:
DO
$do$
BEGIN
PERFORM id
FROM whatever_table;
IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;