Postgresql, update if row with some unique value exists, else insert - postgresql

I have a URLs table. They contain
(id int primary key,
url character varying unique,
content character varying,
last analyzed date).
I want to create trigger or something(rule may be), so each time i make insert from my java program, it updates some single row if row with such URL exists. Else it should perform an Insert.
Please, can you provide a complete code in Postgresql. Thanks.

This has been asked many times. A possible solution can be found here:
https://stackoverflow.com/a/6527838/552671
This solution requires both an UPDATE and INSERT.
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
With Postgres 9.1 it is possible to do it with one query:
https://stackoverflow.com/a/1109198/2873507

If INSERTS are rare, I would avoid doing a NOT EXISTS (...) since it emits a SELECT on all updates. Instead, take a look at wildpeaks answer: https://dba.stackexchange.com/questions/5815/how-can-i-insert-if-key-not-exist-with-postgresql
CREATE OR REPLACE FUNCTION upsert_tableName(arg1 type, arg2 type) RETURNS VOID AS $$
DECLARE
BEGIN
UPDATE tableName SET col1 = value WHERE colX = arg1 and colY = arg2;
IF NOT FOUND THEN
INSERT INTO tableName values (value, arg1, arg2);
END IF;
END;
$$ LANGUAGE 'plpgsql';
This way Postgres will initially try to do a UPDATE. If no rows was affected, it will fall back to emitting an INSERT.

I found this post more relevant in this scenario:
WITH upsert AS (
UPDATE spider_count SET tally=tally+1
WHERE date='today' AND spider='Googlebot'
RETURNING *
)
INSERT INTO spider_count (spider, tally)
SELECT 'Googlebot', 1
WHERE NOT EXISTS (SELECT * FROM upsert)

Firstly It tries insert. If there is a conflict on url column then it updates content and last_analyzed fields. If updates are rare this might be better option.
INSERT INTO URLs (url, content, last_analyzed)
VALUES
(
%(url)s,
%(content)s,
NOW()
)
ON CONFLICT (url)
DO
UPDATE
SET content=%(content)s, last_analyzed = NOW();

create table urls (
url_id serial primary key,
url text unique,
content text,
last_analyzed timestamptz);
insert into urls(url) values('hello'),
('How'),('are'),
('you'),('doing');
By creating procedure, you also also do upsert.
CREATE OR REPLACE PROCEDURE upsert_url(_url text) LANGUAGE plpgsql
as $$
BEGIN
INSERT INTO URLs (url) values (_url)
ON CONFLICT (url)
DO UPDATE SET last_analyzed = NOW();
END
$$;
Test it through call the procedure.
call upsert_url('I am is ok');
call upsert_url('hello');

Related

Return the value changed by an update without a trigger

Postgres has a great RETURNING clause for INSERT, DELETE and UPDATE...and it's made me a bit greedy. In a few cases, what I'd like to get is not only the current value, but the previous value:
UPDATE analytic_productivity
SET points = 1000
WHERE points > 1000
RETURNING id,
points,
OLD.points;
I don't believe there's any way to access previous values outside of the lifespan and context of a trigger. So, I'll guess what I'd like isn't possible as such. If that's right, can anyone suggest an alternative? I'm overwriting outliers with some set values, and would like to record the modified values in another table. This is why I don't know the current value in advance. This is a rare (and clearly suspect) operation, and I don't want to record the change on normal inserts and updates.
As an alternative, I'm thinking that I can select the outliers, revise them, and then write back the modifications. So, do most of the work on the client side with a couple of requests to Postgres. If so, can someone suggest the right locking level to apply between my initial SELECT and my following UPDATE? I believe that the FOR UPDATE lock is right.
Any suggestions on a smart way to capture previous values, during an update, without a trigger would be great to hear about.
Follow-up
Thanks to comments here, I experimented a bit and came up with a solution that works in my case. To make my objectives clearer:
I've got a table named outlier_rule that defines values that are too high for a specific column.
The goal is to loop over the table, and apply the rules to set outliers to a fixed value.
Stomping on outliers like this is...questionable. There must be leaks in the app's UI that allow for unreasonable values. To help track these down, I'm recording the large values in a table named outlier_change.
I'd like to push this behavior into server-side function so that any of our servers, regardless of their codebase version, can invoke the current logic.
The client servers compose and send an email with a result summary, when outliers are found and corrected.
So, a server-side function to do everything, log some data, and return a result. I've got that working, but it's got the smell of You Don't Know What You're Doing So Just Keep Adding Code Until it Works. I've at least got a better handle on using FORMAT and think I understand now that a single function can do many things, and that you can choose what to return with the RETURN clause. For reference, the various bits of code:
CREATE TABLE IF NOT EXISTS data.outlier_rule (
id uuid NOT NULL DEFAULT extensions.gen_random_uuid(),
schema_name text NOT NULL DEFAULT NULL,
table_name text NOT NULL DEFAULT NULL,
column_name text NOT NULL DEFAULT NULL,
threshold integer,
set_to integer,
CONSTRAINT outlier_rule_id_pkey
PRIMARY KEY (schema_name,table_name,column_name)
);
For tracking the modifications, I've got a second table named outlier_change:
------------------------------
-- Table
------------------------------
DROP TABLE IF EXISTS data.outlier_change CASCADE;
CREATE TABLE IF NOT EXISTS data.outlier_change (
id uuid NOT NULL DEFAULT NULL,
outlier_rule_id uuid NOT NULL DEFAULT NULL,
value_was integer NOT NULL DEFAULT NULL,
set_to integer NOT NULL DEFAULT NULL,
change_count integer NOT NULL DEFAULT 0,
last_changed_dts timestamptz NOT NULL DEFAULT NOW(),
CONSTRAINT outlier_change_id_pkey
PRIMARY KEY (id,outlier_rule_id)
);
ALTER TABLE data.outlier_change OWNER TO user_change_structure;
------------------------------
-- Trigger Function
------------------------------
CREATE OR REPLACE FUNCTION data.on_outlier_change_upsert()
RETURNS pg_catalog.trigger AS $BODY$
BEGIN
NEW.last_changed_dts := NOW();
NEW.change_count := OLD.change_count + 1;
RETURN NEW; -- important!
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
------------------------------
-- Trigger
------------------------------
CREATE TRIGGER outlier_change_upsert BEFORE INSERT OR UPDATE ON data.outlier_change
FOR EACH ROW
EXECUTE PROCEDURE data.on_outlier_change_upsert();
DROP FUNCTION IF EXISTS data.outlier_fix ();
CREATE OR REPLACE FUNCTION data.outlier_fix ()
RETURNS TABLE (
schema_name text,
table_name text,
column_name text,
id uuid,
value_was integer,
set_to integer,
change_count integer
)
AS $$
DECLARE
rule record;
now_ timestamptz = NOW();
BEGIN
FOR rule IN SELECT * FROM data.outlier_rule LOOP
EXECUTE FORMAT (
'INSERT INTO outlier_change (
outlier_rule_id,
set_to,
id,
value_was)
SELECT %6$L,
%5$s,
%2$I.id,
%2$I.%3$I
FROM %1$I.%2$I
WHERE %3$I > %4$s
ON CONFLICT(id,outlier_rule_id) DO UPDATE SET
value_was = EXCLUDED.value_was,
set_to = EXCLUDED.set_to
RETURNING outlier_rule_id,
id,
value_was,
set_to
change_count;
UPDATE %1$I.%2$I
SET %3$I = %5$s
WHERE %3$I > %4$s;',
rule.schema_name,
rule.table_name,
rule.column_name,
rule.threshold,
rule.set_to,
rule.id);
END LOOP;
RETURN QUERY EXECUTE ('
SELECT outlier_rule.schema_name,
outlier_rule.table_name,
outlier_rule.column_name,
outlier_change.id,
outlier_change.value_was,
outlier_change.set_to,
outlier_change.change_count
FROM outlier_change
JOIN outlier_rule ON (outlier_rule.id = outlier_change.outlier_rule_id)
WHERE last_changed_dts = $1')
USING now_;
END;
$$ LANGUAGE plpgsql;
ALTER FUNCTION data.outlier_fix() OWNER TO user_bender;
You could achieve that with a bit of a hack. You can self join the table in your update query like this:
UPDATE analytic_productivity NEW
SET points = 1000
FROM analytic_productivity OLD
WHERE NEW.points > 1000
and NEW.id = OLD.id
RETURNING NEW.id,
NEW.points,
OLD.points as old_points;

How to use variable settings in trigger functions?

I would like to record the id of a user in the session/transaction, using SET, so I could be able to access it later in a trigger function, using current_setting. Basically, I'm trying option n2 from a very similar ticket posted previously, with the difference that I'm using PG 10.1 .
I've been trying 3 approaches to setting the variable:
SET local myvars.user_id = 4, thereby setting it locally in the transaction;
SET myvars.user_id = 4, thereby setting it in the session;
SELECT set_config('myvars.user_id', '4', false), which depending of the last argument, will be a shortcut for the previous 2 options.
None of them is usable in the trigger, which receives NULL when getting the variable through current_setting. Here is a script I've devised to troubleshoot it (can be easily used with the postgres docker image):
database=$POSTGRES_DB
user=$POSTGRES_USER
[ -z "$user" ] && user="postgres"
psql -v ON_ERROR_STOP=1 --username "$user" $database <<-EOSQL
DROP TRIGGER IF EXISTS add_transition1 ON houses;
CREATE TABLE IF NOT EXISTS houses (
id SERIAL NOT NULL,
name VARCHAR(80),
created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
PRIMARY KEY(id)
);
CREATE TABLE IF NOT EXISTS transitions1 (
id SERIAL NOT NULL,
house_id INTEGER,
user_id INTEGER,
created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
PRIMARY KEY(id),
FOREIGN KEY(house_id) REFERENCES houses (id) ON DELETE CASCADE
);
CREATE OR REPLACE FUNCTION add_transition1() RETURNS TRIGGER AS \$\$
DECLARE
user_id integer;
BEGIN
user_id := current_setting('myvars.user_id')::integer || NULL;
INSERT INTO transitions1 (user_id, house_id) VALUES (user_id, NEW.id);
RETURN NULL;
END;
\$\$ LANGUAGE plpgsql;
CREATE TRIGGER add_transition1 AFTER INSERT OR UPDATE ON houses FOR EACH ROW EXECUTE PROCEDURE add_transition1();
BEGIN;
%1% SELECT current_setting('myvars.user_id');
%2% SELECT set_config('myvars.user_id', '55', false);
%3% SELECT current_setting('myvars.user_id');
INSERT INTO houses (name) VALUES ('HOUSE PARTY') RETURNING houses.id;
SELECT * from houses;
SELECT * from transitions1;
COMMIT;
DROP TRIGGER IF EXISTS add_transition1 ON houses;
DROP FUNCTION IF EXISTS add_transition1;
DROP TABLE transitions1;
DROP TABLE houses;
EOSQL
The conclusion I came to was that the function is triggered in a different transaction and a different (?) session. Is this something that one can configure, so that all happens within the same context?
Handle all possible cases for the customized option properly:
option not set yet
All references to it raise an exception, including current_setting() unless called with the second parameter missing_ok. The manual:
If there is no setting named setting_name, current_setting throws an error unless missing_ok is supplied and is true.
option set to a valid integer literal
option set to an invalid integer literal
option reset (which burns down to a special case of 3.)
For instance, if you set a customized option with SET LOCAL or set_config('myvars.user_id3', '55', true), the option value is reset at the end of the transaction. It still exists, can be referenced, but it returns an empty string now ('') - which cannot be cast to integer.
Obvious mistakes in your demo aside, you need to prepare for all 4 cases. So:
CREATE OR REPLACE FUNCTION add_transition1()
RETURNS trigger AS
$func$
DECLARE
_user_id text := current_setting('myvars.user_id', true); -- see 1.
BEGIN
IF _user_id ~ '^\d+$' THEN -- one or more digits?
INSERT INTO transitions1 (user_id, house_id)
VALUES (_user_id::int, NEW.id); -- valid int, cast is safe
ELSE
INSERT INTO transitions1 (user_id, house_id)
VALUES (NULL, NEW.id); -- use NULL instead
RAISE WARNING 'Invalid user_id % for house_id % was reset to NULL!'
, quote_literal(_user_id), NEW.id; -- optional
END IF;
RETURN NULL; -- OK for AFTER trigger
END
$func$ LANGUAGE plpgsql;
db<>fiddle here
Notes:
Avoid variable names that match column names. Very error prone. One popular naming convention is to prepend variable names with an underscore: _user_id.
Assign at declaration time to save one assignment. Note the data type text. We'll cast later, after sorting out invalid input.
Avoid raising / trapping an exception if possible. The manual:
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
Test for valid integer strings. This simple regular expression allows only digits (no leading sign, no white space): _user_id ~ '^\d+$'. I reset to NULL for any invalid input. Adapt to your needs.
I added an optional WARNING for your debugging convenience.
Cases 3. and 4. only arise because customized options are string literals (type text), valid data types cannot be enforced automatically.
Related:
User defined variables in PostgreSQL
Is there a way to define a named constant in a PostgreSQL query?
All that aside, there may be more elegant solutions for what you are trying to do without customized options, depending on your exact requirements. Maybe this:
Fastest way to get current user's OID in Postgres?
It is not clear why you are trying to concat NULL to user_id but it is obviously the cause of the problem. Get rid of it:
CREATE OR REPLACE FUNCTION add_transition1() RETURNS TRIGGER AS $$
DECLARE
user_id integer;
BEGIN
user_id := current_setting('myvars.user_id')::integer;
INSERT INTO transitions1 (user_id, house_id) VALUES (user_id, NEW.id);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Note that
SELECT 55 || NULL
always gives NULL.
You can catch the exception when the value doesn't exist - here's the changes I made to get this to work:
CREATE OR REPLACE FUNCTION add_transition1() RETURNS TRIGGER AS $$
DECLARE
user_id integer;
BEGIN
BEGIN
user_id := current_setting('myvars.user_id')::integer;
EXCEPTION WHEN OTHERS THEN
user_id := 0;
END;
INSERT INTO transitions1 (user_id, house_id) VALUES (user_id, NEW.id);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION insert_house() RETURNS void as $$
DECLARE
user_id integer;
BEGIN
PERFORM set_config('myvars.user_id', '55', false);
INSERT INTO houses (name) VALUES ('HOUSE PARTY');
END; $$ LANGUAGE plpgsql;

postgres trigger creates index: BEFORE INSERT ON hides one row

I have a trigger AFTER INSERT ON mytable that calls a function
CREATE OR REPLACE FUNCTION myfunction() RETURNS trigger AS
$BODY$
DECLARE
index TEXT;
BEGIN
index := 'myIndex_' || NEW.id2::text;
IF to_regclass(index::cstring) IS NULL THEN
EXECUTE 'CREATE INDEX ' || index || ' ON mytable(id) WITH (FILLFACTOR=100) WHERE id2=' || NEW.id2|| ';';
RAISE NOTICE 'Created new index %',index;
END IF;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
SECURITY DEFINER
COST 100;
ALTER FUNCTION myfunction()
OWNER TO theadmin;
This works wonderfully. For each distinct id2 I create an index. Speeds up relevant queries by a lot.
As mentioned above I trigger this AFTER INSERT ON. Before doing that however I had the trigger set to BEFORE INSERT ON. And the function did some strange things. (Yes, I had changed the RETURN NULL to RETURN NEW)
insert of a new row insert into mytable VALUES(1391, 868, 0.5, 0.5);
creates the corresponding index myIndex_868
the inserted row does not appear in mytable when doing a select :(
trying to insert the same row results in ERROR: duplicate key value violates unique constraint "mytable_pkey" because of course DETAIL: Key (id, id2)=(1391, 868) already exists.
inserting other rows for the same id2 works as expected :)
DELETE FROM mytable WHERE id = 1391 and id2 = 868 does nothing
DROP INDEX myIndex_868; drops the index. And suddenly the initial row that never appeared in the table is suddenly there!
Why does BEFORE INSERT ON behave so differently? Is this a bug in postgres 9.4 or did I overlook something?
Just for completeness' sake:
CREATE TRIGGER mytrigger
AFTER INSERT ON mytable
FOR EACH ROW EXECUTE PROCEDURE myfunction();
vs.
CREATE TRIGGER mytrigger
BEFORE INSERT ON mytable
FOR EACH ROW EXECUTE PROCEDURE myfunction();
I'd argue that this is a bug in PostgreSQL. I could reproduce it with 9.6.
It is clear that the row is not contained in the index as it is created in the BEFORE trigger, but the fact that the index is not updated when the row is inserted is a bug in my opinion.
I have written to pgsql-hackers to ask for an opinion.
But apart from that, I don't see the point of the whole exercise.
Better than creating a gazillion indexes would be to create a single one:
CREATE INDEX ON mytable(id2, id);

Postgresql function: get id of updated or inserted row

I have this function in my postgresql database that update row if exist or insert new one if it doesn't exist:
CREATE OR REPLACE FUNCTION insert_or_update(val1 integer, val2 integer) RETURNS VOID AS $$
DECLARE
BEGIN
UPDATE my_table SET col2 = val2 WHERE col1 = val1;
IF NOT FOUND THEN
INSERT INTO my_table (col2) values ( val2 );
END IF;
END;
$$ LANGUAGE 'plpgsql';
For now it's working perfect but I want to get the id of row if updated or inserted.
How can I do it?
Your function is declared as returns void so it can't return anything.
Assuming col1 is the primary key and is also defined as a serial, you can do something like this:
CREATE OR REPLACE FUNCTION insert_or_update(val1 integer, val2 integer)
RETURNS int
AS $$
DECLARE
l_id integer;
BEGIN
l_id := val1; -- initialize the local variable.
UPDATE my_table
SET col2 = val2
WHERE col1 = val1; -- !! IMPORTANT: this assumes col1 is unique !!
IF NOT FOUND THEN
INSERT INTO my_table (col2) values ( val2 )
RETURNING col1 -- this makes the generated value available
into l_id; -- and this stores it in the local variable
END IF;
return l_id; -- return whichever was used.
END;
$$ LANGUAGE plpgsql;
I changed four things compared to your function:
the function is declared as returns integer in order to be able to return something
you need a variable where you can store the returned value from the insert statement
and finally the generated value needs to be returned:
The language name is an identifier, so it must not be quoted using single quotes.
If you want to distinguish between an update or an insert from the caller, you could initialize l_id to null. In that case the function will return null if an update occurred and some value otherwise.
You can get the LastInsert ID using the method CURVAL(SEQUENCE_NAME_OF_TABLE).
But the best way is always to use the INSERT or UPDATE queries with RETURNING Clause.
CREATE OR REPLACE FUNCTION insert_or_update(val1 integer, val2 integer) RETURNS VOID AS $$
DECLARE
BEGIN
UPDATE my_table SET col2 = val2 WHERE col1 = val1 RETURNING col1;
IF NOT FOUND THEN
INSERT INTO my_table (col2) values ( val2 ) RETURNING col1;
END IF;
END;
$$ LANGUAGE 'plpgsql';
You can refer the following examples:
Insert Command - Last Example
Postgres with RETURNING clause
Note: In your UPDATE query, your WHERE clause is col1=val1. I assume that Val1 will be unique value, else multiple records will be updated. Hope you know that. And I assume col1 is your Primary Key like ID or so.
The PostgreSQL wiki's entry on UPSERT states that INSERT ... ON CONFLICT UPDATE will be added to PostgreSQL 9.5. This will allow you to more directly express the operation you desire without resorting to a stored procedure and/or introducing race conditions.
This operation is otherwise surprisingly tricky to express in earlier PostgreSQL versions without the risk of database corruption and/or a race condition. The code fragments posted so far all contain an error in that if two callers happen to want to upsert the same nonexistent row, the initial UPDATE will update zero rows and then they will both attempt an INSERT, one of which will fail. It should at least fail safe, aborting the query and any transaction in progress.
The PostgreSQL documentation on INSERT (search on that page for the text "Attempt to insert a new stock item along with the quantity of stock") shows how to do it safely and correctly on PostgreSQL 9.4 and earlier. Of particular note is that it tries the INSERT first to avoid any races on that front, and if that fails, does an UPDATE of the row it now knows exists. It uses a SAVEPOINT to ensure that a failed INSERT does not abort the transaction.

postgresql: nested insert

I have two tables. Lets say tblA and tblB.
I need to insert a row in tblA and use the returned id as a value to be inserted as one of the columns in tblB.
I tried finding out this in documentation but could not get it. Well, is it possible to write a statement (intended to be used in prepared) like
INSERT INTO tblB VALUES
(DEFAULT, (INSERT INTO tblA (DEFAULT, 'x') RETURNING id), 'y')
like we do for SELECT?
Or should I do this by creating a Stored Procedure?. I'm not sure if I can create a prepared statement out of a Stored Procedure.
Please advise.
Regards,
Mayank
You'll need to wait for PostgreSQL 9.1 for this:
with
ids as (
insert ...
returning id
)
insert ...
from ids;
In the meanwhile, you need to use plpgsql, a temporary table, or some extra logic in your app...
This is possible with 9.0 and the new DO for anonymous blocks:
do $$
declare
new_id integer;
begin
insert into foo1 (id) values (default) returning id into new_id;
insert into foo2 (id) values (new_id);
end$$;
This can be executed as a single statement. I haven't tried creating a PreparedStatement out of that though.
Edit
Another approach would be to simply do it in two steps, first run the insert into tableA using the returning clause, get the generated value through JDBC, then fire the second insert, something like this:
PreparedStatement stmt_1 = con.prepareStatement("INSERT INTO tblA VALUES (DEFAULT, ?) returning id");
stmt_1.setString(1, "x");
stmt_1.execute(); // important! Do not use executeUpdate()!
ResultSet rs = stmt_1.getResult();
long newId = -1;
if (rs.next()) {
newId = rs.getLong(1);
}
PreparedStatement stmt_2 = con.prepareStatement("INSERT INTO tblB VALUES (default,?,?)");
stmt_2.setLong(1, newId);
stmt_2.setString(2, "y");
stmt_2.executeUpdate();
You can do this in two inserts, using currval() to retrieve the foreign key (provided that key is serial):
create temporary table tb1a (id serial primary key, t text);
create temporary table tb1b (id serial primary key,
tb1a_id int references tb1a(id),
t text);
begin;
insert into tb1a values (DEFAULT, 'x');
insert into tb1b values (DEFAULT, currval('tb1a_id_seq'), 'y');
commit;
The result:
select * from tb1a;
id | t
----+---
3 | x
(1 row)
select * from tb1b;
id | tb1a_id | t
----+---------+---
2 | 3 | y
(1 row)
Using currval in this way is safe whether in or outside of a transaction. From the Postgresql 8.4 documentation:
currval
Return the value most recently
obtained by nextval for this sequence
in the current session. (An error is
reported if nextval has never been
called for this sequence in this
session.) Because this is returning a
session-local value, it gives a
predictable answer whether or not
other sessions have executed nextval
since the current session did.
You may want to use AFTER INSERT trigger for that. Something along the lines of:
create function dostuff() returns trigger as $$
begin
insert into table_b(field_1, field_2) values ('foo', NEW.id);
return new; --values returned by after triggers are ignored, anyway
end;
$$ language 'plpgsql';
create trigger trdostuff after insert on table_name for each row execute procedure dostuff();
after insert is needed because you need to have the id to reference it. Hope this helps.
Edit
A trigger will be called in the same "block" as the command that triggered it, even if not using transactions - in other words, it becomes somewhat part of that command.. Therefore, there is no risk of something changing the referenced id between inserts.