upsert and sub select - postgresql

i have a upsert statement (http://www.the-art-of-web.com/sql/upsert/) doing an insert whenever a row with id does not exist and updating the column when the row exists :
WITH upsert AS
(UPDATE foo SET counter=counter+1 WHERE id='bar' RETURNING *)
INSERT INTO foo(id, counter) SELECT 'bar', 0
WHERE NOT EXISTS (SELECT * FROM upsert) RETURNING counter;
id is the primary key column (as expected). until here everything works fine.
but there is a 3rd column 'position' which can be used for custom ordering.
in case of an update i want to keep the current value.
but the insert statement needs an additional subquery returning the lowest possible position not in use:
WITH upsert AS
(UPDATE foo SET counter=counter+1 WHERE id='bar' RETURNING *)
INSERT INTO foo(id, counter, position) SELECT 'bar', 0, MIN( position)-1 from foo
WHERE NOT EXISTS (SELECT * FROM upsert) RETURNING counter;
using this statement i get an error
ERROR: duplicate key value violates unique constraint "id"
whats wrong here ?

The problem is that MIN() applied to 0 row returns one row (with a NULL value)
Example:
test=> select min(1) where false;
min
-----
(1 row)
This differs from the same WHERE clause without min()
test=> select 1 where false;
?column?
----------
(0 rows)
So when using MIN() in the subquery feeding the INSERT, it will insert a new row even when the WHERE clause evaluates to false, which defeats the logic of this UPSERT.
I think this can be worked around by introducing another subquery:
WITH upsert AS
(UPDATE foo SET counter=counter+1 WHERE id='bar' RETURNING *)
INSERT INTO foo(id, counter, position)
SELECT * FROM (SELECT 'bar', 0, MIN( position)-1 from foo) s
WHERE NOT EXISTS (SELECT * FROM upsert)
RETURNING counter;
Note however that cramming this into a single SQL statement does not confer any guarantee of systematic success when run concurrently.
See for more:
How do I do an UPSERT (MERGE, INSERT … ON DUPLICATE UPDATE) in PostgreSQL?

Related

how can I translate a query to CTE?

I'm still having trouble understanding how CTE works.
I'm looking to make an insert. In case of conflict I use the on conflict do nothing but I want it to return the id to me (for the success of the insert or the conflict)
WITH inserted AS (
INSERT INTO fiche(label)
VALUES ('label')
ON CONFLICT (label) DO NOTHING
RETURNING *
)
SELECT * FROM inserted
WHERE NOT EXISTS (SELECT 1 FROM inserted);
Note that
SELECT * FROM some_relation
WHERE NOT EXISTS (SELECT 1 FROM some_relation);
will always give you an empty result. Either some_relation is empty itself or if it is not empty SELECT 1 FROM some_relation is not empty and therefore NOT EXISTS ... always returns false and so no record is matching the WHERE clause.
What you want is to have the VALUES as a CTE. You can then reference the values from your INSERT statement and in a SELECT to compare those values to the result of the RETURNING clause.
WITH
vals AS (
VALUES ('label')
),
inserted AS (
INSERT INTO fiche(label)
SELECT * FROM vals
ON CONFLICT (label) DO NOTHING
RETURNING label, id
)
SELECT
vals.column1,
inserted.id
FROM vals
LEFT JOIN inserted ON vals.column1 = inserted.label
This should give you a row for each row in your VALUES clause and the second column will be NULL if it was not inserted due to a conflict or the inserted ID otherwise.

Output Inserted.id equivalent in Postgres

I am new to PostgreSQL and trying to convert mssql scripts to Postgres.
For Merge statement, we can use insert on conflict update or do nothing but am using the below statement, not sure whether it is the correct way.
MSSQL code:
Declare #tab2(New_Id int not null, Old_Id int not null)
MERGE Tab1 as Target
USING (select * from Tab1
WHERE ColumnId = #ID) as Source on 0 = 1
when not matched by Target then
INSERT
(ColumnId
,Col1
,Col2
,Col3
)
VALUES (Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
)
OUTPUT INSERTED.Id, Source.Id into #tab2(New_Id, Old_Id);
Postgres Code:
Create temp table tab2(New_Id int not null, Old_Id int not null)
With source as( select * from Tab1
WHERE ColumnId = ID)
Insert into Tab1(ColumnId
,Col1
,Col2
,Col3
)
select Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
from source
My query is how to convert OUTPUT INSERTED.Id in postgres.I need this id to insert records in another table (lets say as child tables based on Inserted values in Tab1)
In PostgreSQL's INSERT statements you can choose what the query should return. From the docs on INSERT:
The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used). This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Only rows that were successfully inserted or updated will be returned.
Example (shortened form of your query):
WITH [...] INSERT INTO Tab1 ([...]) SELECT [...] FROM [...] RETURNING Tab1.id

CTE based insert of multiple rows into "one-per-group" table violates unique index

I have a table where only one row per group can be true.
This is enforced by a partial unique index (which can't be deferred).
CREATE TABLE test
(
id SERIAL PRIMARY KEY,
my_group INTEGER,
last BOOLEAN DEFAULT TRUE
);
CREATE UNIQUE INDEX "test.last" ON test (my_group) WHERE last;
INSERT INTO test (my_group)
VALUES (1), (2);
I'm trying to insert a new row into this table that shall replace the "last" element of the corresponding group. I also want to accomplish this in a single statement.
With some CTE trickery I'm able to do this: link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1)
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
So far so good, the insert happens... no conflicts.
I don't quite understand why this is working as from my understanding all CTEs should read the same initial DB state and can't see the changes made by other CTEs
The problem is now that I get a unique violation when I try to do the same with multiple rows at once: Link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1), (2) -- <- difference to above query
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
-- Schema Error: error: duplicate key value violates unique constraint "test.last"
Is there any way to insert multiple rows with one statement /Can someone explain to me why the first query is working and the second isn't?
This was caused by PostgreSQL simplifying my always true clause:
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true)
was supposed to create a dependency between the main query and the CTE to enforce execution order from the main query's point of view.
It broke with more than one entry because the limit 1 allowed PostgreSQL to ignore the second row, as only one was required for evaluation.
I fixed it by comparing COUNT(*) > -1 instead:
COALESCE((SELECT COUNT(*) FROM uncheck_old_last) > -1, true)

Postgres: Update a query counter once a result is returned

I have a situation where a particular row can only be queried for a fixed number of times, say 1000. After which, it is made unavailable to that particular party permanently. Each query returns only 1 result i.e. LIMIT = 1.
I intend to implement this by having a counter that starts at 1000 and decrement with the number of times it gets queried.
Is there anyway where upon returning that result, that I am able to have its counter is immediately incremented?
This is as opposed to waiting for the result to be received by the application layer and then sending an UPDATE statement to increment the counter. Because between the time the result is returned till the time the UPDATE query is received, there can be another query.
You can use a CTE that does the SELECT and the UPDATE in one query:
CREATE TABLE foo(
id int primary key,
content text,
counter int default 0);
INSERT INTO foo(id, content, counter) VALUES(1, 'foo', default);
INSERT INTO foo(id, content, counter) VALUES(2, 'bar', default);
INSERT INTO foo(id, content, counter) VALUES(3, 'baz', default);
-- select the data and update the counter:
WITH step_1 AS (
SELECT * FROM foo WHERE counter < 5 ORDER BY id LIMIT 1 -- now you can use LIMIT as well
), step_2 AS (
UPDATE foo SET counter = foo.counter + 1 FROM step_1 WHERE foo.id = step_1.id RETURNING foo.*
)
SELECT * FROM step_2;
Unfortunately you can not create SELECT triggers in PostgreSQL . but you can achieve this by Transactions
testdb=# BEGIN;
SELECT something FROM Some_table WHERE <where_criteria>;
UPDATE Some_table SET value = value - 1 WHERE <where_criteria>
COMMIT;
That where_criteria should be same for both statements

SELECT or INSERT a row in one command

I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:
SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'
Then, if no match:
INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id
The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id
... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.
Have you tried to union it?
Edit - this requires Postgres 9.1:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';
results in:
id | other_key
----+-----------
1 | SOMETHING
(1 row)
No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).
using 9.5 i successfully tried this
based on Denis de Bernardy's answer
only 1 parameter
no union
no stored procedure
atomic, thus no concurrency problems (i think...)
The Query:
WITH neworexisting AS (
INSERT INTO mytable(other_key) VALUES('hello 2')
ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
RETURNING *
)
SELECT * FROM neworexisting
first call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|false |
second call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|true |
First create your table ;-)
CREATE TABLE mytable (
id serial NOT NULL,
other_key text NOT NULL,
created timestamptz NOT NULL DEFAULT now(),
existed bool NOT NULL DEFAULT false,
CONSTRAINT mytable_pk PRIMARY KEY (id),
CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);
you can use a stored procedure
IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF
I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
SELECT id
FROM mytable
WHERE other_key = 'test'
UNION
SELECT NULL AS id
ORDER BY id NULLS LAST
LIMIT 1
), table_ins AS (
INSERT INTO mytable (id, other_key)
SELECT
COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
'test'
FROM table_sel
ON CONFLICT (id) DO NOTHING
RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;
In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.
In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure
that if id is already in mytable I don't insert anything.
At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.
This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.
Minor tweak a decade late to Denis's excellent answer:
-- Create the table with a unique constraint
CREATE TABLE mytable (
id serial PRIMARY KEY
, other_key varchar NOT NULL UNIQUE
);
WITH new_row AS (
-- Only insert when we don't find anything, avoiding a table lock if
-- possible.
INSERT INTO mytable ( other_key )
SELECT 'SOMETHING'
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
)
RETURNING *
)
(
-- This comes first in the UNION ALL since it'll almost certainly be
-- in the query cache. Marginally slower for the insert case, but also
-- marginally faster for the much more common read-only case.
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
-- Don't check for duplicates to be removed
UNION ALL
-- If we reach this point in iteration, we needed to do the INSERT and
-- lock after all.
SELECT *
FROM new_row
) LIMIT 1 -- Just return whatever comes first in the results and allow
-- the query engine to cut processing short for the INSERT
-- calculation.
;
The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.
NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.