SELECT or INSERT a row in one command - postgresql

I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:
SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'
Then, if no match:
INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id
The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id
... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.

Have you tried to union it?
Edit - this requires Postgres 9.1:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';
results in:
id | other_key
----+-----------
1 | SOMETHING
(1 row)

No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).

using 9.5 i successfully tried this
based on Denis de Bernardy's answer
only 1 parameter
no union
no stored procedure
atomic, thus no concurrency problems (i think...)
The Query:
WITH neworexisting AS (
INSERT INTO mytable(other_key) VALUES('hello 2')
ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
RETURNING *
)
SELECT * FROM neworexisting
first call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|false |
second call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|true |
First create your table ;-)
CREATE TABLE mytable (
id serial NOT NULL,
other_key text NOT NULL,
created timestamptz NOT NULL DEFAULT now(),
existed bool NOT NULL DEFAULT false,
CONSTRAINT mytable_pk PRIMARY KEY (id),
CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);

you can use a stored procedure
IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF

I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
SELECT id
FROM mytable
WHERE other_key = 'test'
UNION
SELECT NULL AS id
ORDER BY id NULLS LAST
LIMIT 1
), table_ins AS (
INSERT INTO mytable (id, other_key)
SELECT
COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
'test'
FROM table_sel
ON CONFLICT (id) DO NOTHING
RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;
In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.
In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure
that if id is already in mytable I don't insert anything.
At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.
This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.

Minor tweak a decade late to Denis's excellent answer:
-- Create the table with a unique constraint
CREATE TABLE mytable (
id serial PRIMARY KEY
, other_key varchar NOT NULL UNIQUE
);
WITH new_row AS (
-- Only insert when we don't find anything, avoiding a table lock if
-- possible.
INSERT INTO mytable ( other_key )
SELECT 'SOMETHING'
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
)
RETURNING *
)
(
-- This comes first in the UNION ALL since it'll almost certainly be
-- in the query cache. Marginally slower for the insert case, but also
-- marginally faster for the much more common read-only case.
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
-- Don't check for duplicates to be removed
UNION ALL
-- If we reach this point in iteration, we needed to do the INSERT and
-- lock after all.
SELECT *
FROM new_row
) LIMIT 1 -- Just return whatever comes first in the results and allow
-- the query engine to cut processing short for the INSERT
-- calculation.
;
The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.
NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.

Related

Coalesce sentence containing an insert into clause fails in PostgreSQL

This is my trivial test table,
create table test (
id int not null generated always as identity,
first_name. varchar,
primary key (id),
unique(first_name)
);
As an alternative to insert-into-on-conflict sentences, I was trying to use the coalesce laziness to execute a select whenever possible or an insert, only when select fails to find a row.
coalesce laziness is described in documentation. See https://www.postgresql.org/docs/current/functions-conditional.html
Like a CASE expression, COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated. This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other database systems.
I also want to get back the id value of the row, having being inserted or not.
I started with:
select coalesce (
(select id from test where first_name='carlos'),
(insert into test(first_name) values('carlos') returning id)
);
but an error syntax error at or near "into" was found.
See it on this other DBFiddle
https://www.db-fiddle.com/f/t7TVkoLTtWU17iaTAbEhDe/0
Then I tried:
select coalesce (
(select id from test where first_name='carlos'),
(with r as (
insert into test(first_name) values('carlos') returning id
) select id from r
)
);
Here I am getting a WITH clause containing a data-modifying statement must be at the top level error that I don't understand, as insert is the first and only sentence within the with.
I am testing this with DBFiddle and PostgreSQL 13. The source code can be found at
https://www.db-fiddle.com/f/hp8T1iQ8eS4wozDCBhBXDw/5
Different method: chained CTEs:
CREATE TABLE test
( id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, first_name VARCHAR UNIQUE
);
WITH sel AS (
SELECT id FROM test WHERE first_name = 'carlos'
)
, ins AS (
INSERT INTO test(first_name)
SELECT 'carlos'
WHERE NOT EXISTS (SELECT 1 FROM test WHERE first_name = 'carlos')
RETURNING id
)
, omg AS (
SELECT id FROM sel
UNION ALL
SELECT id FROM ins
)
SELECT id
FROM omg
;
It seems that the returning value from the insert into clause is not equivalent in nature to the scalar query of a select clause. So I try encapsulating the insert into into an SQL function and it worked.
create or replace function insert_first_name(
_first_name varchar
) returns int
language sql as $$
insert into test (first_name)
values (_first_name)
returning id;
$$;
select coalesce (
(select id from test where first_name='carlos'),
(select insert_first_name('carlos'))
);
See it on https://www.db-fiddle.com/f/73rVXgqGfrG4VmjrAk6Z3i/2
This is a refinement on #wildplasser accepted answer. it avoids comparing first_name twice and uses coalesce instead of union all. Kind of an selsert in just one sentence.
with sel as (
select id from test where first_name = 'carlos'
)
, ins as (
insert into test(first_name)
select 'carlos'
where (select id from sel) is null
returning id
)
select coalesce (
(select id from sel),
(select id from ins)
);
See it at https://www.db-fiddle.com/f/goRh4TyAebTkEZFHk6WbtK/6

Output Inserted.id equivalent in Postgres

I am new to PostgreSQL and trying to convert mssql scripts to Postgres.
For Merge statement, we can use insert on conflict update or do nothing but am using the below statement, not sure whether it is the correct way.
MSSQL code:
Declare #tab2(New_Id int not null, Old_Id int not null)
MERGE Tab1 as Target
USING (select * from Tab1
WHERE ColumnId = #ID) as Source on 0 = 1
when not matched by Target then
INSERT
(ColumnId
,Col1
,Col2
,Col3
)
VALUES (Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
)
OUTPUT INSERTED.Id, Source.Id into #tab2(New_Id, Old_Id);
Postgres Code:
Create temp table tab2(New_Id int not null, Old_Id int not null)
With source as( select * from Tab1
WHERE ColumnId = ID)
Insert into Tab1(ColumnId
,Col1
,Col2
,Col3
)
select Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
from source
My query is how to convert OUTPUT INSERTED.Id in postgres.I need this id to insert records in another table (lets say as child tables based on Inserted values in Tab1)
In PostgreSQL's INSERT statements you can choose what the query should return. From the docs on INSERT:
The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used). This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Only rows that were successfully inserted or updated will be returned.
Example (shortened form of your query):
WITH [...] INSERT INTO Tab1 ([...]) SELECT [...] FROM [...] RETURNING Tab1.id

CTE based insert of multiple rows into "one-per-group" table violates unique index

I have a table where only one row per group can be true.
This is enforced by a partial unique index (which can't be deferred).
CREATE TABLE test
(
id SERIAL PRIMARY KEY,
my_group INTEGER,
last BOOLEAN DEFAULT TRUE
);
CREATE UNIQUE INDEX "test.last" ON test (my_group) WHERE last;
INSERT INTO test (my_group)
VALUES (1), (2);
I'm trying to insert a new row into this table that shall replace the "last" element of the corresponding group. I also want to accomplish this in a single statement.
With some CTE trickery I'm able to do this: link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1)
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
So far so good, the insert happens... no conflicts.
I don't quite understand why this is working as from my understanding all CTEs should read the same initial DB state and can't see the changes made by other CTEs
The problem is now that I get a unique violation when I try to do the same with multiple rows at once: Link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1), (2) -- <- difference to above query
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
-- Schema Error: error: duplicate key value violates unique constraint "test.last"
Is there any way to insert multiple rows with one statement /Can someone explain to me why the first query is working and the second isn't?
This was caused by PostgreSQL simplifying my always true clause:
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true)
was supposed to create a dependency between the main query and the CTE to enforce execution order from the main query's point of view.
It broke with more than one entry because the limit 1 allowed PostgreSQL to ignore the second row, as only one was required for evaluation.
I fixed it by comparing COUNT(*) > -1 instead:
COALESCE((SELECT COUNT(*) FROM uncheck_old_last) > -1, true)

Get row to swap tables on a certain condition

I currently have a parent table:
CREATE TABLE members (
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
first_name varchar(20)
last_name varchar(20)
address address (composite type)
contact_numbers varchar(11)[3]
date_joined date
type varchar(5)
);
and two related tables:
CREATE TABLE basic_member (
activities varchar[3])
INHERITS (members)
);
CREATE TABLE full_member (
activities varchar[])
INHERITS (members)
);
If the type is full the details are entered to the full_member table or if type is basic into the basic_member table. What I want is that if I run an update and change the type to basic or full the tuple goes into the corresponding table.
I was wondering if I could do this with a rule like:
CREATE RULE tuple_swap_full
AS ON UPDATE TO full_member
WHERE new.type = 'basic'
INSERT INTO basic_member VALUES (old.member_id, old.first_name, old.last_name,
old.address, old.contact_numbers, old.date_joined, new.type, old.activities);
... then delete the record from the full_member
Just wondering if my rule is anywhere near or if there is a better way.
You don't need
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
A PRIMARY KEY implies UNIQUE NOT NULL automatically:
member_id SERIAL PRIMARY KEY
I wouldn't use hard coded max length of varchar(20). Just use text and add a check constraint if you really must enforce a maximum length. Easier to change around.
Syntax for INHERITS is mangled. The key word goes outside the parens around columns.
CREATE TABLE full_member (
activities text[]
) INHERITS (members);
Table names are inconsistent (members <-> member). I use the singular form everywhere in my test case.
Finally, I would not use a RULE for the task. A trigger AFTER UPDATE seems preferable.
Consider the following
Test case:
Tables:
CREATE SCHEMA x; -- I put everything in a test schema named "x".
-- DROP TABLE x.members CASCADE;
CREATE TABLE x.member (
member_id SERIAL PRIMARY KEY
,first_name text
-- more columns ...
,type text);
CREATE TABLE x.basic_member (
activities text[3]
) INHERITS (x.member);
CREATE TABLE x.full_member (
activities text[]
) INHERITS (x.member);
Trigger function:
Data-modifying CTEs (WITH x AS ( DELETE ..) are the best tool for the purpose. Requires PostgreSQL 9.1 or later.
For older versions, first INSERT then DELETE.
CREATE OR REPLACE FUNCTION x.trg_move_member()
RETURNS trigger AS
$BODY$
BEGIN
CASE NEW.type
WHEN 'basic' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.basic_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
WHEN 'full' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.full_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
END CASE;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Trigger:
Note that it is an AFTER trigger and has a WHEN condition.
WHEN condition requires PostgreSQL 9.0 or later. For earlier versions, you can just leave it away, the CASE statement in the trigger itself takes care of it.
CREATE TRIGGER up_aft
AFTER UPDATE
ON x.member
FOR EACH ROW
WHEN (NEW.type IN ('basic ','full')) -- OLD.type cannot be IN ('basic ','full')
EXECUTE PROCEDURE x.trg_move_member();
Test:
INSERT INTO x.member (first_name, type) VALUES ('peter', NULL);
UPDATE x.member SET type = 'full' WHERE first_name = 'peter';
SELECT * FROM ONLY x.member;
SELECT * FROM x.basic_member;
SELECT * FROM x.full_member;

Getting a query to index seek (rather than scan)

Running the following query (SQL Server 2000) the execution plan shows that it used an index seek and Profiler shows it's doing 71 reads with a duration of 0.
select top 1 id from table where name = '0010000546163' order by id desc
Contrast that with the following with uses an index scan with 8500 reads and a duration of about a second.
declare #p varchar(20)
select #p = '0010000546163'
select top 1 id from table where name = #p order by id desc
Why is the execution plan different? Is there a way to change the second method to seek?
thanks
EDIT
Table looks like
CREATE TABLE [table] (
[Id] [int] IDENTITY (1, 1) NOT NULL ,
[Name] [varchar] (13) COLLATE Latin1_General_CI_AS NOT NULL)
Id is primary clustered key
There is a non-unique index on Name and a unique composite index on id/name
There are other columns - left them out for brevity
Now you've added the schema, please try this. SQL Server treats length differences as different data types and will convert the varchar(13) column to match the varchar(20) variable
declare #p varchar(13)
If not, what about collation coercien? Is the DB or server different to the column?
declare #p varchar(13) COLLATE Latin1_General_CI_AS NOT NULL
If not, add this before and post results
SET SHOWPLAN_TEXT ON
GO
If the name column is NVARCHAR then u need your parameter to be also of the same type. It should then pick it up by index seek.
declare #p nvarchar(20)
select #p = N'0010000546163'
select top 1 id from table where name = #p order by id desc