PostgreSQL with serial fields , rule and wrong currval / sequence - postgresql

Usually insert records into a log table based on CRUD events with PotsgreSQL requires (at least) one trigger and one trigger function. I tried to make it easier, using Postgres RULES. And it almost work, except for the scenario below:
create table mydata ( somedata char(4) , id1 bigserial );
create table mylog ( logid bigint , op char(1), id2 bigserial );
create rule mydata_ilog as on insert to mydata do insert into mylog( logid , op ) values ( new.id1 ,'i' );
insert into mydata( somedata ) values ('0001');
The main table and log table have a serial field, and I expect -- after insert one record into mydata -- to have one record in mydata with id1=1, and one record in mylog, with logid=1 ( as the id of mydata record), and id2=1... At mydata, id1=1, right ... ay mylog, id2 is 1, right ... but at mylog, logid field is 2, and the database sequence created to increment the id1 serial field is 2 ...
select currval('mydata_id1_seq'); -- return 2 ?!
select last_value from mydata_id1_seq; -- return 2 ?!
select id1 from mydata -- return 1 ok
select id2 from mydata -- return 1 ok
select logid from mylog -- return 2 ?!
I already solved this with a trigger and a trigger function, works fine. But, is there any reason to PostgreSQL do this ?!

Look at the execution plan:
EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO mydata (somedata) VALUES ('0001');
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Insert on myschema.mydata
-> Result
Output: '0001'::character(4), nextval('mydata_id1_seq'::regclass)
Insert on myschema.mylog
-> Result
Output: nextval('mydata_id1_seq'::regclass), 'i'::character(1), nextval('mylog_id2_seq'::regclass)
(7 rows)
A rule rewrites a query, using the same expression that gets used for mydata.id for the second statement, so you have the nextval call in both statements.
Rules for data modifying queries are difficult to get right, and you just stepped into a typical trap. Use triggers.

Related

Output Inserted.id equivalent in Postgres

I am new to PostgreSQL and trying to convert mssql scripts to Postgres.
For Merge statement, we can use insert on conflict update or do nothing but am using the below statement, not sure whether it is the correct way.
MSSQL code:
Declare #tab2(New_Id int not null, Old_Id int not null)
MERGE Tab1 as Target
USING (select * from Tab1
WHERE ColumnId = #ID) as Source on 0 = 1
when not matched by Target then
INSERT
(ColumnId
,Col1
,Col2
,Col3
)
VALUES (Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
)
OUTPUT INSERTED.Id, Source.Id into #tab2(New_Id, Old_Id);
Postgres Code:
Create temp table tab2(New_Id int not null, Old_Id int not null)
With source as( select * from Tab1
WHERE ColumnId = ID)
Insert into Tab1(ColumnId
,Col1
,Col2
,Col3
)
select Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
from source
My query is how to convert OUTPUT INSERTED.Id in postgres.I need this id to insert records in another table (lets say as child tables based on Inserted values in Tab1)
In PostgreSQL's INSERT statements you can choose what the query should return. From the docs on INSERT:
The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used). This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Only rows that were successfully inserted or updated will be returned.
Example (shortened form of your query):
WITH [...] INSERT INTO Tab1 ([...]) SELECT [...] FROM [...] RETURNING Tab1.id

Function taking forever to run for large number of records

I have created the following function in Postgres 9.3.5:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$BODY
$Declare
result text;
BEGIN
select min(id) into result from table
where id_used is null and id_type = val2;
update table set
id_used = 'Y',
col1 = val1,
id_used_date = now()
where id_type = val2
and id = result;
RETURN result;
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.
Example of the function when being run:
select get_result('123','idtype');
table columns:
id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)
id is the table index.
Can someone help?
Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:
T1 T2 T3 ...
SELECT max(id) -- id 1
SELECT max(id) -- id 1
SELECT max(id) -- id 1
...
Row id 1 locked, wait ...
Row id 1 locked, wait ...
UPDATE id 1
...
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
...
Largely rewritten and simplified as SQL function:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$func$
UPDATE table t
SET id_used = 'Y'
, col1 = val1
, id_used_date = now()
FROM (
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE -- lock to avoid race condition! see below ...
) t1
WHERE t.id_type = val2
-- AND t.id_used IS NULL -- repeat condition (not if row is locked)
AND t.id = t1.id
RETURNING id;
$func$ LANGUAGE sql;
Related question with a lot more explanation:
Atomic UPDATE .. SELECT in Postgres
Explain
Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.
You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.
You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:
The locking clauses cannot be used in contexts where returned rows
cannot be clearly identified with individual table rows; for example
they cannot be used with aggregation.
Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.
If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:
CREATE INDEX foo_idx ON table (id_type, id)
WHERE id_used IS NULL;
Alternative solutions
Advisory locks May be the superior approach here:
Postgres UPDATE ... LIMIT 1
Or you may want to lock many rows at once:
How to mark certain nr of rows in table on concurrent access

How to insert values from another table in PostgreSQL?

I have a table which references other tables:
CREATE TABLE scratch
(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
rep_id INT NOT NULL REFERENCES reps,
term_id INT REFERENCES terms
);
CREATE TABLE reps (
id SERIAL PRIMARY KEY,
rep TEXT NOT NULL UNIQUE
);
CREATE TABLE terms (
id SERIAL PRIMARY KEY,
terms TEXT NOT NULL UNIQUE
);
I wish to add a new record to scratch given the name, the rep and the terms values, i.e. I have neither corresponding rep_id nor term_id.
Right now the only idea that I have is:
insert into scratch (name, rep_id, term_id)
values ('aaa', (select id from reps where rep='Dracula' limit 1), (select id from terms where terms='prepaid' limit 1));
My problem is this. I am trying to use the parameterized query API (from node using the node-postgres package), where an insert query looks like this:
insert into scratch (name, rep_id, term_id) values ($1, $2, $3);
and then an array of values for $1, $2 and $3 is passed as a separate argument. At the end, when I am comfortable with the parameterized queries the idea is to promote them to prepared statements to utilize the most efficient and safest way to query the database.
However, I am puzzled how can I do this with my example, where different tables have to be subqueried.
P.S. I am using PostgreSQL 9.2 and have no problem with a PostgreSQL specific solution.
EDIT 1
C:\Users\markk>psql -U postgres
psql (9.2.4)
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
Type "help" for help.
postgres=# \c dummy
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
You are now connected to database "dummy" as user "postgres".
dummy=# DROP TABLE scratch;
DROP TABLE
dummy=# CREATE TABLE scratch
dummy-# (
dummy(# id SERIAL NOT NULL PRIMARY KEY,
dummy(# name text NOT NULL UNIQUE,
dummy(# rep_id integer NOT NULL,
dummy(# term_id integer
dummy(# );
NOTICE: CREATE TABLE will create implicit sequence "scratch_id_seq" for serial column "scratch.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "scratch_pkey" for table "scratch"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "scratch_name_key" for table "scratch"
CREATE TABLE
dummy=# DEALLOCATE insert_scratch;
ERROR: prepared statement "insert_scratch" does not exist
dummy=# PREPARE insert_scratch (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# SELECT $1, r.id, t.id
dummy-# FROM reps r, terms t
dummy-# WHERE r.rep = $2 AND t.terms = $3
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# DEALLOCATE insert_scratch2;
ERROR: prepared statement "insert_scratch2" does not exist
dummy=# PREPARE insert_scratch2 (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# VALUES ($1, (SELECT id FROM reps WHERE rep=$2 LIMIT 1), (SELECT id FROM terms WHERE terms=$3 LIMIT 1))
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# EXECUTE insert_scratch ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+-----+-------
(0 rows)
INSERT 0 0
dummy=# EXECUTE insert_scratch2 ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+----------+-------
1 | abc | Snowhite |
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch ('abcd', 'Snowhite', '30 days');
id | name | rep | terms
----+------+----------+---------
2 | abcd | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch2 ('abcd2', 'Snowhite', '30 days');
id | name | rep | terms
----+-------+----------+---------
3 | abcd2 | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=#
EDIT 2
We can utilize the fact that rep_id is required, even though terms_id is optional and use the following version of INSERT-SELECT:
PREPARE insert_scratch (text, text, text) AS
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM reps r
LEFT JOIN terms t ON t.terms = $3
WHERE r.rep = $2
RETURNING id, name, $2 rep, $3 terms;
This version, however, has two problems:
No distinction is made between a missing terms value (i.e. '') and an invalid terms value (i.e. a non empty value missing from the terms table entirely). Both are treated as missing terms. (But the INSERT with two subqueries suffers from the same problem)
The version depends on the fact that the rep is required. But what if rep_id was optional too?
EDIT 3
Found the solution for the item 2 - eliminating dependency on rep being required. Plus using the WHERE statement has the problem that the sql does not fail if the rep is invalid - it just inserts 0 rows, whereas I want to fail explicitly in this case. My solution is simply using a dummy one row CTE:
PREPARE insert_scratch (text, text, text) AS
WITH stub(x) AS (VALUES (0))
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM stub
LEFT JOIN terms t ON t.terms = $3
LEFT JOIN reps r ON r.rep = $2
RETURNING id, name, rep_id, term_id;
If rep is missing or invalid, this sql will try to insert NULL into the rep_id field and since the field is NOT NULL an error would be raised - precisely what I need. And if further I decide to make rep optional - no problem, the same SQL works for that too.
INSERT into scratch (name, rep_id, term_id)
SELECT 'aaa'
, r.id
, t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = 'Dracula'
AND t.terms = 'prepaid'
;
Notes:
You don't need the ugly LIMITs, since r.rep and t.terms are unique (candidate keys)
you could replace the FROM a, b by a FROM a CROSS JOIN b
the scratch table will probably need an UNIQUE constraint on (rep_id, term_it) (the nullability of term_id is questionable)
UPDATE: the same as prepared query as found in the Documentation
PREPARE hoppa (text, text,text) AS
INSERT into scratch (name, rep_id, term_id)
SELECT $1 , r.id , t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = $2
AND t.terms = $3
;
EXECUTE hoppa ('bbb', 'Dracula' , 'prepaid' );
SELECT * FROM scratch;
UPDATE2: test data
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE reps ( id SERIAL PRIMARY KEY, rep TEXT NOT NULL UNIQUE);
CREATE TABLE terms ( id SERIAL PRIMARY KEY, terms TEXT NOT NULL UNIQUE);
CREATE TABLE scratch ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, rep_id INT NOT NULL REFERENCES reps, term_id INT REFERENCES terms);
INSERT INTO reps(rep) VALUES( 'Dracula' );
INSERT INTO terms(terms) VALUES( 'prepaid' );
Results:
NOTICE: drop cascades to 3 other objects
DETAIL: drop cascades to table tmp.reps
drop cascades to table tmp.terms
drop cascades to table tmp.scratch
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
PREPARE
INSERT 0 1
id | name | rep_id | term_id
----+------+--------+---------
1 | aaa | 1 | 1
2 | bbb | 1 | 1
(2 rows)

Get row to swap tables on a certain condition

I currently have a parent table:
CREATE TABLE members (
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
first_name varchar(20)
last_name varchar(20)
address address (composite type)
contact_numbers varchar(11)[3]
date_joined date
type varchar(5)
);
and two related tables:
CREATE TABLE basic_member (
activities varchar[3])
INHERITS (members)
);
CREATE TABLE full_member (
activities varchar[])
INHERITS (members)
);
If the type is full the details are entered to the full_member table or if type is basic into the basic_member table. What I want is that if I run an update and change the type to basic or full the tuple goes into the corresponding table.
I was wondering if I could do this with a rule like:
CREATE RULE tuple_swap_full
AS ON UPDATE TO full_member
WHERE new.type = 'basic'
INSERT INTO basic_member VALUES (old.member_id, old.first_name, old.last_name,
old.address, old.contact_numbers, old.date_joined, new.type, old.activities);
... then delete the record from the full_member
Just wondering if my rule is anywhere near or if there is a better way.
You don't need
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
A PRIMARY KEY implies UNIQUE NOT NULL automatically:
member_id SERIAL PRIMARY KEY
I wouldn't use hard coded max length of varchar(20). Just use text and add a check constraint if you really must enforce a maximum length. Easier to change around.
Syntax for INHERITS is mangled. The key word goes outside the parens around columns.
CREATE TABLE full_member (
activities text[]
) INHERITS (members);
Table names are inconsistent (members <-> member). I use the singular form everywhere in my test case.
Finally, I would not use a RULE for the task. A trigger AFTER UPDATE seems preferable.
Consider the following
Test case:
Tables:
CREATE SCHEMA x; -- I put everything in a test schema named "x".
-- DROP TABLE x.members CASCADE;
CREATE TABLE x.member (
member_id SERIAL PRIMARY KEY
,first_name text
-- more columns ...
,type text);
CREATE TABLE x.basic_member (
activities text[3]
) INHERITS (x.member);
CREATE TABLE x.full_member (
activities text[]
) INHERITS (x.member);
Trigger function:
Data-modifying CTEs (WITH x AS ( DELETE ..) are the best tool for the purpose. Requires PostgreSQL 9.1 or later.
For older versions, first INSERT then DELETE.
CREATE OR REPLACE FUNCTION x.trg_move_member()
RETURNS trigger AS
$BODY$
BEGIN
CASE NEW.type
WHEN 'basic' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.basic_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
WHEN 'full' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.full_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
END CASE;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Trigger:
Note that it is an AFTER trigger and has a WHEN condition.
WHEN condition requires PostgreSQL 9.0 or later. For earlier versions, you can just leave it away, the CASE statement in the trigger itself takes care of it.
CREATE TRIGGER up_aft
AFTER UPDATE
ON x.member
FOR EACH ROW
WHEN (NEW.type IN ('basic ','full')) -- OLD.type cannot be IN ('basic ','full')
EXECUTE PROCEDURE x.trg_move_member();
Test:
INSERT INTO x.member (first_name, type) VALUES ('peter', NULL);
UPDATE x.member SET type = 'full' WHERE first_name = 'peter';
SELECT * FROM ONLY x.member;
SELECT * FROM x.basic_member;
SELECT * FROM x.full_member;

SELECT or INSERT a row in one command

I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:
SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'
Then, if no match:
INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id
The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id
... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.
Have you tried to union it?
Edit - this requires Postgres 9.1:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';
results in:
id | other_key
----+-----------
1 | SOMETHING
(1 row)
No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).
using 9.5 i successfully tried this
based on Denis de Bernardy's answer
only 1 parameter
no union
no stored procedure
atomic, thus no concurrency problems (i think...)
The Query:
WITH neworexisting AS (
INSERT INTO mytable(other_key) VALUES('hello 2')
ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
RETURNING *
)
SELECT * FROM neworexisting
first call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|false |
second call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|true |
First create your table ;-)
CREATE TABLE mytable (
id serial NOT NULL,
other_key text NOT NULL,
created timestamptz NOT NULL DEFAULT now(),
existed bool NOT NULL DEFAULT false,
CONSTRAINT mytable_pk PRIMARY KEY (id),
CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);
you can use a stored procedure
IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF
I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
SELECT id
FROM mytable
WHERE other_key = 'test'
UNION
SELECT NULL AS id
ORDER BY id NULLS LAST
LIMIT 1
), table_ins AS (
INSERT INTO mytable (id, other_key)
SELECT
COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
'test'
FROM table_sel
ON CONFLICT (id) DO NOTHING
RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;
In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.
In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure
that if id is already in mytable I don't insert anything.
At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.
This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.
Minor tweak a decade late to Denis's excellent answer:
-- Create the table with a unique constraint
CREATE TABLE mytable (
id serial PRIMARY KEY
, other_key varchar NOT NULL UNIQUE
);
WITH new_row AS (
-- Only insert when we don't find anything, avoiding a table lock if
-- possible.
INSERT INTO mytable ( other_key )
SELECT 'SOMETHING'
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
)
RETURNING *
)
(
-- This comes first in the UNION ALL since it'll almost certainly be
-- in the query cache. Marginally slower for the insert case, but also
-- marginally faster for the much more common read-only case.
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
-- Don't check for duplicates to be removed
UNION ALL
-- If we reach this point in iteration, we needed to do the INSERT and
-- lock after all.
SELECT *
FROM new_row
) LIMIT 1 -- Just return whatever comes first in the results and allow
-- the query engine to cut processing short for the INSERT
-- calculation.
;
The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.
NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.