If I have a sequence 'foo' in postgres, and do something like the following:
begin;
insert into ... values (nextval('foo'));
commit;
Is nextval evaluated on commit? In other words, if I only do such writes to that column, will it be visible as monotonically increasing, or is there a race there?
The docs make very clear that sequences are non-transactional, but not whether they could be used to order writes this way.
If yes, what about this?
begin;
select nextval('foo'); -- save the value
insert into ... values (<saved value>);
commit;
nextval is executed immediately. Every call to nextval is guaranteed to return a unique number, even if several statements call it on the same sequence at the same time.
Nothing can go wrong with sequences!
Related
What's the best way to atomically update a sequence in Postgres?
Context: I'm bulk inserting objects with SQLAlchemy, and exectutemany can't return defaults, so I'd like to increment the primary key sequence by the amount of objects I need to insert.
I know I can do:
ALTER SEQUENCE seq INCREMENT BY 1000;
But I'm not sure if that's safe to do in concurrent environments.
You can use setval() combined with nextval()
select setval('my_sequence', nextval('my_sequence') + 999);
This increments the current value by 1000, it does not set it to a fixed value.
As Laurenz Albe suggests: I called nextval 1000 times.
SELECT nextval('common_id_seq') FROM generate_series(1, 1000);
The advantage to this over a a_horse_with_no_name's suggestion is that I don't need to grant setval privileges to the user.
That would be safe, since ALTER SEQUENCE takes an ACCESS EXCLUSIVE lock on the sequence.
There are two problems:
this will block all concurrent usage of the sequence until your transaction is completed
you don't know the starting value
You could work around the second problem like this:
BEGIN;
ALTER SEQUENCE seq INCREMENT BY 1000;
SELECT nextval('seq');
COMMIT;
Then you know that that value and the preceding 999 ondes are yours.
But I think the best way is to call nextval 1000 times.
I have the following SQL script which sets the sequence value corresponding to max value of the ID column:
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1)) FROM mytable;
Should I lock 'mytable' in this case in order to prevent changing ID in a parallel request, such in the example below?
request #1 request #2
MAX(id)=5
inserted id 6
SETVAL=5
Or setval(max(id)) is an atomic operation?
Your suspicion is right, this approach is subject to race conditions.
But locking the table won't help, because it won't keep a concurrent transaction from fetching new sequence values. This transaction will block while the table is locked, but will happily continue inserting once the lock is gone, using a sequence value it got while the table was locked.
If it were possible to lock sequences, that might be a solution, but it is not possible to lock sequences.
I can think of two solutions:
Remove all privileges on the sequence while you modify it, so that concurrent requests to the sequence will fail. That causes errors, of course.
The pragmatic way: use
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1) + 100000) FROM mytable;
Here 100000 is a value that is safely bigger than the number rows that might get inserted while your operatoin is running.
You can use two requests in the same transaction:
ALTER SEQUENCE mytable_id_seq RESTART;
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1)) FROM mytable;
Note: the first command will lock the sequence for other transactions
I have a general function that can manipulate the sequence of any table (why is irrelevant to my question). It reads the current value, works out the new value, sets it, and returns its calculation, which is what's inserted. This is obviously a multi-step process.
I call it from a BEFORE INSERT trigger on tables where I need it.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
Specifically, does the BEFORE INSERT trigger have to complete before it is called again by another caller?
Logically, I would assume yes, but one never knows what may be going on under the hood.
If the answer is no, what minimal locking would I need on the function to guarantee I can read and write the sequence in a "thread-safe" manner?
I'm using PG 10.
EDIT
Here is the function updated with a lock:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
seq text := format('%I.%I_uts_seq', tg_table_schema, tg_table_name);
BEGIN
EXECUTE format('LOCK TABLE %I IN ROW EXCLUSIVE MODE;', tg_table_name);
EXECUTE 'SELECT last_value+1 FROM ' || seq INTO sv; -- currval(seq) isn't useable
PERFORM setval(seq, GREATEST(sv, (EXTRACT(epoch FROM localtimestamp) * 1000000)::int8), false);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
However, a SELECT already acquires ROW EXCLUSIVE, so this statement may be redundant and a stronger lock may be needed. Or, conversely, it may mean no lock is needed.
UPDATE
If I am reading this SO question correctly, my original version without the LOCK should work since the trigger acquires the same lock my updated function is redundantly taking.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
No. Not related to calling functions itself, but you can achieve this behaviour with SERIALIZABLE transaction isolation level:
This level emulates serial transaction execution for all committed
transactions; as if transactions had been executed one after another,
serially, rather than concurrently
But this approach would introduce several tradeoffs, such preparing your application to retry transactions with serialization failure.
Maybe a missed something, but I really believe that you just need NEXTVAL, something like below:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
-- First, use %I wildcard for identifiers instead of %s
seq text := format('%I.%I', tg_table_schema, tg_table_name || '_uts_seq');
BEGIN
-- Second, you couldn't call CURRVAL on a session
-- that you didn't issued NEXTVAL before
sv := NEXTVAL(seq);
-- Do your logic here...
-- Result is ignored since this is an STATEMENT trigger
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Remember that CURRVAL acts on session local scope and NEXTVAL on global scope, so you have a reliable thread-safe mechanism in hands.
The sequence itself handles thread safety with concurrent sessions. So it real comes down to the code that is interacting with the sequence. The following code is thread safe:
SELECT nextval('myseq');
If the sequence is doing much fancier things like setval and currval, I would be more worried about that being done in a high transaction/multi-user environment. Even so, the sequence itself should be locked from other queries while the sequence is being manipulated.
Let's say I've written plpgsql function that does the following:
CREATE OR REPLACE FUNCTION foobar (_foo_data_id bigint)
RETURNS bigint AS $$
BEGIN
DROP TABLE IF EXISTS tmp_foobar;
CREATE TEMP TABLE tmp_foobar AS
SELECT *
FROM foo_table ft
WHERE ft.foo_data_id = _foo_data_id;
-- more SELECT queries on unrelated tables
-- a final SELECT query that invokes tmp_foobar
END;
First question:
If I simultaneously invoked this function twice, is it possible for the second invocation of foobar() to drop the tmp_foobar table while the first invocation of foobar() is still running?
I understand that SELECT statements create an ACCESS SHARE lock, but will that lock persist until the SELECT statement completes or until the implied COMMIT at the end of the function?
Second question:
If the latter is true, will the second invocation of foobar() indefinitely re-try DROP TABLE IF EXISTS tmp_foobar; until the lock is dropped or will it fail at some point?
If you simultaneously invoke a function twice, it means you're using two separate sessions to do so. Temporary tables are not shared between sessions, so the second session would not "see" tmp_foobar from the first session, and there would be no interaction. See http://www.postgresql.org/docs/9.2/static/sql-createtable.html#AEN70605 ("Temporary tables").
Locks persist until the end of the transaction (regardless of how you acquire them; exception are advisory locks, but that's not what you're doing.)
The second question does not need an answer, because the premise is false.
One more thing. It might be useful to create indexes on that temporary table of yours, and ANALYZE it; that might cause the final query to be faster.
I have a table
create table testtable(
testtable_rid serial not null,
data integer not null,
constraint pk_testtable primary key(testtable_rid)
);
So lets say I do this code about 20 times:
begin;
insert into testtable (data) values (0);
rollback;
and then I do
begin;
insert into testtable (data) values (0);
commit;
And finally a
select * from testtable
Result:
row0: testtable_rid=21 | data=0
Expected result:
row0: testtable_rid=1 | data=0
As you can see, sequences do not appear to be affected by transaction rollbacks. They continue to increment as if the transaction was committed and then the row was deleted. Is there some way to prevent sequences from behaving in this way?
It would not be a good idea to rollback sequences. Imagine two transactions happening at the same time, each of which uses the sequence for a unique id. If the second transaction commits and the first transaction rolls back, then the second inserted a row with "2" while the first rolls the sequence back to "1".
If that sequence is then used again, the value of the sequence will become "2" which could lead to a unique constraint problem.
No, there isn't. See the note at the bottom of this page. It's a bad idea to do something like that anyway. If you have two transactions running at the same time, each inserting one row, you want them to insert rows with different IDs.