PostgreSQL Serialized Inserts Interleaving Sequence Numbers - postgresql

I have multiple processes inserting into a Postgres (10.3) table using the SERIALIZED isolation level.
Another part of our system needs to read these records and be guaranteed that it receives all of them in sequence. For example, in the picture below, the consumer would need to
select * from table where sequanceNum > 2309 limit 5
and then receive sequence numbers 2310, 2311, 2312, 2313 and 2314.
The reading query is using READCOMMITTED isolation level.
What I'm seeing though is that the reading query is only receiving the rows I've highlighted in yellow. Looking at the xmin, I'm guessing that transaction 334250 had begun but not finished, then transactions 334251, 334252 et al started and finished prior to my reading query starting.
My question is, how did they get sequence numbers interleaved in those of 334250? Why weren't those transactions blocked by merrit of all of the writing transactions being serialized?
Any suggestions on how to achieve what I'm after? Which is, a guarantee that different transactions don't generate interleaving sequence numbers? (It's ok if there are gaps.... but they can't interleave).
Thanks very much for your help. I'm losing hair over this one!
PS - I just noticed that 334250 has a non zero xmax. Is that a clue that I'm missing perhaps?

The SQL standard in its usual brevity defines SERIALIZABLE as:
The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable.
A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions
that produces the same effect as some serial execution of those same SQL-transactions. A serial execution
is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.
In the light of this definition, I understand that your wish is that the sequence numbers be in the same order as the “serial execution” that “produces the same effect”.
Unfortunately the equivalent serial ordering is not clear at the time the transactions begin, because statements later in the transaction can determine the “logical” order of the transactions.
Sequence numbers on the other hand are ordered according to the wall time when the number was requested.
In a way, you would need sequence numbers that are determined by something that is not certain until the transactions commit, and that is a contradiction in terms.
So I think that it is not possible to get what you want, unless you actually serialize the execution, e.g. by locking the table in SHARE ROW EXCLUSIVE mode before you insert the data.
My question is why you have that unusual demand. I cannot think of a good reason.

Related

unexplained variations in postgresql insert performance

I have a processing pipeline generates two streams of data, then joins the two streams of data to produce a third stream. Each stream of data is a timeseries over 1 year of 30 minute intervals (so 17520 rows). Both the generated streams and the joined stream are written into a single table keyed by a unique stream id and the timestamp of each point in the timeseries.
In abstract terms, the c and g series are generated by plpgsql functions which insert into the timeseries table from data stored elsewhere in the database (e.g. with a select) and then return the unique identifiers of the newly created series. The n series is generated with a join between the timeseries identified by c_id and g_id by the calculate_n() function which returns the id of the new n series.
To illustrate with some pseudo code:
-- generation transaction
begin;
c_id = select generate_c(p_c);
g_id = select generate_g(p_g);
end transaction;
-- calculation transaction
begin;
n_id = select calculate_n(c_id, g_id);
end transaction;
I observe that generate_c() and generate_g() typically run in a lot less than a second however the first time calculate_n() runs, it typically takes 1 minute.
However if I run calculate_n() a second time with exactly the same parameters as the first run, it runs in less than a second. (calculate_n() generates a completely new series each time it runs - it is not reading or re-writing any data calculated by the first execution)
If I stop the database server, restart it, then run calculate_n() on c_id and g_id calculated previously, the execution of calculate_n() also takes less than a second.
This is very confusing to me. I could understand the second run of calculate_n() is taking only a second if, somehow, the first run had warmed a cache but if that is so, then why does the third run (after a server restart) still run quickly when any such cache would have been cleared?
It appears to me that perhaps some kind of write cache, generated by the first generation transaction, is (unexpectedly) impeding the first execution of calculate_n() but once calculate_n() completes, that cache is purged so that it doesn't get in the way of subsequent executions of calculate_n() when they occur. I have had a look at the activity of the shared buffer cache via pg_buffercache but didn't see any strong evidence that this was happening although there were certainly evidence of cache activity across executions of calculate_n().
I may be completely off-base about this being the result of an interaction with a write-cache that was populated by the first transaction, but I am struggling to understand why the performance of calculate_n() is so poor immediately after the first transaction completes but not at other times, such as immediately after the first attempt or after the database server is restarted.
I am using postgres 11.6.
What explains this behaviour?
update:
So further on this. Running the vacuum analyze between the two generate steps and the calculate step did improve the performance of the calculate step, but if I found that if I repeated the steps again, I needed to run vacuum analyze in between the generate steps and the calculate step every time I executed the sequence which doesn't seem like a particularly practical thing to do (since you can't call vacuum analyze in a function or a procedure). I understand the need to run vacuum analyze at least once with a reasonable number of rows in the table. But do I really need to do it every time I insert 34000 more rows?

Postgres: incrementing a counter field

How does one increment a field in a database such that even if if a thousands connections to the database try to increment it at once -- it will always be 1000 at the end (if started from zero).
I mean, so that no two connections increment the same number number resulting in a lost increment.
How do you synchronize and make sure the data is consistent? is there a must for a stores procedure for this? database "locking"? how is that done?
What you're looking for is a Postgres SEQUENCE.
You call nextval('sequence_name') to get the next number in the sequence.
According to the docs,
sequences are designed with concurrency in mind:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values.
EDIT:
If you're looking for a gapless sequence, in 2006 someone posted a solution to the PostgreSQL mailing list: http://www.postgresql.org/message-id/44E376F6.7010802#seaworthysys.com. It appears there's also a lengthy discussion on locking, etc.
The gapless-sequence question was also asked on SO even though there was never an accepted answer:
postgresql generate sequence with no gap

mongo save documents in monotically increasing sequence

I know mongo docs provide a way to simulate auto_increment.
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
But it is not concurrency-proof as guaranteed by say MySQL.
Consider the sequence of events:
client 1 obtains an index of 1
client 2 obtains an index of 2
client 2 saves doc with id=2
client 1 saves doc with id=1
In this case, it is possible to save a doc with id less than the current max that is already saved. For MySql, this can never happen since auto increment id is assigned by the server.
How do I prevent this? One way is to do optimistic looping at each client, but for many clients, this will result in heavy contention. Any other better way?
The use case for this is to ensure id is "forward-only". This is important for say a chat room where many messages are posted, and messages are paginated, I do not want new messages to be inserted in a previous page.
But it is not concurrency-proof as guaranteed by say MySQL.
That depends on the definition of concurrency-proof, but let's see
In this case, it is possible to save a doc with id less than the current max that is already saved.
That is correct, but it depends on the definition of simultaneity and monotonicity. Let's say your code snapshots the state of some other part of the system, then fetches the monotonic key, then performs an insert that may take a while. In that case, this apparently non-monotonic insert might actually be 'more monotonic' in the sense that index 2 was indeed captured at a later time, possibly reflecting a more recent state. In other words: does the time it took to insert really matter?
For MySql, this can never happen since auto increment id is assigned by the server.
That sounds like folklore. Most relational dbs offer fine-grained control over these features, since strict guarantees severely impact concurrency.
MySQL does neither guarantee that there are no gaps, nor that a transaction with a high AUTO_INCREMENT id isn't visible to other readers before a transaction that acquired a lower AUTO_INCREMENT value was committed, unless you keep a table-level lock, which severely impacts concurrency.
For gaplessness, consider a transaction rollback of the first of two concurrent inserts. Does the second insert now get a new id assigned while it's being committed? No - from the InnoDB documentation:
You may see gaps in the sequence of values assigned to the AUTO_INCREMENT column if you roll back transactions that have generated numbers using the counter. (see end of 14.6.5.5.1, "Traditional InnoDB Auto-Increment Locking")
and
In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”
also, you're completely ignoring the problem of replication where sequences lead to even more trouble:
Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time. (see 14.6.5.5.2 "Configurable InnoDB Auto-Increment Locking")
The sheer length of the documentation of the InnoDB behavior is a reminder of the true complexity of making apparently simple guarantees in a concurrent system. Yes, monotonicity of inserts is possible with table-level locks, but hardly desirable. If you take a distributed view of the system, things get worse, because we can't even be sure of the counter value in partition mode...

postgresql concurrent queries as stored procedures

I have 2 stored procedures that interact with the same datatables.
first executes for several hours and second one is instant.
So if I run first one, and after that second one (second connection) then the second procedure will wait for the first one to end.
It is harmless for my data if both can run at the same time, how to do that?
The fact that the shorter query is blocked while being on a second connection suggests that the longer query is getting an exclusive lock on the table during the query.
That suggests it is doing writes, as if they were both reads there shouldn't be any locking issues. PgAdmin can show what locks are active during the longer query and also if the shorter query is indeed blocked on the longer one.
If the longer query is indeed doing writes, it's possible that you may be able to reduce the lock contention -- by chunking it, for example, which could allow readers in between chunked updates/inserts -- but if it's an operation that requires an exclusive write lock, then it will block everybody until it's done.
It's also possible that you may be able to optimize the query such that it needs to be a lower-level lock that isn't exclusive, but that would all depend on the specifics of what the query is doing and your data.

Can multiple threads cause duplicate updates on constrained set?

In postgres if I run the following statement
update table set col = 1 where col = 2
In the default READ COMMITTED isolation level, from multiple concurrent sessions, am I guaranteed that:
In a case of a single match only 1 thread will get a ROWCOUNT of 1 (meaning only one thread writes)
In a case of a multi match that only 1 thread will get a ROWCOUNT > 0 (meaning only one thread writes the batch)
Your stated guarantees apply in this simple case, but not necessarily in slightly more complex queries. See the end of the answer for examples.
The simple case
Assuming that col1 is unique, has exactly one value "2", or has stable ordering so every UPDATE matches the same rows in the same order:
What'll happen for this query is that the threads will find the row with col=2 and all try to grab a write lock on that tuple. Exactly one of them will succeed. The others will block waiting for the first thread's transaction to commit.
That first tx will write, commit, and return a rowcount of 1. The commit will release the lock.
The other tx's will again try to grab the lock. One by one they'll succeed. Each transaction will in turn go through the following process:
Obtain the write lock on the contested tuple.
Re-check the WHERE col=2 condition after getting the lock.
The re-check will show that the condition no longer matches so the UPDATE will skip that row.
The UPDATE has no other rows so it will report zero rows updated.
Commit, releasing the lock for the next tx trying to get hold of it.
In this simple case the row-level locking and the condition re-check effectively serializes the updates. In more complex cases, not so much.
You can easily demonstrate this. Open say four psql sessions. In the first, lock the table with BEGIN; LOCK TABLE test;*. In the rest of the sessions run identical UPDATEs - they'll block on the table level lock. Now release the lock by COMMITting your first session. Watch them race. Only one will report a row count of 1, the others will report 0. This is easily automated and scripted for repetition and scaling up to more connections/threads.
To learn more, read rules for concurrent writing, page 11 of PostgreSQL concurrency issues - and then read the rest of that presentation.
And if col1 is non-unique?
As Kevin noted in the comments, if col isn't unique so you might match multiple rows, then different executions of the UPDATE could get different orderings. This can happen if they choose different plans (say one is a via a PREPARE and EXECUTE and another is direct, or you're messing with the enable_ GUCs) or if the plan they all use uses an unstable sort of equal values. If they get the rows in a different order then tx1 will lock one tuple, tx2 will lock another, then they'll each try to get locks on each others' already-locked tuples. PostgreSQL will abort one of them with a deadlock exception. This is yet another good reason why all your database code should always be prepared to retry transactions.
If you're careful to make sure concurrent UPDATEs always get the same rows in the same order you can still rely on the behaviour described in the first part of the answer.
Frustratingly, PostgreSQL doesn't offer UPDATE ... ORDER BY so ensuring that your updates always select the same rows in the same order isn't as simple as you might wish. A SELECT ... FOR UPDATE ... ORDER BY followed by a separate UPDATE is often safest.
More complex queries, queuing systems
If you're doing queries with multiple phases, involving multiple tuples, or conditions other than equality you can get surprising results that differ from the results of a serial execution. In particular, concurrent runs of anything like:
UPDATE test SET col = 1 WHERE col = (SELECT t.col FROM test t ORDER BY t.col LIMIT 1);
or other efforts to build a simple "queue" system will *fail* to work how you expect. See the PostgreSQL docs on concurrency and this presentation for more info.
If you want a work queue backed by a database there are well-tested solutions that handle all the surprisingly complicated corner cases. One of the most popular is PgQ. There's a useful PgCon paper on the topic, and a Google search for 'postgresql queue' is full of useful results.
* BTW, instead of a LOCK TABLE you can use SELECT 1 FROM test WHERE col = 2 FOR UPDATE; to obtain a write lock on just that on tuple. That'll block updates against it but not block writes to other tuples or block any reads. That allows you to simulate different kinds of concurrency issues.