What resources do subtransaction IDs consume? - postgresql

The PostgreSQL wiki advises an approach to implementing UPSERT that uses a retry-loop. Implicit in this solution is the use of "subtransaction IDs". On the wiki article there is the following warning:
The correct solution is slow and clumsy to use, and is unsuitable for significant amounts of data. It also potentially burns through a lot of subtransaction IDs - avoiding burning XIDs is an explicit goal of the current "native UPSERT in PostgreSQL" effort.
What is the consequence of using "a lot of subtransaction IDs"? I don't really know what a subtransaction ID is - is this just a way of numbering nested transactions, and is the implication that these numbers might run out?

The resource is the 32 bits XID transaction counter itself, which is used by the engine to know if the version of a row in a table is associated to an "old" transaction (committed or rolled back) or a not-yet-committed transaction, and if it's visible or not from any given transaction.
Increasing XIDs at a super-high rate creates or increases the risk of getting a transaction ID wraparound issue. The worst case being that this issue escalates into a database self-shutdown to avoid data inconsistencies.
What avoids the transaction ID wraparound is routine vacuuming. This is detailed in the doc under Preventing Transaction ID Wraparound Failures.
But autovacuum is a background task which is meant to not get in the way of the foreground activity. Among other things, it cancels itself instead of locking out other queries. At times, it can lag a lot behind.
We can imagine a worst case where the foreground database activity increases XID values so fast that autovacuum just doesn't have the time to freeze the rows with the "old XIDs" before these XIDs values are claimed by a new transaction or subtransaction, a situation which PostgreSQL couldn't deal with.
It might also be that those foreground transactions stay uncommitted when this is going on, so even an aggressive vaccum couldn't do anything about it.
That's why programmers should be cautious about techniques that make this event more likely, like opening/closing subtransactions in huge loops.
The range is about 2 billion transactions, but this is the kind of limit that was unreachable when the system was designed, but which will become problematic as our hardware capabilities and what we're asking from our databases are ever-increasing.

Related

Why does the way Postgres handles XID assignments determine the overhead of a READ/WRITE transaction w/o writes (vs. that of a READ ONLY transaction)?

I don't understand this comment:
My understanding is that read-write transactions carry some overhead, but that you don't incur this overhead until you actually write something. In other words, in terms of performance, a READ ONLY transaction should be the same as a READ WRITE transaction which only contains reads. This stems from the way Postgres handles XID assignment (some info on this here).
The link in the comment states:
"Transactions and subtransactions are assigned permanent XIDs only when/if they first do something that requires one --- typically, insert/update/delete a tuple, though there are a few other places that need an XID assigned."
Is this the key point? That is, if a READ/WRITE transaction only has reads, then an XID isn't assigned, and assigning an XID would otherwise account for the overhead difference between a READ/WRITE transaction with no writes and a READ ONLY transaction.
Does this mean that other databases assign an XID even if no rows are changed, removed, or added?
The overhead difference is related to how read-only and read-write transactions are defined, and how permanent XIDs are assigned in PostgreSQL.
A transaction is defined as virtual transaction (aka read-only) and does not get assigned a true XID until it does a data modification operation on the database. Virtual transactions do not affect the visibility of tuples (nevertheless, they do trigger the pruning of dead tuples in the page depending on the free page-size left, different topic). No impact on visibility means no impact on snapshot isolation. And no need for the assignment of a true XID -- which normally requires synchronization of internal processes, page writes (xmin, xmax, and hint-bits related to those), additional IO, etc. This is the extra overhead. You can self-experiment with this by starting a transaction block and observing no-permanent_XID assignment until a DML statement is executed by using a built-in function (for details, https://pgpedia.info/p/pg_current_xact_id_if_assigned.html):
postgres=# begin;
BEGIN
postgres=*# select pg_current_xact_id_if_assigned();
pg_current_xact_id_if_assigned
--------------------------------
(1 row)
A virtual_XID assignment is still done for read-only transactions as well. But assigned IDs are memory-only, local to the process, and temporary; which makes them much less expensive.
When it comes to other DBMSs; MS SQL also differentiates between different types of transactions and if I am not mistaken they are all common in how they are assigned: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-tran-active-transactions-transact-sql?view=azuresqldb-current
Yes, that is the case: if you commit a transaction that did not request a new transaction ID, hardly anything happens (unless you created a WITH HOLD cursor). If the transaction got a transaction ID, a COMMIT record is written to WAL, and WAL is flushed to disk.

Is ROLLBACK TO SAVEPOINT better performing than ROLLBACK TRANSACTION?

I have a script that performs a bunch of updates on a moderately large (approximately 6 million rows) table, based on data read from a file.
It currently begins and then commits a transaction for each row it updates and I wanted to improve its performance somehow. I wonder if starting a single transaction at the beginning of the script's run and then rollbacking to individual savepoints in case any validation error occurs would actually result in a performance increase.
I looked online but haven't had much luck finding any documentation or benchmarks.
COMMIT is mostly an I/O problem, because the transaction log (WAL) has to be synchronized to disk.
So using subtransactions (savepoints) will verylikely boost performance. But beware that using more than 64 subtransactions per transaction will again hurt performance if you have concurrent transactions.
If you can live with losing some committed transactions in the event of a database server crash (which is rare), you could simply set synchronous_commit to off and stick with many small transactions.
Another, more complicated method is to process the rows in batches without using subtransactions and repeating the whole batch in case of a problem.
Having a single transaction with only 1 COMMIT should be faster than having multiple single row update transactions because each COMMIT must synchronize WAL writing to disk. But how really faster it is in a given environment depends a lot of the environment (number of transactions, table structure, index structure, UPDATE statement, PostgreSQL configuration, system configuration etc.): only you can benchmark in your environment.

PostgreSQL limits itself to a single core CPU usage (debugging lock issue?)

Update after some research, it seems this question was incorrect - the 100% was representing all cores, not a single core, making the whole question moot. My sincere apologies to the community.
On PostgreSQL 10, PostGIS 2.5.2, without any data modifications (SELECT queries only), I have 40 identical GIS queries running in parallel (with different params), each taking ~20-500ms. Server has lots of RAM, NVME SSDs.
The CPU usage consistently shows 100% of a single core, implying that all queries are stuck waiting for something that cannot execute in parallel, but I am not sure how to find it.
Examining pg_stat_activity multiple times shows that all queries are active, and their wait_event could be one of these cases:
wait_event is NULL for all
a few ClientRead and lock_manager, NULL everything else
a lot of lock_manager, and a few ClientRead and NULLs.
Is there a way to figure out what may be causing this?
That is surprising, as reading queries never lock on anything short of an ACCESS EXCLUSIVE lock that is required by operations like DROP TABLE, TRUNCATE, ALTER TABLE and similar statements.
Perhaps the locks are “light-weight locks” on internal PostgreSQL data structures, which are usually only held for a very short time. I don't know what in a PostGIS query could have high contention on such internal locks, but then you didn't show the statement or its execution plan, nor did you show the exact lock events.
If you have several concurrent queries that each take a long time like 500ms, the definitely should be running in parallel.
Apart from the possibilities of some internal lock contention, I can think of two explanations:
Most of the queries are short enough that a single core suffices to process all the queries. Each connection spends most of its time waiting for the client.
The system is I/O bound, so that most of the CPUs just twiddle their thumbs. That would be indicated by a CPU iowait% of 10 or more.

What are the consequences of not ending a database transaction?

I have found a bug in my application code where I have started a transaction, but never commit or do a rollback. The connection is used periodically, just reading some data every 10s or so. In the pg_stat_activity table, its state is reported as "idle in transaction", and its backend_start time is over a week ago.
What is the impact on the database of this? Does it cause additional CPU and RAM usage? Will it impact other connections? How long can it persist in this state?
I'm using postgresql 9.1 and 9.4.
Since you only SELECT, the impact is limited. It is more severe for any write operations, where the changes are not visible to any other transaction until committed - and lost if never committed.
It does cost some RAM and permanently occupies one of your allowed connections (which may or may not matter).
One of the more severe consequences of very long running transactions: It blocks VACUUM from doing it's job, since there is still an old transaction that can see old rows. The system will start bloating.
In particular, SELECT acquires an ACCESS SHARE lock (the least blocking of all) on all referenced tables. This does not interfere with other DML commands like INSERT, UPDATE or DELETE, but it will block DDL commands as well as TRUNCATE or VACUUM (including autovacuum jobs). See "Table-level Locks" in the manual.
It can also interfere with various replication solutions and lead to transaction ID wraparound in the long run if it stays open long enough / you burn enough XIDs fast enough. More about that in the manual on "Routine Vacuuming".
Blocking effects can mushroom if other transactions are blocked from committing and those have acquired locks of their own. Etc.
You can keep transactions open (almost) indefinitely - until the connection is closed (which also happens when the server is restarted, obviously.)
But never leave transactions open longer than needed.
There are two major impacts to the system.
The tables that have been used in those transactions:
are not vacuumed which means they are not "cleaned up" and their statistics aren't updated which might lead to bad (=slow) execution plans
cannot be changed using ALTER TABLE

Mongodb update guarantee using w=0

I have a large collection with more that half a million of docs, which I need to updated continuously. To achieve this, my first approach was to use w=1 to ensure write result, which causes a lot of delay.
collection.update(
{'_id': _id},
{'$set': data},
w=1
)
So I decided to use w=0 in my update method, now the performance got significantly faster.
Since my past bitter experience with mongodb, I'm not sure if all the update are guaranteed when w=0. My question is, is it guaranteed to update using w=0?
Edit: Also, I would like to know how does it work? Does it create an internal queue and perform update asynchronously one by one? I saw using mongostat, that some update is being processed even after the python script quits. Or the update is instant?
Edit 2: According to the answer of Sammaye, link, any error can cause silent failure. But what happens if a heavy load of updates are given? Does some updates fail then?
No, w=0 can fail, it is only:
http://docs.mongodb.org/manual/core/write-concern/#unacknowledged
Unacknowledged is similar to errors ignored; however, drivers will attempt to receive and handle network errors when possible.
Which means that the write can fail silently within MongoDB itself.
It is not reliable if you wish to specifically guarantee. At the end of the day if you wish to touch the database and get an acknowledgment from it then you must wait, laws of physics.
Does w:0 guarantee an update?
As Sammaye has written: No, since there might be a time where the data is only applied to the in memory data and is not written to the journal yet. So if there is an outage during this time, which, depending on the configuration, is somewhere between 10 (with j:1 and the journal and the datafiles living on separate block devices) and 100ms by default, your update may be lost.
Please keep in mind that illegal updates (such as changing the _id of a document) will silently fail.
How does the update work with w:0?
Assuming there are no network errors, the driver will return as soon it has send the operation to the mongod/mongos instance with w:0. But let's look a bit further to give you an idea on what happens under the hood.
Next, the update will be processed by the query optimizer and applied to the in memory data set. After sucessful application of the operation a write with write concern w:1 would return now. The operations applied will be synced to the journal every commitIntervalMs, which is divided by 3 with write concern j:1. If you have a write concern of {j:1}, the driver will return after the operations are stored in the journal successfully. Note that there are still edge cases in which data which made it to the journal won't be applied to replica set members in case a very "well" timed outage occurs now.
By default, every syncPeriodSecs, the data from the journal is applied to the actual data files.
Regarding what you saw in mongostat: It's granularity isn't very high, you might well we operations which took place in the past. As discussed, the update to the in memory data isn't instant, as the update first has to pass the query optimizer.
Will heavy load make updates silently fail with w:0?
In general, it is safe to say "No." And here is why:
For each connection, there is a certain amount of RAM allocated. If the load is so high that mongo can't allocate any further RAM, there would be a connection error – which is dealt with, regardless of the write concern, except for unacknowledged writes.
Furthermore, the application of updates to the in memory data is extremely fast - most likely still faster than they come in in case we are talking of load peaks. If mongod is totally overloaded (e.g. 150k updates a second on a standalone mongod with spinning disks), problems might occur, of course, though even that usually is leveraged from a durability point of view by the underlying OS.
However, updates still may silently disappear in case of an outage when the write concern is w:0,j:0 and the outage happens in the time the update is not synced to the journal.
Notes:
The optimal balance between maximum performance and minimal guaranteed durability is a write concern of j:1. With a proper setup, you can reduce the latency to slightly over 10ms.
To further reduce the latency/update, it might be worth having a look at bulk write operations, if those apply to your use case. In my experience, they do more often than not. Please read and try before dismissing the idea.
Doing write operations with w:0,j:0 is highly discouraged in case you expect any guarantee on data durability. Use a t your own risk. This write concern is only meant for "cheap" data, which is easy to reobtain or where speed concern exceeds the need for durability. Collecting real time weather data in a large scale would be an example – the system still works, even if one or two data points are missing here and there. For most applications, durability is a concern. Conclusion: use w:1,j:1 at least for durable writes.