Is there a way to measure how many write transactions are happening per second in Postgres? As I understand pg_stat_database.xact_commit will show total number of transactions committed, but I want to exclude readonly queries and only see the number of commits that actually modified data.
Run
SELECT txid_current();
to get the current transaction number.
If you do that at two points in time and subtract the numbers, you know how many transactions (committed or rolled back) have occurred in the mean time.
Read-only transactions do not consume a transaction ID.
This script can be used to count number of transaction commits performed between starting the script and killing it: https://gist.github.com/dmos62/aa754a04ff8bf36d6565d74b2dad6513
Usage looks like this:
./count_txs.sh psql postgresql://x:y#z:1234/w
ctrl-c to stop counting
^C
55
This means that 55 transaction commits have been performed between starting the script and killing it.
Related
We have a Postgres 12 system running one master master and two async hot-standby replica servers and we use SERIALIZABLE transactions. All the database servers have very fast SSD storage for Postgres and 64 GB of RAM. Clients connect directly to master server if they cannot accept delayed data for a transaction. Read-only clients that accept data up to 5 seconds old use the replica servers for querying data. Read-only clients use REPEATABLE READ transactions.
I'm aware that because we use SERIALIZABLE transactions Postgres might give us false positive matches and force us to repeat transactions. This is fine and expected.
However, the problem I'm seeing is that randomly a single line INSERT or UPDATE query stalls for a very long time. As an example, one error case was as follows (speaking directly to master to allow modifying table data):
A simple single row insert
insert into restservices (id, parent_id, ...) values ('...', '...', ...);
stalled for 74.62 seconds before finally emitting error
ERROR 40001 could not serialize access due to concurrent update
with error context
SQL statement "SELECT 1 FROM ONLY "public"."restservices" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
We log all queries exceeding 40 ms so I know this kind of stall is rare. Like maybe a couple of queries a day. We average around 200-400 transactions per second during normal load with 5-40 queries per transaction.
After finally getting the above error, the client code automatically released two savepoints, rolled back the transaction and disconnected from database (this cleanup took 2 ms total). It then reconnected to database 2 ms later and replayed the whole transaction from the start and finished in 66 ms including the time to connect to the database. So I think this is not about performance of the client or the master server as a whole. The expected transaction time is between 5-90 ms depending on transaction.
Is there some PostgreSQL connection or master configuration setting that I can use to make PostgreSQL to return the error 40001 faster even if it caused more transactions to be rolled back? Does anybody know if setting
set local statement_timeout='250'
within the transaction has dangerous side-effects? According to the documentation https://www.postgresql.org/docs/12/runtime-config-client.html "Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions" but I could set the timeout only for transactions by this client that's able to automatically retry the transaction very fast.
Is there anything else to try?
It looks like someone had the parent row to the one you were trying to insert locked. PostgreSQL doesn't know what to do about that until the lock is released, so it blocks. If you failed rather than blocking, and upon failure retried the exact same thing, the same parent row would (most likely) still be locked and so would just fail again, and you would busy-wait. Busy-waiting is not good, so blocking rather than failing is generally a good thing here. It blocks and then unblocks only to fail, but once it does fail a retry should succeed.
An obvious exception to blocking-better-than-failing being if when you retry, you can pick a different parent row to retry with, if that make sense in your context. In this case, maybe the best thing to do is explicitly lock the parent row with NOWAIT before attempting the insert. That way you can perhaps deal with failures in a more nuanced way.
If you must retry with the same parent_id, then I think the only real solution is to figure out who is holding the parent row lock for so long, and fix that. I don't think that setting statement_timeout would be hazardous, but it also wouldn't solve your problem, as you would probably just keep retrying until the lock on the offending row is released. (Setting it on the other session, the one holding the lock, might be helpful, depending on what that session is doing while the lock is held.)
I'm trying to get a better understanding of the lock acquisition behavior on MongoDB transactions. I have a scenario where two concurrent transactions try to modify the same document. Since one transaction will get the write lock on the document first, the second transaction will run into a write conflict and fail.
I stumbled upon the maxTransactionLockRequestTimeoutMillis setting as documented here: https://docs.mongodb.com/manual/reference/parameters/#param.maxTransactionLockRequestTimeoutMillis and it states:
The maximum amount of time in milliseconds that multi-document transactions should wait to acquire locks required by the operations in the transaction.
However, changing this value does not seem to have an impact on the observed behavior with a write conflict. Transaction 2 does not seem to wait for the lock to be released again but immediately runs into a write conflict when another transaction holds the lock (other than concurrent writes outside a transaction which will block and wait for the lock).
Do I understand correctly that the configured time in maxTransactionLockRequestTimeoutMillis does not include the act of actually receiving the write lock on the document or is there something wrong with my tests?
Short description about the setup:
I'm trying to implement a "basic" event store/ event-sourcing application using a RDBMS (in my case Postgres). The events are general purpose events with only some basic fields like eventtime, location, action, formatted as XML. Due to this general structure, there is now way of partitioning them in a useful way. The events are captured via a Java Application, that validate the events and then store them in an events table. Each event will get an uuid and recordtime when it is captured.
In addition, there can be subscriptions to external applications, which should get all events matching a custom criteria. When a new matching event is captured, the event should be PUSHED to the subscriber. To ensure, that the subscriber does not miss any event, I'm currently forcing the capture process to be single threaded. When a new event comes in, a lock is set, the event gets a recordtime assigned to the current time and the event is finally inserted into the DB table (explicitly waiting for the commit). Then the lock is released. For a subscription which runs scheduled for example every 5 seconds, I track the recordtime of the last sent event, and execute a query for new events like where recordtime > subscription_recordtime. When the matching events are successfully pushed to the subscriber, the subscription_recordtime is set to the events max recordtime.
Everything is actually working but as you can imagine, a single threaded capture process, does not scale very well. Thus the main question is: How can I optimise this and allow for example multiple capture processes running in parallel?
I already thought about setting the recordtime in the DB itself on insert, but since the order of commits cannot be guaranteed (JVM pauses), I think I might loose events when two capture transactions are running nearly at the same time. When I understand the DB generated timestamp currectly, it will be set before the actual commit. Thus a transaction with a recordtime t2 can already be visible to the subscription query, although another transaction with a recordtime t1 (t1 < t2), is still ongoing and so has not been committed. The recordtime for the subscription will be set to t2 and so the event from transaction 1 will be lost...
Is there a way to guarantee the order on a DB level, so that events are visible in the order they are captured/ committed? Every newly visible event must have a later timestamp then the event before (strictly monotonically increasing). I know about a full table lock, but I think, then I will have the same performance penalties as before.
Is it possible to set the DB to use a single threaded writer? Then each capture process would also be waiting for another write TX to finished, but on a DB level, which would be much better than a single instance/threaded capture application. Or can I use a different field/id for tracking the current state? Normal sequence ids will suffer from the same reasons.
Is there a way to guarantee the order on a DB level, so that events are visible in the order they are captured/ committed?
You should not be concerned with global ordering of events. Your events should contain a Version property. When writing events, you should always be inserting monotonically increasing Version numbers for a given Aggregate/Stream ID. That really is the only ordering that should matter when you are inserting. For Customer ABC, with events 1, 2, 3, and 4, you should only write event 5.
A database transaction can ensure the correct order within a stream using the rules above.
For a subscription which runs scheduled for example every 5 seconds, I track the recordtime of the last sent event, and execute a query for new events like where recordtime > subscription_recordtime.
Reading events is a slightly different story. Firstly, you will likely have a serial column to uniquely identify events. That will give you ordering and allow you to determine if you have read all events. When you read events from the store, if you detect a gap in the sequence. This will happen if an insert was in flight when you read the latest events. In this case, simply re-read the data and see if the gap is gone. This requires your subscription to maintain it's position in the index. Alternatively or additionally, you can read events that are at least N milliseconds old where N is a threshold high enough to compensate for delays in transactions (e.g 500 or 1000).
Also, bear in mind that there are open source RDBMS event stores that you can either use or leverage in your process.
Marten: http://jasperfx.github.io/marten/documentation/events/
SqlStreamStore: https://github.com/SQLStreamStore/SQLStreamStore
A program which I developed is using postgresql. That program is running a plpgsql function it is taking so long time(hours or days). I want to be sure that function is running during that long time.
How can I know that? I don't want to use "raise notice" in a loop in function because that will extend running time.
You can see if it's running by examining pg_stat_activity for the process. However, this won't tell you if the function is progressing.
You can check to see whether that backend is blocked on any locks by joining pg_stat_activity against pg_locks to see if there are any open (granted = False) locks for that table. Again, this won't tell you if it's progressing, just that if it isn't it's not stuck on a lock.
If you want to monitor a function's progress you will need to emit log messages or use one of the other hacks for monitoring progress. You can (ab)use NOTIFY with payload to LISTEN for progress messages. Alternately, you could create a sequence that you call nextval on each time you process an item in your procedure; you can then SELECT * FROM the_sequence_name; in another transaction to see the approximate progress.
In general I'd recommend setting client_min_messages to notice or above then RAISE LOG so you record messages that appear only in the logs, without being sent to the client. To reduce overhead, keep a counter and log every 100 or 1000 or whatever iterations of your loop so you only log occasionally. There's a cost to updating the counter, for sure, but it's pretty low compared to the cost of a big, slow PL/PgSQL procedure like this.
Upon client's request, I was asked to turn a web application on read-uncommitted isolation level (it's a probably a bad idea...).
While testing if the isolation was in place, I inserted a row without committing (DBVisualiser : #set autocommit off + stop VPN connection to the database) and I started testing my application towards that uncommitted insert.
select * from MYTABLE WHERE MY ID = "NON_COMMIT_INSERT_ID" WITH UR is working fine. Now I would like to "delete" this row and I did not find any way...
UPDATE : The row did disappear after some time (about 30min). I guess there is some kind of timeout before a rollback is automatically issued. Is there any way to remove an uncommitted row before this happens ?
I think that this will not be possible using normal SQL statements - the only way to delete the row will be to rollback the transaction which inserted it (or wait for tx to commit, then delete). As you have disconnected from DB on network level, then 30 minutes you talk about is probably TCP timeout enforced on operating system level. After TCP connection has been terminated, DB2 rollbacked client's transaction automatically.
Still I think you could administratively force application to disconnect from database (using FORCE APPLICATION with handle obtained from LIST APPLICATIONS) which should rollback the transaction, see http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/core/r0001951.htm for details on these commands.
It's one thing to read uncommitted rows from a data base. There are sometimes good reasons (lack of read locks) for doing this,
It's another to leave inserted, updated, or deleted rows on a data base without a commit or roll back. You should never do this. Either commit or roll back after a database change.