POSTGRES: pg_cancel_backend does not always work (reason behind it)

POSTGRES: pg_cancel_backend does not always work (reason behind it) - postgresql

I'm currenty using postgres as my database engine, which i've hooked up to a web application.
I'm have noticed on some occasions that there are locks that get accumlated in the database, mainly AccessSharedLocks (when running the query: select * from pg_locks).
One thing I have noticed is that to cancel a process that is acquiring a lock you can use pg_cancel_backend(pid), but sometime i realise that this doesnt always work!! And i'm curious to know why. Is it that this function sends a SIGINT to the database to shut it down gracefully? meaning that it wont shut it down immediately?
There is pg_terminate_backend, but i prefer to not use this.
Any advice on why pg_cancel_backend intermittently works (or at least some explanation) would be grateful).
thanks.

pg_cancel_backend and pg_terminate_backend send signals to the process.
The backend checks ever so often for pending interrupts, but it can happen that execution is in a place where it takes a while until that happens.
Canceling a query won't get rid of the locks until the transaction is closed.

Related

Identify which service listens for notifications but doesn't consume them

I have a huge database where, in some places, we use Postgres notifications. We noticed that the queue size is increasing. The way we check is executing this simple command: select pg_notification_queue_usage();.
When it reaches 100% then all messages are gone. The problem I have is that I don't know who listens to notifications and what channels we have there. I identified only two services that listen for those notifications but it seems that's not the case.
My task is to find other places where we use notifications (consume or produce) to find the root cause. How can I do it?
The only thing I found about it is the query select pg_notification_queue_usage(); but it seems that Postgres doesn't provide other useful functions related to this feature.
I did some experiments regarding it. I launched a local Postgres instance and started publishing notifications there. Everything worked as expected. When I did it once again but without actual consuming notifications, the queue size started to grow. That's what I expected, tho.
Then, I restarted the process and the queue size dropped to 0. That's exactly what the docs say about it.
A session's listen registrations are automatically cleared when the session ends.
On the production, we did exactly the same - we restarted known services but the notification queue didn't drop to 0 as we expected.
It means, there's something else listening to one of the channels but it doesn't consume it or does it too slow.
Is there any way of identifying such listeners?

Debug Postgres 'too many notifications in the NOTIFY queue'

I am using a Postgres table which gets 2000-3000 updates per second.
I am using for update this table queries generated with the update helper of pg-promise library.
Each update triggers a notify with pg_notify() function. Some nodejs scripts are handling these notifications. For some reason in Postgres logs keep appearing 'too many notifications in the NOTIFY queue' messages and also indication about the notify queue size which keep increasing up to 100%.
I read some posts like: https://postgrespro.com/list/thread-id/1557124
or https://github.com/hasura/graphql-engine/issues/6263
but I cannot find a way to debug this issue.
Which would be a good way to approach this situation?

Your listener doesn't seem to be consuming the notices fast enough, or possibly not at all. So the first step would be something like logging the processing of each notice from your app code, to figure out what is actually going on.

This might be because there is a long-running transaction that is blocking the release of older messages from the buffer. The process is explained in the manuals and is somewhat analoguous to vacuuming - old transactions need to finish in order to clean up old data.
A gotcha here is that any long-running query can hold up the cleanup; for me it was the process that was running the Listen - it was designed to just keep running forever. PG server log has a backend PID that might be the culprit, so you can look it up in pg_stat_activity and proceed from there.

How to persist and replay NestJS CQRS event and saga across restart?

I am making an application which will need to use NestJS' CQRS module, as the requirements naturally lend themselves to that pattern.
Updates to the application logic are expected to be frequent and to happen during busy hours (that's just how my management works...), so the application needs to be able to restart gracefully. However, this means that events started just before the shutdown may not finish, or even if they do, some sagas may not trigger due to some events having happened before the restart... I'd like to ensure that doesn't happen.
I'm aware of NestJS' OnApplicationShutdown and OnApplicationBootstrap hooks, which is exactly for this purpose, but what I'm not sure is what I should do there. How can I capture all events that have unfinished handlers and sagas? Then after a restart, how can I make the event bus aware of the events monitored by sagas, without executing the already executed handlers?
I guess the second part could be worked around with a random ID per event/handler combo, that will be looked up in a log, and if present, the handler will be skipped, and if not, it will be executed and added to the log... But even with such a workaround, I don't see how I could do the first part. There will be a lot of events, and sagas (by definition) execute commands, meaning they have side effects... Even if all commands can become idempotent, the sheer quantity of events and frequent restarts means restarting from the very first command is a no go.
I've seen this package but I'm not sure if it solves this particular use case, or if it's really just logging the events, and pretty much nothing more.

PostgreSQL backend behavior upon receiving "Terminate" ('X') after "COMMIT"

We run a postgres server v9.2.8, and use epgsql (erlang) as a client library. And in some cases, which we had on production but weren't able to reproduce in dev environment, we're loosing data.
A function in our application (it should be killed) allows an operator to change session parameters on a running connection. Since connection is usually always busy on production, a "SET SESSION bla-bla" query always crashes pgsql_connection process.
Before crashing, pgsql_connection sends a "Terminate" ('X') signal via pgsql_sock (a wrapper around tcp socket) to a backend. At the same time another erlang process (let's call it "worker") is waiting for a response from postgres backend using the same socket.
Now the question: is it possible that upon receiving a "Terminate" signal from a client, backend can cancel last transaction even if it has sent an "OK" on "COMMIT" statement already?
Because if it is possible, a worker will have a chance to report to the main application process about successfully written transaction while indeed the transaction has been cancelled.
Or, where can I read more details about this? Documentation says (http://www.postgresql.org/docs/9.2/static/protocol-flow.html):
For either normal or abnormal termination, any open transaction is
rolled back, not committed. One should note however that if a frontend
disconnects while a non-SELECT query is being processed, the backend
will probably finish the query before noticing the disconnection. If
the query is outside any transaction block (BEGIN ... COMMIT sequence)
then its results might be committed before the disconnection is
recognized.
– not a crystal clear statement.

Now the question: is it possible that upon receiving a "Terminate" signal from a client, backend can cancel last transaction even if it has sent an "OK" on "COMMIT" statement already?
No. that is fundamentally impossible. If it's committed, it's committed, and there's no going back. That's what "commit" means.
The only time Pg might return success before the commit hits disk and is persistent is if you told it to by setting synchronous_commit = off.
If you're seeing anything different happening then most likely it's a result of attempting to share a single connection between multiple processes (as you establish the connection before fork()) without proper locking or other mutual exclusion to ensure that the connection is locked while a command is in-flight.
Note that the reverse isn't true, which might be what you're thinking of with the quoted documentation passage. A transaction can get committed without returning a successful OK to the client if the client goes away (crashes, loses connection, etc) after issuing the commit command.
What the application is doing, where it sends out-of-sync messages on the wire protocol, is totally broken. It's guaranteed to cause unpredictable problems. The protocol is somewhat robust, so you're not likely to get things like an unintended commit, but you're very likely to get transactions aborted or whole sessions disconnected suddenly.
If you need to be able to roll back/abort committed transactions, then your application design has problems. You're not really ready to commit when you say COMMIT. You would have the same problem if the app process crashed or the whole server crashed between Pg committing the transaction and you doing whatever you need to do.
If you cannot fix the app design to avoid this then you will have to use two-phase transactions, either directly using PREPARE TRANSACTION then COMMIT PREPARED, or indirectly via the XA API. This has significant costs in performance and management overhead, but it's the only option if you need to do special work after database commit but before you're really "done".
The docs you quote are talking about the case where the app has sent a COMMIT but then disconnects before receiving the backend's acknowledgement of the commit. Because TCP/IP is buffered there's no guarantee the COMMIT got flushed to Pg, and if it did there's no guarantee it doesn't accompany the RST that terminates the connection. So in this specific case it's somewhat uncertain whether the transaction will commit or not. An application for which this is a problem would need to have a way of checking whether the last unit of work committed or not when it resumes work, or if it can't do that use two-phase transactions. The docs you quote say nothing about being able to cancel a commit after it's completed, because you can't. Ever.
Assuming that the app has to do some kind of extra work after commit, like moving a file or sending an email or doing work on another data store, then you're probably going to need two-phase transactions. Even then you're vulnerable to issues unless all parties in the distributed transaction support two phase commit, because your "other bit" could get done then your worker or server could crash before the confirmation of its completion is sent to the database to finish phase II of the commit.
You can keep your own two phase commit log of sorts in the DB instead of using true 2PC:
Do the main database work and write a record to the work log table that says "I've done the work in the database and I'm about to do the next part".
Do the next part; and
Update the work log to say the next part is done.
... but this has the same problem, where a crash between parts 2 and 3 causes the app to forget that it did part 2 and repeat it on startup. If you can't live with that, you need to find a way to make part 2 commit completion verifiable, so you can tell if it's done or not, or find a way to make it capable of doing 2-phase commit.
To learn more about this topic, read about XA, distributed transactions, two-phase commit, etc.

Sqlite database is locked and database is busy Issues

Database is locked: Is this only comes due to not finalised or close Db statements missing?.
Actually I am using db access in background as well so my some other methods can be access db at same time.
Can anyone please let me know when Database is locked and when database is busy issues comes.?
My prepared statements is execute and not an error in database but still unable to get data?
Any Help?

Database is Locked error comes when you are using the same database in some place else maybe another application by getting the lock on it and the application is still not released the lock on database.
I dont know about database is busy error. this link might answer your question SQLite Exception: SQLite Busy
Hope it somehow helps you.

The sqlite_busy can come in these cases
1. When one thread has locked a database using BEGIN and another thread is trying to write to the same database.
2. When a particular row is being updated by one thread and the same row is being read by another thread.
In both the cases install a busy handler to the database. The busy handler should attempt the execution of the statement after few milliseconds.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse