Identify which service listens for notifications but doesn't consume them - postgresql

I have a huge database where, in some places, we use Postgres notifications. We noticed that the queue size is increasing. The way we check is executing this simple command: select pg_notification_queue_usage();.
When it reaches 100% then all messages are gone. The problem I have is that I don't know who listens to notifications and what channels we have there. I identified only two services that listen for those notifications but it seems that's not the case.
My task is to find other places where we use notifications (consume or produce) to find the root cause. How can I do it?
The only thing I found about it is the query select pg_notification_queue_usage(); but it seems that Postgres doesn't provide other useful functions related to this feature.
I did some experiments regarding it. I launched a local Postgres instance and started publishing notifications there. Everything worked as expected. When I did it once again but without actual consuming notifications, the queue size started to grow. That's what I expected, tho.
Then, I restarted the process and the queue size dropped to 0. That's exactly what the docs say about it.
A session's listen registrations are automatically cleared when the session ends.
On the production, we did exactly the same - we restarted known services but the notification queue didn't drop to 0 as we expected.
It means, there's something else listening to one of the channels but it doesn't consume it or does it too slow.
Is there any way of identifying such listeners?

Related

Debug Postgres 'too many notifications in the NOTIFY queue'

I am using a Postgres table which gets 2000-3000 updates per second.
I am using for update this table queries generated with the update helper of pg-promise library.
Each update triggers a notify with pg_notify() function. Some nodejs scripts are handling these notifications. For some reason in Postgres logs keep appearing 'too many notifications in the NOTIFY queue' messages and also indication about the notify queue size which keep increasing up to 100%.
I read some posts like: https://postgrespro.com/list/thread-id/1557124
or https://github.com/hasura/graphql-engine/issues/6263
but I cannot find a way to debug this issue.
Which would be a good way to approach this situation?
Your listener doesn't seem to be consuming the notices fast enough, or possibly not at all. So the first step would be something like logging the processing of each notice from your app code, to figure out what is actually going on.
This might be because there is a long-running transaction that is blocking the release of older messages from the buffer. The process is explained in the manuals and is somewhat analoguous to vacuuming - old transactions need to finish in order to clean up old data.
A gotcha here is that any long-running query can hold up the cleanup; for me it was the process that was running the Listen - it was designed to just keep running forever. PG server log has a backend PID that might be the culprit, so you can look it up in pg_stat_activity and proceed from there.

Is there a way to rely on Postgres Notify/Listen mechanism?

I have implemented a Notify/Listen mechanism, so when a special request is sent to the web server, using notify I can notify the workers (in Python) that there's a pending request waiting to be processed.
The implementation works fine, but the problem is that if the workers server is restarting, the notification gets lost, since at that particular time there's no listener.
I can implement a service like MQRabbit or similar, but my needs are so simple that implement such a monster is too much.
Is there any way, a configuration variable perhaps, that can give some persistence to the notification mechanism?
Thanks in advance
I don't think there is a way to persist notification channels, but you can simply store the pending requests to a table, and have the worker check for any missed work on startup.
Either a timestamp or a pending/completed flag would work, depending on what kind of work it's doing.
For consistency, you can have the NOTIFY fire from an INSERT trigger on the queue table, and have the worker always check for any remaining work (not just a specific request) when notified.

Silly WebSphere MQ questions

I have two very basic questions on WebSphere MQ - given that I had been kind of administrating it for past few months I tend to think that these are silly questions
Is there a way to "deactivate" a
queue ? (for example through a
runmqsc command or through the
explorer interface) - I think not. I
think what I can do is just delete
it.
What will happen if I create a
remote queue definition if the real
remote queue is not in place? Will
it cause any issues on the queue
manager? - I think not. I think all
I will have are error messages in
the logs.
Please let me know your thoughts.
Thanks!
1 Is there a way to "deactivate" a
queue?
Yes. You can change the queue attributes like so:
ALTER Q(QUEUE_NAME) PUT(DISABLED) GET(DISABLED)
Any connected applications will receive a return code on the next API call telling them that the queue is no longer available for PUT/GET. If these are well-behaved programs they will then report the error and either end or go into a retry loop.
2 What will happen if I create a
remote queue definition if the real
remote queue is not in place?
The QRemote definition will resolve to a transmit queue. If the message can successfully be placed there your application will receive a return code of zero. (Any unsuccessful PUT will be due to hitting MAXDEPTH or other local problem not connected to the fact that the remote definition does not exist.)
The problem will be visible when the channel tries to deliver the message. If the remote QMgr has a Dead Letter Queue, the message will go there. If not, it will be backed out onto the local XMitQ and the channel will stop.

message queue for iOS / iPad - something like MSMQ?

I have an iPad app that works both on and offline but when I am offline there are web service calls that will need to be made once online availability is an option again.
Example:
A new client is added to the app, this needs to be sent to the web service but since we are offline we dont want to slow the user down so we let them add locally and keep going but we need to remember that that call needs to be made to the web service when we can. Same thing for placing orders and such.
Is there some sort of queue that can be setup that will fire once we have connectivity?
I don't think the overhead of a heavyweight tool like MSMQ is needed for a simple action. You can use Core Data, persist managed objects with the data needed to call the web service, and only delete each managed object after a successful post. There might or might not be a way to capture an event when connectivity starts, but you can certainly create a repeating NSTimer when the first message is queued and stop it when there are no messages in the queue.
This library handles offline persistent message queueing for situations like you describe. It says alpha from a year ago, but I have confirmed it is used in production apps:
https://github.com/gcamp/IPOfflineQueue

Reconnect logic with connectivity notifications

Say I have an application that wants a persistent connection to a server. How do I implement connection/re-connection logic so that I'm not wasting resources (power/bandwidth) and I have fast reconnect time when connectivity appears/improves? If I only use connectivity notifications, I can get stuck on problems not related to the local network.
Bonus if you could show me the C# version.
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
This is a very "huge" question. I can say that we use an O/R Mapper and each "query" to the database needs an object called PersistenceBroker. This class is in charge of all the DB Stuff related to connecting, authenticating etc.
We've written a PersistenceBrokerFactory.GetCurrentBroker() which returns the "working" broker. If the DB suddenly fails (for whatever reason), the CONN object will "timeout()" after 30secs (or whatever you define). If that happens, we show the user that he/she is offline and display a reconnect button.
On the other hand, to provide a visual indication that the user has connectivity, we have a thread running in the background, that checks for Internet connectivity every 15 seconds. We do 1 ping to google.com. ;) If that fails, we assume Internet is somehow broken, and we update a status bar.
I could show you all that code for the network health monitor if you wanted. I took some bits from google and other I made myself :)