How to resolve slow downs induced by HADR_FLAGS = STANDBY_RECV_BLOCKED? - db2

We had a severe slow down of our applications in our HADR environment. We are seeing the following when we run db2pd -hadr:
HADR_FLAGS = STANDBY_RECV_BLOCKED
STANDBY_RECV_BUF_PERCENT = 100
STANDBY_SPOOL_PERCENT = 100
These recovered later and seem better now with STANDBY_SPOOL_PERCENT coming down gradually. Can you please help understand the implications of the values of the above parameters and what needs to be done to ensure we don't get into such a situation?

This isssue is most likely triggered by a peak amount of transactions occuring on the primary. The standby receive buffer and spool got saturated. Unless you are running with configuration parameter HADR_SYNCMODE in SUPERASYNC mode, you could fall into this situation. The slow down on application was induced by the primary waiting for an acknowledgement from the standby that it had received the log file, but since its spool and receive buffers were full at the time, the standby was delaying this acknowledgement.
You could consider setting HADR_SYNCMODE to SUPERASYNC, but this would also mean that the system will be more vulnerable to data loss should there be a failure on the primary. To manage these temporary peaks, you can make either of the following configuration changes:
Increase the size of the log receive buffer on the standby database
by modifying the value of the DB2_HADR_BUF_SIZE registry
variable.
Enable log spooling on a standby database by setting the
HADR_SPOOL_LIMIT
For further details, you can refer to HADR Performance Guide

Related

RDS Postgres "canceling statement due to conflict with recovery" [duplicate]

I'm getting the following error when running a query on a PostgreSQL db in standby mode. The query that causes the error works fine for 1 month but when you query for more than 1 month an error results.
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
Any suggestions on how to resolve? Thanks
No need to touch hot_standby_feedback. As others have mentioned, setting it to on can bloat master. Imagine opening transaction on a slave and not closing it.
Instead, set max_standby_archive_delay and max_standby_streaming_delay to some sane value:
# /etc/postgresql/10/main/postgresql.conf on a slave
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s
This way queries on slaves with a duration less than 900 seconds won't be cancelled. If your workload requires longer queries, just set these options to a higher value.
Running queries on hot-standby server is somewhat tricky — it can fail, because during querying some needed rows might be updated or deleted on primary. As a primary does not know that a query is started on secondary it thinks it can clean up (vacuum) old versions of its rows. Then secondary has to replay this cleanup, and has to forcibly cancel all queries which can use these rows.
Longer queries will be canceled more often.
You can work around this by starting a repeatable read transaction on primary which does a dummy query and then sits idle while a real query is run on secondary. Its presence will prevent vacuuming of old row versions on primary.
More on this subject and other workarounds are explained in Hot Standby — Handling Query Conflicts section in documentation.
There's no need to start idle transactions on the master. In postgresql-9.1 the
most direct way to solve this problem is by setting
hot_standby_feedback = on
This will make the master aware of long-running queries. From the docs:
The first option is to set the parameter hot_standby_feedback, which prevents
VACUUM from removing recently-dead rows and so cleanup conflicts do not occur.
Why isn't this the default? This parameter was added after the initial
implementation and it's the only way that a standby can affect a master.
As stated here about hot_standby_feedback = on :
Well, the disadvantage of it is that the standby can bloat the master,
which might be surprising to some people, too
And here:
With what setting of max_standby_streaming_delay? I would rather
default that to -1 than default hot_standby_feedback on. That way what
you do on the standby only affects the standby
So I added
max_standby_streaming_delay = -1
And no more pg_dump error for us, nor master bloat :)
For AWS RDS instance, check http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication and resume after the query:
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
I'm going to add some updated info and references to #max-malysh's excellent answer above.
In short, if you do something on the master, it needs to be replicated on the slave. Postgres uses WAL records for this, which are sent after every logged action on the master to the slave. The slave then executes the action and the two are again in sync. In one of several scenarios, you can be in conflict on the slave with what's coming in from the master in a WAL action. In most of them, there's a transaction happening on the slave which conflicts with what the WAL action wants to change. In that case, you have two options:
Delay the application of the WAL action for a bit, allowing the slave to finish its conflicting transaction, then apply the action.
Cancel the conflicting query on the slave.
We're concerned with #1, and two values:
max_standby_archive_delay - this is the delay used after a long disconnection between the master and slave, when the data is being read from a WAL archive, which is not current data.
max_standby_streaming_delay - delay used for cancelling queries when WAL entries are received via streaming replication.
Generally, if your server is meant for high availability replication, you want to keep these numbers short. The default setting of 30000 (milliseconds if no units given) is sufficient for this. If, however, you want to set up something like an archive, reporting- or read-replica that might have very long-running queries, then you'll want to set this to something higher to avoid cancelled queries. The recommended 900s setting above seems like a good starting point. I disagree with the official docs on setting an infinite value -1 as being a good idea--that could mask some buggy code and cause lots of issues.
The one caveat about long-running queries and setting these values higher is that other queries running on the slave in parallel with the long-running one which is causing the WAL action to be delayed will see old data until the long query has completed. Developers will need to understand this and serialize queries which shouldn't run simultaneously.
For the full explanation of how max_standby_archive_delay and max_standby_streaming_delay work and why, go here.
It might be too late for the answer but we face the same kind of issue on the production.
Earlier we have only one RDS and as the number of users increases on the app side, we decided to add Read Replica for it. Read replica works properly on the staging but once we moved to the production we start getting the same error.
So we solve this by enabling hot_standby_feedback property in the Postgres properties.
We referred the following link
https://aws.amazon.com/blogs/database/best-practices-for-amazon-rds-postgresql-replication/
I hope it will help.
Likewise, here's a 2nd caveat to #Artif3x elaboration of #max-malysh's excellent answer, both above.
With any delayed application of transactions from the master the follower(s) will have an older, stale view of the data. Therefore while providing time for the query on the follower to finish by setting max_standby_archive_delay and max_standby_streaming_delay makes sense, keep both of these caveats in mind:
the value of the follower as a standby / backup diminishes
any other queries running on the follower may return stale data.
If the value of the follower for backup ends up being too much in conflict with hosting queries, one solution would be multiple followers, each optimized for one or the other.
Also, note that several queries in a row can cause the application of wal entries to keep being delayed. So when choosing the new values, it’s not just the time for a single query, but a moving window that starts whenever a conflicting query starts, and ends when the wal entry is finally applied.

How long does pg_statio_all_tables data live for?

I have read https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STATIO-ALL-TABLES-VIEW
But it does not clarify whether this data is over the whole history of the DB (unless you reset the stats counters). Or whether it is just for recent data.
I am seeing a low cache-hit ratio on one of my tables, but I recently added an index to it. I'm not sure if it is low from all the pre-index usage, or if it is still low, even with the index.
Quote from the manual
When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat subdirectory, so that statistics can be retained across server restarts. When recovery is performed at server start (e.g., after immediate shutdown, server crash, and point-in-time recovery), all statistics counters are reset.
I read this as "the data is preserved as long as the server is restarted cleanly".
So the data is only reset if recovery was performed or it has been reset manually using pg_stat_reset ().

log shipping process (archive_timeout)

I'm very new in postgresql. I want to ask about log shipping replication process. I know that the timeout parameter is optional in log shipping process. It specifies that we don't want that postgreSQL to wait until the WAL files contain 16 MB to be sent as it does by default. My question is, it is better has timeout parameter(eg : archive_timeout = 60) or not? is it when we do timeout parameter the process of WAL file in log shipping is faster than default (the 0 default value indicates that it will until the WAL filled)? why?
I'm sorry i'm still confused in this situation.
If you want timely replication, I suggest enabling streaming replication as well as log shipping.
The main purpose of archive_timeout is to ensure that, when you're using log shipping for PITR backups, there's a maximum time window of data loss in situations where the server isn't generating lots of WAL so segment rotation would otherwise be infrequent.

Mongo continues to insert documents, slowly, long after script is quit

Do I have a zombie somewhere?
My script finished inserting a massive amount of new data.
However, the server continues with high lock rates and slowly inserting new records. It's been about an hour since the script that did the inserts finished, and the documents are still trickling in.
Where are these coming from and how to I purge the queue? (I refactored the code to use an index and want to redo the process to avoid the 100-200% lock rate)
This could be because of following scenarios,
1..Throughput bound Disk IO
One can look into following metrics using "mongostat" and "MongoDB Management Service":
Average Flush time (how long MongoDB's periodic sync to disk is taking)
IOStats in the hardware tab (look specifically IOWait)
Since the Disk IO is slower than CPU processing time, all the inserts get queued up, and this can continue for longer duration, one can check the server status using db.serverStatus() and look into "globalLock"(as Write acquires global lock) field for "currrentQueue" associated with the lock, to see number of writers in queue.
2..Another possible cause could be Managed Sharded Cluster Balancer has been put in On Status(which is by Default On)
If you have been working on clustered environment, whenever write operation starts Balancer automatically gets ON, in-order to keep the cluster in balanced state, which can continue moving chunks from one shard to another even after completion of your scripts. In such a case I would rather suggest to keep the balancer off while having bulk load, in such a case all your documents goes to single shard, but balancer can be kicked on at any downtime.
3..Write Concern
This may also contribute to problem slightly, if they are set to Replica Acknowledged or Acknowledged mode, it depends on you, based on your type of data, to decide on these concerns.

Under what circumstances is a change not written to a MongoDB database when safe mode is off?

I have safe mode turned off in my MongoDB database because none of the data being written is absolutely 100% mission critical and the gain in insertion speed is very important, but I would really prefer if all of the data is written to the database.
My understanding is that with journaling turned on and safe mode turned off, if the server crashes in the 100ms between when a write request is received and the data is output to the journal, the data can be lost.
If the data is successfully written to the journal, is it a pretty safe bet, even if the database is lagging due to heavy load, that the data will end up in the database when the database catches up and is able to process what's in the journal? Or is my understanding of what the journal does flawed? Are there any other circumstances under which inserted data may be lost?
What happens if I update a document a fraction of a second before another process attempts to read it, but the changes haven't been committed to the collection yet? Will the read block until the insert has completed?
The read will only be blocked until the insert has completed if the read is requested on the same connection as the write. There is no guarantee that once the data is written, it will be immediately visible to other connections unless proper getLastError is used.
Data is processed and put in memory mapped region of data before journaling. However, it may not be "fsynced" to disk as often as journaling. This means that even though the load is high, the data should be eventually updated and become visible to other connections. Journaling data is only used to restore durability when mongod instance unexpectedly crashes.
Your data may be lost due to network interruption, disk corruption, index dup, etc.