log shipping process (archive_timeout) - postgresql

I'm very new in postgresql. I want to ask about log shipping replication process. I know that the timeout parameter is optional in log shipping process. It specifies that we don't want that postgreSQL to wait until the WAL files contain 16 MB to be sent as it does by default. My question is, it is better has timeout parameter(eg : archive_timeout = 60) or not? is it when we do timeout parameter the process of WAL file in log shipping is faster than default (the 0 default value indicates that it will until the WAL filled)? why?
I'm sorry i'm still confused in this situation.

If you want timely replication, I suggest enabling streaming replication as well as log shipping.
The main purpose of archive_timeout is to ensure that, when you're using log shipping for PITR backups, there's a maximum time window of data loss in situations where the server isn't generating lots of WAL so segment rotation would otherwise be infrequent.

Related

How long does pg_statio_all_tables data live for?

I have read https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STATIO-ALL-TABLES-VIEW
But it does not clarify whether this data is over the whole history of the DB (unless you reset the stats counters). Or whether it is just for recent data.
I am seeing a low cache-hit ratio on one of my tables, but I recently added an index to it. I'm not sure if it is low from all the pre-index usage, or if it is still low, even with the index.
Quote from the manual
When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat subdirectory, so that statistics can be retained across server restarts. When recovery is performed at server start (e.g., after immediate shutdown, server crash, and point-in-time recovery), all statistics counters are reset.
I read this as "the data is preserved as long as the server is restarted cleanly".
So the data is only reset if recovery was performed or it has been reset manually using pg_stat_reset ().

Is there a system procedure or method to find out transaction log full issue?

I'm looking for a easy method to find the culprit process holding the transaction log which is causing the pg_wal full isues.
The transaction log contains all transactions, and it does not contain a reference to the process that caused an entry to be written. So you cannot infer from WAL what process causes the data modification activity that fills your disk.
You can turn on logging (log_min_duration_statement = 0) and find the answer in the log file.
But I think that you are looking at the problem in the wrong way: the problem is not that WAL is generated, but that full WAL segments are not removed soon enough.
That can happen for a variety of reasons:
WAL archiving has problems or is too slow
a stale replication slot is blocking WAL removal
wal_keep_segments is too high

How to resolve slow downs induced by HADR_FLAGS = STANDBY_RECV_BLOCKED?

We had a severe slow down of our applications in our HADR environment. We are seeing the following when we run db2pd -hadr:
HADR_FLAGS = STANDBY_RECV_BLOCKED
STANDBY_RECV_BUF_PERCENT = 100
STANDBY_SPOOL_PERCENT = 100
These recovered later and seem better now with STANDBY_SPOOL_PERCENT coming down gradually. Can you please help understand the implications of the values of the above parameters and what needs to be done to ensure we don't get into such a situation?
This isssue is most likely triggered by a peak amount of transactions occuring on the primary. The standby receive buffer and spool got saturated. Unless you are running with configuration parameter HADR_SYNCMODE in SUPERASYNC mode, you could fall into this situation. The slow down on application was induced by the primary waiting for an acknowledgement from the standby that it had received the log file, but since its spool and receive buffers were full at the time, the standby was delaying this acknowledgement.
You could consider setting HADR_SYNCMODE to SUPERASYNC, but this would also mean that the system will be more vulnerable to data loss should there be a failure on the primary. To manage these temporary peaks, you can make either of the following configuration changes:
Increase the size of the log receive buffer on the standby database
by modifying the value of the DB2_HADR_BUF_SIZE registry
variable.
Enable log spooling on a standby database by setting the
HADR_SPOOL_LIMIT
For further details, you can refer to HADR Performance Guide

Setting wal_keep_segments for PostgreSQL hot_standby

I am having a trouble setting up a PostgreSQL hot_standby. When attempting to start the database after running pg_basebackup, I receive, FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 00000001000006440000008D has already been removed, in the postgresql.log. After a brief discussion in IRC, I came to understand the error likely originates from a too low wal_keep_segments setting for my write intensive database..
How might I calculate, if possible, the proper setting for wal_keep_segments? What is an acceptable value for this setting?
What I am working with:
Postgresql 9.3
Debian 7.6
wal_keep_segments could be estimated as the average number of new WAL segments per minute in the pg_xlog directory multiplied by the number of minutes across which you want to be safe for. Bear in mind that the rate is expected to increase after wal_level is changed from its default value of minimal to either archive or hot_standby. The only cost is disk space, which as you know by default is 16 MB per segment.
I typically use powers of 2 as values. At the rate of about 1 segment per minute, a value of 256 gives me about 4 hours in which to set up the standby.
You could alternatively consider using WAL streaming with pg_basebackup. This is per its --xlog-method=stream option. Unfortunately, at least as of 2013, per a discussion on a PostgreSQL mailing list, setting wal_keep_segments to a nonzero value may still be recommended - this is to prevent risking the stream from being unable to keep up. If you do use pg_basebackup though, also don't forget --checkpoint=fast.

Confirm basic understanding of MongoDB's acknowledged write concern

Using MongoDB (via PyMongo) in the default "acknowledged" write concern mode, is it the case that if I have a line that writes to the DB (e.g. a mapReduce that outputs a new collection) followed by a line that reads from the DB, the read will always see the changes from the write?
Further, is the above true for all stricter write concerns than "acknowledged," i.e. "journaled" and "replica acknowledged," but not true in the case of "unacknowledged"?
If the write has been acknowledged, it should have been written to memory, thus any subsequent query should get the current data. This won't work if you have a replica set and allow reads from secondaries.
Journaled writes are written to the journal file on disk, which protects your data in case of power / hardware failures, etc. This shouldn't have an impact on consistency, which is covered as soon as the data is in memory.
Any replica configuration in the write concern will ensure that writes need to be acknowledged by the majority / all nodes in the replica set. This will only make a difference if you read from replicas or to protect your data against unreachable / dead servers.
For example in case of WiredTiger database engine, there'll be a cache of pages inside memory that are periodically written and read from disk, depending on memory pressure. And, in case of MMAPV1 storage engine, there would be a memory mapped address space that would correspond to pages on the disk. Now, the secondary structure that's called a journal. And a journal is a log of every single thing that the database processes - notice that the journal is also in memory.
When does the journal gets written to the disk?
When the app request something to the mongodb server via a TCP connection - and the server is gonna process the request. And it's going to write it into the memory pages. But they may not write to the disk for quite a while, depending on the memory pressure. It's also going to update request into the journal. By default, in the MongoDB driver, when we make a database request, we wait for the response. Say an acknowledged insert/update. But we don't wait for the journal to be written to the disk. The value that represents - whether we're going to wait for this write to be acknowledged by the server is called w.
w = 1
j = false
And by default, it's set to 1. 1 means, wait for this server to respond to the write. By default, j equals false, and j which stands for journal, represents whether or not we wait for this journal to be written to be written to the disk before we continue. So, what are the implications of these defaults? Well, the implications are that when we do an update/insert - we're really doing the operation in memory and not necessarily to the disk. This means, of course, it's very fast. And periodically (every few seconds) the journal gets written to the disk. It won't be long, but during this window of vulnerability when the data has been written into the server's memory into the pages, but the journal has not yet been persisted to the disk, if the server crashed, we could lose the data. We also have to realize that, as a programmer just because the write came back as good and it was written successfully to the memory. It may not ever persist to disk if the server subsequently crashes. And whether or not this is the problem depends on the application. For some applications, where there are lots of writes and logging small amount of data, we might find that it's very hard to even keep up with the data stream, if we wait for the journal to get written to the disk, because the disk is going to be 100 times, 1,000 times slower than memory for every single write. But for other applications, we may find that it's completely necessary for us to wait for this to be journaled and to know that it's been persisted to the disk before we continue. So, it's really upto us.
The w and j value together are called write concern. They can be set in the driver, at the collection level, database level or a client level.
1 : wait for the write to be acknowledged
0 : do not wait for the write to be acknowledged
TRUE : sync to journal
FALSE : do not sync to journal
There are also other values for w as well that also have some significance. With w=1 & j=true we could make sure that those writes have been persisted to disk. Now, if the writes have been written to the journal, then what happens is if the server crashes, then even though the pages may not be written back to disk yet, on recovery, the server can look at the journal on the disk - the mongod process and recreate all the writes that were not yet persisted to the pages. Because, they've been written to the journal. So, that's why this gives us a greater level of safety.