[RED] restartpoint complete: wrote 3392 buffers (10.9%);
0 transaction log file(s) added, 0 removed, 9 recycled;
write=340.005 s, sync=0.241 s, total=340.257 s;
sync files=86, longest=0.121 s, average=0.002 s
and
[RED] recovery restart point at 2AA/A90FB758 Detail: last completed transaction was at log time 2015-01-21 14:00:18.782442+00
RED database is a follower of master database. What these log entries mean??
These are informational log messages telling you that the server finished creating a restart point completed. There should be other messages telling that the server started creating the restart point.
Restart points are described at http://www.postgresql.org/docs/9.0/static/wal-configuration.html
A restart point is a point in the transaction history at which a future run of this server could restart without needing to use any write-ahead log file, if the server crashes. The system creates restarts points periodically, based on how much transaction history happens and the passage of time.
The log_checkpoints parameter in the Postgresql configuration file determines whether you get these log messages when the restart points get created.
Related
I have some code that will perform a pg_rewind if it detects that the standby is out of sync. However I saw in the documentation that it requires that the database was gracefully shutdown before this command is run. I would like to be able to detect if postgres did not shut down gracefully so I can:
start and stop postgres ahead of time so pg_rewind will work
know if I should run some checks on my data to see if it is ok
I'm assuming that having it shut down non-gracefully either means it crashed, the server crashed or was told to shutdown immediately, so it would be nice to know if something bad happened and I should do something like run pg_checksums.
pg_rewind and pg_checksums both require a cleanly shutdown server.
You could probably try and replicate PostgreSQL's own checks that normally lead to a "database system was not properly shut down" entry that appears in the server log after startup. But you can just as well simply use those - and since they require a startup, you can also let it attempt to recover and perform a fresh, clean, graceful smart shutdown. To avoid any immediate connection attempts, you could use a different port for that cycle.
For a regular server, checking pg_is_in_recovery(); after startup would be some indication of a non-graceful shutdown taking place earlier, which causes the db to enter recovery mode on the next startup. However, by design a standby always stays in recovery mode until promoted, so that won't mean the same thing here.
I have a database hosted at AWS RDS. The db was hosted for close to 3 years now and seemd to work without issues. Without any significant changes (besides writing new records in the DB), the connections to it started to hang/timeout. Sometimes a connection goes through but it takes about 30-60 seconds, and then the connection drops shortly after.
In the postgres error log, there is nothing particularly concerning. The most relevant thing I found is:
2022-11-15 14:08:11 UTC::#:[365]:LOG: checkpoint starting: time
2022-11-15 14:08:12 UTC::#:[365]:LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.277 s, sync=0.040 s, total=0.838 s; sync files=1, longest=0.040 s, average=0.040 s; distance=65535 kB, estimate=65536 kB
2022-11-15 14:10:37 UTC::#:[368]:WARNING: worker took too long to start; canceled
2022-11-15 14:10:52 UTC::#:[3350]:WARNING: autovacuum worker started without a worker entry
Running telnet to the DB seems to pass, and on the AWS dashboard I don't see anything particularly concerning - normal number of connections, CPU/mem usage.
I tried restarting the db instance, upgrading to a newer version of postgres, and restarting the clients, but to no avail.
Any ideas what I can look at, in order to pinpoint the issue?
How to stop collection cloned even when all the data is restored on recovered instance using oplog replay.
Replication Scenario
I have a 3 node Replcation set up.
Load
There is continous load, data keeps adding every day.
and we have oplog backups every 2 housrs. Inspite of the oplog backups set for ever 2 hours,
we have some of transactions roll off from the oplog. That means we might be miss some records when we replay those oplogs.
Scenario.
In a replication scenario we have one of the secondaries not responding and by the time we join it back to the replication set
the minimum oplog timestamp goes past the oplog in the failed instance and the failed instance tries to catch up but it gets into a recovering mode.
from the
log message on the recovering instance.
2019-02-13T15:49:42.346-0500 I REPL [replication-0] We are too stale to use primaryserver3:27012 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1550090168, 1) is before their earliest timestamp: Timestamp(1550090897, 28907) for 1min until: 2019-02-13T15:50:42.346-0500
2019-02-13T15:49:42.347-0500 I REPL [replication-0] sync source candidate: primaryserver3:27012
2019-02-13T15:49:42.347-0500 I ASIO [RS] Connecting to primaryserver3:27012
2019-02-13T15:49:42.348-0500 I REPL [replication-0] We are too stale to use primaryserver3:27012 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1550090168, 1) is before their earliest timestamp: Timestamp(1550090897, 22809) for 1min until: 2019-02-13T15:50:42.348-0500
2019-02-13T15:49:42.348-0500 I REPL [replication-0] could not find member to sync from
To bring this instance at par with Primary, We make this "RECOVERING" instance as a "new PRIMARY" and apply all oplog backups taken till present insert. after the oplogs are applied Record count on both the servers match. Now when i join the recovering instance(ie "new PRIMARY") back to the replication set,
i see the logs showing "initial sync" which is supposed to do and then seeing the below log
2019-03-01T12:11:58.327-0500 I REPL [repl writer worker 4] CollectionCloner ns:datagen_it_test.test finished cloning with status: OK
2019-03-01T12:12:40.518-0500 I REPL [repl writer worker 8] CollectionCloner ns:datagen_it_test.link finished cloning with status: OK
Where the collections are cloned again.
My question is Why does it clone again to get the data. We have the data restored in the "recovering" instance records all match.
How to stop the cloning happening.
As per the MongoDB documentation
A replica set member becomes “stale” when its replication process
falls so far behind that the primary overwrites oplog entries the
member has not yet replicated. The member cannot catch up and becomes
“stale.” When this occurs, you must completely resynchronize the
member by removing its data and performing an initial sync.
This tutorial addresses both resyncing a stale member and creating a
new member using seed data from another member, both of which can be
used to restore a replica set member. When syncing a member, choose a
time when the system has the bandwidth to move a large amount of data.
Schedule the synchronization during a time of low usage or during a
maintenance window.
MongoDB provides two options for performing an initial sync:
Restart the mongod with an empty data directory and let MongoDB’s
normal initial syncing feature restore the data. This is the more
simple option but may take longer to replace the data.
See Automatically Sync a Member.
Restart the machine with a copy of a recent data directory from
another member in the replica set. This procedure can replace the data
more quickly but requires more manual steps.
See Sync by Copying Data Files from Another Member.
Step by step procedure is available in
Resync a Member of a Replica Set
I am trying to set logical replication between 2 cloud instances both with Debian 9 and PG 11.1. The command CREATE PUBLICATION on master was successful, but when I start the command CREATE SUBSCRIPTION on the intended logical replica, the command hangs indefinitely.
On the master I can see that the replication slot was created and is active and I can see a new walsender process created and "waiting" and in the log on the master I see these these lines:
2019-01-14 14:20:39.924 UTC [8349] repl_user#db LOG: logical decoding found initial starting point at 7B0/6C777D10
2019-01-14 14:20:39.924 UTC [8349] repl_user#db DETAIL: Waiting for transactions (approximately 2) older than 827339177 to end.
But that is all. The command CREATE SUBSCRIPTION never ends.
Master is a db with heavy inserts, like 100s per minute, but they are all always committed. So there should not be any long time uncommitted transactions.
I tried to google for this problem but did not find anything. What am I missing?
Since the databases are “in the cloud”, you don't know where they really are.
Odds are that they are actually in the same database cluster, which would explain the deadlock you see: CREATE SUBSCRIPTION waits until all concurrent transactions on the cluster that contains the replication source database are finished before it can create its replication slot, but since both databases are in the same cluster, it waits for itself to finish, which obviously won't happen.
The solution is to explicitly create a logical replication slot in the source database and to use that existing slot when you create the subscription.
I try to get backup on ops manager.
but, initial sync is not proceed.
The following content and check the backup-agent.log has is displayed .
[backup-agent/components/mothership.go:321] Total Slice #1031 - server syncStore is full. 0th attempt. Will resend this slice again soon.
I checked ulimit,head database space,back store space.but that is not full.
what is full?
Since the question is a little ambiguous, Iam making couple of assumption.
Assumptions:
- You are setting up the Ops Manager for an existing replica set.
- The Agents are configured correctly and you have at least one Backup agent running on the replica set you are trying to backup.
Coming back to your question, syncStore is the OpLog Store which syncs your Initial Sync data using Backup Daemon. The issue here seems to be, backup job has not been bound to a headDB. you can do so by selecting Backup - Jobs - Job - under "Machine" label select set binding and select appropriate headDB.
Hope the above solution solves the issue.
[backup-agent/components/mothership.go:321] Total Slice #1031 - server syncStore is full. 0th attempt. Will resend this slice again soon.
Please note that server syncStore is full message indicates the backup daemon is not processing ( applying to headDB) the incoming oplog/sync slices fast enough.
Hence, In order to know the rate at which backup daemon is applying the slices to the headDB and track the status of the ongoing backup initial-sync process.
So you need to analyze OpsManagerlog files, there are-
Diagnostic archive log (Admin-->General-->Overview-->System Overview-->Diagnostic Archive)
Backup daemon logs and mms0 logs [available in /opt/mongodb/mms/logs]
Backup agent log
Thanks,
Karthick
I once encountered the same problem because I did not place Mongo binaries in the mongodb releases directory during configuration. Because of this, the Deamon program cannot start the head database, resulting in the failure to take care of the transferred oplog, thus printing an error server syncStore is full.But that's not the reason.