Postgres VACUUM and replication

Postgres VACUUM and replication - postgresql

I have a master postgres with 2 async replication salves
I run VACUUM FULL VERBOSE ANALYSE my_table on all tables ,after vacuuming the slaves get out of sync
My application read from slaves , currently everything is wrong!
How can I force to sync or run re-sync ?
Whats problem here? Why running vacuum issued a problem?!

Whats problem here?
Your server log files can probably answer that much more accurately than random strangers without access to your computer can. What do the log files say? The replica logs are probably more interesting then the master logs, but check both.
Do you get messages about requested WAL segment %s has already been removed? If so, you will have to recreate your replicas. (Unless you have a WAL archive someplace which the replicas aren't currently configred to use--but even then, recreating may be faster and easier).
If you are using replication slots, the master should be retaining all the necessary WAL. In that case the replicas would still be trying to catch up, it might just take them a long time to do so. Either wait, or re-create them if you think that that will be faster.
Why running vacuum issued a problem?!
The key here is the FULL. Doing that basically rewrote your entire database, generating massive amounts of WAL which needs to fetched over the network and then replayed. The bottleneck could be anything from the network to the CPU to the disk drive.
Don't do VACUUM FULL without a darn good reason.

Related

Changing osd while there are inconsistent objects inside

I have a disk in my cluster that started giving out pg read errors after scrubs the last 2 days. Manual repairs are taking too much time and the disk's smartctl output does not look good. My question is, can i change disks before the errors are fixed? Will it cause some corruption across cluster? every pg is active+clean, only some pgs (erasure coded) are active+clean+inconsistent. Should i start recovery process and then replace the osd?
PS: none of the data chunks with read error are the primary chunk.

Your question is not entirely clear to me, I answer how I understand it.
The safest way to replace a failed disk is to let ceph rebalance after the disk has been taken out and then let it remap after the new disk has been deployed. This is lots of network traffic and can take quite some time, but you reduce the risk of data loss.
There are ways to prevent ceph from backfilling and deploy a new disk in a degraded state, but this increases the risk of a disk failure during this degraded state which can lead to data loss. I would advise against it, though.
You write
"can I change disks before the errors are fixed?"
How many disks have failed? The PG availability is dependent on k + 1 active chunks for erasure coded pools (pool's min_size).
"Manual repairs are taking too much time"
You mean ceph pg repair <PG_ID>? Usually I would wait for it, it can help, not all scrub errors are corrupted data, but in your case with the smartctl output it's probably just a temporary fix, if it even succeeds.

Debezium causes Postgres to run out of disk space on RDS

I have a small Postgres development database running on Amazon RDS, and I'm running K8s. As far as I can tell, there is barely any traffic.
I want to enable change capture, I've enabled rds.logical_replication, started a Debezium instance, and the topics appear in Kafka, and all seems fine.
After a few hours, the free disk space starts tanking:
It started to consume disk at a constant rate, and eat up all of the 20Gb available within 24 hours. Stopping Debezium doesn't do anything. The way I got my disk space back was by:
select pg_drop_replication_slot('services_debezium')
and:
vacuum full
Then, after a few minutes, as you can see in the graph, disk space is reclaimed.
Any tips? I would love to see what is it what's actually filling up the space, but I don't think I can. Nothing seems to happen on the Debezium side (no ominous logs), and the Postgres logs don't show anything special either. Or is there some external event that triggers the start of this?

You need to periodically generate some movement in your database (perform an update on any record for example).
Debezium provides a feature called heartbeat to perform this type of operation.
Heartbeat can be configured in the connector as follows:
"heartbeat.interval.ms" : "300000",
"heartbeat.action.query": "update my_table SET date_column = now();"
You can find more information in the official documentation:
https://debezium.io/documentation/reference/connectors/postgresql.html#postgresql-wal-disk-space

The replication slot is the problem. It marks a position in the WAL, and PostgreSQL won't delete any WAL segments newer than that. Those files are in the pg_wal subdirectory of the data directory.
Dropping the replication slot and running CHECKPOINT will delete the files and free space.
The cause of the problem must be misconfiguration of Debrezium: it does not consume changes and move the replication slot ahead. Fix that problem and you are good.

Ok, I think I figured it out. There is another 'hidden' database on Amazon RDS, that has changes, but changes that I didn't make and I can's see, so Debezium can't pick them up either. If change my monitored database, it will show that change and in the process flush the buffer and reclaim that space. So the very lack of changes was the reason it filled up. Don't know if there is a pretty solution for this, but at least I can work with this.

Postgresql DB backup Ideal practices

• What are ideal practices for taking PostgreSQL logical backup using pg_dump?
• Is it ideal to take backup from a standby/slave node? If replication lag is less than 200ms
• Is it ideal to take backup from standby/slave node, and is there any specific configuration we need to change?
• Which method is a good way for taking backups logical backup or physical backup? where DB is getting updated frequently. As a backup is taken for disaster recovery which method is the faster and better backup and disaster recovery(restore).
updated
Our current database size is 5GB and replication is on hot standby mode.
We are running the Backup script on slave node but it takes remote backup from the master node every 30 minutes.
The reason I created this question is to understand when the backup is running some COPY statements takes 6 mins to complete, even though it will not affect other transactions on DB, is there any other issues occurs if a statement is taking more time.

I thought about what you wrote and here are some ideas for you:
If you need backup which will really be consistent to some point in time then you must use pg_basebackup or pg_barman (internally uses pg_basebackup) - explanation is in 1. link below. Latest pg_basebackup 10 streams WAL logs so you backup also all changes done during backup. Of course this backup takes only the whole PG instance. On the other hand it does not lock any table. And if you do it from remote instance then it causes only small CPU load on PG instance and disk IO is not as big as some texts suggests. See links 4 about my experiences. Restoration is quite simple - see link 5.
If you use pg_dump you must understand that you have no guarantee that your backup is really consistent to the point in time - again see link 1. There is a possibility to use snapshot of the database (see links 2 and 3) but even with it you cannot count on 100% consistency. We used pg_dump only on our analytical database which loads new only 1x per day (yesterdays partitions from production database). You can speed it with parallel option (works only for directory backup format). But downside is much higher load on PG instance - higher CPU usage, much higher disk IO. Even if you run pg_dump remotely - in such case you save only disk IO for saving of backup files. Plus pg_dump needs to place read lock on tables so it can collied either with new inserts or with replication (when taken on replica). But when your database reaches hundreds of GBs then even parallel dump can takes hours and in that moment you would need to switch to pg_basebackup anyway.
pg_barman is "comfortable version" of pg_basebackup + it allows you to prevent data loss even when your PG instance crashes very badly. Setting it to work requires more changes but it is definitely worth it. You will have to set WAL log archiving (see link 6) and if you PG is <10 you will have to set "max_wal_senders" and "max_replication_slots" (which you need for replication anyway) - everything is in pg-barman manual although description is not exactly great. pg_barman will stream and store WAL records even between backups so this way you can be sure that data loss in case of very bad crash will be almost none. But making it work can take many hours because descriptions are not exactly good. pg-barman does both backup and restoration with its commands.
Your database is 5GB big so any backup method will be quick. But you have to decide if you need point in time recovery and almost zero data loss or not - so if you will invest time to setting pg-barman or not.
Links:
PostgreSQL, Backups and everything you need to know
Review for Paper: 14-Serializable Snapshot Isolation in PostgreSQL - about snapshots
Parallel dumping of databases - example how to use snapshot
pg_basebackup experiencies
pg_basebackup - restore tar backup
Archiving WAL logs using script

Rebalance data after adding nodes

I'm using Cassandra 2.0.4 (with vnodes) and 2 days ago I added 2 nodes (.210 and .195.) I expected Cassandra to redistribute the existing data automatically, but today I still find this nodetool status
Issuing a nodetool repair on any of the nodes doesn't do anything either (the repair finishes within seconds.) The logs state that the repair is being executed as expected, but after preparing the repair plan it pretty much instantly finishes executing said plan.
Was I wrong to assume the existing data would be redistributed at all, or is something wrong? And if that isn't the case; how do I manually 'rebalance' the data?
Worth noting: I seem to have lost some data after adding this new nodes. Issuing a select on certain keys only returns data from the last couple of days rather than weeks, this makes me think the data is saved on .92 while Cassandra queries for it on one of the new servers. But that's really just an uneducated guess, I may have simple broken something during all of my trial & error tests meaning the data is actually gone (even though I don't issue deletes, ever.)
Can anyone enlighten me?

There is currently no manual rebalance option for vnode-enabled clusters.
But your cluster doesn't look unbalanced based on the nodetool status output you show. I'm curious as to why node .88 has only 64 tokens compared to the others but that isn't a problem per se. When a cluster is smaller there will be a slight variance in the balance of data across the nodes.
As for the data issues, you can try running nodetool repair -pr around the nodes in the ring and then nodetool cleanup and see if that helps.

PostgreSQL In Memory Database

I want to run my PostgreSQL database server from memory. The reason is that on my new server, I have 24 GB of memory, and hardly any of it is used.
I know I can run this command to make a ramdisk:
mdmfs -s 1024m md2 /mnt
And I could theoretically have PostgreSQL store its data there. But the problem with this is that if the server crashes or reboots, the data will be gone.
Basically, I want the database to be loaded in memory at all times so that it does not have to go to the hard disk drive to read every record, since I have TONS of memory and since memory is faster than hard disk drives.
Is there a way to do this while also having PostgreSQL write to disk so I don't lose any data in case the server goes down? Or is there a way to cache all data in memory?

I'm now using streaming replication which is async. This means my MASTER could be running all in memory, with the separate SLAVE instance using traditional disk.
A machine restart would involve stopping the SLAVE, copying the postgresql data back into ramdisk and then restarting the MASTER followed by the SLAVE. This would be an interesting possibility which compares well with something like REDIS, but with the advantage of redundancy / hotstandby / backup / sql / rich toolset etc.

have you seen the Server Configuration manual chapter? check it out, then google postgresql memory tuning.

I have to believe that Postgres is written in such a way as to take full advantage of available RAM in the server. As you may have guessed by now, there's no reliable way to do this outside of Postgres.
Within Postgres, transactions assure that all operations are atomic, so if the power goes down while you are writing to a Postgres database, you will only lose that particular operation, and not the entire database.

The answer is caching. Look into adding memory to the server, then tuning PostgreSQL to maximize memory usage. Also, the file system cache will help with this, doing some of it automatically. You will be able to speed up performance, almost as if it were in memory except for the first hit, while not having to manage it yourself, and being able to have a database larger than the physical memory.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse