Debezium causes Postgres to run out of disk space on RDS - postgresql

I have a small Postgres development database running on Amazon RDS, and I'm running K8s. As far as I can tell, there is barely any traffic.
I want to enable change capture, I've enabled rds.logical_replication, started a Debezium instance, and the topics appear in Kafka, and all seems fine.
After a few hours, the free disk space starts tanking:
It started to consume disk at a constant rate, and eat up all of the 20Gb available within 24 hours. Stopping Debezium doesn't do anything. The way I got my disk space back was by:
select pg_drop_replication_slot('services_debezium')
and:
vacuum full
Then, after a few minutes, as you can see in the graph, disk space is reclaimed.
Any tips? I would love to see what is it what's actually filling up the space, but I don't think I can. Nothing seems to happen on the Debezium side (no ominous logs), and the Postgres logs don't show anything special either. Or is there some external event that triggers the start of this?

You need to periodically generate some movement in your database (perform an update on any record for example).
Debezium provides a feature called heartbeat to perform this type of operation.
Heartbeat can be configured in the connector as follows:
"heartbeat.interval.ms" : "300000",
"heartbeat.action.query": "update my_table SET date_column = now();"
You can find more information in the official documentation:
https://debezium.io/documentation/reference/connectors/postgresql.html#postgresql-wal-disk-space

The replication slot is the problem. It marks a position in the WAL, and PostgreSQL won't delete any WAL segments newer than that. Those files are in the pg_wal subdirectory of the data directory.
Dropping the replication slot and running CHECKPOINT will delete the files and free space.
The cause of the problem must be misconfiguration of Debrezium: it does not consume changes and move the replication slot ahead. Fix that problem and you are good.

Ok, I think I figured it out. There is another 'hidden' database on Amazon RDS, that has changes, but changes that I didn't make and I can's see, so Debezium can't pick them up either. If change my monitored database, it will show that change and in the process flush the buffer and reclaim that space. So the very lack of changes was the reason it filled up. Don't know if there is a pretty solution for this, but at least I can work with this.

Related

RDS Postgres "canceling statement due to conflict with recovery" [duplicate]

I'm getting the following error when running a query on a PostgreSQL db in standby mode. The query that causes the error works fine for 1 month but when you query for more than 1 month an error results.
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
Any suggestions on how to resolve? Thanks
No need to touch hot_standby_feedback. As others have mentioned, setting it to on can bloat master. Imagine opening transaction on a slave and not closing it.
Instead, set max_standby_archive_delay and max_standby_streaming_delay to some sane value:
# /etc/postgresql/10/main/postgresql.conf on a slave
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s
This way queries on slaves with a duration less than 900 seconds won't be cancelled. If your workload requires longer queries, just set these options to a higher value.
Running queries on hot-standby server is somewhat tricky — it can fail, because during querying some needed rows might be updated or deleted on primary. As a primary does not know that a query is started on secondary it thinks it can clean up (vacuum) old versions of its rows. Then secondary has to replay this cleanup, and has to forcibly cancel all queries which can use these rows.
Longer queries will be canceled more often.
You can work around this by starting a repeatable read transaction on primary which does a dummy query and then sits idle while a real query is run on secondary. Its presence will prevent vacuuming of old row versions on primary.
More on this subject and other workarounds are explained in Hot Standby — Handling Query Conflicts section in documentation.
There's no need to start idle transactions on the master. In postgresql-9.1 the
most direct way to solve this problem is by setting
hot_standby_feedback = on
This will make the master aware of long-running queries. From the docs:
The first option is to set the parameter hot_standby_feedback, which prevents
VACUUM from removing recently-dead rows and so cleanup conflicts do not occur.
Why isn't this the default? This parameter was added after the initial
implementation and it's the only way that a standby can affect a master.
As stated here about hot_standby_feedback = on :
Well, the disadvantage of it is that the standby can bloat the master,
which might be surprising to some people, too
And here:
With what setting of max_standby_streaming_delay? I would rather
default that to -1 than default hot_standby_feedback on. That way what
you do on the standby only affects the standby
So I added
max_standby_streaming_delay = -1
And no more pg_dump error for us, nor master bloat :)
For AWS RDS instance, check http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication and resume after the query:
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
I'm going to add some updated info and references to #max-malysh's excellent answer above.
In short, if you do something on the master, it needs to be replicated on the slave. Postgres uses WAL records for this, which are sent after every logged action on the master to the slave. The slave then executes the action and the two are again in sync. In one of several scenarios, you can be in conflict on the slave with what's coming in from the master in a WAL action. In most of them, there's a transaction happening on the slave which conflicts with what the WAL action wants to change. In that case, you have two options:
Delay the application of the WAL action for a bit, allowing the slave to finish its conflicting transaction, then apply the action.
Cancel the conflicting query on the slave.
We're concerned with #1, and two values:
max_standby_archive_delay - this is the delay used after a long disconnection between the master and slave, when the data is being read from a WAL archive, which is not current data.
max_standby_streaming_delay - delay used for cancelling queries when WAL entries are received via streaming replication.
Generally, if your server is meant for high availability replication, you want to keep these numbers short. The default setting of 30000 (milliseconds if no units given) is sufficient for this. If, however, you want to set up something like an archive, reporting- or read-replica that might have very long-running queries, then you'll want to set this to something higher to avoid cancelled queries. The recommended 900s setting above seems like a good starting point. I disagree with the official docs on setting an infinite value -1 as being a good idea--that could mask some buggy code and cause lots of issues.
The one caveat about long-running queries and setting these values higher is that other queries running on the slave in parallel with the long-running one which is causing the WAL action to be delayed will see old data until the long query has completed. Developers will need to understand this and serialize queries which shouldn't run simultaneously.
For the full explanation of how max_standby_archive_delay and max_standby_streaming_delay work and why, go here.
It might be too late for the answer but we face the same kind of issue on the production.
Earlier we have only one RDS and as the number of users increases on the app side, we decided to add Read Replica for it. Read replica works properly on the staging but once we moved to the production we start getting the same error.
So we solve this by enabling hot_standby_feedback property in the Postgres properties.
We referred the following link
https://aws.amazon.com/blogs/database/best-practices-for-amazon-rds-postgresql-replication/
I hope it will help.
Likewise, here's a 2nd caveat to #Artif3x elaboration of #max-malysh's excellent answer, both above.
With any delayed application of transactions from the master the follower(s) will have an older, stale view of the data. Therefore while providing time for the query on the follower to finish by setting max_standby_archive_delay and max_standby_streaming_delay makes sense, keep both of these caveats in mind:
the value of the follower as a standby / backup diminishes
any other queries running on the follower may return stale data.
If the value of the follower for backup ends up being too much in conflict with hosting queries, one solution would be multiple followers, each optimized for one or the other.
Also, note that several queries in a row can cause the application of wal entries to keep being delayed. So when choosing the new values, it’s not just the time for a single query, but a moving window that starts whenever a conflicting query starts, and ends when the wal entry is finally applied.

Postgres insert slow after snapshot restore but not after restart

My setup
Postgres 11 running on an AWS EC2 t4g.xlarge instance (4 vCPU, 16GB) running Amazon Linux.
Set up to take a nightly disk snapshot (my workload doesn't require high reliability).
Database has table xtc_table_1 with ~6.3 million rows, about 3.2GB.
Scenario
To test some new data processing code, I created a new test AWS instance from the nightly snapshot of my production instance.
I create a new UNLOGGED table, and populate it with INSERT INTO holding_table_1 SELECT * FROM xtc_table_1;
It takes around 2 min 24 sec for the CREATE statement to execute.
I truncate holding_table_1 and run the CREATE statement again, and it completes in 30 sec. The ~30 second timing is consistent for successive truncates and creates of the table.
I think this may be because of some caching of data. I tried restarting Postgres service, then rebooting the AWS instance (after stopping postgres with sudo service postgresql stop), then stopping and starting the AWS instance. However, it's still ~30 sec to create the table.
If I rebuild a new instance from the snapshot, the first time I run the CREATE statement it's back to the ~2m+ time.
Similar behavior for other tables xtc_table_2, xtc_table_3.
Hypothesis
After researching and finding this answer, I wonder if what's happening is that the disk snapshot contains some WAL data that is being replayed the first time I do anything with xtc_table_n. And that subsequently, because Postgres was shut down "nicely" there is no WAL to playback.
Does this sound plausible?
I don't know enough about Postgres internals to be sure. I would have imagined that any WAL playback would happen on starting up postgres, but maybe it happens at the individual table level the first time a table is touched?
Knowing the reason is more than just theoretical; I'm using the test instance to do some tuning on some processing code, and need to be confident in having a consistent baseline to measure from.
Let me know if more information is needed about my setup or what I'm doing.
#jellycsc's suggestion was correct; adding more info here in case it's helpful to anyone else.
The problem I was encountering was not a postgres issue at all, but because of the way AWS handles volumes and snapshots.
From this page:
For volumes that were created from snapshots, the storage blocks must
be pulled down from Amazon S3 and written to the volume before you can
access them. This preliminary action takes time and can cause a
significant increase in the latency of I/O operations the first time
each block is accessed. Volume performance is achieved after all
blocks have been downloaded and written to the volume.
I used the fio utility as described in the linked AWS page to initialize the restored volume, and first-time performance was consistent with subsequent query times.

Postgres VACUUM and replication

I have a master postgres with 2 async replication salves
I run VACUUM FULL VERBOSE ANALYSE my_table on all tables ,after vacuuming the slaves get out of sync
My application read from slaves , currently everything is wrong!
How can I force to sync or run re-sync ?
Whats problem here? Why running vacuum issued a problem?!
Whats problem here?
Your server log files can probably answer that much more accurately than random strangers without access to your computer can. What do the log files say? The replica logs are probably more interesting then the master logs, but check both.
Do you get messages about requested WAL segment %s has already been removed? If so, you will have to recreate your replicas. (Unless you have a WAL archive someplace which the replicas aren't currently configred to use--but even then, recreating may be faster and easier).
If you are using replication slots, the master should be retaining all the necessary WAL. In that case the replicas would still be trying to catch up, it might just take them a long time to do so. Either wait, or re-create them if you think that that will be faster.
Why running vacuum issued a problem?!
The key here is the FULL. Doing that basically rewrote your entire database, generating massive amounts of WAL which needs to fetched over the network and then replayed. The bottleneck could be anything from the network to the CPU to the disk drive.
Don't do VACUUM FULL without a darn good reason.

Postgresql DB backup Ideal practices

• What are ideal practices for taking PostgreSQL logical backup using pg_dump?
• Is it ideal to take backup from a standby/slave node? If replication lag is less than 200ms
• Is it ideal to take backup from standby/slave node, and is there any specific configuration we need to change?
• Which method is a good way for taking backups logical backup or physical backup? where DB is getting updated frequently. As a backup is taken for disaster recovery which method is the faster and better backup and disaster recovery(restore).
updated
Our current database size is 5GB and replication is on hot standby mode.
We are running the Backup script on slave node but it takes remote backup from the master node every 30 minutes.
The reason I created this question is to understand when the backup is running some COPY statements takes 6 mins to complete, even though it will not affect other transactions on DB, is there any other issues occurs if a statement is taking more time.
I thought about what you wrote and here are some ideas for you:
If you need backup which will really be consistent to some point in time then you must use pg_basebackup or pg_barman (internally uses pg_basebackup) - explanation is in 1. link below. Latest pg_basebackup 10 streams WAL logs so you backup also all changes done during backup. Of course this backup takes only the whole PG instance. On the other hand it does not lock any table. And if you do it from remote instance then it causes only small CPU load on PG instance and disk IO is not as big as some texts suggests. See links 4 about my experiences. Restoration is quite simple - see link 5.
If you use pg_dump you must understand that you have no guarantee that your backup is really consistent to the point in time - again see link 1. There is a possibility to use snapshot of the database (see links 2 and 3) but even with it you cannot count on 100% consistency. We used pg_dump only on our analytical database which loads new only 1x per day (yesterdays partitions from production database). You can speed it with parallel option (works only for directory backup format). But downside is much higher load on PG instance - higher CPU usage, much higher disk IO. Even if you run pg_dump remotely - in such case you save only disk IO for saving of backup files. Plus pg_dump needs to place read lock on tables so it can collied either with new inserts or with replication (when taken on replica). But when your database reaches hundreds of GBs then even parallel dump can takes hours and in that moment you would need to switch to pg_basebackup anyway.
pg_barman is "comfortable version" of pg_basebackup + it allows you to prevent data loss even when your PG instance crashes very badly. Setting it to work requires more changes but it is definitely worth it. You will have to set WAL log archiving (see link 6) and if you PG is <10 you will have to set "max_wal_senders" and "max_replication_slots" (which you need for replication anyway) - everything is in pg-barman manual although description is not exactly great. pg_barman will stream and store WAL records even between backups so this way you can be sure that data loss in case of very bad crash will be almost none. But making it work can take many hours because descriptions are not exactly good. pg-barman does both backup and restoration with its commands.
Your database is 5GB big so any backup method will be quick. But you have to decide if you need point in time recovery and almost zero data loss or not - so if you will invest time to setting pg-barman or not.
Links:
PostgreSQL, Backups and everything you need to know
Review for Paper: 14-Serializable Snapshot Isolation in PostgreSQL - about snapshots
Parallel dumping of databases - example how to use snapshot
pg_basebackup experiencies
pg_basebackup - restore tar backup
Archiving WAL logs using script

PostgreSQL In Memory Database

I want to run my PostgreSQL database server from memory. The reason is that on my new server, I have 24 GB of memory, and hardly any of it is used.
I know I can run this command to make a ramdisk:
mdmfs -s 1024m md2 /mnt
And I could theoretically have PostgreSQL store its data there. But the problem with this is that if the server crashes or reboots, the data will be gone.
Basically, I want the database to be loaded in memory at all times so that it does not have to go to the hard disk drive to read every record, since I have TONS of memory and since memory is faster than hard disk drives.
Is there a way to do this while also having PostgreSQL write to disk so I don't lose any data in case the server goes down? Or is there a way to cache all data in memory?
I'm now using streaming replication which is async. This means my MASTER could be running all in memory, with the separate SLAVE instance using traditional disk.
A machine restart would involve stopping the SLAVE, copying the postgresql data back into ramdisk and then restarting the MASTER followed by the SLAVE. This would be an interesting possibility which compares well with something like REDIS, but with the advantage of redundancy / hotstandby / backup / sql / rich toolset etc.
have you seen the Server Configuration manual chapter? check it out, then google postgresql memory tuning.
I have to believe that Postgres is written in such a way as to take full advantage of available RAM in the server. As you may have guessed by now, there's no reliable way to do this outside of Postgres.
Within Postgres, transactions assure that all operations are atomic, so if the power goes down while you are writing to a Postgres database, you will only lose that particular operation, and not the entire database.
The answer is caching. Look into adding memory to the server, then tuning PostgreSQL to maximize memory usage. Also, the file system cache will help with this, doing some of it automatically. You will be able to speed up performance, almost as if it were in memory except for the first hit, while not having to manage it yourself, and being able to have a database larger than the physical memory.