PostgreSQL 9.6 vacuum using 100% CPU - postgresql

I am running PostgreSQL 9.6 on Centos 7.
The database was migrated from a PostgreSQL 9.4 server that did not have the issue.
With autovacuum on Postgres is using 100% of one core constantly (10% of total CPU). With autovacuum off it does not use any CPU other than when executing queries.
If this expected or normal, or something bad going on? Note it is a very big database, with many schemas/tables.
I tried,
vacuumdb --all -w
and,
ANALYZE VERBOSE;
The "ANALYZE VERBOSE;" made the database run a lot faster, but did not change the CPU usage.

Related

Upgraded RDS postgres from 11.16 to 14.4, CPU has been stuck at 100% for hours while running the huge aggregate queries

We have upgraded our AWS RDS Postgres from 11.16 to 14.4, After that I can see CPU stay around 100% for long hours while running the aggregate Queries.
I'm working with a large database. Most of the main tables have 1 million records. While running these aggregate queries in previous version (11.16) it will give the result without increasing the CPU usage.
Can we use ANALYZE command for optimizing the upgraded database?
Is there any impact when we running this command on database?
Any suggestions for resolving this?

Why is pg_restore that slow and PostgreSQL almost not even using the CPU?

I just had to use pg_restore with a small dump of 30MB and it took in average 5 minutes! On my colleagues' computers, it is ultra fast, like a dozen of seconds. The difference between the two is the CPU usage: while for the others, the database uses quite a bunch of CPU (60-70%) during the restore operation, on my machine, it stays around a few percents only (0-3%) as if it was not active at all.
The exact command was : pg_restore -h 127.0.0.1 --username XXX --dbname test --no-comments test_dump.sql
The originating command to produce this dump was: pg_dump --dbname=XXX --user=XXX --no-owner --no-privileges --verbose --format=custom --file=/sql/test_dump.sql
Look at the screenshot taken in the middle of the restore operation:
Here is the corresponding vmstat 1 result running the command:
I've looked at the web for a solution during a few hours but this under-usage of the CPU remains quite mysterious. Any idea will be appreciated.
For the stack, I am on Ubuntu 20.04 and postgres version 13.6 is running into a docker container. I have a decent hardware, neither bad nor great.
EDIT: This very same command worked in the past on my machine with a same common HDD but now it is terribly slow. The only difference I saw with others (for whom it is blazing fast) was really on the CPU-usage from my point of view (even if they have an SSD which shouldn't be at all the limiting factor especially with a 30 MB dump).
EDIT 2: For those who proposed the problem was about IO-boundness and maybe a slow disk, I just tried without any conviction to run my command on an SSD partition I just made and nothing has changed.
The vmstat output shows that you are I/O bound. Get faster storage, and performance will improve.
PostgreSQL, by default, is tuned for data durability. Usually transactions are flushed to the disk at each and every commit, forcing write-through of any disk write cache, so it seems to be very IO-bound.
When restoring database from a dump file, it may make sense to lower these durability settings, especially if the restore is done while your application is offline, especially in non-production environments.
I temporarily run postgres with these options: -c fsync=off -c synchronous_commit=off -c full_page_writes=off -c checkpoint_flush_after=256 -c autovacuum=off -c max_wal_senders=0
Refer to these documentation sections for more information:
14.4.9. Some Notes about pg_dump
14.5. Non-Durable Settings.
Also this article:
Settings for a fast pg_restore

Do backups using pg_dump cause server outage if the database is too busy?

I have a Postgres database in production environment, and it has millions of records in tables. So I wanted to take a backup using pg_dump for some investigation.
But this database is so busy. So I am afraid if backup operation is caused any server issue like slow down server or crash database etc. as it is busy database.
Can anyone share if there is any risk? And please give some idea about best practice to take a backup from Postgres with no risk.
Running pg_dump will not cause a server crash, but it will add some extra CPU and particularly I/O load. You can test if that is a problem, pg_dump can be canceled any time.
On a busy database, it can also lead to table bloat, because old row versions have to be retained for the duration of pg_dump and cannot be vacuumed.
There are some alternatives:
Run pg_dump against a standby server.
Use pg_basebackup to perform a physical backup. That can be throttled to reduce the I/O load.

How to limit pg_dump's memory usage?

I have a ~140 GB postgreDB on Heroku / AWS. I want to create a dump of this on a windows Azure - Windows server 2012 R2 virtual machine, as i need to move the DB into Azure environment.
The DB has a couple of smaller tables, but mainly consists of a single table taking ~130 GB, including indexes. It has ~500 million rows.
I've tried to use pg_dump for this, with:
./pg_dump -Fc --no-acl --no-owner --host * --port 5432 -U * -d * > F:/051418.dump
I've tried on various Azure virtual machine sizes, including some fairly large with (D12_V2) 28GB ram, 4 VCPUs 12000 MAXIOPs, etc. But in all cases the pg_dump stalls completely due to memory swapping.
On above machine it's currently using all available memory and has used the past 12 hrs swapping memory on the disk. I dont expect it to complete, due to the swapping.
From other posts i've understood it could be an issue with the network speed, beeing much faster than the disk IO speed, causing pg_dump to suck up all available memory and more, so i've tried using the azure machine with most IOPs. This hasnt helped.
So is there another way i can force pg_dump to cap it's memory usage, or wait on pulling more data until it has written to disk and clear memory ?
Looking forward to your help!
Krgds.
Christian

Migrating 200GB of Postgres data from 9.0 to 9.6

We have a simple database with just 5 tables. But 1 table is huge, around 100GB of data by itself, and the indices together are nearly double that size. The server is an old CentOS 5 server with PG 9.0. I'm moving to a more modern setup with SSD hard disks, CentOS 7, and PG 9.6.
Question: what's the best way to migrate data in a simple way. pg_dump it on the old server, move it via rsync or something to the new server and pg_restore? I could do the pg_dump with -Fc option, so that we can pg_restore it easily (otherwise it's a text format and we have to use psql -f instead). But a trial run suggested that while the pg_dump is OK, the pg_restore on the destination server, which is much faster, goes on and on. We did a pg_restore --verbose, but there was no verbosity at all. Perhaps the server was stuck doing IO?
Our pg.conf settings for the pg_restore are as follows:
maintenance_work_mem = 1500MB
fsync = off
synchronous_commit = off
wal_level = minimal
full_page_writes = off
wal_buffers = 64MB
max_wal_senders = 0
wal_keep_segments = 0
archive_mode = off
autovacuum = off
What should we do to ensure that the pg_restore works? Right now both servers are offline, so I can do pretty much anything needed -- any settings can be changed.
Some more background info--
Old server: CentOS 5, SCSI RAID 1 disks, 4GB RAM (not much), PG 9.0
New server: CentOS 7 (latest), SSD disk, 16GB RAM, PG 9.6
Thank you for any pointers on moving large tables in the best way possible. The usual PG documentation doesn't seem to be helping. We've tried both the text dump way and the -Fc way.
I strongly suggest you pg_upgrade:
Install 9.0.23 on the new server. From source if necessary.
Set up a streaming replica on the new server using pg_basebackup and a suitable recovery.conf. Enable WAL archiving and restore_command too, in case it becomes desynchronised for any reason.
Also install 9.6 on the new server
Do an upgrade test by stopping the replica and attempting a pg_upgrade to 9.6. Restart the replica, fix any issues and repeat until you succeed.
When you're confident pg_upgrade will succeed, plan a cut-over time. Stop the 9.0 master and stop the replica. pg_upgrade the replica. Start the new 9.6 server.
See the pg_upgrade documentation for more info.
Remember: KEEP BACKUPS.
If you want simple, just pg_dumpall and then pipe to psql. But that'll be slow and it'll cause problems if your restore fails partway through then you try to resume, etc.
Better:
If you don't want to use replication, then use parallel-mode pg_dump and pg_restore with directory format input/output if you want to get things done quickly.
Configure your 9.0 database to accept connections from the 9.6 host and make sure there's a high-performance network connection (gigabit or better).
Using the 9.6 host, running the 9.6 versions of pg_dump and pg_dumpall:
Dump your global objects with pg_dumpall --globals-only -f globals.sql
Dump your database(s) with pg_dump -Fd -j4 -d dbname -f dbname.dumpdir or similar. -j is the number of parallel jobs. You'll need to dump each database separately if there are multiple ones.
Cleanly initdb a new PostgreSQL 9.6 install, removing whatever attempts you have previously made (since I don't know what is/isn't there). Alternately, DROP any created roles, databases, etc, returning it to a clean state.
Use psql to run the globals script: psql -v ON_ERROR_STOP=1 --single-transaction -f globals.sql -d postgres
Use pg_restore to load the database dumps: pg_restore --create -d template1 -j4 template1 dbname.dump, repeating for each dumped DB. You can restore multiple DBs concurrently.
Yes, I know the handling of global objects sucks. And yes, it'd be nice if all this were wrapped up in a simple command. But it isn't. Designs and well thought out patches are welcome if you want to try to improve this. So far nobody's wanted to enough to do the work.