I am facing a very strange issue on my server, my configuration is very straight-forward:
Small VPS, 500 MiB RAM, 40 GiB disk
Debian stable at install time, now probably old_stable
PostgreSQL v11.11
The data is very small, the use of a database for my purpose is probably overkill, but handy:
7 tables
7 views, including one of them which is a little bit scary
The biggest table have a few hundred records
The full dump of the database gives me a file of 93 KiB
Everything was very fast for 1.5 year. Yesterday, the database suddenly became very slow. My investigations showed that the size of the data on the disk was 34 GiB and I had no disk space available anymore.
After more investigations, I tried the command "vacuum full", which deleted the useless 34 GiB. The disk space changed from 100% usage to 10% usage and the performances came back immediately. One day later, the system is slow again, I saw the disk usage is now around 50%.
I have no clue about what is going on, any suggestion?
I'd recommend reading Optimize and Improve PostgreSQL Performance with VACUUM, ANALYZE, and REINDEX and Routine Vacuuming. Here's some relevant bits.
In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table
You must have done a lot of deletes and updates, so Postgres consumed a lot of disk space. vacuum recovers this space. vacuum full isn't normally necessary and will lock your database.
Normally there is an autovacuum daemon running which will vacuum periodically. It probably isn't running. Check with show autovacuum and show track_counts. Both need to be true for autovacuum to run.
You can see what is "bloating" your database with the check_postgres tool.
Related
We have a large table (1.6T) and deleted 60% of the records, and want to reclaim that space for the OS and file system. We're running PostgreSQL 9.4 (we're stuck on that pending a major software upgrade).
We need that space, as we're down to 100GB and when materialized views are refreshed we're running out of space on the server.
I tried running VACUUM(FULL, ANALYZE, VERBOSE) schema.tablename and let it run for 24 hours last weekend, but had to cancel it to get the server back online.
I'm running it again this weekend, after deleting the indexes (I'm hoping that will speed it up so it will finish). So far there is no output or indication of progress. I created a tablespace on another SSD array and set it up as temp space using temp_tablespaces = 'name_of_other_tablespaces', but du -chs shows it is still empty.
The query shows active, but since disk usage isn't increasing it just feels like it's just sitting there, making no noise and pretending it's not there.
This is on a server with 512GB of RAM and a RAID 10 array of very fast enterprise SSDs. Is there any way to get progress and know that something is actually happening and that it's working? Any guesses as to duration, or other suggestions?
I found out what was happening, by finally noticing that it was waiting for an autovacuum process to finish, which never happened (autovacuum: VACUUM pg_toast.pg_toast_nnnnn (to prevent wraparound)). Once I killed that the VACUUM ran quite quickly and cleared up over 1TB of space. Time to celebrate!
Since April 1st, the size of my DB storage space grows by 32GB a day. It's very unusual, and based on the 500GB disk, this will not last for much longer.
Why is the DB growing by 32GB a day?
For context, I've allocated a 500GB disk; binary logs are enabled; automated backups are enabled.
I tested further. The reason for the DB growing so dramatically every night is due to the binary logs. Every night Magento indexes run, and produce 32GB of binary logging data. Not all Magento stores will be the same, but large Magento stores beware.
The solution, temporarily at least, is to disable binary logging. Have a look at the image to see the reclaimed disk space after disabling the option.
This will make it a challenge when setting up read/failover replicas. It would be nice if the MySQL instance is configured to purge/prune binary logs after a set amount of time has passed, or at least once operations have been copied to slave instances. Maybe it does, but I haven't investigated. Given current time constraints, I was not going to wait until the purge/prune happened, if it even would.
Could it be your DB log that is growing at a rapid pace?
I have had this issue in the past and ended up creating a job for the SQL agent that runs once a week and purges the log.
I created a database containing a total of 3 tables for a specific purpose. The total size of all tables is about 850 MB - very lean... out of which one single table contains about 800 MB (including index) of data and 5 million records (daily addition of about 6000 records).
The system is PG-Windows with 8 GB RAM Windows 7 laptop with SSD.
I allocated 2048MB as shared_buffers, 256MB as temp_buffers and 128MB as work_mem.
I execute a single query multiple times against the single table - hoping that the table stays in RAM (hence the above parameters).
But, although I see a spike in memory usage during execution (by about 200 MB), I do not see memory consumption remaining at at least 500 MB (for the data to stay in memory). All postgres exe running show 2-6 MB size in task manager. Hence, I suspect the LRU does not keep the data in memory.
Average query execution time is about 2 seconds (very simple single table query)... but I need to get it down to about 10-20 ms or even lesser if possible, purely because there are just too many times, the same is going to be executed and can be achieved only by keeping stuff in memory.
Any advice?
Regards,
Kapil
You should not expect postgres processes to show large memory use, even if the whole database is cached in RAM.
That is because PostgreSQL relies on buffered reads from the operating system buffer cache. In simplified terms, when PostgreSQL does a read(), the OS looks to see whether the requested blocks are cached in the "free" RAM that it uses for disk cache. If the block is in cache, the OS returns it almost instantly. If the block is not in cache the OS reads it from disk, adds it to the disk cache, and returns the block. Subsequent reads will fetch it from the cache unless it's displaced from the cache by other blocks.
That means that if you have enough free memory to fit the whole database in "free" operating system memory, you won't tend to hit the disk for reads.
Depending on the OS, behaviour for disk writes may differ. Linux will write-back cache "dirty" buffers, and will still return blocks from cache even if they've been written to. It'll write these back to the disk lazily unless forced to write them immediately by an fsync() as Pg uses at COMMIT time. When it does that it marks the cached blocks clean, but doesn't flush them. I don't know how Windows behaves here.
The point is that PostgreSQL can be running entirely out of RAM with a 1GB database, even though no PostgreSQL process seems to be using much RAM. Having shared_buffers too high just leads to double-caching and can reduce the amount of RAM available for the OS to cache blocks.
It isn't easy to see exactly what's cached in RAM because Pg relies on the OS cache. That's why I referred you to pg_fincore.
If you're on Windows and this won't work, you really just have to rely on observing disk activity. Does performance monitor show lots of uncached disk reads? Does operating system memory monitoring show lots of memory used for disk cache in the OS?
Make sure that effective_cache_size correctly reflects the RAM used for disk cache. It will help PostgreSQL choose appropriate query plans.
You are making the assumption, without apparent evidence, that the query performance you are experiencing is explained by disk read delays, and that it can be improved by in-memory caching. This may not be the case at all. You need to look at explain analyze output and system performance metrics to see what's going on.
I am building a large postgres 9.1 database on ubuntu 12.04, with one table that holds about 80 million rows or so. Whenever I run a SELECT statement:
SELECT * FROM db WHERE ID=1;
It takes almost 2.5 minutes to execute the query which returns only a few thousand rows. After running a few diagnostics on the disk I/O, I think that is not the problem, but just in case below is the output from a diagnostic. (I have 2GB of RAM) I am not exactly sure what a good output is here, but it seems ballpark given stats found for other servers on the internet.
time sh -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync"
500000+0 records in
500000+0 records out
4096000000 bytes (4.1 GB) copied, 106.969 s, 38.3 MB/s
real 1m49.091s
user 0m0.248s
sys 0m9.369s
I have modified postgresql.conf considerably, boosting the effective_cache to 75% of ram, shared_buffers to 25%, checkpoint_segments to 15, work_mem to 256MB, autovacuum, SHMMAX on the kernel, etc. I have had some performance increases, but not more than 5% better. Networking shouldnt be an issue since it still takes a long time even running on localhost. I am planning to add even more data, and the query time seems to be growing quickly with the number of rows.
It seems like I should be able to run these SELECT statements in a few seconds, not a few minutes. Any suggestions on where this bottleneck could be?
sorry if this is inexcusably obvious, but do you have an index on the ID column?
also, though I'm not blaming the disk, you merely tested sequential bandwidth, which tells you very little about latency. though I have to say that 38 MB/s is underwhelming even for that measure...
I'm nearly out of disk space because of a query that tried to update every row in a huge table. I don't have enough space for CLUSTER (though it would barely fit if I dropped indexes first and recreated them afterwards).
How can I estimate how long VACUUM will take? How about VACUUM FULL? How do the three (with CLUSTER) compare in terms of running time and disk usage?
It's PostgreSQL 8.3.
use cluster, until 8.4 vacuum full is broke. if it takes to long you might as well dump and reload the table.