We have a PGSQL server (v13) with alot of data in it.
The database contains documents.
The total database is around 1.5 TB. Today, someone called me telling me the disk space was almost full. They put in 1 TB extra storage some time ago but that extra storange ran full extremely quickly, which is very abnormal. Disk was 2 TB, now 3 TB with the extra storage.
If I look at the table containing the documents, it only added around 10 GB since 20/07/2022, so I really don't understand why the disk is running full this fast. If I do this query on the database:
SELECT pg_size_pretty( pg_total_relation_size('documents') );
It returns '2.7 TB' which is impossible, since there aren't that much documents added recently.
I did a Vacuum as a test on a certain table (total: 20 gb). The vacuum failed with Error:
ERROR: wrong tuple length
What does it mean? I have the same errors in the PGSQL logfiles. They recently installed a new antivirus system on the server. I already asked for exclusions but it didn't seem to solve the problem.
I now only have +/- 130 gb free disk space and it keeps getting full.
Is it possible the vacuum takes the disk space and does not return it to Windows because of the error?
Any help is appreciated. I'm not a database expert but i really need to solve this.
Related
I've got a postgres database which I recently vacuumed. I understand that process marks space as available for future use, but for the most part does not return it to the OS.
I need to track how close I am to using up that available "slack space" so I can ensure the entire database does not start to grow again.
Is there a way to see how much empty space the database has inside it?
I'd prefer to just do a VACUUM FULL and monitor disk consumption, but I can't lock the table for a prolonged period, nor do I have the disk space.
Running version 13 on headless Ubuntu if that's important.
Just like internal free space is not given back to the OS, it also isn't shared between tables or other relations (like indexes). So having freespace in one table isn't going to help if a different table is the one growing. You can use pg_freespacemap to get a fast approximate answer for each table, or pgstattuple for more detailed data.
Running on RHEL 7
PostgreSQL Version 12
System has 28G Memory, and 12G shared Memory
The DB uses over 6T on disk
Some rows have around 300 million rows.
Moved my DB from version 9 to version 12 and am running tests on the new DB. We have a process that generates summary data in a temporary table and then we query the temporary table for different things, and then we delete the temporary table - much faster than running very similar queries multiple times is why this was done.
They query is similar to this:
CREATE TEMPORARY TABLE
XXX
AS
SELECT
COUNT(t.id) AS count,
t.tagged AS tagged,
t.tag_state AS tag_state,
t.error AS error,
td.duplicate AS duplicate
FROM
ttt t
INNER JOIN tweet_data td ON (td.tweet_id = t.id)
GROUP BY
t.tagged,
t.tag_state,
t.error,
td.duplicate;
Note that this works fine on V9, but, I have not watched it very carefully on V9 to see what it does. On V12, shared memory usage grows slowly and then after about 15 minutes it kicks into high gear, grows to about 12G and then tries to make it bigger and failes:
The error is:
ERROR: could not resize shared memory segment "/PostgreSQL.868719775" to 2147483648 bytes: No space left on device
On a whim, we ran just the select statement without creating the temporary table and it also failed while shared memory was increasing, but, the error message said that it was killed by admin.
I am currently running vacuum against the DB to see if that helps.
The largest concern is that this does work with V9, but fails on V12. I also know that they query engine is very different and new in V12 compared to V9.
I had some crazy hope that running vacuum in stages would make a difference. The data was migrated using pg_upgrade.
vacuumdb -U postgres -p 5431 --all --analyze-in-stages
I don't know if the temporary table is created or not, but, after running vacuum, we ran the full query again creating the temp table and it also failed.
Any thoughts? Is my only choice to try more shared memory?
These shared memory segments are used for communication between worker processes with parallel query.
PostgreSQL seems to be tight on resources, and while the error is a symptom rather than the cause of the problem, you can improve the situation by disabling parallel query for this statement:
SET max_parallel_workers_per_gather = 0;
Then your query will take more time, but use less resources, which might be enough to get rid of the problem.
In the long run, you should review your configuration, which might be too generous with memory or the number of connections, but I cannot diagnose that from here.
First I thought was too big restore, so instead of a single 2GB (compress) db backup I split into several backup, one for schema. This schema map has 600 Mb. Next step would be split for tables.
This one have some spatial data from my country map, not sure if that is relavant.
As you can see almost 2h. The disk arent really in use anymore. When restore start, disk reach 100% several times. But last hour has been flat 0%
And as you can see here I can access the data in the all the restored tables. So looks like is already done.
Is this normal?.
There is anything I can check to see what is doing the restore?
Hardware Setup:
Core i7 # 3.4 GHz - 24 GB Ram
DB on 250 gb SSD Backup files in SATA disk
EDIT
SELECT application_name, query, *
FROM pg_stat_activity
ORDER BY application_name, query;
Yes, that seems perfectly normal.
Most likely you observe index or constraint creation. Look at the output of
SELECT * FROM pg_stat_activity;
to confirm that (it should contain CREATE INDEX or ALTER TABLE).
It is too late now, but increasing maintenance_work_mem will speed up index creation.
I'm using a Dev level database on Heroku that was about 63GB and approaching about 9.9 million rows (close to the limit of 10 million for this tier). I ran a script that deleted about 5 million rows I didn't need, and now (few days later) in the Postgres control panel/using pginfo:table-size it shows roughly 4.7 million rows but it's still at 63GB. 64 is the limit for he next tier so I need to reduce the size.
I've tried vacuuming but pginfo:bloat said the bloat was only about 3GB. Any idea what's happening here?
If you have [vacuum][1]ed the table, don't worry about the size one disk still remaining unchanged. The space has been marked as reusable by new rows. So you can easily add another 4.7 million rows and the size on disk wont grow.
The standard form of VACUUM removes dead row versions in tables and
indexes and marks the space available for future reuse. However, it
will not return the space to the operating system, except in the
special case where one or more pages at the end of a table become
entirely free and an exclusive table lock can be easily obtained. In
contrast, VACUUM FULL actively compacts tables by writing a complete
new version of the table file with no dead space. This minimizes the
size of the table, but can take a long time. It also requires extra
disk space for the new copy of the table, until the operation
completes.
If you want to shrink it on disk, you will need to VACUUM FULL which locks the tables and needs as much extra space as the size of the tables when the operation is in progress. So you will have to check your quota before you try this and your site will be unresponsive.
Update:
You can get a rough idea about the size of your data on disk by querying the pg_class table like this:
SELECT SUM(relpages*8192) from pg_class
Another method is a query of this nature:
SELECT pg_database_size('yourdbname');
This link: https://www.postgresql.org/docs/9.5/static/disk-usage.html provides additional information on disk usage.
From the start,
I have a collection with about 51m records. So I indexed one of the fields when the percent progress started to increase above 100% (800% complete) so I cancelled this and figured I must have some db corruption
So I did a validation of the collections and found they were ok. Nonetheless I tried a compact and a repair and I find that in the temp folder (for a repair) or my db folder for a compact
I used to have 'collection.0' to 'collection.14' but checking it after 14 hours I found it counting to 'collection.64' and had to cancel it. Its highly unusual from my previous experiences that this is normal behaviour.
Previously the database size was 20 gig and during this compact it increased to well over 100gb due to this
What could be wrong and how would I fix my database?
From the console each one will have
Allocating new datafile ... Size: 2047MB, took 107 seconds
(For each of the 15 to 64 additional files)