Postgres determine size of all blobs - postgresql

hi i'm trying to find the size of all blobs. I always used this
SELECT sum(pg_column_size(pg_largeobject)) lob_size FROM pg_largeobject
but while my database is growing ~40GB this takes several hours and loads the cpu too much.
is there any more efficent way?

Some of the functions mentioned in Database Object Management Functions give an immediate result for the entire table.
I'd suggest pg_table_size(regclass) which is defined as:
Disk space used by the specified table, excluding indexes (but
including TOAST, free space map, and visibility map)
It differs from sum(pg_column_size(tablename)) FROM tablename because it counts entire pages, so that includes the padding between the rows, the dead rows (updated or deleted and not reused), and the row headers.

Related

update single column taking lot of time and lot of disk space in postgresql

I was executed the update query on single column with 10 million records. it is taking lot of time and lot of disk space in postgresql. please find the below query.
"update test set record_status = 'I';"
and my column data type is "char(1)"
please suggest me why it is taking lot of time and lot of disk space.
Thanks,
Vittal
In PostgreSQL, an UPDATE creates a new version of the row (“tuple”) (see for example this part of the documentation).
So by updating every row in the table, you effectively double the size of the table. The old row versions get removed by autovacuum, but the size of the table stays the same (it becomes “bloated”).
The empty space can be reused by future INSERTs, but if you want to get rid of it, use VACUUM (FULL).
The long duration is also explained by the many new tuples. Increasing max_wal_size may help some by reducing the number of checkpoints that are caused by the statement.

Space required for VACUUM FULL table

From the PostgreSQL 10.4 manual regarding a full vacuum:
Note that they also temporarily use extra disk space approximately equal to the size of the table, since the old copies of the table and indexes can't
be released until the new ones are complete
I've read in this in many different places and phrased in a variety of ways. Some indicating that the space required is at most equal to the size of the vacuumed table. Hinting that it may only require enough space to store the resulting vacuumed table, i.e. of size in the range [0-size_of_original_table], depending on how many dead rows are in the table.
My question is: Will doing a full vacuum of a table always require a space equal to the original table size or is it dependent on the number of live rows in the table?
The additional space required by VACUUM (FULL) depends on the number of live rows in the table.
What happens during VACUUM (FULL) is that a new copy of the table is written.
All live tuples (= row versions) and the dead tuples that cannot be removed yet will be written to this new copy.
When the transaction finishes, the old copy will be removed.
It is recommended to have the free space at least equal to size of largest table in the database.
i.e, if your database size is 10GB and the largest table size on your database is 2GB. Then you must have at least 2GB of extra space on your disk, in order to complete the vacuum successfully.
Because VACUUM FULL will create a new copy of the table, excluding the dead rows and then remove the existing tables.

Postgres: Do we always need at least 3-4 times free the space of the biggest table?

we are using Postgres to store ~ 2.000.000.000 samples. This ends up in tables with ~ 500 mio entries and ~100GB Size each table.
What I want to do:
E.g. update the table entries: UPDATE table SET flag = true;
After this, the table is twice as big, i.e. 200GB
To get the space (stored on a SSD) back we: "VACCUM FULL table"
Unfortunately, this step needs again loads of space which results in the Vacuum to fail due to too little space left.
My Questions:
Does this mean, that, in order to make this UPDATE query only once and to get the space back for other tables in this DB we need at least 300-400GB space for a 100GB table?
In your scenario, you won't get away without having at least twice as much space as the table data would require.
The cheapest solution is probably to define the table with a fillfactor of 50 so that half of each block is left empty, thereby doubling the table size. Then the updated rows can all be in the same block as the original rows, and the UPDATE won't increase the table size because PostgreSQL can use the heap only tuple (HOT) update feature. The old versions will be freed immediately if there are no long running transactions that can still see them.
NOTE: This will only work if the colum you are updating is not indexed.
The downside of this approach is that the table is always twice the necessary size, and all sequential scans will take twice as long. It won't bother you if you don't use sequential scans of the table.

Heroku Postgres database size doesn't go down after deleting rows

I'm using a Dev level database on Heroku that was about 63GB and approaching about 9.9 million rows (close to the limit of 10 million for this tier). I ran a script that deleted about 5 million rows I didn't need, and now (few days later) in the Postgres control panel/using pginfo:table-size it shows roughly 4.7 million rows but it's still at 63GB. 64 is the limit for he next tier so I need to reduce the size.
I've tried vacuuming but pginfo:bloat said the bloat was only about 3GB. Any idea what's happening here?
If you have [vacuum][1]ed the table, don't worry about the size one disk still remaining unchanged. The space has been marked as reusable by new rows. So you can easily add another 4.7 million rows and the size on disk wont grow.
The standard form of VACUUM removes dead row versions in tables and
indexes and marks the space available for future reuse. However, it
will not return the space to the operating system, except in the
special case where one or more pages at the end of a table become
entirely free and an exclusive table lock can be easily obtained. In
contrast, VACUUM FULL actively compacts tables by writing a complete
new version of the table file with no dead space. This minimizes the
size of the table, but can take a long time. It also requires extra
disk space for the new copy of the table, until the operation
completes.
If you want to shrink it on disk, you will need to VACUUM FULL which locks the tables and needs as much extra space as the size of the tables when the operation is in progress. So you will have to check your quota before you try this and your site will be unresponsive.
Update:
You can get a rough idea about the size of your data on disk by querying the pg_class table like this:
SELECT SUM(relpages*8192) from pg_class
Another method is a query of this nature:
SELECT pg_database_size('yourdbname');
This link: https://www.postgresql.org/docs/9.5/static/disk-usage.html provides additional information on disk usage.

What is the effect on record size of reordering columns in PostgreSQL?

Since Postgres can only add columns at the end of tables, I end up re-ordering by adding new columns at the end of the table, setting them equal to existing columns, and then dropping the original columns.
So, what does PostgreSQL do with the memory that's freed by dropped columns? Does it automatically re-use the memory, so a single record consumes the same amount of space as it did before? But that would require a re-write of the whole table, so to avoid that, does it just keep a bunch of blank space around in each record?
The question is old, but since both answers are wrong or misleading, I'll add another one.
When updating a row, Postgres writes a new row version and the old one is eventually removed by VACUUM after no running transaction can see it any more.
Plain VACUUM does not return disk space from the physical file that contains the table to the system, unless it finds completely dead or empty blocks at the physical end of the table. You need to run VACUUM FULL or CLUSTER to aggressively compact the table and return excess space to the system. This is not typically desirable in normal operation. Postgres can re-use dead tuples to keep new row versions on the same data page, which benefits performance.
In your case, since you update every row, the size of the table is doubled (from its minimum size). It's advisable to run VACUUM FULL or CLUSTER to return the bloat to the system.
Both take an exclusive lock on the table. If that interferes with concurrent access, consider pg_repack, which can do the same without exclusive locks.
To clarify: Running CLUSTER reclaims the space completely. No VACUUM FULL is needed after CLUSTER (and vice versa).
More details:
PostgreSQL 9.2: vacuum returning disk space to operating system
From the docs:
The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.
You'll need to do a CLUSTER followed by a VACUUM FULL to reclaim the space.
Why do you "reorder" ? There is no order in SQL, it doesn't make sence. If you need a fixed order, tell your queries what order you need or use a view, that's what views are made for.
Diskspace will be used again after vacuum, auto_vacuum will do the job. Unless you disabled this process.
Your current approach will kill overall performance (table locks), indexes have to be recreated, statistics go down the toilet, etc. etc. And in the end, you end up with the same situation you allready had. So why the effort?