According to the docs, after deleting rows in postgresql, they remain in a dead state, hence the need to periodically vacuum to reclaim this space. Does this also apply to row width when removing columns from tables, or is that space forever allocated?
The answer is yes - the space is reclaimed.
From the documentation about VACUUM (bold emphasis mine):
it writes a new copy of the table and doesn't release the old copy until the operation is complete
And from the notes in page about ALTER TABLE:
To force immediate reclamation of space occupied by a dropped column, you can execute one of the forms of ALTER TABLE that performs a rewrite of the whole table. This results in reconstructing each row with the dropped column replaced by a null value.
and one more:
depending on the parameter you might need to rewrite the table to get the desired effects. That can be done with VACUUM FULL, CLUSTER or one of the forms of ALTER TABLE that forces a table rewrite.
Related
We use PostgreSQL for analytics. Three typical operations we do on tables are:
Create table as select
Create table followed by insert in table
Drop table
We are not doing any UPDATE, DELETE etc.
For this situation can we assume that estimates would just be accurate?
SELECT reltuples AS estimate FROM pg_class where relname = 'mytable';
With autovacuum running (which is the default), ANALYZE and VACUUM are fired up automatically - both of which update reltuples. Basic configuration parameters for ANALYZE (which typically runs more often), (quoting the manual):
autovacuum_analyze_threshold (integer)
Specifies the minimum number of inserted, updated or deleted tuples
needed to trigger an ANALYZE in any one table. The default is 50
tuples. This parameter can only be set in the postgresql.conf file
or on the server command line; but the setting can be overridden for
individual tables by changing table storage parameters.
autovacuum_analyze_scale_factor (floating point)
Specifies a fraction of the table size to add to
autovacuum_analyze_threshold when deciding whether to trigger an
ANALYZE. The default is 0.1 (10% of table size). This parameter can
only be set in the postgresql.conf file or on the server command
line; but the setting can be overridden for individual tables by
changing table storage parameters.
Another quote gives insight to details:
For efficiency reasons, reltuples and relpages are not updated
on-the-fly, and so they usually contain somewhat out-of-date values.
They are updated by VACUUM, ANALYZE, and a few DDL commands such
as CREATE INDEX. A VACUUM or ANALYZE operation that does not
scan the entire table (which is commonly the case) will incrementally
update the reltuples count on the basis of the part of the table it
did scan, resulting in an approximate value. In any case, the planner
will scale the values it finds in pg_class to match the current
physical table size, thus obtaining a closer approximation.
Estimates are up to date accordingly. You can change autovacuum settings to be more aggressive. You can even do this per table. See:
Aggressive Autovacuum on PostgreSQL
On top of that, you can scale estimates like Postgres itself does it. See:
Fast way to discover the row count of a table in PostgreSQL
Note that VACUUM (of secondary relevance to your case) wasn't triggered by only INSERTs before Postgres 13. Quoting the release notes:
Allow inserts, not only updates and deletes, to trigger vacuuming
activity in autovacuum (Laurenz Albe, Darafei
Praliaskouski)
Previously, insert-only activity would trigger auto-analyze but not
auto-vacuum, on the grounds that there could not be any dead tuples to
remove. However, a vacuum scan has other useful side-effects such as
setting page-all-visible bits, which improves the efficiency of
index-only scans. Also, allowing an insert-only table to receive
periodic vacuuming helps to spread out the work of “freezing” old
tuples, so that there is not suddenly a large amount of freezing work
to do when the entire table reaches the anti-wraparound threshold all
at once.
If necessary, this behavior can be adjusted with the new parameters
autovacuum_vacuum_insert_threshold and
autovacuum_vacuum_insert_scale_factor, or the equivalent
table storage options.
I'm retrieving data from an AWS database using PgAdmin. This works well. The problem is that I have one column that I set to True after I retrieve the corresponding row, where originally it is set to Null. Doing so adds an enormous amount of data to my database.
I have checked that this is not due to other processes: it only happens when my program is running.
I am certain no rows are being added, I have checked the number of rows before and after and they're the same.
Furthermore, it only does this when changing specific tables, when I update other tables in the same database with the same process, the database size stays the same. It also does not always increase the database size, only once every couple changes does the total size increase.
How can changing a single boolean from Null to True add 0.1 MB to my database?
I'm using the following commands to check my database makeup:
To get table sizes
SELECT
relname as Table,
pg_total_relation_size(relid) As Size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as External Size
FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;
To get number of rows:
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
To get database size:
SELECT pg_database_size('mydatabasename')
If you have not changed that then your fillfactor is at 100% on the table since that is the default.
This means that every change in your table will mark the changed row as obsolete and will recreate the updated row. The issue could be even worse if you have indices on your table since those should be updated on every row change too. As you could imagine this hurts the UPDATE performance too.
So technically if you would read the whole table and update even the smallest column after reading the rows then it would double the table size when your fillfactor is 100.
What you can do is to ALTER your table lower the fillfactor on it, then VACUUM it:
ALTER TABLE your_table SET (fillfactor = 90);
VACUUM FULL your_table;
Of course with this step your table will be about 10% bigger but Postgres will spare some space for your updates and it won't change its size with your process.
The reason why autovacuum helps is because it cleans the obsoleted rows periodically and therefore it will keep your table at the same size. But it puts a lot of pressure on your database. If you happen to know that you'll do operations like you described in the opening question then I would recommend tuning the fillfactor for your needs.
The problem is that (source):
"In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table"
Furthermore, we did not always close the cursor which also increased database size while running.
One last problem is that we were running one huge query, not allowing the system to autovacuum properly. This problem is described in more detail here
Our solution was to re-approach the problem such that the rows did not have to be updated. Other solutions that we could think of but have not tried is to stop the process every once in a while allowing the autovacuum to work correctly.
What do you mean adds data? to all the data files? specifically to some files?
to get a precise answer you should supply more details, but generally speaking, any DB operation will add data to the transaction logs, and possibly other files.
I have a table with files and various relations to this table, files are stored as bytea. I want to free up space occupied by old files (according to timestamp), however the rows should still be present in the table.
Is it enough to set null to bytea field? Will the data be actually deleted from the table this way?
In PostgreSQL, updating a row creates a new tuple (row version), and the old one is left to be deleted by autovacuum.
Also, larger bytea attributes will be stored out-of-line in the TOAST table that belongs to the table.
When you set the bytea attribute to NULL (which is the right thing to do), two things will happen:
The main table will become bigger because of all the new tuples created by the UPDATE. Autovacuum will free the space, but not shrink the table (the empty space can be re-used by future data modifications).
Entries in the TOAST table will be deleted. Again, autovacuum will free the space, but the table won't shrink.
So what you will actually observe is that after the UPDATE, your table uses more space than before.
You can get rid of all that empty space by running VACUUM (FULL) on the table, but that will block concurrent access to the table for the duration of the operation, so be ready to schedule some down time (you'll probably do that for the UPDATE anyway).
I think I read somewhere that running an ALTER TABLE foo ADD COLUMN baz text on a postgres database will not cause a read or write lock. Setting a default value causes locking, but allowing a null default prevents a lock.
I can't find this in the documentation, though. Can anyone point to a place that says, definitively, if this is true or not?
The different sorts of locks and when they're used are mentioned in the doc in
Table-level Locks. For instance, Postgres 11's ALTER TABLE may acquire a SHARE UPDATE EXCLUSIVE, SHARE ROW EXCLUSIVE, or ACCESS EXCLUSIVE lock.
Postgres 9.1 through 9.3 claimed to support two of the above three but actually forced Access Exclusive for all variants of this command. This limitation was lifted in Postgres 9.4 but ADD COLUMN remains at ACCESS EXCLUSIVE by design.
It's easy to check in the source code because there's a function dedicated to establishing the lock level needed for this command in various cases: AlterTableGetLockLevel in src/backend/commands/tablecmds.c.
Concerning how much time the lock is held, once acquired:
When the column's default value is NULL, the column's addition should be very quick because it doesn't need a table rewrite: it's only an update in the catalog.
When the column has a non-NULL default value, it depends on PostgreSQL version: with version 11 or newer, there is no immediate rewriting of all the rows, so it should be as fast as the NULL case. But with version 10 or older, the table is entirely rewritten, so it may be quite expensive depending on the table's size.
Adding new null column will lock the table for very very short time since no need to rewrite all data on disk. While adding column with default value requires PostgreSQL to make new versions of all rows and store them on the disk. And during that time table will be locked.
So when you need to add column with default value to big table it's recommended to add null value first and then update all rows in small portions. This way you'll avoid high load on disk and allow autovacuum to do it's job so you'll not end up doubling table size.
http://www.postgresql.org/docs/current/static/sql-altertable.html#AEN57290
"Adding a column with a non-null default or changing the type of an existing column will require the entire table and indexes to be rewritten."
So the documentation only specifies when the table is not rewritten.
There will always be a lock, but it will be very short in case the table is not to be rewritten.
I have a table with 4 column, ID, NAME,AGE and COUNTRY.
For some time being purpose i have set my AGE column as unused by below command
alter table Personal set unused column AGE;
Now i want to use the same column AGE again, How to do this in Oracle(10g).
and dropping a column and setting a column to Unused which is best option. Pls guide me.
You cannot reuse a unused column. The only possible action on the column is to remove it from the table.
But you can add a new column with the same name, even without removing the unused column.
From the documentation
The ALTER TABLE...DROP UNUSED COLUMNS statement is the only action allowed on unused columns. It physically removes unused columns from the table and reclaims disk space.
In the ALTER TABLE statement that follows, the optional clause CHECKPOINT is specified. This clause causes a checkpoint to be applied after processing the specified number of rows, in this case 250. Checkpointing cuts down on the amount of undo logs accumulated during the drop column operation to avoid a potential exhaustion of undo space.
ALTER TABLE hr.admin_emp DROP UNUSED COLUMNS CHECKPOINT 250;
And this other (emphasis mine):
Marking Columns Unused
If you are concerned about the length of time it could take to drop column data from all of the rows in a large table, you can use the ALTER TABLE...SET UNUSED statement. This statement marks one or more columns as unused, but does not actually remove the target column data or restore the disk space occupied by these columns. However, a column that is marked as unused is not displayed in queries or data dictionary views, and its name is removed so that a new column can reuse that name. All constraints, indexes, and statistics defined on the column are also removed.