PostgreSQL ADD COLUMN DEFAULT NULL locks and performance - postgresql

I have a table in my PostgreSQL 9.6 database with 3 million rows. This table already has a null bitmap (it has 2 other DEFAULT NULL fields). I want to add a new boolean nullable column to this table. I stuck with the difference between these two statements:
ALTER TABLE my_table ADD COLUMN my_column BOOLEAN;
ALTER TABLE my_table ADD COLUMN my_column BOOLEAN DEFAULT NULL;
I think that these statements have no difference, but:
I can't find any proof of it in documentation. Documentation tells that providing DEFAULT value for the new column makes PostgreSQL to rewrite all the tuples, but I don't think that it's true for this case, cause default value is NULL.
I ran some tests on copy of this table, and the first statement (without DEFAULT NULL) took a little bit more time than the second. I can't understand why.
My questions are:
Will PostgreSQL use the same lock type (ACCESS EXCLUSIVE) for those two statements?
Will PostgreSQL rewrite all tuples to add NULL value to every of them in case that I use DEFAULT NULL?
Are there any difference between those two statements?

There's a issue in the response of Vao Tsun in point 2.
If you use ALTER TABLE my_table ADD COLUMN my_column BOOLEAN; it won't rewrite all the tuples, it will be just a change in the metadata.
But if you use ALTER TABLE my_table ADD COLUMN my_column BOOLEAN DEFAULT NULL, it will rewrite all the tuples, and it will last for ever on long tables.
The documentation itself tells this.
When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column's default value (NULL if no DEFAULT clause is specified). If there is no DEFAULT clause, this is merely a metadata change and does not require any immediate update of the table's data; the added NULL values are supplied on readout, instead.
This tell us that if there is a DEFAULT clause, even if it is NULL, it will rewrite all the tuples.
This is due to a performance issue on the updates clause. If you need to make an update over a no rewrited tuple, it will need to move the tuple to another disk space, consuming more time.
I tested this by my own on Postgresql 9.6, when i had to add a column, on a table that had 300+ million tuples. Without the DEFAULT NULL it lasted 11 ms, and with the DEFAULT NULL it lasted more than 30 minutes.

https://www.postgresql.org/docs/current/static/sql-altertable.html
Yes - same ACCESS EXCLUSIVE, no exceptions for DEFAULT NULL or no DEFAULT mentionned (statistics, "options", constraints, cluster would require less strict I think, but not add column)
Note that the lock level required may differ for each subform. An
ACCESS EXCLUSIVE lock is held unless explicitly noted. When multiple
subcommands are listed, the lock held will be the strictest one
required from any subcommand.
No - it will rather append NULL to result on select
When a column is added with ADD COLUMN, all existing rows in the table
are initialized with the column's default value (NULL if no DEFAULT
clause is specified). If there is no DEFAULT clause, this is merely a
metadata change and does not require any immediate update of the
table's data; the added NULL values are supplied on readout, instead.
No - no difference AFAIK. Just metadata change in both cases (as I believe it is one case expressed with different semantics)
Edit - Demo:
db=# create table so(i int);
CREATE TABLE
Time: 9.498 ms
db=# insert into so select generate_series(1,10*1000*1000);
INSERT 0 10000000
Time: 13899.190 ms
db=# alter table so add column nd BOOLEAN;
ALTER TABLE
Time: 1025.178 ms
db=# alter table so add column dn BOOLEAN default null;
ALTER TABLE
Time: 13.849 ms
db=# alter table so add column dnn BOOLEAN default true;
ALTER TABLE
Time: 14988.450 ms
db=# select version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.1 on x86_64-apple-darwin15.6.0, compiled by Apple LLVM version 8.0.0 (clang-800.0.42.1), 64-bit
(1 row)
lastly to avoid speculations it is data type specific:
db=# alter table so add column t text;
ALTER TABLE
Time: 25.831 ms
db=# alter table so add column tn text default null;
ALTER TABLE
Time: 13.798 ms
db=# alter table so add column tnn text default 'null';
ALTER TABLE
Time: 15440.318 ms

Related

Remove "identity flag" from a column in PostgreSQL

I have some tables in PostgreSQL 12.9 that were declared as something like
-- This table is written in old style
create table old_style_table_1 (
id bigserial not null primary key,
...
);
-- This table uses new feature
create table new_style_table_2 (
id bigint generated by default as identity,
...
);
Second table seems to be declared using the identity flag introduced in 10th version.
Time went by, and we have partitioned the old tables, while keeping the original sequences:
CREATE TABLE partitioned_old_style_table_1 (LIKE old_style_table_1 INCLUDING DEFAULTS) PARTITION BY HASH (user_id);
CREATE TABLE partitioned_new_style_table_2 (LIKE new_style_table_2 INCLUDING DEFAULTS) PARTITION BY HASH (user_id);
DDL for their id columns seems to be id bigint default nextval('old_style_table_1_id_seq') not null and id bigint default nextval('new_style_table_2_id_seq') not null.
Everything has worked fine so far. Partitioned tables proved to be a great boon and we decided to retire the old tables by dropping them.
DROP TABLE old_style_table_1, new_style_table_2;
-- [2BP01] ERROR: cannot drop desired object(s) because other objects depend on them
-- Detail: default value for column id of table old_style_table_1 depends on sequence old_style_table_1_id_seq
-- default value for column id of table new_style_table_2 depends on sequence new_style_table_2_id_seq
After some pondering I've found out that sequences may have owners in postgres, so I opted to change them:
ALTER SEQUENCE old_style_table_1_id_seq OWNED BY partitioned_old_style_table_1.id;
DROP TABLE old_style_table_1;
-- Worked out flawlessly
ALTER SEQUENCE new_style_table_2_id_seq OWNED BY partitioned_new_style_table_2.id;
ALTER SEQUENCE new_style_table_2_id_seq OWNED BY NONE;
-- Here's the culprit of the question:
-- [0A000] ERROR: cannot change ownership of identity sequence
So, apparently the fact that this column has pg_attribute.attidentity set to 'd' forbids me from:
• changing the default value of the column:
ALTER TABLE new_style_table_2 ALTER COLUMN id SET DEFAULT 0;
-- [42601] ERROR: column "id" of relation "new_style_table_2" is an identity column
• dropping the default value:
ALTER TABLE new_style_table_2 ALTER COLUMN id DROP DEFAULT;
-- [42601] ERROR: column "id" of relation "new_style_table_2" is an identity column
-- Hint: Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead.
• dropping the identity, column or the table altogether (new tables already depend on the sequence):
ALTER TABLE new_style_table_2 ALTER COLUMN id DROP IDENTITY IF EXISTS;
-- or
ALTER TABLE new_style_table_2 DROP COLUMN id;
-- or
DROP TABLE new_style_table_2;
-- result in
-- [2BP01] ERROR: cannot drop desired object(s) because other objects depend on them
-- default value for column id of table partitioned_new_style_table_2 depends on sequence new_style_table_2_id_seq
I've looked up the documentation, it provides the way to SET IDENTITY or ADD IDENTITY, but no way to remove it or to change to a throwaway sequence without attempting to drop the existing one.
➥ So, how am I able to remove an identity flag from the column-sequence pair so it won't affect other tables that use this sequence?
UPD: Tried running UPDATE pg_attribute SET attidentity='' WHERE attrelid=16816; on localhost, still receive [2BP01] and [0A000]. :/
Though I managed to execute the DROP DEFAULT value bit, but it seems like a dead end.
I don't think there is a safe and supported way to do that (without catalog modifications). Fortunately, there is nothing special about sequences that would make dropping them a problem. So take a short down time and:
remove the default value that uses the identity sequence
record the current value of the sequence
drop the table
create a new sequence with an appropriate START value
use the new sequence to set new default values
If you want an identity column, you should define it on the partitioned table, not on one of the partitions.

Serial column takes up disproportional amount of space in PostgreSQL

I would like to create an auto-incrementing id column that is not a primary key in a PostgreSQL table. The table is currently just over 200M rows and contains 14 columns.
SELECT pg_size_pretty(pg_total_relation_size('mytable'));
The above query reveals that mytable takes up 57 GB on disk. I currently have 30 GB free space remaining on disk after checking with df -h (on Ubuntu 20.04)
What I don't understand is why, after trying to create a SERIAL column, I completely run out of disk space - the query ends up never finishing. I run the following command:
ALTER TABLE mytable ADD COLUMN id SERIAL;
and then see how gradually, my disk space runs out until there is nothing left and the query fails. I am no database expert but it does not make sense. Why would a simple serialized column take up more than half of the space of the table itself, especially when it is not a primary key and therefore has no index? Is there a known workaround to creating such an auto-incrementing id column?
As a proof of concept:
create table id_test(pk_fld integer primary key generated always as identity);
--FYI, in Postgres 14+ the overriding system value won't be needed.
--That is a hack around a bug in 13-
insert into id_test overriding system value values (default), (default);
select * from id_test;
pk_fld
--------
1
2
alter table id_test add column id_fld integer ;
update id_test set id_fld = 0;
alter table id_test alter COLUMN id_fld set not null;
alter table id_test alter COLUMN id_fld add generated always as identity;
update id_test set id_fld = default;
select * from id_test;
pk_fld | id_fld
--------+--------
1 | 1
2 | 2
Basically this breaks the process down into steps. Obviously this is just a toy table and not representative of your setup. I would try it on test table that is a subset of you actual table to see what happens to disk space consumption. It would not hurt to use VACUUM after the updates to return rows to the database.
Adding a serial column is adding an integer column with a non-constant DEFAULT value. This will cause PostgreSQL to rewrite the table, because the new column value has to be added to all existing rows. So PostgreSQL writes a new copy of the table and discards the old one after it is done. This will require more than double the disk space of the original table temporarily, which explains why you run out of disk space.
You can split the operation into several steps:
ALTER TABLE mytable ADD id bigint;
CREATE SEQUENCE mytable_id_seq OWNED BY mytable.id;
ALTER TABLE mytable ALTER id SET DEFAULT nextval('mytable_id_seq');
This will not rewrite the table, and it will leave the existing rows untouched. The value of id for these columns will be NULL.
You probably want to update the existing rows to be NOT NULL, but be careful: if you update them all at once, you will run out of disk space as well, because in PostgreSQL an UPDATE writes a complete new version of the row to the table. You'd have to update the rows in batches and run VACUUM between these updates.
All in all, this is rather annoying and complicated. So do yourself a favor and increase the disk space. That is the simple and best solution.

Is there a method to do an ALTER Column in postgres 12 on an huge table without waiting a lifetime?

Is there a method to do an ALTER COLUMN in postgres 12 on an huge table without waiting a lifetime?
I try to convert a field from bigint to smallint :
ALTER TABLE huge ALTER COLUMN result_code TYPE SMALLINT;
It takes 28 hours, is there a smarter method?
The table has sequences, keys and foreign keys
The table has to be rewritten, and you have to wait.
If you have several columns whose data type you want to change, you can use several ALTER COLUMN clauses in a single ALTER TABLE statement and save time that way.
An alternative idea would be to use logical replication: set up an empty copy of the database (pg_dump -s), where your large table is defined with smallint columns. Replicate your database to that database, and switch over as soon as replication has caught up.

PostgreSQL: increasing a column's length in a very large table

Aurora PostgreSQL, version 10.4.
I have a table with several million rows. One of the columns is defined as character varying(255). Once upon a time, 255 was plenty of room, but now it's not, so I have to make more.
I found this in the PG 9.1 release notes:
Allow ALTER TABLE ... SET DATA TYPE to avoid table rewrites in appropriate cases (Noah Misch, Robert Haas)
For example, converting a varchar column to text no longer requires a rewrite of the table. However, increasing the length constraint on a varchar column still requires a table rewrite.
This suggests that changing to a longer varchar is not practical (since rewriting a table of that size would lock it for an ungodly amount of time), but changing to text would work. Is this correct?
Any other things I should know about when making such a change? I want to avoid data loss, obviously, and I can't afford to make this table inaccessible for more than a short period.
You should have read all release notes.
Because just one version later
Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite.
You can test that easily for yourself:
postgres=# select version();
version
------------------------------------------------------------
PostgreSQL 10.5, compiled by Visual C++ build 1800, 64-bit
(1 row)
postgres=# \timing on
Timing is on.
postgres=# create table alter_test (id serial, some_col varchar(255));
CREATE TABLE
Time: 22.331 ms
postgres=# insert into alter_test (some_col) select md5(random()::text) from generate_series(1,10e6);
INSERT 0 10000000
Time: 40894.275 ms (00:40.894)
postgres=# alter table alter_test alter column some_col type varchar(500);
ALTER TABLE
Time: 5.297 ms
postgres=#

Postgres alter field type from float4 to float8 on huge table

I want to alter column data type from float4 to float8 on a table with huge rows count. If I do it in usual path it takes much time and my table blocked for this time.
IS any hack to do it without rewrite the table content?
ALTER TABLE ... ALTER COLUMN ... TYPE ... USING ... (or related things like ALTER TABLE ... ADD COLUMN ... DEFAULT ... NOT NULL) requires a full table rewrite with an exclusive lock.
You can, with a bit of effort, work around this in steps:
ALTER TABLE thetable ADD COLUMN thecol_tmp newtype without NOT NULL.
Create a trigger on the table that, for every write to thecol, updates thecol_tmp as well, so new rows that're created, and rows that're updated, get a value for newcol_tmp as well as newcol.
In batches by ID range, UPDATE thetable SET thecol_tmp = CAST(thecol AS newtype) WHERE id BETWEEN .. AND ..
once all values are populated in thecol_tmp, ALTER TABLE thetable ALTER COLUMN thecol_tmp SET NOT NULL;.
Now swap the columns and drop the trigger in a single tx:
BEGIN;
ALTER TABLE thetable DROP COLUMN thecol;
ALTER TABLE thetable RENAME COLUMN thecol_tmp TO thecol;
DROP TRIGGER whatever_trigger_name ON thetable;
COMMIT;
Ideally we'd have an ALTER TABLE ... ALTER COLUMN ... CONCURRENTLY that did this within PostgreSQL, but nobody's implemented that. Yet.