Is there a method to do an ALTER Column in postgres 12 on an huge table without waiting a lifetime? - postgresql

Is there a method to do an ALTER COLUMN in postgres 12 on an huge table without waiting a lifetime?
I try to convert a field from bigint to smallint :
ALTER TABLE huge ALTER COLUMN result_code TYPE SMALLINT;
It takes 28 hours, is there a smarter method?
The table has sequences, keys and foreign keys

The table has to be rewritten, and you have to wait.
If you have several columns whose data type you want to change, you can use several ALTER COLUMN clauses in a single ALTER TABLE statement and save time that way.
An alternative idea would be to use logical replication: set up an empty copy of the database (pg_dump -s), where your large table is defined with smallint columns. Replicate your database to that database, and switch over as soon as replication has caught up.

Related

Serial column takes up disproportional amount of space in PostgreSQL

I would like to create an auto-incrementing id column that is not a primary key in a PostgreSQL table. The table is currently just over 200M rows and contains 14 columns.
SELECT pg_size_pretty(pg_total_relation_size('mytable'));
The above query reveals that mytable takes up 57 GB on disk. I currently have 30 GB free space remaining on disk after checking with df -h (on Ubuntu 20.04)
What I don't understand is why, after trying to create a SERIAL column, I completely run out of disk space - the query ends up never finishing. I run the following command:
ALTER TABLE mytable ADD COLUMN id SERIAL;
and then see how gradually, my disk space runs out until there is nothing left and the query fails. I am no database expert but it does not make sense. Why would a simple serialized column take up more than half of the space of the table itself, especially when it is not a primary key and therefore has no index? Is there a known workaround to creating such an auto-incrementing id column?
As a proof of concept:
create table id_test(pk_fld integer primary key generated always as identity);
--FYI, in Postgres 14+ the overriding system value won't be needed.
--That is a hack around a bug in 13-
insert into id_test overriding system value values (default), (default);
select * from id_test;
pk_fld
--------
1
2
alter table id_test add column id_fld integer ;
update id_test set id_fld = 0;
alter table id_test alter COLUMN id_fld set not null;
alter table id_test alter COLUMN id_fld add generated always as identity;
update id_test set id_fld = default;
select * from id_test;
pk_fld | id_fld
--------+--------
1 | 1
2 | 2
Basically this breaks the process down into steps. Obviously this is just a toy table and not representative of your setup. I would try it on test table that is a subset of you actual table to see what happens to disk space consumption. It would not hurt to use VACUUM after the updates to return rows to the database.
Adding a serial column is adding an integer column with a non-constant DEFAULT value. This will cause PostgreSQL to rewrite the table, because the new column value has to be added to all existing rows. So PostgreSQL writes a new copy of the table and discards the old one after it is done. This will require more than double the disk space of the original table temporarily, which explains why you run out of disk space.
You can split the operation into several steps:
ALTER TABLE mytable ADD id bigint;
CREATE SEQUENCE mytable_id_seq OWNED BY mytable.id;
ALTER TABLE mytable ALTER id SET DEFAULT nextval('mytable_id_seq');
This will not rewrite the table, and it will leave the existing rows untouched. The value of id for these columns will be NULL.
You probably want to update the existing rows to be NOT NULL, but be careful: if you update them all at once, you will run out of disk space as well, because in PostgreSQL an UPDATE writes a complete new version of the row to the table. You'd have to update the rows in batches and run VACUUM between these updates.
All in all, this is rather annoying and complicated. So do yourself a favor and increase the disk space. That is the simple and best solution.

Postgres: difference between DEFAULT in CREATE TABLE and ALTER TABLE in database dump

In database dump created with pg_dump, some tables have DEFAULTs in the CREATE TABLE statement, i.e.:
CREATE TABLE test (
f1 integer DEFAULT nextval('test_f1_seq'::regclass) NOT NULL
);
But others have an additional ALTER statement:
ALTER TABLE ONLY test2 ALTER COLUMN f1 SET DEFAULT nextval('test2_f1_seq'::regclass);
What is the reason of this? All sequential fields were created with type SERIAL, but in the dump they look different, and I can't guess any rule for this.
The difference must be that in the first case, the sequence is “owned” by the table column.
You can specify this dependency using the OWNED BY clause when you create a sequence. A sequence that is owned by a column will automatically be dropped when the column is.
If a sequence is implicitly created by using serial, it will be owned by the column.

PostgreSQL ADD COLUMN DEFAULT NULL locks and performance

I have a table in my PostgreSQL 9.6 database with 3 million rows. This table already has a null bitmap (it has 2 other DEFAULT NULL fields). I want to add a new boolean nullable column to this table. I stuck with the difference between these two statements:
ALTER TABLE my_table ADD COLUMN my_column BOOLEAN;
ALTER TABLE my_table ADD COLUMN my_column BOOLEAN DEFAULT NULL;
I think that these statements have no difference, but:
I can't find any proof of it in documentation. Documentation tells that providing DEFAULT value for the new column makes PostgreSQL to rewrite all the tuples, but I don't think that it's true for this case, cause default value is NULL.
I ran some tests on copy of this table, and the first statement (without DEFAULT NULL) took a little bit more time than the second. I can't understand why.
My questions are:
Will PostgreSQL use the same lock type (ACCESS EXCLUSIVE) for those two statements?
Will PostgreSQL rewrite all tuples to add NULL value to every of them in case that I use DEFAULT NULL?
Are there any difference between those two statements?
There's a issue in the response of Vao Tsun in point 2.
If you use ALTER TABLE my_table ADD COLUMN my_column BOOLEAN; it won't rewrite all the tuples, it will be just a change in the metadata.
But if you use ALTER TABLE my_table ADD COLUMN my_column BOOLEAN DEFAULT NULL, it will rewrite all the tuples, and it will last for ever on long tables.
The documentation itself tells this.
When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column's default value (NULL if no DEFAULT clause is specified). If there is no DEFAULT clause, this is merely a metadata change and does not require any immediate update of the table's data; the added NULL values are supplied on readout, instead.
This tell us that if there is a DEFAULT clause, even if it is NULL, it will rewrite all the tuples.
This is due to a performance issue on the updates clause. If you need to make an update over a no rewrited tuple, it will need to move the tuple to another disk space, consuming more time.
I tested this by my own on Postgresql 9.6, when i had to add a column, on a table that had 300+ million tuples. Without the DEFAULT NULL it lasted 11 ms, and with the DEFAULT NULL it lasted more than 30 minutes.
https://www.postgresql.org/docs/current/static/sql-altertable.html
Yes - same ACCESS EXCLUSIVE, no exceptions for DEFAULT NULL or no DEFAULT mentionned (statistics, "options", constraints, cluster would require less strict I think, but not add column)
Note that the lock level required may differ for each subform. An
ACCESS EXCLUSIVE lock is held unless explicitly noted. When multiple
subcommands are listed, the lock held will be the strictest one
required from any subcommand.
No - it will rather append NULL to result on select
When a column is added with ADD COLUMN, all existing rows in the table
are initialized with the column's default value (NULL if no DEFAULT
clause is specified). If there is no DEFAULT clause, this is merely a
metadata change and does not require any immediate update of the
table's data; the added NULL values are supplied on readout, instead.
No - no difference AFAIK. Just metadata change in both cases (as I believe it is one case expressed with different semantics)
Edit - Demo:
db=# create table so(i int);
CREATE TABLE
Time: 9.498 ms
db=# insert into so select generate_series(1,10*1000*1000);
INSERT 0 10000000
Time: 13899.190 ms
db=# alter table so add column nd BOOLEAN;
ALTER TABLE
Time: 1025.178 ms
db=# alter table so add column dn BOOLEAN default null;
ALTER TABLE
Time: 13.849 ms
db=# alter table so add column dnn BOOLEAN default true;
ALTER TABLE
Time: 14988.450 ms
db=# select version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.1 on x86_64-apple-darwin15.6.0, compiled by Apple LLVM version 8.0.0 (clang-800.0.42.1), 64-bit
(1 row)
lastly to avoid speculations it is data type specific:
db=# alter table so add column t text;
ALTER TABLE
Time: 25.831 ms
db=# alter table so add column tn text default null;
ALTER TABLE
Time: 13.798 ms
db=# alter table so add column tnn text default 'null';
ALTER TABLE
Time: 15440.318 ms

POSTGRESQL:autoincrement for varchar type field

I'm switching from MongoDB to PostgreSQL and was wondering how I can implement the same concept as used in MongoDB for uniquely identifying each raws by MongoId.
After migration, the already existing unique fields in our database is saved as character type. I am looking for minimum source code changes.
So if any way exist in postgresql for generating auto increment unique Id for each inserting into table.
The closest thing to MongoDB's ObjectId in PostgreSQL is the uuid type. Note that ObjectId has only 12 bytes, while UUIDs have 128 bits (16 bytes).
You can convert your existsing IDs by appending (or prepending) f.ex. '00000000' to them.
alter table some_table
alter id_column
type uuid
using (id_column || '00000000')::uuid;
Although it would be the best if you can do this while migrating the schema + data. If you can't do it during the migration, you need to update you IDs (while they are still varchars: this way the referenced columns will propagate the change), drop foreign keys, do the alter type and then re-apply foreign keys.
You can generate various UUIDs (for default values of the column) with the uuid-ossp module.
create extension "uuid-ossp";
alter table some_table
alter id_column
set default uuid_generate_v4();
Use a sequence as a default for the column:
create sequence some_id_sequence
start with 100000
owned by some_table.id_column;
The start with should be bigger then your current maximum number.
Then use that sequence as a default for your column:
alter table some_table
alter id_column set default nextval('some_id_sequence')::text;
The better solution would be to change the column to an integer column. Storing numbers in a text (or varchar) column is a really bad idea.

Increasing the size of character varying type in postgres without data loss

I need to increase the size of a character varying(60) field in a postgres database table without data loss.
I have this command
alter table client_details alter column name set character varying(200);
will this command increase the the field size from 60 to 200 without data loss?
The correct query to change the data type limit of the particular column:
ALTER TABLE client_details ALTER COLUMN name TYPE character varying(200);
Referring to this documentation, there would be no data loss, alter column only casts old data to new data so a cast between character data should be fine. But I don't think your syntax is correct, see the documentation I mentioned earlier. I think you should be using this syntax :
ALTER [ COLUMN ] column TYPE type [
USING expression ]
And as a note, wouldn't it be easier to just create a table, populate it and test :)
Yes. But it will rewrite this table and lock it exclusively for duration of rewriting — any query trying to access this table will wait until rewrite finishes.
Consider changing type to text and using check constraint for limiting size — changing constraint would not rewrite or lock a table.
you can use this below sql command
ALTER TABLE client_details
ALTER COLUMN name TYPE varchar(200)
From PostgreSQL 9.2 Relase Notes E.15.3.4.2
Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite.
Changing the Column Size in Postgresql 9.1 version
During the Column chainging the varchar size to higher values, table re write is required during this lock will be held on table and user table not able access
till table re-write is done.
Table Name :- userdata
Column Name:- acc_no
ALTER TABLE userdata ALTER COLUMN acc_no TYPE varchar(250);