How to remove columns for real in postgresql? - postgresql

I have a large system, and table schema updates quite offtenly, I noticed that after times of removing and recreating new cloumn, limitation "tables can have at most 1600 columns" is shown, but still there are few columns in information_schema.columns.
I've tried vacuum full analyze, still not working, any way to avoid this limitation?
DO $$
declare tbname varchar(1024);
BEGIN
FOR i IN 1..1599 LOOP
tbname := 'alter table vacuum_test add column test' || CAST(i AS varchar(8)) ||' int';
EXECUTE tbname;
END LOOP;
END $$;
alter table vacuum_test drop column test1;
VACUUM FULL ANALYZE vacuum_test;
alter table vacuum_test add column test1 int;
result:
alter table vacuum_test add column test1 int
> ERROR: tables can have at most 1600 columns
> 时间: 0.054s

Unfortunately vacuum full does not remove dropped columns from the table (i.e. entries that have attisdropped = true in `pg_attribute). I would have expected that, but apparently this does not happen.
The only way to get rid of the hidden columns is to create a brand new table and copy the data to the new table.
Something along the lines:
create table new_table (like old_table including all);
insert into new_table
select *
from old_table;
Then drop the old table and rename the new one to the old name. Constraint and index names will be named differently, so you might want to rename them as well.
You will have the re-create all foreign keys (incoming and outgoing) manually as they are not included when using CREATE TABLE (LIKE ...).
Another option is to use pg_repack which does this transparently in the background without locking the table.

Related

Replacing table and renaming primary key - Postgres

To preface, I am trying to replace an entire table with a new table with same columns, but with updated values.
I have the following SQL code:
BEGIN;
ALTER TABLE "original" RENAME TO "original_old";
ALTER TABLE "original_new" RENAME TO "original";
ALTER TABLE "original" RENAME CONSTRAINT "temp_original_id" to "original_id";
DROP TABLE "original_old";
COMMIT;
Output:
ERROR: constraint "temp_original_id" for table "original" does not exist
However, if I do the following before the last ALTER statement:
SELECT * from original;
I see temp_original_id present in the table.
I can't seem to find any other sources that lead me to updating primary key (at least that worked)
The table I am replacing also has dependencies with other tables.. So I was wondering if this would be a viable solution to even begin with
Did you mean ALTER TABLE "original" RENAME COLUMN "temp_original_id" to "original_id"; ?

How to practically rename tables and columns in PostgreSQL on a production system?

It often happens that the original name something is given is not the best name. Maybe requirements shifted slightly, or maybe as time went on, a better understanding of the concept being represented developed. Sometimes one name is used in developing a feature, but testing with real users reveals that a better name is needed, and it'd be nice to have the names for things in the DB match the names used in the UI.
PostgreSQL lets you rename tables and columns via alter table, but often this isn't feasible in a production without significant downtime. If existing clients are using the old name, you can't just yank it out from under them.
I was hoping there was some way to "add a name" to a table or column, so that old_name and new_name will both work, and then at a later time remove the old name. Then a name migration could work like this:
add new_name
modify all clients to use new_name instead of old_name
once all clients have been updated, remove old_name
Is there a way to do this? If not, is there a recommended procedure for renaming a column while minimizing downtime? How about a table?
Some recipes for renaming tables and/or columns in a production system that seem to work. However, I've only tested these on a small test database, not on a large production system. Renaming and view creation are both supposed to be very fast, though.
Renaming a table
Rename the table, and temporarily add an updatable view with the old name:
begin;
alter table old_table rename to new_table;
create view old_table as select * from new_table;
commit;
Migrate clients to use the new table name.
Once all clients are migrated, drop the view:
drop view old_table;
Renaming columns without renaming the table
Renaming a column without renaming the table is a bit more complicated,
because we can't have a view shadow a table (apparently).
Rename the column(s), temporarily rename the table, and add an updatable
view that adds the old name for the column with the table's correct name:
begin;
alter table my_table rename column old_name to new_name;
alter table my_table rename to my_table_tmp;
create view my_table as
select *, new_name as old_name from my_table_tmp;
commit;
Migrate clients to use the new column name.
Once all clients are migrated, drop the view, rename the table back:
begin;
drop view my_table;
alter table my_table_tmp rename to my_table;
commit;
Renaming a table and some of its columns simultaneously
Rename the table and columns, and temporarily add an updatable view with the old name and old columns:
begin;
alter table old_table rename to new_table;
alter table new_table rename column to old_name to new_name;
create view old_table as
select *, old_name as new_name from new_table;
commit;
Instead of select *, old_name as new_name, it might be better to only have
the original set of columns, so clients have to migrate to the new table
name to get the new column names:
create view old_table as
select unchanged_name, old_name as new_name from new_table;
Migrate clients to use the new table and column names.
Once all clients are migrated, drop the view:
drop view old_table;

PostgreSQL: is it possible to ALTER TABLE and INSERT to it within a single transaction?

My use case is the following: I'm working on periodical data imports to a PostgreSQL database where some external data has to be imported at regular intervals.
The caveat is the structure of the data might change from one import to the other, therefore I'm truncating + dropping all columns on every import before inserting the new data.
I would like to wrap the entire operation within a single transaction, so in case something goes wrong, the transaction would be rolled back and the old data would still be accessible (lesser of two evils kind-of-thing).
As an example, here is what a data import statement might look like:
BEGIN
ALTER TABLE "external_data" DROP "date"
ALTER TABLE "external_data" DROP "column1"
ALTER TABLE "external_data" DROP "column2"
ALTER TABLE "external_data" ADD "date" date DEFAULT NULL
ALTER TABLE "external_data" ADD "column1" text DEFAULT NULL
ALTER TABLE "external_data" ADD "column2" text DEFAULT NULL
ALTER TABLE "external_data" ADD "column3" text DEFAULT NULL
INSERT INTO "external_data" ("date","column1","column2","column3") VALUES ('20170523','Berlin','Chrome','1'),('20170524','Berlin','Chrome','2')
COMMIT
This is currently not working. The INSERT statement gets stuck because, when it's called, the table is still locked from the ALTER TABLE statement that preceded it.
Is there any way to achieve this within Postgres transactions or should I give up and go for some other solution application-side?
This is currently not working. The INSERT statement gets stuck
because, when it's called, the table is still locked from the ALTER
TABLE statement that preceded it.
No, a transaction can't lock itself that way. The INSERT would be blocked if it was initiated by another transaction, but not by the one that already has a strong lock on the object. There is no problem in dropping the column and doing a subsequent INSERT in the same transaction.
The reason why it seems to be stuck is probably, as mentioned in the comments, that if you feed the sequence of queries from the question to an interactive interpreter, it would not execute any query at all, because there is no indication of the end of any query. If the interpreter is psql this sequence lacks either semi-colons or a \g meta-command at the end of queries.
A SQL query by itself does not need a semi-colon at its end, it's only when several queries can be submitted together that it's required.
Yes. It is posible to alter table and insert inside of one transaction
You can see the example in http://rextester.com/OTU89086
But. Warning you cannot do many drop/add columns
There are a limit of how much columns you can add (even if you are dropping others). If you do a lot You get:
54011: tables can have at most 1600 columns
You can see that problen here:
--PostgreSQL 9.6
--'\\' is a delimiter
select version() as postgresql_version;
drop table if exists "external_data";
create table "external_data"(
"date" date,
"column1" integer,
"column2" text,
"column3" boolean
);
BEGIN TRANSACTION;
create or replace function do_the_import()
returns text
language plpgsql as
$body$
begin
ALTER TABLE "external_data" DROP "date";
ALTER TABLE "external_data" DROP "column1";
ALTER TABLE "external_data" DROP "column2";
ALTER TABLE "external_data" DROP "column3";
ALTER TABLE "external_data" ADD "date" date DEFAULT NULL;
ALTER TABLE "external_data" ADD "column1" text DEFAULT NULL;
ALTER TABLE "external_data" ADD "column2" text DEFAULT NULL;
ALTER TABLE "external_data" ADD "column3" text DEFAULT NULL;
INSERT INTO "external_data" ("date","column1","column2","column3") VALUES ('20170523','Berlin','Chrome','1'),('20170524','Berlin','Chrome','2');
return current_timestamp::text;
end;
$body$;
select count(do_the_import()) from generate_series(1,1000);
COMMIT;
Try it here: http://rextester.com/RPER86062

Postgres alter field type from float4 to float8 on huge table

I want to alter column data type from float4 to float8 on a table with huge rows count. If I do it in usual path it takes much time and my table blocked for this time.
IS any hack to do it without rewrite the table content?
ALTER TABLE ... ALTER COLUMN ... TYPE ... USING ... (or related things like ALTER TABLE ... ADD COLUMN ... DEFAULT ... NOT NULL) requires a full table rewrite with an exclusive lock.
You can, with a bit of effort, work around this in steps:
ALTER TABLE thetable ADD COLUMN thecol_tmp newtype without NOT NULL.
Create a trigger on the table that, for every write to thecol, updates thecol_tmp as well, so new rows that're created, and rows that're updated, get a value for newcol_tmp as well as newcol.
In batches by ID range, UPDATE thetable SET thecol_tmp = CAST(thecol AS newtype) WHERE id BETWEEN .. AND ..
once all values are populated in thecol_tmp, ALTER TABLE thetable ALTER COLUMN thecol_tmp SET NOT NULL;.
Now swap the columns and drop the trigger in a single tx:
BEGIN;
ALTER TABLE thetable DROP COLUMN thecol;
ALTER TABLE thetable RENAME COLUMN thecol_tmp TO thecol;
DROP TRIGGER whatever_trigger_name ON thetable;
COMMIT;
Ideally we'd have an ALTER TABLE ... ALTER COLUMN ... CONCURRENTLY that did this within PostgreSQL, but nobody's implemented that. Yet.

Flip flopping data tables in Postgres

I have a table of several million records which I am running a query against and inserting the results into another table which clients will query. This process takes about 20 seconds.
How can I run this query, building this new table without impacting any of the clients that might be running queries against the target table?
For instance. I'm running
BEGIN;
DROP TABLE target_table;
SELECT blah, blahX, blahY
INTO target_table
FROM source_table
GROUP BY blahX, blahY
COMMIT;
Which is then blocking queries to:
SELECT SUM(blah)
FROM target_table
WHERE blahX > x
In the days of working with some SQL Server DBA's I recall them creating temporary tables, and then flipping these in over the current table. Is this doable/practical in Postgres?
What you want here is to minimize the lock time, which of course if you include a query (that takes a while) in your transaction is not going to work.
In this case, I assume you're in fact refreshing that 'target_table' which contains the positions of the "blah" objects when you run your script is that correct ?
BEGIN;
CREATE TEMP TABLE temptable AS
SELECT blah, blahX, blahY
FROM source_table
GROUP BY blahX, blahY
COMMIT;
BEGIN;
TRUNCATE TABLE target_table
INSERT INTO target_table(blah,blahX,blahY)
SELECT blah,blahX,blahY FROM temptable;
DROP TABLE temptable;
COMMIT;
As mentioned in the comments, it will be faster to drop the index's before truncating and create them anew just after loading the data to avoid the unneeded index changes.
For the full details of what is and is not possible with postgreSQL in that regard :
http://postgresql.1045698.n5.nabble.com/ALTER-TABLE-REPLACE-WITH-td3305036i40.html
There's ALTER TABLE ... RENAME TO ...:
ALTER TABLE name
RENAME TO new_name
Perhaps you could select into an intermediate table and then drop target_table and rename the intermediate table to target_table.
I have no idea how this would interact with any queries that may be running against target_table when you try to do the rename.
You can create a table, drop a table, and rename a table in every version of SQL I've ever used.
BEGIN;
SELECT blah, blahX, blahY
INTO new_table
FROM source_table
GROUP BY blahX, blahY;
DROP TABLE target_table;
ALTER TABLE new_table RENAME TO target_table;
COMMIT;
I'm not sure off the top of my head whether you could use a temporary table for this in PostgreSQL. PostgreSQL creates temp tables in a special schema; you don't get to pick the schema. But you might be able to create it as a temporary table, drop the existing table, and move it with SET SCHEMA.
At some point, any of these will require a table lock. (Duh.) You might be able to speed things up a lot by putting the swappable table on a SSD.