How to practically rename tables and columns in PostgreSQL on a production system? - postgresql

It often happens that the original name something is given is not the best name. Maybe requirements shifted slightly, or maybe as time went on, a better understanding of the concept being represented developed. Sometimes one name is used in developing a feature, but testing with real users reveals that a better name is needed, and it'd be nice to have the names for things in the DB match the names used in the UI.
PostgreSQL lets you rename tables and columns via alter table, but often this isn't feasible in a production without significant downtime. If existing clients are using the old name, you can't just yank it out from under them.
I was hoping there was some way to "add a name" to a table or column, so that old_name and new_name will both work, and then at a later time remove the old name. Then a name migration could work like this:
add new_name
modify all clients to use new_name instead of old_name
once all clients have been updated, remove old_name
Is there a way to do this? If not, is there a recommended procedure for renaming a column while minimizing downtime? How about a table?

Some recipes for renaming tables and/or columns in a production system that seem to work. However, I've only tested these on a small test database, not on a large production system. Renaming and view creation are both supposed to be very fast, though.
Renaming a table
Rename the table, and temporarily add an updatable view with the old name:
begin;
alter table old_table rename to new_table;
create view old_table as select * from new_table;
commit;
Migrate clients to use the new table name.
Once all clients are migrated, drop the view:
drop view old_table;
Renaming columns without renaming the table
Renaming a column without renaming the table is a bit more complicated,
because we can't have a view shadow a table (apparently).
Rename the column(s), temporarily rename the table, and add an updatable
view that adds the old name for the column with the table's correct name:
begin;
alter table my_table rename column old_name to new_name;
alter table my_table rename to my_table_tmp;
create view my_table as
select *, new_name as old_name from my_table_tmp;
commit;
Migrate clients to use the new column name.
Once all clients are migrated, drop the view, rename the table back:
begin;
drop view my_table;
alter table my_table_tmp rename to my_table;
commit;
Renaming a table and some of its columns simultaneously
Rename the table and columns, and temporarily add an updatable view with the old name and old columns:
begin;
alter table old_table rename to new_table;
alter table new_table rename column to old_name to new_name;
create view old_table as
select *, old_name as new_name from new_table;
commit;
Instead of select *, old_name as new_name, it might be better to only have
the original set of columns, so clients have to migrate to the new table
name to get the new column names:
create view old_table as
select unchanged_name, old_name as new_name from new_table;
Migrate clients to use the new table and column names.
Once all clients are migrated, drop the view:
drop view old_table;

Related

add a column to a table which just references an existing column

Is there a way to add a column alias to an existing table, which just references another existing column in the table? such that reads and writes to the new column name will go to the existing column name. Sort of how a view in postgres can act as a read / write alias:
create view temp_order_contacts as (select * from order_emails)
This will make read / write possible to order_emails table but by calling temp_order_contacts instead.
Is there something similar but for columns?
Assuming this is for backwards compatibility; you want to rename a column, but you also want to existing queries to still work.
You can rename the table and create a view with the original name.
-- Move the existing table out of the way.
alter table some_table rename to _some_table;
-- Create a view in its place.
create view some_table as (
select
*,
-- provide a column alias
some_column as some_other_column
from _some_table
);

How to remove columns for real in postgresql?

I have a large system, and table schema updates quite offtenly, I noticed that after times of removing and recreating new cloumn, limitation "tables can have at most 1600 columns" is shown, but still there are few columns in information_schema.columns.
I've tried vacuum full analyze, still not working, any way to avoid this limitation?
DO $$
declare tbname varchar(1024);
BEGIN
FOR i IN 1..1599 LOOP
tbname := 'alter table vacuum_test add column test' || CAST(i AS varchar(8)) ||' int';
EXECUTE tbname;
END LOOP;
END $$;
alter table vacuum_test drop column test1;
VACUUM FULL ANALYZE vacuum_test;
alter table vacuum_test add column test1 int;
result:
alter table vacuum_test add column test1 int
> ERROR: tables can have at most 1600 columns
> 时间: 0.054s
Unfortunately vacuum full does not remove dropped columns from the table (i.e. entries that have attisdropped = true in `pg_attribute). I would have expected that, but apparently this does not happen.
The only way to get rid of the hidden columns is to create a brand new table and copy the data to the new table.
Something along the lines:
create table new_table (like old_table including all);
insert into new_table
select *
from old_table;
Then drop the old table and rename the new one to the old name. Constraint and index names will be named differently, so you might want to rename them as well.
You will have the re-create all foreign keys (incoming and outgoing) manually as they are not included when using CREATE TABLE (LIKE ...).
Another option is to use pg_repack which does this transparently in the background without locking the table.

Best practices for performing a table swap in Redshift

We're in the process of running a handful of hourly scripts on our Redshift cluster which build summary tables for data consumers. After assembling a staging table, the script then runs a transaction which deletes the existing table and replaces it with the staging table, as such:
BEGIN;
DROP TABLE IF EXISTS public.data_facts;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
COMMIT;
The problem with this operation is that long-running analysis queries will place an AccessShareLock on public.data_facts, preventing it from being dropped and thrashing our ETL cycle. I'm thinking a better solution would be one which renames the existing table, as such:
ALTER TABLE public.data_facts RENAME TO data_facts_old;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
DROP TABLE public.data_facts_old;
However, this approach presupposes that 1) public.data_facts exists, and 2) public.data_facts_old does not exist.
Do you know if there's a way to conduct this operation safely in SQL, without relying on application logic? (eg. something like ALTER TABLE IF EXISTS).
I haven't tried it but looking at the documentation of CREATE VIEW it seems that this can be done with late-binding views.
The main idea would be a view public.data_facts that users interact with. Behind the scenes, you can load new data and then swap the view to “point” to the new table.
Bootstrap
-- load data into public.data_facts_v0
CREATE VIEW public.data_facts AS
SELECT * from public.data_facts_v0 WITH NO SCHEMA BINDING;
Update
-- load data into public.data_facts_v1
CREATE OR REPLACE VIEW public.data_facts AS
SELECT * from public.data_facts_v1 WITH NO SCHEMA BINDING;
DROP TABLE public.data_facts_v0;
The WITH NO SCHEMA BINDING means the view will be late-binding. “A late-binding view doesn't check the underlying database objects, such as tables and other views, until the view is queried.” This means the update can even introduce a table with renamed columns or a completely new structure.
Notes:
It might be a good idea to wrap the swap operations into a transaction to make sure we don't drop the previous table if the VIEW swap failed.
You can add a new load time timestamp encode runlength default getdate() column to your target table, and make your ETL do this:
INSERT INTO public.data_facts
SELECT * FROM public.data_facts_staging;
DELETE FROM public.data_facts
WHERE load_time<(select max(load_time) from public.data_facts);
DROP TABLE public.data_facts_staging;
note: public.data_facts_staging should have exactly the same structure as public.data_facts except that the last column of public.data_facts is load_time, so that on insert it will be populated with the current timestamp.
The only implication is that it would require extra disk space for a moment between you insert new rows and delete the old rows, and load_time has to be always the last column. Also you have to vaccum table every time you do this.
Another good thing about this is that if your ETL fails and staging table is empty or there is no staging table you won't lose your data. In the pure SQL scenario of swapping tables with DDL you're not protected from dropping the target table when staging table is missing. In the suggested scenario if no new rows are inserted the delete statement deletes nothing (there are no rows less than max load time), so worst case is just having the old version of data.
p.s. there is a command that instead of insert ... select ... just changes the pointer from staging to target table (alter table ... append from ...) but it requires the same type of lock as alter table I guess, so I don't suggest this

How to clone or copy records in same table in postgres?

How to clone or copy records in same table in PostgreSQL by creating temporary table.
trying to create clones of records from one table to the same table with changed name(which is basically composite key in that table).
You can do it all in one INSERT combined with a SELECT.
i.e. say you have the following table definition and data populated in it:
create table original
(
id serial,
name text,
location text
);
INSERT INTO original (name, location)
VALUES ('joe', 'London'),
('james', 'Munich');
And then you can INSERT doing the kind of switch you're talking about without using a TEMP TABLE, like this:
INSERT INTO original (name, location)
SELECT 'john', location
FROM original
WHERE name = 'joe';
Here's an sqlfiddle.
This should also be faster (although for tiny data sets probably not hugely so in absolute time terms), since it's doing only one INSERT and SELECT as opposed to an extra SELECT and CREATE TABLE plus an UPDATE.
Did a bit of research, came up with a logic :
Create temp table
Copy records into it
Update the records in temp table
Copy it back to original table
CREATE TEMP TABLE temporary AS SELECT * FROM ORIGINAL WHERE NAME='joe';
UPDATE TEMP SET NAME='john' WHERE NAME='joe';
INSERT INTO ORIGINAL SELECT * FROM temporary WHERE NAME='john';
Was wondering if there was any shorter way to do it.

Flip flopping data tables in Postgres

I have a table of several million records which I am running a query against and inserting the results into another table which clients will query. This process takes about 20 seconds.
How can I run this query, building this new table without impacting any of the clients that might be running queries against the target table?
For instance. I'm running
BEGIN;
DROP TABLE target_table;
SELECT blah, blahX, blahY
INTO target_table
FROM source_table
GROUP BY blahX, blahY
COMMIT;
Which is then blocking queries to:
SELECT SUM(blah)
FROM target_table
WHERE blahX > x
In the days of working with some SQL Server DBA's I recall them creating temporary tables, and then flipping these in over the current table. Is this doable/practical in Postgres?
What you want here is to minimize the lock time, which of course if you include a query (that takes a while) in your transaction is not going to work.
In this case, I assume you're in fact refreshing that 'target_table' which contains the positions of the "blah" objects when you run your script is that correct ?
BEGIN;
CREATE TEMP TABLE temptable AS
SELECT blah, blahX, blahY
FROM source_table
GROUP BY blahX, blahY
COMMIT;
BEGIN;
TRUNCATE TABLE target_table
INSERT INTO target_table(blah,blahX,blahY)
SELECT blah,blahX,blahY FROM temptable;
DROP TABLE temptable;
COMMIT;
As mentioned in the comments, it will be faster to drop the index's before truncating and create them anew just after loading the data to avoid the unneeded index changes.
For the full details of what is and is not possible with postgreSQL in that regard :
http://postgresql.1045698.n5.nabble.com/ALTER-TABLE-REPLACE-WITH-td3305036i40.html
There's ALTER TABLE ... RENAME TO ...:
ALTER TABLE name
RENAME TO new_name
Perhaps you could select into an intermediate table and then drop target_table and rename the intermediate table to target_table.
I have no idea how this would interact with any queries that may be running against target_table when you try to do the rename.
You can create a table, drop a table, and rename a table in every version of SQL I've ever used.
BEGIN;
SELECT blah, blahX, blahY
INTO new_table
FROM source_table
GROUP BY blahX, blahY;
DROP TABLE target_table;
ALTER TABLE new_table RENAME TO target_table;
COMMIT;
I'm not sure off the top of my head whether you could use a temporary table for this in PostgreSQL. PostgreSQL creates temp tables in a special schema; you don't get to pick the schema. But you might be able to create it as a temporary table, drop the existing table, and move it with SET SCHEMA.
At some point, any of these will require a table lock. (Duh.) You might be able to speed things up a lot by putting the swappable table on a SSD.