Flip flopping data tables in Postgres - postgresql

I have a table of several million records which I am running a query against and inserting the results into another table which clients will query. This process takes about 20 seconds.
How can I run this query, building this new table without impacting any of the clients that might be running queries against the target table?
For instance. I'm running
BEGIN;
DROP TABLE target_table;
SELECT blah, blahX, blahY
INTO target_table
FROM source_table
GROUP BY blahX, blahY
COMMIT;
Which is then blocking queries to:
SELECT SUM(blah)
FROM target_table
WHERE blahX > x
In the days of working with some SQL Server DBA's I recall them creating temporary tables, and then flipping these in over the current table. Is this doable/practical in Postgres?

What you want here is to minimize the lock time, which of course if you include a query (that takes a while) in your transaction is not going to work.
In this case, I assume you're in fact refreshing that 'target_table' which contains the positions of the "blah" objects when you run your script is that correct ?
BEGIN;
CREATE TEMP TABLE temptable AS
SELECT blah, blahX, blahY
FROM source_table
GROUP BY blahX, blahY
COMMIT;
BEGIN;
TRUNCATE TABLE target_table
INSERT INTO target_table(blah,blahX,blahY)
SELECT blah,blahX,blahY FROM temptable;
DROP TABLE temptable;
COMMIT;
As mentioned in the comments, it will be faster to drop the index's before truncating and create them anew just after loading the data to avoid the unneeded index changes.
For the full details of what is and is not possible with postgreSQL in that regard :
http://postgresql.1045698.n5.nabble.com/ALTER-TABLE-REPLACE-WITH-td3305036i40.html

There's ALTER TABLE ... RENAME TO ...:
ALTER TABLE name
RENAME TO new_name
Perhaps you could select into an intermediate table and then drop target_table and rename the intermediate table to target_table.
I have no idea how this would interact with any queries that may be running against target_table when you try to do the rename.

You can create a table, drop a table, and rename a table in every version of SQL I've ever used.
BEGIN;
SELECT blah, blahX, blahY
INTO new_table
FROM source_table
GROUP BY blahX, blahY;
DROP TABLE target_table;
ALTER TABLE new_table RENAME TO target_table;
COMMIT;
I'm not sure off the top of my head whether you could use a temporary table for this in PostgreSQL. PostgreSQL creates temp tables in a special schema; you don't get to pick the schema. But you might be able to create it as a temporary table, drop the existing table, and move it with SET SCHEMA.
At some point, any of these will require a table lock. (Duh.) You might be able to speed things up a lot by putting the swappable table on a SSD.

Related

How to practically rename tables and columns in PostgreSQL on a production system?

It often happens that the original name something is given is not the best name. Maybe requirements shifted slightly, or maybe as time went on, a better understanding of the concept being represented developed. Sometimes one name is used in developing a feature, but testing with real users reveals that a better name is needed, and it'd be nice to have the names for things in the DB match the names used in the UI.
PostgreSQL lets you rename tables and columns via alter table, but often this isn't feasible in a production without significant downtime. If existing clients are using the old name, you can't just yank it out from under them.
I was hoping there was some way to "add a name" to a table or column, so that old_name and new_name will both work, and then at a later time remove the old name. Then a name migration could work like this:
add new_name
modify all clients to use new_name instead of old_name
once all clients have been updated, remove old_name
Is there a way to do this? If not, is there a recommended procedure for renaming a column while minimizing downtime? How about a table?
Some recipes for renaming tables and/or columns in a production system that seem to work. However, I've only tested these on a small test database, not on a large production system. Renaming and view creation are both supposed to be very fast, though.
Renaming a table
Rename the table, and temporarily add an updatable view with the old name:
begin;
alter table old_table rename to new_table;
create view old_table as select * from new_table;
commit;
Migrate clients to use the new table name.
Once all clients are migrated, drop the view:
drop view old_table;
Renaming columns without renaming the table
Renaming a column without renaming the table is a bit more complicated,
because we can't have a view shadow a table (apparently).
Rename the column(s), temporarily rename the table, and add an updatable
view that adds the old name for the column with the table's correct name:
begin;
alter table my_table rename column old_name to new_name;
alter table my_table rename to my_table_tmp;
create view my_table as
select *, new_name as old_name from my_table_tmp;
commit;
Migrate clients to use the new column name.
Once all clients are migrated, drop the view, rename the table back:
begin;
drop view my_table;
alter table my_table_tmp rename to my_table;
commit;
Renaming a table and some of its columns simultaneously
Rename the table and columns, and temporarily add an updatable view with the old name and old columns:
begin;
alter table old_table rename to new_table;
alter table new_table rename column to old_name to new_name;
create view old_table as
select *, old_name as new_name from new_table;
commit;
Instead of select *, old_name as new_name, it might be better to only have
the original set of columns, so clients have to migrate to the new table
name to get the new column names:
create view old_table as
select unchanged_name, old_name as new_name from new_table;
Migrate clients to use the new table and column names.
Once all clients are migrated, drop the view:
drop view old_table;

How to remove columns for real in postgresql?

I have a large system, and table schema updates quite offtenly, I noticed that after times of removing and recreating new cloumn, limitation "tables can have at most 1600 columns" is shown, but still there are few columns in information_schema.columns.
I've tried vacuum full analyze, still not working, any way to avoid this limitation?
DO $$
declare tbname varchar(1024);
BEGIN
FOR i IN 1..1599 LOOP
tbname := 'alter table vacuum_test add column test' || CAST(i AS varchar(8)) ||' int';
EXECUTE tbname;
END LOOP;
END $$;
alter table vacuum_test drop column test1;
VACUUM FULL ANALYZE vacuum_test;
alter table vacuum_test add column test1 int;
result:
alter table vacuum_test add column test1 int
> ERROR: tables can have at most 1600 columns
> 时间: 0.054s
Unfortunately vacuum full does not remove dropped columns from the table (i.e. entries that have attisdropped = true in `pg_attribute). I would have expected that, but apparently this does not happen.
The only way to get rid of the hidden columns is to create a brand new table and copy the data to the new table.
Something along the lines:
create table new_table (like old_table including all);
insert into new_table
select *
from old_table;
Then drop the old table and rename the new one to the old name. Constraint and index names will be named differently, so you might want to rename them as well.
You will have the re-create all foreign keys (incoming and outgoing) manually as they are not included when using CREATE TABLE (LIKE ...).
Another option is to use pg_repack which does this transparently in the background without locking the table.

How to check whether table is busy or free before running the ALTER or creating TRIGGER on that table

We have thousands of tables. Out of these tables we have few tables. Which are busy some times. If I execute any ALTER statement or creating trigger on those tables I am unable to do it. How to check whether table is busy or free before running the ALTER or creating TRIGGER on that table in postgresql database.
The easiest way would be to run
LOCK TABLE mytable NOWAIT;
If you get no error, the ALTER TABLE statement can proceed without waiting.
Query below returns locked objects in a database.
select t.relname, l.locktype, page, virtualtransaction, pid, mode, granted
from pg_locks l, pg_stat_all_tables t
where l.relation=t.relid
order by relation asc;

How can I ensure synchronous DDL operations on a table that is being replaced?

I have multiple processes which are continually refreshing data in Redshift. They start a transaction, create a new table, COPY all the data from S3 into the new table, then drop the old table and rename the new table to the old table.
pseudocode:
start transaction;
create table foo_temp;
copy into foo_temp from S3;
drop table foo;
rename table foo_temp to foo;
commit;
I have several dozen tables that I update in this way. This works well but I would like to have multiple processes performing these table updates for redundancy purposes and to ensure that data is fairly fresh (different processes can update the data for different tables concurrently).
It works fine unless one process attempts to refresh a table that another process is working on. In that case the second process gets blocked by the first until it commits, and when it commits the second process gets the error:
ERROR: table 12345 dropped by concurrent transaction
Is there a simple way for me to guarantee that only one of my processes is refreshing a table so that the second process doesn't get into this situation?
I considered creating a special lock table for each of my real tables. The process would LOCK the special lock table before working on the companion real table. I think that will work but I would like to avoid creating a special lock table for each of my tables.
you need to protect readers from seeing the drop, do this by:
begin transaction
rename main table to old_main_table
rename tmp table to main table
commit
drop table old_main_table
Conn #1 Conn #2
-------------- ------------------------------------------
> create table bar (id int,id2 int,id3 int);
CREATE TABLE
> begin;
BEGIN
> begin;
BEGIN
> alter table bar rename to bar2;
ALTER TABLE
> select * from bar;
> create table bar (id int,id2 int,id3 int,id4 int);
CREATE TABLE
> commit; drop table bar2;
COMMIT
id | id2 | id3
----+-----+-----
(0 rows)
> commit;
COMMIT
DROP TABLE

Copy Postgres table while maintaining primary key autoincrement

I am trying to copy a table with this postgres command however the primary key autoincrement feature does not copy over. Is there any quick and simple way to accomplish this? Thanks!
CREATE TABLE table2 AS TABLE table;
Here's what I'd do:
BEGIN;
LOCK TABLE oldtable;
CREATE TABLE newtable (LIKE oldtable INCLUDING ALL);
INSERT INTO newtable SELECT * FROM oldtable;
SELECT setval('the_seq_name', (SELECT max(id) FROM oldtable)+1);
COMMIT;
... though this is a moderately unusual thing to need to do and I'd be interested in what problem you're trying to solve.