Is `CLUSTER` applied by `pg_dump`? - postgresql

If CLUSTER is set on a table, then is it applied by pg_dump?
Specifically, the following:
Is it used to order the rows in the dump? If not, is there a way to do this?
Is it set on the table when using pg_restore? If not, is there a way to do this?

The dump will contain the statement
ALTER TABLE mytable CLUSTER ON anindex;
Restoring the dump will execute that statement. As the documentation explains,
This form selects the default index for future CLUSTER operations. It does not actually re-cluster the table.

Related

Migrating `int` to `bigint` in PostgresSQL without any downtime?

I have a database that is going to experience the integer exhaustion problem that Basecamp famously faced back in November. I have several months to figure out what to do.
Is there a no-downtime-required, proactive solution to migrating this column type? If so what is it? If not, is it just a matter of eating the downtime and migrating the column when I can?
Is this article sufficient, assuming I have several days/weeks to perform the migration now before I'm forced to do it when I run out of ids?
Use logical replication.
With logical replication you can have different data types at primary and standby.
Copy the schema with pg_dump -s, change the data types on the copy and then start logical replication.
Once all data is copied over, switch the application to use the standby.
For zero down time, the application has to be able to reconnect and retry, but that's always a requirement in such a case.
You need PostgreSQL v10 or better for that, and your database
shouldn't modify the schema, as DDL is not replicated.
should not use sequence (SERIAL or IDENTITY), as the last used value would not be replicated
Another solution for pre-v10 databases where all transactions are short:
Add a bigint column to the table.
Create a BEFORE trigger that sets the new column whenever a row is added or updated.
Run a series of updates that set the new column from the old one where it IS NULL. Keep those batches short so you don't lock long and don't deadlock much. Make sure these transaction run with session_replication_role = replica so they don't trigger triggers.
Once all rows are updated, create a unique index CONCURRENTLY on the new column.
Add a unique constraint USING the index you just created. That will be fast.
Perform the switch:
BEGIN;
ALTER TABLE ... DROP oldcol;
ALTER TABLE ... ALTER newcol RENAME TO oldcol;
COMMIT;
That will be fast.
Your new column has no NOT NULL set. This cannot be done without a long invasive lock. But you can add a check constraint IS NOT NULL and create it NOT VALID. That is good enough, and you can later validate it without disruptions.
If there are foreign key constraints, things get a little more complicated. You have to drop these and create NOT VALID foreign keys to the new column.
Create a copy of the old table but with modified ID field. Next create a trigger on the old table that inserts new data to both tables. Finally copy data from the old table to the new one (it would be a good idea to distinguish pre-trigger data with post-trigger for example by id if it is sequential). Once you are done switch tables and delete the old one.
This obviously requires twice as much space (and time for copy) but will work without any downtime.

PostgreSQL reset tables to original state

We have a large PostgreSQL dump with hundreds of tables that I can successfully import with pg_restore. We are developing a software that inserts into a lot of these tables (~100) and for every run we need to return these tables to their original state (that means to the content that was in the dump). Restoring the original dump again takes a lot of time and we just can't wait for half an hour before every debugging session. So I need a relatively fast way to revert these tables to the state they are in after restoring from the dump.
I've tried using pg_restore with -L switch and selecting these tables but I get either a duplicate key error when using both --data-only and --clean or a "cannot drop table X because other objects depend on it" error when using only --clean. Issuing a SET CONSTRAINTS ALL DEFERRED command before pg_restore did not work either. Maybe I have the rows in the table list all wrong, right now it's
491; 1259 39623998 TABLE public some_table some_user
8021; 0 0 COMMENT public TABLE some_table some_user
8022; 0 0 ACL public some_table some_user
for every table and then
6700; 0 39624062 TABLE DATA public some_table postgres
8419; 0 0 SEQUENCE SET public some_table_pk_id_seq some_user
for every table.
We only insert data and don't update existing rows so deleting all rows above an index and resetting the sequences might work, but I really don't want to have to manually create these commands for all the hundred tables and I'm not even sure it would work even if I set cascade to delete other objects depending on the given row.
Does anyone have any better idea how to handle this?
So you are looking for something like a snapshot in order to be able to revert quickly to a certain state.
I am not aware of a possiblity in PostgreSql to rollback to a certain timestamp.
While searching for a solution, I've found two ideas here
Use create database with the template option
Virtualize your PostgreSql installation using VMWare or VirtualBox, and use the snapshot feature of the virtual machines.
Again, both ideas are copied from the above source (I have search for "postgresql db snapshots").
You can use PITR to create a snapshot before loads and use the PITR snapshot to take you back to any point that you have the logs for.

db2 reorganize a table

When I alter a table in db2, I have to reorganize it
so I execute the next query:
Call Sysproc.admin_cmd ('reorg Table myTable');
I m searching an appropriate solution to reorganize a table when it s altered, or reorganize all the schema after making various modifications
You can determine when tables will require a REORG by looking at SYSIBMADM.ADMINTABINFO:
select tabschema, tabname
from sysibmadm.admintabinfo
where reorg_pending = 'Y'
You may also want to look at the NUM_REORG_REC_ALTERS column as this may show you additional tables that don't require reorganization due to various ALTER TABLE statements.
The reorg operation is similar to a defrag in hard disk. It frees empty spaces in pages, and eventually it could reorganize data according to an index. Depending on the features, it creates the compression dictionary and compress data.
As you can see, reorg operation is an administrative task, and it is not necessary each time data is modified. A database could run without reorg.
It order to ease this, DB2 included autonomic features like automatic backup, however this doesn't answer you own question. This will only trigger reorg on tables that need that.
To reorg a table explicitly you need to execute the command reorg http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001966.html
or via the admin_cmd http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0023582.html
in db2 config we have:
Automatic reorganization (AUTO_REORG) = OFF
we can set auto_reorg to on

DB2 SQL Error: SQLCODE=-911, SQLSTATE=40001, SQLERRMC=68

I am getting this error when I ran:
alter table tablename add column columnname varchar(1) default 'N';
DB2 SQL Error: SQLCODE=-911, SQLSTATE=40001, SQLERRMC=68
How to solve it?
The alter statement wants to get an X lock on this row in SYSIBM.SYSTABLES. There is an open transaction that has this row/index value in an incompatible lock state. This lock that caused the timeout could even be from an open cursor that reads this row with an RS or RR isolation level.
Terminate any other SQL currently trying to query SYSTABLES and any utilities that may be trying to update SYSTABLES like reorg and runstats then try the alter again.
See DB2 Info center (I picked the one for DB2 10, most likely this error code is the same in other versions, but doublecheck!).
Seems there is a transaction open on your table, that prevents your alter command from execution.
after you have Altered a table you need to Reorg: reade up on it here:
Run the runstats script, which is a DB2 script, at regular intervals and set the script to gather RUNSTATS WITH DISTRIBUTION AND DETAILED INDEXES ALL.
In addition to running the runstats scripts regularly, you can perform the following tasks to avoid the problem:
Use REOPT ONCE or REOPT ALWAYS with the command-line interface (CLI ) packages to change the query optimization behavior.
In the DB2 database, change the table to make it volatile. Volatile tables indicate to the DB2 optimizer that the table cardinality can change significantly at run time (from empty to large and vice versa). Therefore, DB2 uses an index to access a table rather than a table scan.

PostgreSQL v7.4 ALTER TABLE to change column

I have a need to change the length of CHAR columns in tables in a PostgreSQL v7.4 database. This version did not support the ability to directly change the column type or size using the ALTER TABLE statement. So, directly altering a column from a CHAR(10) to CHAR(20) for instance isn't possible (yeah, I know, "use varchars", but that's not an option in my current circumstance). Anyone have any advice/tricks on how to best accomplish this? My initial thoughts:
-- Save the table's data in a new "save" table.
CREATE TABLE save_data AS SELECT * FROM table_to_change;
-- Drop the columns from the first column to be changed on down.
ALTER TABLE table_to_change DROP column_name1; -- for each column starting with the first one that needs to be modified
ALTER TABLE table_to_change DROP column_name2;
...
-- Add the columns back, using the new size for the CHAR column
ALTER TABLE table_to_change ADD column_name1 CHAR(new_size); -- for each column dropped above
ALTER TABLE table_to_change ADD column_name2...
-- Copy the data bace from the "save" table
UPDATE table_to_change
SET column_name1=save_data.column_name1, -- for each column dropped/readded above
column_name2=save_date.column_name2,
...
FROM save_data
WHERE table_to_change.primary_key=save_data.primay_key;
Yuck! Hopefully there's a better way? Any suggestions appreciated. Thanks!
Not PostgreSQL, but in Oracle I have changed a column's type by:
Add a new column with a temporary name (ie: TMP_COL) and the new data type (ie: CHAR(20))
run an update query: UPDATE TBL SET TMP_COL = OLD_COL;
Drop OLD_COL
Rename TMP_COL to OLD_COL
I would dump the table contents to a flat file with COPY, drop the table, recreate it with the correct column setup, and then reload (with COPY again).
http://www.postgresql.org/docs/7.4/static/sql-copy.html
Is it acceptable to have downtime while performing this operation? Obviously what I've just described requires making the table unusable for a period of time, how long depends on the data size and hardware you're working with.
Edit: But COPY is quite a bit faster than INSERTs and UPDATEs. According to the docs you can make it even faster by using BINARY mode. BINARY makes it less compatible with other PGSQL installs but you won't care about that because you only want to load the data to the same instance that you dumped it from.
The best approach to your problem is to upgrade pg to something less archaic :)
Seriously. 7.4 is going to be removed from "supported versions" pretty soon, so I wouldn't wait for it to happen with 7.4 in production.