Postgres: alter table to unlogged - postgresql

I have seen this answer, How to apply PostgreSQL UNLOGGED feature to an existing table?, which basically suggests that the way to convert a table to unlogged is to run:
CREATE UNLOGGED TABLE your_table_unlogged AS SELECT * FROM your_table;
Is this still the case, because while this is an of obvious working solution, for a large table there are potential time and disk space factors that could come into play. And, if yes, could someone please explain briefly how the architecture of Postgres means you need to rewrite an entire table in order to make it unlogged?

Update: In PostgreSQL 9.5+ there is ALTER TABLE ... SET LOGGED and ... SET UNLOGGED
Converting from UNLOGGED to LOGGED requires that the whole table's data be written to xlogs if wal_level is > minimal so replicas get a copy. So it's not free, but it can still be worth creating a table unlogged, populating it then setting it logged if you have a bunch of cleanup and deletion and merging work to do on the table after initial load.
Yes, that's still the case in 9.4.
Converting from logged to UNLOGGED isn't theoretically hard AFAIK, but nobody's done the work to do it. The main thing is that all constraints and types etc referring to it must be re-checked to make sure there's no reference from another logged table to this table. Most attention has been paid to the other case, so if this feature is important to you, consider funding its development or getting involved in development yourself.
Converting UNLOGGED to logged may become possible for nodes that aren't involved in streaming replication or using an archive_command. It's not simple otherwise because of the need to cope with the fact that the data for the table wasn't sent, but suddenly changes to it are - the replication protocol would need further enhancement to allow the table to be base-copied before continuing.

Apparently this (alter table ... set logged | unlogged) has been implemented in (the upcoming) postgresql 9.5.

Related

Any disadvantage to leaving a read only table in unlogged mode after bulk loading?

Loading a large table (150GB+) from Teradata into PostgreSQL 9.6 on AWS. Conventional load was extremely slow so I set the table to "unlogged" and the load is going much faster. Once loaded, this database will be strictly read only for archival purposes. Is there any need to alter it back to logged when loaded? I understand that process takes a lot of time and would like to avoid that time if possible. we will be taking a backup of the data once this table is loaded and the data verified.
Edit: I should note I am using the copy command reading from a named pipe for this.
If you leave the table UNLOGGED, it will be empty after PostgreSQL recovers from a crash. So that is a bad idea if you need these data.

Postgres SET UNLOGGED takes a long time

Running Postgres-9.5. I have a large table that I'm doing ALTER TABLE table SET UNLOGGED on. I already dropped all foreign key constraints targeting the table since FK-referred tables can't be unlogged. The query took about 20 minutes and consumed 100% CPU the whole time. I can understand it taking a long time to make a table logged, but making it unlogged doesn't seem difficult... but is it?
Is there anything I could do to make it faster to set a table unlogged?
SET UNLOGGED involves a table rewrite, so for a large table, you can expect it to take quite a while.
As you said, it doesn't seem like making a table UNLOGGED should be that difficult. And simply converting the table isn't that difficult; the complicating factor is the need to make it crash-safe. An UNLOGGED table has an additional file associated with it (the init fork), and there's no way to synchronise the creation of this file with the rest of the commit.
So instead, SET UNLOGGED builds a copy of the table, with an init fork attached, and then swaps in the new relfilenode, which the commit can handle atomically. A more efficient implementation would be possible, but not without changing the representation of unlogged tables (which predate SET UNLOGGED by quite a while) or the logic behind COMMIT itself, both of which were deemed too intrusive for this relatively minor feature. You can read the discussion behind the design on the pgsql-hackers list.
If you really need to minimise downtime, you could take a similar approach to that taken by SET UNLOGGED: create a new UNLOGGED table, copy all of the records across, briefly lock the old table while you sync the last few changes, and swap the new table in with a RENAME when you're done.

is it safe to enable autovacuum for a table in PostgreSQL

Am newbie in PostgreSQL(Version 9.2) Database development. While looking one of my table a saw an option called autovaccum.
many of my table contains 20000+ rows.For testing purpose I've altered one of that table like below
ALTER TABLE theTable SET (
autovacuum_enabled = true
);
So,I wish to know the benefits/advantages/disadvantages(if any) autovacuuming a table ?
Autovacuum is enabled by default in current versions of Postgres (and has been for a while). It's generally a good thing to have enabled for performance and other reasons.
Prior to autovacuuming, you would need to explicitly vacuum tables yourself (via cronjobs which executed psql commands to vacuum them, or similar) in order to get rid of dead tuples, etc. Postgres has for a while now managed this for you via autovacuum.
I have in some cases, with tables that have immense churn (i.e. very high rates of insertions and deletions) found it necessary to still explicitly vacuum via a cron in order to keep the dead tuple count low and performance high, because the autovacuum doesn't kick in fast enough, but this is something of a niche case.
More info: http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html

What is supported as transactional in postgres

I am trying find out what is postgres can handle safely inside of transaction, but I cannot find the relavant information in the postgres manual. So far I have found out the following:
UPDATE, INSERT and DELTE are fully supported inside transactions and rolled back when the transaction is not finished
DROP TABLE is not handled safely inside a transaction, and is undone with a CREATE TABLE, thus recreates the dropped table but does not repopulate it
CREATE TABLE is also not truly transactionized and is instead undone with a corresponding DROP TABLE
Is this correct? Also I could not find any hints as to the handling of ALTER TABLE and TRUNCATE. In what way are those handled and are they safe inside transactions? Is there a difference of the handling between different types of transactions and different versions of postgres?
DROP TABLE is transactional. To undo this, you need to issue a ROLLBACK not a CREATE TABLE. The same goes for CREATE TABLE (which is also undone using ROLLBACK).
ROLLBACK is always the only correct way to undo a transaction - that includes ALTER TABLE and TRUNCATE.
The only thing that is never transactional in Postgres are the numbers generated by a sequence (CREATE/ALTER/DROP SEQUENCE themselves are transactional though).
Best I'm aware all of these commands are transaction aware, except for TRUNCATE ... RESTART IDENTITY (and even that one is transactional since 9.1.)
See the manual on concurrency control and transaction-related commands.

How to prevent Write Ahead Logging on just one table in PostgreSQL?

I am considering log-shipping of Write Ahead Logs (WAL) in PostgreSQL to create a warm-standby database. However I have one table in the database that receives a huge amount of INSERT/DELETEs each day, but which I don't care about protecting the data in it. To reduce the amount of WALs produced I was wondering, is there a way to prevent any activity on one table from being recorded in the WALs?
Ran across this old question, which now has a better answer. Postgres 9.1 introduced "Unlogged Tables", which are tables that don't log their DML changes to WAL. See the docs for more info, but at least now there is a solution for this problem.
See Waiting for 9.1 - UNLOGGED tables by depesz, and the 9.1 docs.
Unfortunately, I don't believe there is. The WAL logging operates on the page level, which is much lower than the table level and doesn't even know which page holds data from which table. In fact, the WAL files don't even know which pages belong to which database.
You might consider moving your high activity table to a completely different instance of PostgreSQL. This seems drastic, but I can't think of another way off the top of my head to avoid having that activity show up in your WAL files.
To offer one option to my own question. There are temp tables - "temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)" - which I think don't generate WALs. Even so, this might not be ideal as the table creation & design will be have to be in the code.
I'd consider memcached for use-cases like this. You can even spread the load over a bunch of cheap machines too.