Running Postgres-9.5. I have a large table that I'm doing ALTER TABLE table SET UNLOGGED on. I already dropped all foreign key constraints targeting the table since FK-referred tables can't be unlogged. The query took about 20 minutes and consumed 100% CPU the whole time. I can understand it taking a long time to make a table logged, but making it unlogged doesn't seem difficult... but is it?
Is there anything I could do to make it faster to set a table unlogged?
SET UNLOGGED involves a table rewrite, so for a large table, you can expect it to take quite a while.
As you said, it doesn't seem like making a table UNLOGGED should be that difficult. And simply converting the table isn't that difficult; the complicating factor is the need to make it crash-safe. An UNLOGGED table has an additional file associated with it (the init fork), and there's no way to synchronise the creation of this file with the rest of the commit.
So instead, SET UNLOGGED builds a copy of the table, with an init fork attached, and then swaps in the new relfilenode, which the commit can handle atomically. A more efficient implementation would be possible, but not without changing the representation of unlogged tables (which predate SET UNLOGGED by quite a while) or the logic behind COMMIT itself, both of which were deemed too intrusive for this relatively minor feature. You can read the discussion behind the design on the pgsql-hackers list.
If you really need to minimise downtime, you could take a similar approach to that taken by SET UNLOGGED: create a new UNLOGGED table, copy all of the records across, briefly lock the old table while you sync the last few changes, and swap the new table in with a RENAME when you're done.
Related
I have a script which imports data from csv files and loads into db tables. The tables are partitioned by a column like customer_id. The script first loads data into a staging table, checks unique key/FK constraints, deletes data which violates the FK/unique constraints.
Then it drops the existing partition from the main table and adds this staging table as a partition. My question is this the best approach to import data? The another approach I can think of is: don't use partitions, just use staging table and after cleaning the data, import it into the main table after deleting the existing data.
If you can load data partitionwise, that is going to be better.
Deleting many rows in a PostgreSQL table is painful, and DROP TABLE will always win.
Keep in mind that in PostgreSQL, a DELETE doesn't actually delete data from the table -- it merely flags the row as invisible to later transactions. Later, when the table is vacuumed, those invisible rows get flagged as re-usable for future INSERTs and UPDATEs. Only when a VACUUM FULL on the table is run will the space be reclaimed. Therefore, "import[ing] it into the maint able after deleting the existing data" would cause bloat.
A DROP TABLE will immediately reclaim space. As such, it would make more sense to work with partitions and drop partitions to reclaim space.
More information about this behavior can be found in the documentation
I have the following column in a postgreSQL database
column | character varying(10) | not null default 'default'::character varying
I want to drop it, but the database is huge and if it blocks updates for an extended period of time I will be publicly flogged, and likely drawn and quartered. I found a blog from braintree, here, which suggests this is a safe operation but its a little vague.
The ALTER TABLE command needs to acquire an ACCESS EXCLUSIVE lock on the table, which will block everything trying to access that table, including SELECTs, and, as the name implies, needs to wait for existing operations to finish so it can be exclusive.
So, if your table is extremely busy, it may not get an opportunity to actually acquire the exclusive lock, and will simply block for what is functionally forever.
It also depends whether this column has a lot of indexes and dependencies. If there are dependencies (i.e. foreign keys or views), you'll need to add CASCADE to the DROP COLUMN, and this will increase the work that needs to be done, and the amount of time it will need to hold the exclusive lock.
So, it's not risk free. However, you should know fairly quickly after trying it whether it's likely to block for a long time. If you can try it and safely take a minute or two of potentially blocking that table, it's worth a shot -- try the drop and see. If it doesn't complete within a relatively short period of time, abort the command and you'll likely need to schedule some downtime of at least the app(s) that are hammering the table. (You can take a look at the server activity and the lock activity to try to surmise what's hammering that table.)
does drop column block a PostgreSQL database
The answer to that is no, because it does not block the database.
However any DDL statement requires an exclusive lock on the table being changed. Which means no other transaction can access the table. So the table is "blocked", not the database.
However the time to drop a column is really very short, because the column isn't physically removed from the table but only marked as no longer there.
And don't forget to commit the DDL statement (if you have turned autocommit off), otherwise the table will be blocked until you commit your change.
I am trying to add a new column to a table with upwards of 9 million records.
This issue is the column needs to be default value of 'N'. When updating the table the database is getting an issue with the temp data being filled. Also, it is taking a huge amount of time.
I was wondering if anyone knows of anyway to make this faster or a better way of doing this to avoid problems with the temp data filling up.
The database is Oracle10g.
If you could move to 11g and the column was NOT NULL, Oracle has an optimization where the default value doesn't need to be stored in each row so you can add the column very quickly. Unfortunately, it sounds like you're stuck with a depricated version of Oracle where that isn't available.
Most likely, you don't have a lot of really good options other than waiting. It may be more efficient, assuming you're doing this during a period of downtime, to create a new table with the new column, do a direct-path insert of all the data from the old table to the new table, rename the tables, and re-point any constraints at the new table. Whether this is actually more efficient than waiting for the update will depend on your hardware and your table but an INSERT is likely to be more efficient than an UPDATE. On the other hand, for a new single-character column that isn't going to create a lot of migrated rows, you're probably better off waiting for the UPDATE rather than going to this level of effort-- there are a lot of things that could potentially go wrong that you'd need to test and validate (i.e. making sure that you updated all the constraints correctly).
I have seen this answer, How to apply PostgreSQL UNLOGGED feature to an existing table?, which basically suggests that the way to convert a table to unlogged is to run:
CREATE UNLOGGED TABLE your_table_unlogged AS SELECT * FROM your_table;
Is this still the case, because while this is an of obvious working solution, for a large table there are potential time and disk space factors that could come into play. And, if yes, could someone please explain briefly how the architecture of Postgres means you need to rewrite an entire table in order to make it unlogged?
Update: In PostgreSQL 9.5+ there is ALTER TABLE ... SET LOGGED and ... SET UNLOGGED
Converting from UNLOGGED to LOGGED requires that the whole table's data be written to xlogs if wal_level is > minimal so replicas get a copy. So it's not free, but it can still be worth creating a table unlogged, populating it then setting it logged if you have a bunch of cleanup and deletion and merging work to do on the table after initial load.
Yes, that's still the case in 9.4.
Converting from logged to UNLOGGED isn't theoretically hard AFAIK, but nobody's done the work to do it. The main thing is that all constraints and types etc referring to it must be re-checked to make sure there's no reference from another logged table to this table. Most attention has been paid to the other case, so if this feature is important to you, consider funding its development or getting involved in development yourself.
Converting UNLOGGED to logged may become possible for nodes that aren't involved in streaming replication or using an archive_command. It's not simple otherwise because of the need to cope with the fact that the data for the table wasn't sent, but suddenly changes to it are - the replication protocol would need further enhancement to allow the table to be base-copied before continuing.
Apparently this (alter table ... set logged | unlogged) has been implemented in (the upcoming) postgresql 9.5.
I created a set of partitioned tables in Postgres, and started inserting a lot of rows via the master table. When the load process blew up on me, I realized I should have declared the id row BIGSERIAL (BIGINT with a sequence, behind the scenes), but inadvertently set it as SERIAL (INTEGER). Now that I have a couple of billion rows loaded, I am trying to ALTER the column to BIGINT. The process seems to be working, but is taking a long time. So, in reality, I don't really know if it is working or it is hung. I'd rather not restart the entire load process again.
Any suggestions?
When you update a row to alter it in PostgreSQL, that writes out a new copy of the row and then does some cleanup later to remove the original. This means that trying to fix the problem by doing updates can take longer than just loading all the data in from scratch again--it's more disk I/O than loading a new copy, and some extra processing time too. The only situation where you'd want to do an update instead of a reload is when the original load was very inefficient, for example if a slow client programs is inserting the data and it's the bottleneck on the process.
To figure out if the process is still working, see if it's using CPU when you run top (UNIX-ish systems) or the Task Manager (Windows). On Linux, "top -c" will even show you what the PostgreSQL client processes are doing. You probably just expected it to take less time than the original load, which it won't, and it's still running rather than hung up.
Restart it (clarifying edit: restart the entire load process again).
Altering a column value requires a new row version, and all indexes pointing to the old version to be updated to point to the new version.
Additionally, see how much of the advise on populating databases you can follow.
Correction from #archnid:
altering the type of the column will trigger a table rewrite, so the row versioning isn't a big problem, but it will still take lots of disk space temporarily. you can usually monitor progress by looking at which files in the database directory are being appended to...