I have a process running that can not be killed safely:
autovacuum: VACUUM public.mytable (to prevent wraparound)
This table has been cleared (aside from some entries that can not be deleted due to the table's corruption during a hardware issue) and can not be dropped, because the vacuum is blocking this. I had to run a kill -9 to stop this process and restarted the database, but you can't disable this autovacuum (to prevent [transaction] wraparound), so the autovacuum is coming back up and immediately getting stuck by this corrupt table.
Any insight into this?
First of all, shutdown database server and make a physical copy of data directory to a safe place.
Then you could truncate the datafile of corrupted table. E.g.:
--Get datafile path
db=# SELECT pg_relation_filepath('corrupted_table');
pg_relation_filepath
----------------------
base/1234/56789
(1 row)
Enter database directory (e.g: data/base/1234)
Rename the file to 56789_bkp
Create an empty file called 56789: touch 56789
Start database server
Issue a truncate table to force PostgreSQL overwrite datafile: TRUNCATE TABLE corrupted_table;
You may want to VACUUM and make a backup afterwards
Hope this helps.
Related
I have Postgresql 10 installed on my VM. I want to test as how recovery is done from WAL files.
I have created a scenario as below.
i have opened a transaction id for inserting records on a table like employee as
begin;
insert into employee values(select generate_series(1,200000),'abc',1);
here i didn't commit it as to recover it from WAL files to clear crash recovery process.
Please let me know that which steps needs to be taken so that i get the lost data which was in memory to recover it from WAL files. Keeping in mind that i have no streaming replication. It is a standalone single machine.
Thanks
Should I go with this below command to recover the tablespace in DEV environment, or is there a better solution?
db2 "backup database DEV tablespace (xyz) online to /dev/null"
If a table space is backup pending, it's usually due to some event that requires a new backup point, such as after a LOAD .. COPY NO. In such a situation it is best advised that you take a new backup to an actual location to save the image for future recoveries through this point in time.
If not, you could be exposed to data loss until a new backup including this table space is completed.
Thanks.
Few of the tables and indexes are bloated though auto vacuum has enables.
Two reclaim the space vacuum, I have ran vacuum full on the larger tables and also performed reindex on the indexes which are in larger size. Now the size of the database is in control.
After perfoming the vacuum full and reindexing on larger tables, I am facing below error.
org.postgresql.util.PSQLException: Error could not open file "base/16384/19048": No such file or directory
Please guide me how to resolve the above error and let me know does this has any relation with vacumm full or reindexing operation which are performed by me.
You should connect to the database with the OID 16384 and find out what that file is:
SELECT oid::regclass FROM pg_class WHERE relfilenode = 19048;
The error should be independent of VACUUM or REINDEX runs. Did you experience OS crashes recently?
The error indicates data corruption; a file is missing in your data directory.
The easiest option is to restore from backup.
Other than that, you can delete the object that belongs to the missing file to restore integrity (while potentially suffering data loss).
I run an update on a large table (e.g. 8 GB). It is a simple update of 3 fields in the table. I had no problems running it under postgresql 9.1, it would take 40-60 minutes but it worked. I run the same query in 9.4 database (freshly created, not upgraded) and it starts the update fine but then slows down. It uses only ~2% CPU, the level if IO is 4-5MB/s and it is sitting there. No locks, no other queries or connections, just this single update SQL on the server.
The SQL is below. "lookup" table has 12 records. The lookup can return only one row, it breaks a discrete scale (SMALLINT, -32768 .. +32767) into non-overlapping regions. "src" and "dest" tables are ~60 million records.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = (SELECT lookup.id FROM lookup WHERE src.value BETWEEN lookup.min AND lookup.max)
FROM src
WHERE dest.id = src.id;
I thought my disk slowed down but I can copy 1 GB files in parallel to query execution and it runs fast at >40MB/s and I have only one disk (it is a VM with ISCSI media). All other disk operations are not impacted, there is plenty of IO bandwidth. At the same time PostgreSQL is just sitting there doing very little, running very slowly.
I have 2 virtualized linux servers, one runs postgresql 9.1 and another runs 9.4. Both servers have close to identical postgresql configuration.
Has anyone else had similar experience? I am running out of ideas. Help.
Edit
The query "ran" for 20 hours I had to kill the connections and restart the server. Surprisingly it didn't kill the connection via query:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND datname = current_database();
and sever produced the following log:
2015-05-21 12:41:53.412 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT STATEMENT: UPDATE <... this is 60,000,000 record table update statement>
Also server restart took long time, producing the following log:
2015-05-21 12:43:36.730 EDT LOG: received fast shutdown request
2015-05-21 12:43:36.730 EDT LOG: aborting any active transactions
2015-05-21 12:43:36.730 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.734 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.747 EDT LOG: autovacuum launcher shutting down
2015-05-21 12:44:36.801 EDT LOG: received immediate shutdown request
2015-05-21 12:44:36.815 EDT WARNING: terminating connection because of crash of another server process
2015-05-21 12:44:36.815 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
"The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory" - is this an indication of a bug in PostgreSQL?
Edit
I tested 9.1, 9.3 and 9.4. Both 9.1 and 9.3 don't experience the slow down. 9.4 consistently slows down on large transactions. I noticed that when a transaction starts htop monitor indicates high CPU and the process status is "R" (running). Then it gradually changes to low CPU usage and status "D" - disk (see screenshot ). My biggest question is why 9.4 is different from 9.1 and 9.3? I have a dozen of servers and this effect is observed across the board.
Thanks everyone for the help. No matter how much I tried to emphasize on the difference of performance between identical configuration of 9.4 and previous versions no one seemed to pay attention to that.
The problem was solved by disabling transparent huge pages:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
Here are some resources I found helpful in reserching the issue:
* https://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit/34169#34169
* https://lwn.net/Articles/591723/
* https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
I'd suspect a lot of disk seeking - 5MB/s is just about right for a very random IO on ordinary (spinning) hard drive.
As you constantly replace basically all your rows I'd try to set dest table fillfactor to about 45% (alter table dest set (fillfactor=45);) and then cluster test using test_pkey;. This would allow updated row versions to be placed in the same disk sector.
Additionally using cluster src using src_pkey; so both tables would have data in the same physical order on disk also can help.
Also remember to vacuum table dest; after every update that large, so old row versions could be used again in subsequent updates.
Your old server probably evolved it's fillfactor naturally during multiple updates. On new server it is packed 100%, so updated rows have to be placed at the end.
If only few of the target rows are actually updated, you can avoid new row versions to be generated by using DISTICNT FROM. This can prevent a lot of useless disk traffic.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = lu.id
FROM src
JOIN lookup lu ON src.value BETWEEN lu.min AND lu.max
WHERE dest.id = src.id
-- avoid unnecessary row versions to be generated
AND (dest.field1 IS DISTINCT FROM src.field1
OR dest.field1 IS DISTINCT FROM src.field1
OR dest.field3_id IS DISTINCT FROM lu.id
)
;
We do nightly full backups of our db and I then use that dump to create my own dev-db. The creation of the dev-db takes roughly 10 minutes so its scheduled every morning by cron before I get to work. So I can now work with an almost live db.
But when I'm testing things it would sometimes be convenient to rollback the full db or just some specific tables to the initial backup. Of course I could do the full recreation of the dev-db but that would make me wait for another 10 minutes before I could run the tests again.
So is there an easy way to restore/rewind the database/table to a specific point in time or from a dump?
I have tried to use pg_restore like this to restore specific tables:
pg_restore -d my-dev-db -n stuff -t tableA -t tableB latest-live-db.dump
I have tried with options like -cand --data-only also. But there seems to be several issues here that I did not foresee:
The old data is not automatically removed when the restored data is copied back.
There is several foreign-key constraints that makes this impossible (correct me if I'm wrong) without explicitly removing the FK before the restore and then adding them back again.
PK-sequences that gets out of order does not concern me at all at this point but that might be an issue as well.
Edit: more things I tested/looked into:
pg_basebackup
A more brute force alternative to pg_basebackup is to stop the db-server, copy the db-files, then start the db-server.
Both of the alternatives above fail because I have several local databases running in the same cluster and that sums up to a lot of data on disk. There is no way to separate the databases this way! So the file copy action here will not give me any speed gain.
I'm assuming you are asking about a database not a cluster. The first thing that comes to my mind is to restore the backup to 2 different dbs, one with the dev_db name and the other with another name like dev_db_back. Then when you need a fresh db drop dev_db and rename dev_db_backup to dev_db with
drop database if exists dev_db;
alter database dev_db_backup rename to dev_db;
After that, to have another source to rename from, restore the backup to dev_db_backup again. This could be done by a script so the dropping, renaming and restoring would be automated. As dropping and renaming are instantaneous just start the script and the renaming is done without a need to wait for the new restore.
If it is common to need repeated restores in less 10 minutes intervals I think you can try to do what you are doing inside a transaction:
begin;
-- alter the db
-- test the alterations
commit; -- or ...
-- rollback;