In postgresql 9.2.1, i'm running vacuum freeze process in single user mode. suddenly, the vacuuming process got stuck in the half way through (just frozen while vacuuming some transactions).
What would be the reasons for that vacuuming process can stuck in
half way ?
How can i prevent it ?
Related
I am running PostgreSQL 9.6 on Centos 7.
The database was migrated from a PostgreSQL 9.4 server that did not have the issue.
With autovacuum on Postgres is using 100% of one core constantly (10% of total CPU). With autovacuum off it does not use any CPU other than when executing queries.
If this expected or normal, or something bad going on? Note it is a very big database, with many schemas/tables.
I tried,
vacuumdb --all -w
and,
ANALYZE VERBOSE;
The "ANALYZE VERBOSE;" made the database run a lot faster, but did not change the CPU usage.
I have this issue that is driving me nuts. Despite all my efforts, I am not able to force my postgres server to shut down. I have followed those instructions : http://www.question-defense.com/2008/10/17/pg_ctl-server-does-not-shut-down-force-postgres-to-shutdown
but still, nothing happens and all I got in the shell is
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
Any help much appreciated.
Update: Checking the logs, I have this recurring error :
LOG: checkpoints are occurring too frequently (25 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
After giving it a lot of thoughts especially on the way I installed it at the first place, I realize that I set up the install so the daemon would launch postgres at the start of my machine. Thus, any manual killing would simply result in the recreation of those process by the same daemon.
To resolve this problem you need to stop the daemon from working using launchctl and remove a .plist file in your postgres directory.
Good luck if you face the same problem.
You probably run with the default setting of "checkpoint_segments = 3", that produces the warnings. Your database does many writes, right? It takes some time to write all of this to disk, and your database is quite busy rotating the logfiles, instead doing real work.
If you increase checkpint_segments, you will see performance improvements, and less I/O.
For further readings: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
What is the expected behavior for postgresql processes when system time is changed?
For eg: A case is seen with postgres autovacuum process that, it is not working after time has been changed and only solution is to restart the Database.
I run an update on a large table (e.g. 8 GB). It is a simple update of 3 fields in the table. I had no problems running it under postgresql 9.1, it would take 40-60 minutes but it worked. I run the same query in 9.4 database (freshly created, not upgraded) and it starts the update fine but then slows down. It uses only ~2% CPU, the level if IO is 4-5MB/s and it is sitting there. No locks, no other queries or connections, just this single update SQL on the server.
The SQL is below. "lookup" table has 12 records. The lookup can return only one row, it breaks a discrete scale (SMALLINT, -32768 .. +32767) into non-overlapping regions. "src" and "dest" tables are ~60 million records.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = (SELECT lookup.id FROM lookup WHERE src.value BETWEEN lookup.min AND lookup.max)
FROM src
WHERE dest.id = src.id;
I thought my disk slowed down but I can copy 1 GB files in parallel to query execution and it runs fast at >40MB/s and I have only one disk (it is a VM with ISCSI media). All other disk operations are not impacted, there is plenty of IO bandwidth. At the same time PostgreSQL is just sitting there doing very little, running very slowly.
I have 2 virtualized linux servers, one runs postgresql 9.1 and another runs 9.4. Both servers have close to identical postgresql configuration.
Has anyone else had similar experience? I am running out of ideas. Help.
Edit
The query "ran" for 20 hours I had to kill the connections and restart the server. Surprisingly it didn't kill the connection via query:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND datname = current_database();
and sever produced the following log:
2015-05-21 12:41:53.412 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT STATEMENT: UPDATE <... this is 60,000,000 record table update statement>
Also server restart took long time, producing the following log:
2015-05-21 12:43:36.730 EDT LOG: received fast shutdown request
2015-05-21 12:43:36.730 EDT LOG: aborting any active transactions
2015-05-21 12:43:36.730 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.734 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.747 EDT LOG: autovacuum launcher shutting down
2015-05-21 12:44:36.801 EDT LOG: received immediate shutdown request
2015-05-21 12:44:36.815 EDT WARNING: terminating connection because of crash of another server process
2015-05-21 12:44:36.815 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
"The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory" - is this an indication of a bug in PostgreSQL?
Edit
I tested 9.1, 9.3 and 9.4. Both 9.1 and 9.3 don't experience the slow down. 9.4 consistently slows down on large transactions. I noticed that when a transaction starts htop monitor indicates high CPU and the process status is "R" (running). Then it gradually changes to low CPU usage and status "D" - disk (see screenshot ). My biggest question is why 9.4 is different from 9.1 and 9.3? I have a dozen of servers and this effect is observed across the board.
Thanks everyone for the help. No matter how much I tried to emphasize on the difference of performance between identical configuration of 9.4 and previous versions no one seemed to pay attention to that.
The problem was solved by disabling transparent huge pages:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
Here are some resources I found helpful in reserching the issue:
* https://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit/34169#34169
* https://lwn.net/Articles/591723/
* https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
I'd suspect a lot of disk seeking - 5MB/s is just about right for a very random IO on ordinary (spinning) hard drive.
As you constantly replace basically all your rows I'd try to set dest table fillfactor to about 45% (alter table dest set (fillfactor=45);) and then cluster test using test_pkey;. This would allow updated row versions to be placed in the same disk sector.
Additionally using cluster src using src_pkey; so both tables would have data in the same physical order on disk also can help.
Also remember to vacuum table dest; after every update that large, so old row versions could be used again in subsequent updates.
Your old server probably evolved it's fillfactor naturally during multiple updates. On new server it is packed 100%, so updated rows have to be placed at the end.
If only few of the target rows are actually updated, you can avoid new row versions to be generated by using DISTICNT FROM. This can prevent a lot of useless disk traffic.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = lu.id
FROM src
JOIN lookup lu ON src.value BETWEEN lu.min AND lu.max
WHERE dest.id = src.id
-- avoid unnecessary row versions to be generated
AND (dest.field1 IS DISTINCT FROM src.field1
OR dest.field1 IS DISTINCT FROM src.field1
OR dest.field3_id IS DISTINCT FROM lu.id
)
;
i have written a wrong infinite-loop recursive query as i have been learning recursive postgres queries and affected the whole server postgres server
every database request takes inifinitely long time, giving me timeouts. so i have tried doing /etc/init.d/postgresql restart but while it seems to have helped a little, the postgres queries still run slowly
are the unfinished queries cached even after the postgtres server restart? how could i escape this sticky situation? :(
thank you ammoQ and Tometzky for your answers, it ended up the postgres process was cleaning up a lot of stuff. it gets back to normal after a couple of minutes