Connection lost after query runs for few minutes in PostgreSQL - postgresql

I am using PostgreSQL 8.4 and PostGIS 1.5. What I'm trying to do is INSERT data from one table to another (but not strictly the same data). For each column, a few queries are run and there are a total of 50143 rows stored in the table. But the query is quite resource-heavy: after the query has run for a few minutes, the connection is lost. Its happening about 21-22k MS into the execution of the query, after which I have to start the DBMS manually again. How should I go about solving this issue?
The error message is as follows:
[Err] server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Additionally, here is the psql error log:
2013-07-03 05:33:06 AZOST HINT: In a moment you should be able to reconnect to the database and repeat your command.
2013-07-03 05:33:06 AZOST WARNING: terminating connection because of crash of another server process
2013-07-03 05:33:06 AZOST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

My guess, reading your problem, is that you are hitting out of memory issues. Craig's suggestion to turn off overcommit is a good one. You may also need to reduce work_mem if this is a big query. This may slow down your query but it will free up memory. work_mem is per operation so a query can use many times that setting.
Another possibility is you are hitting some sort of bug in a C-language module in PostgreSQL. If this is the case, try updating to the latest version of PostGIS etc.

Related

Postgres: some process is cleaning all my table records

I have a Postgres installed in a Centos and another application is using Postgres the save data.
For sometime, and I can't find the reason, all the database tables become empty on the weekends.
I have been searching a lot to try to find some clues of the reason of that behaviour, but logs are not giving me that info.
I am pretty sure the application is not executing anything to clean the records, my thoughts are pointing to some process for some reason in the Postgres side.
The pg_log only shows this warning the day it happens:
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (11 seconds apart)
Apart from that I have no other clues.
Performing a VACUUM ANALYZE VERBOSE it says there is no dead data so it has nothing to delete.
Can you tell me what should I look to get the reason? Should it be any Postgres process to do it?
LOG: checkpoints are occurring too frequently (11 seconds apart)
This log message should also include all the information log_line_prefix tells it to include. So you should set log_line_prefix to include more information, like application name (voluntarily supplied by the client), database username, and host name/IP from which the connection came.
But perhaps more directly at issue, if things are connecting to your database and doing things you don't understand or approve of, it is time to change your passwords.

Terminating connection because of crash of another server process -Postgres

Everytime i run the same query i'm getting this error:
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
I'm using phpPgAdmin 5.1.
A concurrent query in a different database session has crashed its server backend process. As a consequence, the whole database stops and performs crash recovery from the latest checkpoint.
You should look into the database server log to see what the problem is.

Postgresql large table update slows down

I run an update on a large table (e.g. 8 GB). It is a simple update of 3 fields in the table. I had no problems running it under postgresql 9.1, it would take 40-60 minutes but it worked. I run the same query in 9.4 database (freshly created, not upgraded) and it starts the update fine but then slows down. It uses only ~2% CPU, the level if IO is 4-5MB/s and it is sitting there. No locks, no other queries or connections, just this single update SQL on the server.
The SQL is below. "lookup" table has 12 records. The lookup can return only one row, it breaks a discrete scale (SMALLINT, -32768 .. +32767) into non-overlapping regions. "src" and "dest" tables are ~60 million records.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = (SELECT lookup.id FROM lookup WHERE src.value BETWEEN lookup.min AND lookup.max)
FROM src
WHERE dest.id = src.id;
I thought my disk slowed down but I can copy 1 GB files in parallel to query execution and it runs fast at >40MB/s and I have only one disk (it is a VM with ISCSI media). All other disk operations are not impacted, there is plenty of IO bandwidth. At the same time PostgreSQL is just sitting there doing very little, running very slowly.
I have 2 virtualized linux servers, one runs postgresql 9.1 and another runs 9.4. Both servers have close to identical postgresql configuration.
Has anyone else had similar experience? I am running out of ideas. Help.
Edit
The query "ran" for 20 hours I had to kill the connections and restart the server. Surprisingly it didn't kill the connection via query:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND datname = current_database();
and sever produced the following log:
2015-05-21 12:41:53.412 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT STATEMENT: UPDATE <... this is 60,000,000 record table update statement>
Also server restart took long time, producing the following log:
2015-05-21 12:43:36.730 EDT LOG: received fast shutdown request
2015-05-21 12:43:36.730 EDT LOG: aborting any active transactions
2015-05-21 12:43:36.730 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.734 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.747 EDT LOG: autovacuum launcher shutting down
2015-05-21 12:44:36.801 EDT LOG: received immediate shutdown request
2015-05-21 12:44:36.815 EDT WARNING: terminating connection because of crash of another server process
2015-05-21 12:44:36.815 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
"The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory" - is this an indication of a bug in PostgreSQL?
Edit
I tested 9.1, 9.3 and 9.4. Both 9.1 and 9.3 don't experience the slow down. 9.4 consistently slows down on large transactions. I noticed that when a transaction starts htop monitor indicates high CPU and the process status is "R" (running). Then it gradually changes to low CPU usage and status "D" - disk (see screenshot ). My biggest question is why 9.4 is different from 9.1 and 9.3? I have a dozen of servers and this effect is observed across the board.
Thanks everyone for the help. No matter how much I tried to emphasize on the difference of performance between identical configuration of 9.4 and previous versions no one seemed to pay attention to that.
The problem was solved by disabling transparent huge pages:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
Here are some resources I found helpful in reserching the issue:
* https://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit/34169#34169
* https://lwn.net/Articles/591723/
* https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
I'd suspect a lot of disk seeking - 5MB/s is just about right for a very random IO on ordinary (spinning) hard drive.
As you constantly replace basically all your rows I'd try to set dest table fillfactor to about 45% (alter table dest set (fillfactor=45);) and then cluster test using test_pkey;. This would allow updated row versions to be placed in the same disk sector.
Additionally using cluster src using src_pkey; so both tables would have data in the same physical order on disk also can help.
Also remember to vacuum table dest; after every update that large, so old row versions could be used again in subsequent updates.
Your old server probably evolved it's fillfactor naturally during multiple updates. On new server it is packed 100%, so updated rows have to be placed at the end.
If only few of the target rows are actually updated, you can avoid new row versions to be generated by using DISTICNT FROM. This can prevent a lot of useless disk traffic.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = lu.id
FROM src
JOIN lookup lu ON src.value BETWEEN lu.min AND lu.max
WHERE dest.id = src.id
-- avoid unnecessary row versions to be generated
AND (dest.field1 IS DISTINCT FROM src.field1
OR dest.field1 IS DISTINCT FROM src.field1
OR dest.field3_id IS DISTINCT FROM lu.id
)
;

psql seems to timeout with long queries

I am performing a bulk copy into postgres with about 80GB of data.
\copy my_table FROM '/path/csv_file.csv' csv DELIMITER ','
Before the transaction is committed I get the following error.
Server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
In the PostgreSQL logs:
LOG:server process (PID 21122) was terminated by signal 9: Killed
LOG:terminating any other active server processes
WARNING:terminating connection because of crash of another server process
DETAIL:The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
Your backend process receiving a signal 9 (SIGKILL). This can happen if:
Somebody sends a kill -9 manually;
A cron job is set up to send kill -9 under some circumstances (very unsafe, do not do this); or
the Linux out-of-memory (OOM) killer triggers and terminates the process.
In the latter case you will see reports of OOM killer activity in the kernel's dmesg output. I expect this is what you'll see in your case.
PostgreSQL servers should be configured without virtual memory overcommit so that the OOM killer does not run and PostgreSQL can handle out-of-memory conditions its self. See the PostgreSQL documentation on Linux memory overcommit.
The separate question "why is this using so much memory" remains. Answering that requires more knowledge of your setup: how much RAM the server has, how much of it is free, what your work_mem and maintenance_work_mem settings are, etc. It isn't a very interesting problem to look into until you upgrade to the current PostgreSQL 8.4 patch release to make sure the problem isn't one that's already fixed.

Timeout of remote connection to Postgresql

I have two EC2 instances, one of them needs to insert large amounts of data into a Postgresql db that lives on the other one. Incidentally it's Open Street Map data and I am using the osm2pgsql utility to insert the data, not sure if that's relevant, I don't think so.
For smaller inserts everything is fine but for very large inserts something times out after around 15 minutes and the operation fails with:
COPY_END for planet_osm_point failed: SSL SYSCALL error: Connection timed out
Is this timeout enforced by Postgresql, Ubuntu or AWS? Not too sure where to start troubleshooting.
Thanks
Could be caused by renegotiation. Check the log, and maybe tweak
ssl_renegotiation_limit = 512MB (the default)
setting it to zero will disable negotiation