postgres corrupted shared memory - postgresql

i am trying to use postgresql to process queries on GPU using PG_strom.
whenever i try to process queries i got issue:
postmaster has commanded this server process to roll back the current
transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
i have checked that i have to increase shared buffer and work memory these are my current configuartions:
shared_buffers = 10GB
work_mem = 1GB
i want to know how can i increase them to its maximum limit as whenever i try to increase them i couldn't connect to postgresql.

Related

PGSQL could not fork new process for connection

when I try to connect my postgresql database I get an error message "could not fork new process for connection : No error could not fork new process for connection"
You either have exceeded the ulimit of processes for the PostgreSQL user, or you have run out of memory. There are perhaps other causes, but these are the most frequent. You will have to determine the cause and fix the operating system and/or PostgreSQL configuration.
Reduce your max_connections to a reasonable limit like 100 and make sure that work_mem is not excessively high.

How can I get the host name and query that has been executed on the slave postgres database that caused the system into the memory crashed

My slave database has undergone the memory crash(Out of memory) and in recovery stage.
I want to know the query that causes this issue.
I have checked logs I get one query just before the system goes into the recovery mode;But I want to confirm it.
I am using postgres 9.4
If any one has any idea?
If you set log_min_error_statement to error (the default) or lower, you will get the statement that caused the out-of-memory error in the database log.
But I assume that you got hit by the Linux OOM killer, that would cause a PostgreSQL process to be killed with signal 9, whereupon the database goes into recovery mode.
The correct solution here is to disable memory overcommit by setting vm.overcommit_ratio to 2 in /etc/sysctl.conf and activate the setting with sysctl -p (you should then also tune vm.overcommit_ratio correctly).
Then you will get an error rather than a killed process, which is easier to debug.

postgresql consumed 100% of storage

I'm running a Red Hat 4.8.3-9 server in AWS cloud with version 9.4 of Postgresql. The database has consumed 100% of my disk usage. I went in to the database and truncated the table with the most data. After viewing the size of the tables with \d+, there was not any tables over a couple MB's. I ran du -h * --max-depth=1 and found that /var/lib/pgsql94/data/base/16384 held 472G of the 500G total storage. Even after truncating the tables, my disk usage is still 99%. I'm wondering if there is a solution to release the data because I believe deleting all the OID's in 'data/base/16384 would be bad. I have tried stopping, restarting postgres service. I am not allowed to reboot the machine unfortunately.
df -ih shows inode usage is 1%
sudo lsof +L1 does not show any large files at all
Thank you
Log files: 8K worth of repeating string
LOG: could not write temporary statistics file
"pg_stat_tmp/db_16384.tmp": No space left on device
LOG: could not
close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left
on device
LOG: could not close temporary statistics file
"pg_stat_tmp/global.tmp": No space left on device
LOG: using stale
statistics instead of current ones because stats collector is not
responding

Postgresql large table update slows down

I run an update on a large table (e.g. 8 GB). It is a simple update of 3 fields in the table. I had no problems running it under postgresql 9.1, it would take 40-60 minutes but it worked. I run the same query in 9.4 database (freshly created, not upgraded) and it starts the update fine but then slows down. It uses only ~2% CPU, the level if IO is 4-5MB/s and it is sitting there. No locks, no other queries or connections, just this single update SQL on the server.
The SQL is below. "lookup" table has 12 records. The lookup can return only one row, it breaks a discrete scale (SMALLINT, -32768 .. +32767) into non-overlapping regions. "src" and "dest" tables are ~60 million records.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = (SELECT lookup.id FROM lookup WHERE src.value BETWEEN lookup.min AND lookup.max)
FROM src
WHERE dest.id = src.id;
I thought my disk slowed down but I can copy 1 GB files in parallel to query execution and it runs fast at >40MB/s and I have only one disk (it is a VM with ISCSI media). All other disk operations are not impacted, there is plenty of IO bandwidth. At the same time PostgreSQL is just sitting there doing very little, running very slowly.
I have 2 virtualized linux servers, one runs postgresql 9.1 and another runs 9.4. Both servers have close to identical postgresql configuration.
Has anyone else had similar experience? I am running out of ideas. Help.
Edit
The query "ran" for 20 hours I had to kill the connections and restart the server. Surprisingly it didn't kill the connection via query:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND datname = current_database();
and sever produced the following log:
2015-05-21 12:41:53.412 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT STATEMENT: UPDATE <... this is 60,000,000 record table update statement>
Also server restart took long time, producing the following log:
2015-05-21 12:43:36.730 EDT LOG: received fast shutdown request
2015-05-21 12:43:36.730 EDT LOG: aborting any active transactions
2015-05-21 12:43:36.730 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.734 EDT FATAL: terminating connection due to administrator command
2015-05-21 12:43:36.747 EDT LOG: autovacuum launcher shutting down
2015-05-21 12:44:36.801 EDT LOG: received immediate shutdown request
2015-05-21 12:44:36.815 EDT WARNING: terminating connection because of crash of another server process
2015-05-21 12:44:36.815 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
"The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory" - is this an indication of a bug in PostgreSQL?
Edit
I tested 9.1, 9.3 and 9.4. Both 9.1 and 9.3 don't experience the slow down. 9.4 consistently slows down on large transactions. I noticed that when a transaction starts htop monitor indicates high CPU and the process status is "R" (running). Then it gradually changes to low CPU usage and status "D" - disk (see screenshot ). My biggest question is why 9.4 is different from 9.1 and 9.3? I have a dozen of servers and this effect is observed across the board.
Thanks everyone for the help. No matter how much I tried to emphasize on the difference of performance between identical configuration of 9.4 and previous versions no one seemed to pay attention to that.
The problem was solved by disabling transparent huge pages:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
Here are some resources I found helpful in reserching the issue:
* https://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit/34169#34169
* https://lwn.net/Articles/591723/
* https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
I'd suspect a lot of disk seeking - 5MB/s is just about right for a very random IO on ordinary (spinning) hard drive.
As you constantly replace basically all your rows I'd try to set dest table fillfactor to about 45% (alter table dest set (fillfactor=45);) and then cluster test using test_pkey;. This would allow updated row versions to be placed in the same disk sector.
Additionally using cluster src using src_pkey; so both tables would have data in the same physical order on disk also can help.
Also remember to vacuum table dest; after every update that large, so old row versions could be used again in subsequent updates.
Your old server probably evolved it's fillfactor naturally during multiple updates. On new server it is packed 100%, so updated rows have to be placed at the end.
If only few of the target rows are actually updated, you can avoid new row versions to be generated by using DISTICNT FROM. This can prevent a lot of useless disk traffic.
UPDATE dest SET
field1 = src.field1,
field2 = src.field2,
field3_id = lu.id
FROM src
JOIN lookup lu ON src.value BETWEEN lu.min AND lu.max
WHERE dest.id = src.id
-- avoid unnecessary row versions to be generated
AND (dest.field1 IS DISTINCT FROM src.field1
OR dest.field1 IS DISTINCT FROM src.field1
OR dest.field3_id IS DISTINCT FROM lu.id
)
;

psql seems to timeout with long queries

I am performing a bulk copy into postgres with about 80GB of data.
\copy my_table FROM '/path/csv_file.csv' csv DELIMITER ','
Before the transaction is committed I get the following error.
Server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
In the PostgreSQL logs:
LOG:server process (PID 21122) was terminated by signal 9: Killed
LOG:terminating any other active server processes
WARNING:terminating connection because of crash of another server process
DETAIL:The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
Your backend process receiving a signal 9 (SIGKILL). This can happen if:
Somebody sends a kill -9 manually;
A cron job is set up to send kill -9 under some circumstances (very unsafe, do not do this); or
the Linux out-of-memory (OOM) killer triggers and terminates the process.
In the latter case you will see reports of OOM killer activity in the kernel's dmesg output. I expect this is what you'll see in your case.
PostgreSQL servers should be configured without virtual memory overcommit so that the OOM killer does not run and PostgreSQL can handle out-of-memory conditions its self. See the PostgreSQL documentation on Linux memory overcommit.
The separate question "why is this using so much memory" remains. Answering that requires more knowledge of your setup: how much RAM the server has, how much of it is free, what your work_mem and maintenance_work_mem settings are, etc. It isn't a very interesting problem to look into until you upgrade to the current PostgreSQL 8.4 patch release to make sure the problem isn't one that's already fixed.