Postgresql - could not create shared memory segment: Cannot allocate memory - postgresql

This question has been asked many times, but the response given / worked for most is not working for me. So, I am suspicious that something else is missing.
OS: $ cat /proc/version
Linux version 2.6.32-431.29.2.el6.x86_64 (mockbuild#x86-026.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Sun Jul 27 15:55:46 EDT 2014
When I try to start PostGreSQL, I get the following error :
[reportadmin /opt/birt/modules/BIRTiHub]$ ./startPostgreSQL.sh
Starting startpostgresql at Fri 05/15/2015 08:03:29
AC_SERVER_HOME=/opt/BIRTiHubFType/modules/BIRTiHub/iHub
2015-05-15 12:03:30 UTC 27379 LOG: loaded library "pg_stat_statements"
2015-05-15 12:03:30 UTC 27379 FATAL: could not create shared memory segment: Cannot allocate memory
2015-05-15 12:03:30 UTC 27379 DETAIL: Failed system call was shmget(key=8433001, size=290635776, 03600).
2015-05-15 12:03:30 UTC 27379 HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space, or exceeded your kernel's SHMALL parameter. You can either reduce the request size or reconfigure the kernel with larger SHMALL. To reduce the request size (currently 290635776 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
The PostgreSQL documentation contains more information about shared memory configuration.
pg_ctl: could not start server
Examine the log output.
Also, I have adjusted the shmmax and shmall values to be greater than and less than (respect) the requested value.
[root /opt/birt]$ sysctl -p
....
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 134217728
In other words:
shmall =134217728
Requested =290635776
shmmax =68719476736
Any idea what could be wrong ?
D

Related

Restore postgres db from folder

i have an old copy of my postgresql db folder (/var/lib/postgresql/9.5/main/) from my server. Now I want to get the data out of the files. So i copied the main folder to my local machine and changed the postgresql config (/etc/postgresql/9.5/main/postgresql.conf) to point to that directory. Also i changed the permission of the main directory to the user postgres. After restarting the postgresql service (sudo service postgresql restart) it doesn't really work.
What I'm doing wrong? (Yea I know, pg_dump is the preferred way, but in this way...)
So my question, does this even work?
Or is there a other way to get the data out of this?
everything is done on ubuntu 16.04.
Edit:
the log file after changing the postgresql.conf file to point to the new directory.
2017-10-13 06:15:43 CEST [968-1] LOG: database system was shut down at 2017-10-13 00:21:04 CEST
2017-10-13 06:15:43 CEST [968-2] LOG: MultiXact member wraparound protections are now enabled
2017-10-13 06:15:43 CEST [959-1] LOG: database system is ready to accept connections
2017-10-13 06:15:43 CEST [975-1] LOG: autovacuum launcher started
2017-10-13 06:15:43 CEST [983-1] [unknown]#[unknown] LOG: incomplete startup packet
2017-10-13 06:47:55 CEST [975-2] LOG: autovacuum launcher shutting down
2017-10-13 06:47:55 CEST [959-2] LOG: received smart shutdown request
2017-10-13 06:47:55 CEST [972-1] LOG: shutting down
2017-10-13 06:47:55 CEST [972-2] LOG: database system is shut down
2017-10-13 06:47:55 CEST [4667-1] FATAL: database files are incompatible with server
2017-10-13 06:47:55 CEST [4667-2] DETAIL: The database cluster was initialized without USE_FLOAT8_BYVAL but the server was compiled with USE_FLOAT8_BYVAL.
2017-10-13 06:47:55 CEST [4667-3] HINT: It looks like you need to recompile or initdb.
Ok that pointed me to this. The server is a armv7l, whereas the local machine is x86_64 (uname -m). So there is no chance to get the data out of it?
thx, Luc
If it's really true that your data directory is from an ARM7l system, and your local system is x86_64, you're going to have some difficulties.
The immediate error about USE_FLOAT8_BYVAL is because ARM7L is 32-bit, and cannot pass 64-bit floating point values (8 byte) by-value. Your 64-bit host can. But if you recompiled a custom postgres with USE_FLOAT8_BYVAL disabled you'd likely just run into other issues.
I suggest installing PostgreSQL on a matching ARM system to recover the data. Data directories for PostgreSQL are not portable across architectures (for performance reasons).
If you do not have access to the ARM system anymore, an emulator like qemu should be able to help you.
Otherwise, maybe you can compile a modified PostgreSQL (probably starting with 32-bit x86) that can read the data-dir, with appropriate configure options etc. I've never needed to try this.

Postgresql unable to start: No space left on device

I have taken dump for the db and it make it run short of memory for Postgresql.
I have then restarted postgresql but it failed to restart.And kept on giving me error
[FAIL] Starting PostgreSQL 9.4 database server: main[....] The PostgreSQL server failed to start. Please check the log output: ... failed!
failed!
and in log file there were following lines
2017-05-05 05:49:25 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device
2017-05-05 05:49:30 UTC LOG: using stale statistics instead of current ones because stats collector is not responding
2017-05-05 05:49:35 UTC LOG: using stale statistics instead of current ones because stats collector is not responding
2017-05-05 05:49:35 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device
2017-05-05 05:49:35 UTC LOG: could not write temporary statistics file "pg_stat_tmp/db_85990.tmp": No space left on device
2017-05-05 05:49:35 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device
2017-05-05 05:49:40 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device
2017-05-05 05:49:40 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device
2017-05-05 05:49:45 UTC LOG: using stale statistics instead of current ones because stats collector is not responding
Please Help me to solve this issue if some one can.
thanks.

WAL contains references to invalid pages

centos 6.7
postgresql 9.5.3
I've DB servers that are on master-standby replication.
Suddenly, standby server's postgresql process was stopped with this logs.
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]WARNING: page 1671400 of relation base/16400/559613 is uninitialized
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]CONTEXT: xlog redo Heap2/VISIBLE: cutoff xid 1902107520
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]PANIC: WAL contains references to invalid pages
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]CONTEXT: xlog redo Heap2/VISIBLE: cutoff xid 1902107520
2016-07-14 18:14:21.026 JST [][5783e038.3cd9][0][15577]LOG: startup process (PID 15579) was terminated by signal 6: Aborted
2016-07-14 18:14:21.026 JST [][5783e038.3cd9][0][15577]LOG: terminating any other active server processes
And, master server's postgresql logs were nothing special.
But, master server's /var/log/messages was listed as below.
Jul 14 05:38:44 host kernel: sbridge: HANDLING MCE MEMORY ERROR
Jul 14 05:38:44 host kernel: CPU 8: Machine Check Exception: 0 Bank 9: 8c000040000800c0
Jul 14 05:38:44 host kernel: TSC 0 ADDR 1f7dad7000 MISC 90004000400008c PROCESSOR 0:306e4 TIME 1468442324 SOCKET 1 APIC 20
Jul 14 05:38:44 host kernel: EDAC MC1: CE row 1, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#1": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x1f7dad7000 => socket=1, Channel=0(mask=1), rank=4
Jul 14 05:38:44 host kernel:
Jul 14 18:30:40 host kernel: sbridge: HANDLING MCE MEMORY ERROR
Jul 14 18:30:40 host kernel: CPU 8: Machine Check Exception: 0 Bank 9: 8c000040000800c0
Jul 14 18:30:40 host kernel: TSC 0 ADDR 1f7dad7000 MISC 90004000400008c PROCESSOR 0:306e4 TIME 1468488640 SOCKET 1 APIC 20
Jul 14 18:30:41 host kernel: EDAC MC1: CE row 1, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#1": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x1f7dad7000 => socket=1, Channel=0(mask=1), rank=4
Jul 14 18:30:41 host kernel:
The memory error's started at 1 week ago. So, I doubt the memory error causes postgresql's error.
My question is here.
1) Can memory error of kernel cause postgresql's "WAL contains references to invalid pages" error?
2) Why there is not any logs at master server's postgresql?
thx.
Faulty memory can cause all kinds of data corruption, so that seems like a good enough explanation to me.
Perhaps there are no log entries at the master PostgreSQL server because all that was corrupted was the WAL stream.
You can run
oid2name
to find out which database has OID 16400 and then
oid2name -d <database with OID 16400> -f 559613
to find out which table belongs to file 559613.
Is that table larger than 12 GB? If not, that would mean that page 1671400 is indeed an invalid value.
You didn't tell which PostgreSQL version you are using, but maybe there are replication bugs fixed in later versions that could cause replication problems even without a hardware bug present; read the release notes.
I would perform a new pg_basebackup and reinitialize the slave system.
But what I'd really be worried about is possible data corruption on the master server. Block checksums are cool (turned on if pg_controldata <data directory> | grep checksum gives you 1), but possibly won't detect the effects of memory corruption.
Try something like
pg_dumpall -f /dev/null
on the master and see if there are errors.
Keep your old backups in case you need to repair something!

Postgres gets out of memory errors despite having plenty of free memory

I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres or some of the clients that are connected to it. What's weird is that when this happens, free still reports over 500MB of free memory.
select version();:
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
uname -a:
Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Postgresql.conf (everything else is commented out/default):
max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0
I got these values from pgtune (I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).
/etc/sysctl.conf:
kernel.shmmax=1050451968
kernel.shmall=256458
vm.overcommit_ratio=100
vm.overcommit_memory=2
I first changed overcommit_memory to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.
I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:
2015-04-07 05:32:39 UTC ERROR: out of memory
2015-04-07 05:32:39 UTC DETAIL: Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT: automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:58 UTC ERROR: out of memory
2015-04-07 05:33:58 UTC DETAIL: Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:59 UTC ERROR: out of memory
2015-04-07 05:33:59 UTC DETAIL: Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT: SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used
The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run very often, so I'm not sure if the issues are because of this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00
This is what ulimit -a for the postgres user looks like:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15956
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15956
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'll try and get the exact numbers from free next time there's a crash, in the meantime this is a braindump of all the info I have.
Any ideas on where to go from here?
I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.
I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.
I created it in the middle of the import. psql made it a lot further before crashing. It made it through 5 additional tables.
I think there must be a bug allocating memory in psql.
Can you check if there's any swap memory available when the error raises up?
I've remove completely the swap memory in my Linux desktop (just for testing other things...) and I got the exactly same error! I'm pretty sure that this is what is going on with you too.
It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?
Output of free command at the time of crash would be useful as well as the content of /proc/meminfo
Beware that setting overcommit_memory to 2 is not so effective if you see the overcommit_ratio to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.
You should probably set overcommit_ratio to 50.

Too many open files while ensure index mongo

I would like to create text index on mongo collection. I write
db.test1.ensureIndex({'text':'text'})
and then i saw in mongod process
Sun Jan 5 10:08:47.289 [conn1] build index library.test1 { _fts: "text", _ftsx: 1 }
Sun Jan 5 10:09:00.220 [conn1] Index: (1/3) External Sort Progress: 200/980 20%
Sun Jan 5 10:09:13.603 [conn1] Index: (1/3) External Sort Progress: 400/980 40%
Sun Jan 5 10:09:26.745 [conn1] Index: (1/3) External Sort Progress: 600/980 61%
Sun Jan 5 10:09:37.809 [conn1] Index: (1/3) External Sort Progress: 800/980 81%
Sun Jan 5 10:09:49.344 [conn1] external sort used : 5547 files in 62 secs
Sun Jan 5 10:09:49.346 [conn1] Assertion: 16392:FileIterator can't open file: data/_tmp/esort.1388912927.0//file.233errno:24 Too many open files
I work on MaxOSX 10.9.1.
Please help.
NB: This solution does/may not work with recent Mac OSs (comments indicate >10.13?). Apparently, changes have been made for security purposes.
Conceptually, the solution applies - following are a few sources of discussion:
https://wilsonmar.github.io/maximum-limits/
https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c
https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1
--
I've had the same problem (executing a different operation, but still, a "Too many open files" error), and as lese says, it seems to be down to the 'maxfiles' limit on the machine running mongod.
On a mac, it is better to check limits with:
sudo launchctl limit
This gives you:
<limit name> <soft limit> <hard limit>
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 709 1064
maxfiles 1024 2048
What I did to get around the problem was to temporarily set the limit higher (mine was originally something like soft: 256, hard: 1000 or something weird like that):
sudo launchctl limit maxfiles 1024 2048
Then re-run the query/indexing operation and see if it breaks. If not, and to keep the higher limits (they will reset when you log out of the shell session you've set them on), create an '/etc/launchd.conf' file with the following line:
limit maxfiles 1024 2048
(or add that line to your existing launchd.conf file, if you already have one).
This will set the maxfile via launchctl on every shell at login.
I added a temporary ulimit -n 4096 before the restore command.
also you can use
mongorestore --numParallelCollections=1 ... and that seems to help.
But still the connection pool seems to get exhausted.
it may be related to this
try to check your system configuration issuing the following command in terminal
ulimit -a