MongoDB: out of memory - mongodb

I am wondering about the MongoDB memory consumption. I have read the corresponding manual sections and the other questions on the topic, but I think this situation is different. May I ask you for your advice?
This is the error from the DB log file:
Fri Oct 26 20:34:00 [conn1] ERROR: mmap private failed with out of memory. (64 bit build)
Fri Oct 26 20:34:00 [conn1] Assertion: 13636:file /docdata/mongodb/data/xxx_letters.5 open/create failed in createPrivateMap (look in log for more information)
These are the data files:
total 4.0G
drwxr-xr-x 2 mongodb mongodb 4.0K 2012-10-26 20:21 journal
-rw------- 1 mongodb mongodb 64M 2012-10-25 19:34 xxx_letters.0
-rw------- 1 mongodb mongodb 128M 2012-10-20 22:10 xxx_letters.1
-rw------- 1 mongodb mongodb 256M 2012-10-24 09:10 xxx_letters.2
-rw------- 1 mongodb mongodb 512M 2012-10-26 10:04 xxx_letters.3
-rw------- 1 mongodb mongodb 1.0G 2012-10-26 19:56 xxx_letters.4
-rw------- 1 mongodb mongodb 2.0G 2012-10-03 11:32 xxx_letters.5
-rw------- 1 mongodb mongodb 16M 2012-10-26 19:56 xxx_letters.ns
This is the output of free -tm:
total used free shared buffers cached
Mem: 3836 3804 31 0 65 2722
-/+ buffers/cache: 1016 2819
Swap: 4094 513 3581
Total: 7930 4317 3612
Is it really necessary to have enough system memory so that the largest data files fit in? Why grow the files that much? (From the sequence shown above, I expect the next file to be 4GB.) I'll try to extend the RAM, but data will eventually grow even more. Or maybe this is not a memory problem at all?
I have got a 64 bit Linux system and use the 64 bit MongoDB 2.0.7-rc1. There is plenty of disk space, the CPU load is 0.0. This is uname -a:
Linux xxx 2.6.32.54-0.3-default #1 SMP 2012-01-27 17:38:56 +0100 x86_64 x86_64 x86_64 GNU/Linux

ulimit -a solved the mystery:
core file size (blocks, -c) 1
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30619
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) 3338968
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30619
virtual memory (kbytes, -v) 6496960
file locks (-x) unlimited
It worked after setting max memory size and virtual memory to unlimited and restarting everything. BTW, the next file had again 2GB.
Sorry for bothering you, but I was desperate. Maybe this helps somebody "googling" with a similar problem.

Related

Kafka too many open files, many tiny logs, with high ulimit and about 5k segments

It keeps having Kafka reporting "Too many open files". I just restarted clean, but after 10 minutes or so I end up with
lsof | grep cp-kafka | wc -l:
454225
process limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 96186 96186 processes
Max open files 800000 800000 files
Max locked memory 16777216 16777216 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 96186 96186 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
I have set retention.hours to -1, as I want to keep all logs from the past. In my server.properties I had segment files of 100mb, but for some reason, Kafka makes 10mb logs. The strange thing is, I "only" have a relatively low number of files in the log directory.
find | wc -l
5884
I don't understand what I am doing wrong here.
I installed the confluent-kafka deb packages on Ubuntu 18.04.
kafka 2.0
messages are about 500bytes each
auto create topic is true
One directory, are my messages too small for the timeindex?
rw-r--r-- 1 2.2K Sep 30 10:03 00000000000000000000.index
rw-r--r-- 1 1.2M Sep 30 10:03 00000000000000000000.log
rw-r--r-- 1 3.3K Sep 30 10:03 00000000000000000000.timeindex
rw-r--r-- 1 560 Sep 30 10:03 00000000000000004308.index
rw-r--r-- 1 293K Sep 30 10:03 00000000000000004308.log
rw-r--r-- 1 10 Sep 30 10:03 00000000000000004308.snapshot
rw-r--r-- 1 840 Sep 30 10:03 00000000000000004308.timeindex
rw-r--r-- 1 10M Sep 30 10:03 00000000000000005502.index
rw-r--r-- 1 97K Sep 30 10:04 00000000000000005502.log
rw-r--r-- 1 10 Sep 30 10:03 00000000000000005502.snapshot
rw-r--r-- 1 10M Sep 30 10:03 00000000000000005502.timeindex
Also added the following lines in server config; index remain 10Mb max
log.segment.bytes=1073741824
log.segment.index.bytes=1073741824
BTW, I am sending messages with timestamps in the past, with log retention of 1000 years.

MongoDB Install Simple Test Ops Manager java.lang.OutOfMemoryError on startup

I just installed a test evaluation of MongoDB Ops Manager and get an error on startup of the Backup HTTP server:
Migrate MMS data
Running migrations...[ OK ]
Start MMS server
Instance 0 starting..........[ OK ]
Start Backup HTTP Server
Instance 0 starting.......[FAILED]
2015-05-07T14:00:32.107+0000 [main] gid ERROR ServerMain:199 - Cannot start bslurp server [FATAL-EXITING] - instance: 0 - msg: unable to create new native thread
java.lang.OutOfMemoryError: unable to create new native thread
I appear to have plenty of memory
[root#krh60621 ~]# free -m
total used free shared buffers cached
Mem: 15951 4588 11362 0 364 2021
and I upped the max processes to unlimited to see if that would help....
[root#krh60621 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127421
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 94000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root#krh60621 ~]# ps -eLF| grep -c java
593
[root#krh60621 ~]# ps -eLF| wc -l
1031
Any thoughts???
I encountered a similar issue in our Test Ops Manager deployment when we upgraded to Ops Manager 1.8.0. I ultimately opened up a ticket with MongoDB Support and this was the resolution for our issue:
The Ops Manager components are launched using the default username "mongodb-mms". Please adjust the ulimit settings for this user to match those of the "mongodb" user, currently defined in /etc/security/limits.d/99-mongodb-mms-automation-agent.conf.
You may wish to add a separate file under /etc/security/limits.d/ for the mongodb-mms user.
More information can be found here.

Postgres gets out of memory errors despite having plenty of free memory

I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres or some of the clients that are connected to it. What's weird is that when this happens, free still reports over 500MB of free memory.
select version();:
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
uname -a:
Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Postgresql.conf (everything else is commented out/default):
max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0
I got these values from pgtune (I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).
/etc/sysctl.conf:
kernel.shmmax=1050451968
kernel.shmall=256458
vm.overcommit_ratio=100
vm.overcommit_memory=2
I first changed overcommit_memory to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.
I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:
2015-04-07 05:32:39 UTC ERROR: out of memory
2015-04-07 05:32:39 UTC DETAIL: Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT: automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:58 UTC ERROR: out of memory
2015-04-07 05:33:58 UTC DETAIL: Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:59 UTC ERROR: out of memory
2015-04-07 05:33:59 UTC DETAIL: Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT: SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used
The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run very often, so I'm not sure if the issues are because of this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00
This is what ulimit -a for the postgres user looks like:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15956
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15956
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'll try and get the exact numbers from free next time there's a crash, in the meantime this is a braindump of all the info I have.
Any ideas on where to go from here?
I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.
I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.
I created it in the middle of the import. psql made it a lot further before crashing. It made it through 5 additional tables.
I think there must be a bug allocating memory in psql.
Can you check if there's any swap memory available when the error raises up?
I've remove completely the swap memory in my Linux desktop (just for testing other things...) and I got the exactly same error! I'm pretty sure that this is what is going on with you too.
It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?
Output of free command at the time of crash would be useful as well as the content of /proc/meminfo
Beware that setting overcommit_memory to 2 is not so effective if you see the overcommit_ratio to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.
You should probably set overcommit_ratio to 50.

Too many open files while ensure index mongo

I would like to create text index on mongo collection. I write
db.test1.ensureIndex({'text':'text'})
and then i saw in mongod process
Sun Jan 5 10:08:47.289 [conn1] build index library.test1 { _fts: "text", _ftsx: 1 }
Sun Jan 5 10:09:00.220 [conn1] Index: (1/3) External Sort Progress: 200/980 20%
Sun Jan 5 10:09:13.603 [conn1] Index: (1/3) External Sort Progress: 400/980 40%
Sun Jan 5 10:09:26.745 [conn1] Index: (1/3) External Sort Progress: 600/980 61%
Sun Jan 5 10:09:37.809 [conn1] Index: (1/3) External Sort Progress: 800/980 81%
Sun Jan 5 10:09:49.344 [conn1] external sort used : 5547 files in 62 secs
Sun Jan 5 10:09:49.346 [conn1] Assertion: 16392:FileIterator can't open file: data/_tmp/esort.1388912927.0//file.233errno:24 Too many open files
I work on MaxOSX 10.9.1.
Please help.
NB: This solution does/may not work with recent Mac OSs (comments indicate >10.13?). Apparently, changes have been made for security purposes.
Conceptually, the solution applies - following are a few sources of discussion:
https://wilsonmar.github.io/maximum-limits/
https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c
https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1
--
I've had the same problem (executing a different operation, but still, a "Too many open files" error), and as lese says, it seems to be down to the 'maxfiles' limit on the machine running mongod.
On a mac, it is better to check limits with:
sudo launchctl limit
This gives you:
<limit name> <soft limit> <hard limit>
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 709 1064
maxfiles 1024 2048
What I did to get around the problem was to temporarily set the limit higher (mine was originally something like soft: 256, hard: 1000 or something weird like that):
sudo launchctl limit maxfiles 1024 2048
Then re-run the query/indexing operation and see if it breaks. If not, and to keep the higher limits (they will reset when you log out of the shell session you've set them on), create an '/etc/launchd.conf' file with the following line:
limit maxfiles 1024 2048
(or add that line to your existing launchd.conf file, if you already have one).
This will set the maxfile via launchctl on every shell at login.
I added a temporary ulimit -n 4096 before the restore command.
also you can use
mongorestore --numParallelCollections=1 ... and that seems to help.
But still the connection pool seems to get exhausted.
it may be related to this
try to check your system configuration issuing the following command in terminal
ulimit -a

mongodb : Can create new thread on FreeBSD?

We experienced some strange thing in our mongodb gridfs platform.
The platform actually is a bi Xeon E5 (bi quad core) with 128GB of memory, running on freebsd 9 with a zfs pool dedicated for mongodb.
[root#mongofile1 ~]# uname -sr
FreeBSD 9.1-RELEASE
our /boot/loader.conf
vfs.zfs.arc_min="2048M"
vfs.zfs.arc_max="7680M"
vm.kmem_size_max="16G"
vm.kmem_size="12G"
vfs.zfs.prefetch_disable="1"
kern.ipc.nmbclusters="32768"
/etc/sysctl.conf
net.inet.tcp.msl=15000
net.inet.tcp.keepidle=300000
kern.ipc.nmbclusters=32768
kern.ipc.maxsockbuf=2097152
kern.ipc.somaxconn=8192
kern.maxfiles=65536
kern.maxfilesperproc=32768
net.inet.tcp.delayed_ack=0
net.inet.tcp.sendspace=65535
net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=57344
net.local.stream.recvspace=65535
net.local.stream.sendspace=65535
we follow the recommandation for the ulimit :
[root#mongofile1 ~]# su - mongodb
$ ulimit -a
cpu time (seconds, -t) unlimited
file size (512-blocks, -f) unlimited
data seg size (kbytes, -d) 33554432
stack size (kbytes, -s) 524288
core file size (512-blocks, -c) unlimited
max memory size (kbytes, -m) unlimited
locked memory (kbytes, -l) unlimited
max user processes (-u) 5547
open files (-n) 32768
virtual mem size (kbytes, -v) unlimited
swap limit (kbytes, -w) unlimited
sbsize (bytes, -b) unlimited
pseudo-terminals (-p) unlimited
This server have a twin( same config exactly) for ReplSet in other datacenter and we have a virtualized arbiter.
Some time, almost 3 days, the process of mongodb exit
The problem begin with
Fri Nov 8 11:27:31.741 [conn774697] end connection 192.168.10.162:47963 (23 connections now open)
Fri Nov 8 11:27:31.770 [initandlisten] can't create new thread, closing connection
Fri Nov 8 11:27:31.771 [rsHealthPoll] replSet member mongofile2:27017 is now in state DOWN
Fri Nov 8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.162:47968 #774702 (20 connections now open)
Fri Nov 8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.161:28522 #774703 (21 connections now open)
Fri Nov 8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.164:15406 #774704 (22 connections now open)
Fri Nov 8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.163:25750 #774705 (23 connections now open)
Fri Nov 8 11:27:31.810 [initandlisten] connection accepted from 192.168.10.182:20779 #774706 (24 connections now open)
Fri Nov 8 11:27:31.855 [initandlisten] connection accepted from 192.168.10.161:28524 #774707 (25 connections now open)
Fri Nov 8 11:27:31.869 [initandlisten] connection accepted from 192.168.10.182:20786 #774708 (26 connections now open)
and after many "can create new thread"
[root#mongofile1 /usr/mongodb]# tail -n 15000 mongod.log.old |grep "create new thread"|wc
5020 55220 421680
and finish by a magnificent
Fri Nov 8 11:30:22.333 [rsMgr] replSet warning caught unexpected exception in electSelf()
pure virtual method called
Fri Nov 8 11:30:22.333 Got signal: 6 (Abort trap: 6).
Fri Nov 8 11:30:22.337 Backtrace:
0x599efc 0x8035cb516
0x599efc <_ZN5mongo10abruptQuitEi+988> at /usr/local/bin/mongod
0x8035cb516 <_pthread_sigmask+918> at /lib/libthr.so.3
Extract of mongodb from top
78126 mongodb 77 20 0 1253G 1449M sbwait 0 0:20 0.00% mongod
If i restart the process when it crash, the problem is fixed for almost 3 days ;-) ( Windows Method is not the solution ...)
Do you already see that ?
The fact that the error message says "pure virtual method called" would indicate that this is at least a bug in the mongodb code. The code that generates that message is here, I think.