I would like to create text index on mongo collection. I write
db.test1.ensureIndex({'text':'text'})
and then i saw in mongod process
Sun Jan 5 10:08:47.289 [conn1] build index library.test1 { _fts: "text", _ftsx: 1 }
Sun Jan 5 10:09:00.220 [conn1] Index: (1/3) External Sort Progress: 200/980 20%
Sun Jan 5 10:09:13.603 [conn1] Index: (1/3) External Sort Progress: 400/980 40%
Sun Jan 5 10:09:26.745 [conn1] Index: (1/3) External Sort Progress: 600/980 61%
Sun Jan 5 10:09:37.809 [conn1] Index: (1/3) External Sort Progress: 800/980 81%
Sun Jan 5 10:09:49.344 [conn1] external sort used : 5547 files in 62 secs
Sun Jan 5 10:09:49.346 [conn1] Assertion: 16392:FileIterator can't open file: data/_tmp/esort.1388912927.0//file.233errno:24 Too many open files
I work on MaxOSX 10.9.1.
Please help.
NB: This solution does/may not work with recent Mac OSs (comments indicate >10.13?). Apparently, changes have been made for security purposes.
Conceptually, the solution applies - following are a few sources of discussion:
https://wilsonmar.github.io/maximum-limits/
https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c
https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1
--
I've had the same problem (executing a different operation, but still, a "Too many open files" error), and as lese says, it seems to be down to the 'maxfiles' limit on the machine running mongod.
On a mac, it is better to check limits with:
sudo launchctl limit
This gives you:
<limit name> <soft limit> <hard limit>
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 709 1064
maxfiles 1024 2048
What I did to get around the problem was to temporarily set the limit higher (mine was originally something like soft: 256, hard: 1000 or something weird like that):
sudo launchctl limit maxfiles 1024 2048
Then re-run the query/indexing operation and see if it breaks. If not, and to keep the higher limits (they will reset when you log out of the shell session you've set them on), create an '/etc/launchd.conf' file with the following line:
limit maxfiles 1024 2048
(or add that line to your existing launchd.conf file, if you already have one).
This will set the maxfile via launchctl on every shell at login.
I added a temporary ulimit -n 4096 before the restore command.
also you can use
mongorestore --numParallelCollections=1 ... and that seems to help.
But still the connection pool seems to get exhausted.
it may be related to this
try to check your system configuration issuing the following command in terminal
ulimit -a
Related
I have a similar problem to what was reported in [1], in that the MongoDB log files are being kept open as deleted when a log rotation happens. However, I believe the underlying causes are quite different, so I created a new question. The long and short of it is that when this happens, which is not all the time, I end up with no Mongo logs at all; and sometimes the deleted log file is kept for so long it becomes an issue as its size becomes too big.
Unlike [1], I have setup log rotation directly in Mongo [2]. It is done as follows:
systemLog:
verbosity: 0
destination: file
path: "/SOME_PATH/mongo.log"
logAppend: true
timeStampFormat: iso8601-utc
logRotate: reopen
In terms of the setup: I am running MongoDB 4.2.9 (WireTiger) on RHEL 7.4. The database sits on an XFS filesystem. The mount options we have for XFS are as follows:
rw,nodev,relatime,seclabel,attr2,inode64,noquota
Any pointers as to what could be causing this behaviour would be greatly appreciated. Thanks in advance for your time.
Update 1
Thanks to everyone for all the pointers. I now understand better the configuration, but I still think there is something amiss. To recap, In addition to the settings telling MongoDB to reopen the file on log rotate, I am also using the logrotate command. The configuration is fairly similar to what is suggested in [3]:
# rotate log every day
daily
# or if size limit exceeded
size 100M
# number of rotations to keep
rotate 5
# don't error if log is missing
missingok
# don't rotate if empty
notifempty
# compress rotated log file
compress
# permissions of rotated logs
create 644 SOMEUSER SOMEGROUP
# run post-rotate once per rotation, not once per file (see 'man logrotate')
sharedscripts
# 1. Signal to MongoDB to start a new log file.
# 2. Delete the empty 0-byte files left from compression.
postrotate
/bin/kill -SIGUSR1 $(cat /SOMEDIR/PIDFILE.pid 2> /dev/null) > /dev/null 2>&1
find /SOMEDIR/ -size 0c -delete
endscript
The main difference really is the slightly more complex postrotate command, though it does seem semantically to do the same as in [3], e.g.:
kill -USR1 $(/usr/sbin/pidof mongod)
At any rate, what seems to be happening with the present log rotate configuration is that, very infrequently, MongoDB appears to get the SIGUSR1 but does not create a new log file. I stress the "seems/appears" as I do not have any hard evidence of this since its a very tricky problem to replicate under a controlled environment. But we can compare the two scenarios. I see that the log rolling is working in the majority of cases:
-rw-r--r--. 1 SOMEUSER SOMEGROUP 34M May 5 10:51 mongo.log
-rw-------. 1 SOMEUSER SOMEGROUP 9.2M Feb 25 2020 mongo.log-20200225.gz
-rw-------. 1 SOMEUSER SOMEGROUP 8.3M Nov 17 03:39 mongo.log-20201117.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 8.6M Jan 30 03:19 mongo.log-20210130.gz
-rw-------. 1 SOMEUSER SOMEGROUP 8.6M Feb 27 03:31 mongo.log-20210227.gz
...
However, on occasion it seems that instead of creating a new log file, MongoDB keeps hold of the deleted file handle (note the missing mongo.log):
$ ls -lh
total 74M
-rw-r--r--. 1 SOMEUSER SOMEGROUP 18M Feb 17 03:29 mongo.log-20210217.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 18M Feb 18 03:11 mongo.log-20210218.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 18M Feb 19 03:41 mongo.log-20210219.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 15M Feb 20 03:07 mongo.log-20210220.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 6.5M Mar 13 03:41 mongo.log-20210313.gz
$ lsof -p SOMEPID | grep deleted | numfmt --field=7 --to=iec
mongod SOMEPID SOMEUSER 679w REG 253,5 106M 1191182408 /SOMEDIR/mongo.log (deleted)
Its not entirely clear to me how would one get more information on what MongoDB is doing upon receiving the SIGUSR1 signal. I also noticed that I get a lot of successful rotations before hitting the issue - may just be a coincidence, but I wonder if its the final rotation that is causing the problem (e.g. rotate 5). I'll keep on investigating but any additional pointers are most welcome. Thanks in advance.
[1] SO: MongoDB keeps deleted log files open after rotation
[2] MongoDB docs: Rotate Log Files
[3] How can I disable the logging of MongoDB?
I have just tried to install MongoDB on a fresh Ubuntu 18 machine.
For this I went through the tutorial from the website.
Everything went fine - including starting the server with
sudo systemctl start mongod
and checking that it runs with:
sudo systemctl status mongod
Only I can't seem to start a mongo console. When I type mongo, I get the following error:
2020-07-17T13:26:48.049+0000 F - [main] Failed to mlock: Cannot allocate locked memory. For more details see: https://dochub.mongodb.org/core/cannot-allocate-locked-memory: Operation not permitted
2020-07-17T13:26:48.049+0000 F - [main] Fatal Assertion 28832 at src/mongo/base/secure_allocator.cpp 255
2020-07-17T13:26:48.049+0000 F - [main]
***aborting after fassert() failure
I checked for the suggested link but there seems to be no limitation problem as resources are not limited (as per check with ulimit). Machine has 16Gb RAM. Any idea what the problem/solution might be?
EDIT: the process limits are:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 64000 64000 processes
Max open files 64000 64000 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 62761 62761 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
I was getting that exact error and that linked mongodb page wasn't helpful for me either. I'm running on FreeBSD and found a useful bit of detail in a bug report for the port. It turns out a system level resource limit was the underlying problem. On FreeBSD, the key are these two sysctl settings:
sysctl vm.stats.vm.v_wire_count vm.max_wired
v_wire_count should be less than max_wired. Increasing max_wired solved the issue for me.
If you use some sort of virtualization to deploy your machine, you need to make sure that the #memlock system calls are allowed. For example, for systemd-nspawn, check this answer:https://stackoverflow.com/a/69286781/16085315
I just had this issue on my FreeBSD VM with mongodb following solved my issue as mentioned previously :
# sysctl vm.stats.vm.v_wire_count vm.max_wired
vm.stats.vm.v_wire_count: 1072281
vm.max_wired: 411615
# sysctl -w vm.max_wired=1400000
vm.max_wired: 411615 -> 1400000
# service mongod restart
Stopping mongod.
Waiting for PIDS: 36308.
Starting mongod.
Add the value for a long term set in /etc/sysctl.conf
vm.max_wired=1400000
v_wire_count should be less than max_wired
centos 6.7
postgresql 9.5.3
I've DB servers that are on master-standby replication.
Suddenly, standby server's postgresql process was stopped with this logs.
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]WARNING: page 1671400 of relation base/16400/559613 is uninitialized
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]CONTEXT: xlog redo Heap2/VISIBLE: cutoff xid 1902107520
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]PANIC: WAL contains references to invalid pages
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]CONTEXT: xlog redo Heap2/VISIBLE: cutoff xid 1902107520
2016-07-14 18:14:21.026 JST [][5783e038.3cd9][0][15577]LOG: startup process (PID 15579) was terminated by signal 6: Aborted
2016-07-14 18:14:21.026 JST [][5783e038.3cd9][0][15577]LOG: terminating any other active server processes
And, master server's postgresql logs were nothing special.
But, master server's /var/log/messages was listed as below.
Jul 14 05:38:44 host kernel: sbridge: HANDLING MCE MEMORY ERROR
Jul 14 05:38:44 host kernel: CPU 8: Machine Check Exception: 0 Bank 9: 8c000040000800c0
Jul 14 05:38:44 host kernel: TSC 0 ADDR 1f7dad7000 MISC 90004000400008c PROCESSOR 0:306e4 TIME 1468442324 SOCKET 1 APIC 20
Jul 14 05:38:44 host kernel: EDAC MC1: CE row 1, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#1": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x1f7dad7000 => socket=1, Channel=0(mask=1), rank=4
Jul 14 05:38:44 host kernel:
Jul 14 18:30:40 host kernel: sbridge: HANDLING MCE MEMORY ERROR
Jul 14 18:30:40 host kernel: CPU 8: Machine Check Exception: 0 Bank 9: 8c000040000800c0
Jul 14 18:30:40 host kernel: TSC 0 ADDR 1f7dad7000 MISC 90004000400008c PROCESSOR 0:306e4 TIME 1468488640 SOCKET 1 APIC 20
Jul 14 18:30:41 host kernel: EDAC MC1: CE row 1, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#1": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x1f7dad7000 => socket=1, Channel=0(mask=1), rank=4
Jul 14 18:30:41 host kernel:
The memory error's started at 1 week ago. So, I doubt the memory error causes postgresql's error.
My question is here.
1) Can memory error of kernel cause postgresql's "WAL contains references to invalid pages" error?
2) Why there is not any logs at master server's postgresql?
thx.
Faulty memory can cause all kinds of data corruption, so that seems like a good enough explanation to me.
Perhaps there are no log entries at the master PostgreSQL server because all that was corrupted was the WAL stream.
You can run
oid2name
to find out which database has OID 16400 and then
oid2name -d <database with OID 16400> -f 559613
to find out which table belongs to file 559613.
Is that table larger than 12 GB? If not, that would mean that page 1671400 is indeed an invalid value.
You didn't tell which PostgreSQL version you are using, but maybe there are replication bugs fixed in later versions that could cause replication problems even without a hardware bug present; read the release notes.
I would perform a new pg_basebackup and reinitialize the slave system.
But what I'd really be worried about is possible data corruption on the master server. Block checksums are cool (turned on if pg_controldata <data directory> | grep checksum gives you 1), but possibly won't detect the effects of memory corruption.
Try something like
pg_dumpall -f /dev/null
on the master and see if there are errors.
Keep your old backups in case you need to repair something!
I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres or some of the clients that are connected to it. What's weird is that when this happens, free still reports over 500MB of free memory.
select version();:
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
uname -a:
Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Postgresql.conf (everything else is commented out/default):
max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0
I got these values from pgtune (I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).
/etc/sysctl.conf:
kernel.shmmax=1050451968
kernel.shmall=256458
vm.overcommit_ratio=100
vm.overcommit_memory=2
I first changed overcommit_memory to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.
I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:
2015-04-07 05:32:39 UTC ERROR: out of memory
2015-04-07 05:32:39 UTC DETAIL: Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT: automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:58 UTC ERROR: out of memory
2015-04-07 05:33:58 UTC DETAIL: Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:59 UTC ERROR: out of memory
2015-04-07 05:33:59 UTC DETAIL: Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT: SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used
The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run very often, so I'm not sure if the issues are because of this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00
This is what ulimit -a for the postgres user looks like:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15956
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15956
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'll try and get the exact numbers from free next time there's a crash, in the meantime this is a braindump of all the info I have.
Any ideas on where to go from here?
I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.
I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.
I created it in the middle of the import. psql made it a lot further before crashing. It made it through 5 additional tables.
I think there must be a bug allocating memory in psql.
Can you check if there's any swap memory available when the error raises up?
I've remove completely the swap memory in my Linux desktop (just for testing other things...) and I got the exactly same error! I'm pretty sure that this is what is going on with you too.
It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?
Output of free command at the time of crash would be useful as well as the content of /proc/meminfo
Beware that setting overcommit_memory to 2 is not so effective if you see the overcommit_ratio to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.
You should probably set overcommit_ratio to 50.
I am wondering about the MongoDB memory consumption. I have read the corresponding manual sections and the other questions on the topic, but I think this situation is different. May I ask you for your advice?
This is the error from the DB log file:
Fri Oct 26 20:34:00 [conn1] ERROR: mmap private failed with out of memory. (64 bit build)
Fri Oct 26 20:34:00 [conn1] Assertion: 13636:file /docdata/mongodb/data/xxx_letters.5 open/create failed in createPrivateMap (look in log for more information)
These are the data files:
total 4.0G
drwxr-xr-x 2 mongodb mongodb 4.0K 2012-10-26 20:21 journal
-rw------- 1 mongodb mongodb 64M 2012-10-25 19:34 xxx_letters.0
-rw------- 1 mongodb mongodb 128M 2012-10-20 22:10 xxx_letters.1
-rw------- 1 mongodb mongodb 256M 2012-10-24 09:10 xxx_letters.2
-rw------- 1 mongodb mongodb 512M 2012-10-26 10:04 xxx_letters.3
-rw------- 1 mongodb mongodb 1.0G 2012-10-26 19:56 xxx_letters.4
-rw------- 1 mongodb mongodb 2.0G 2012-10-03 11:32 xxx_letters.5
-rw------- 1 mongodb mongodb 16M 2012-10-26 19:56 xxx_letters.ns
This is the output of free -tm:
total used free shared buffers cached
Mem: 3836 3804 31 0 65 2722
-/+ buffers/cache: 1016 2819
Swap: 4094 513 3581
Total: 7930 4317 3612
Is it really necessary to have enough system memory so that the largest data files fit in? Why grow the files that much? (From the sequence shown above, I expect the next file to be 4GB.) I'll try to extend the RAM, but data will eventually grow even more. Or maybe this is not a memory problem at all?
I have got a 64 bit Linux system and use the 64 bit MongoDB 2.0.7-rc1. There is plenty of disk space, the CPU load is 0.0. This is uname -a:
Linux xxx 2.6.32.54-0.3-default #1 SMP 2012-01-27 17:38:56 +0100 x86_64 x86_64 x86_64 GNU/Linux
ulimit -a solved the mystery:
core file size (blocks, -c) 1
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30619
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) 3338968
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30619
virtual memory (kbytes, -v) 6496960
file locks (-x) unlimited
It worked after setting max memory size and virtual memory to unlimited and restarting everything. BTW, the next file had again 2GB.
Sorry for bothering you, but I was desperate. Maybe this helps somebody "googling" with a similar problem.