MongoDB fails to rotate logs, keeping deleted files open

MongoDB fails to rotate logs, keeping deleted files open - mongodb

I have a similar problem to what was reported in [1], in that the MongoDB log files are being kept open as deleted when a log rotation happens. However, I believe the underlying causes are quite different, so I created a new question. The long and short of it is that when this happens, which is not all the time, I end up with no Mongo logs at all; and sometimes the deleted log file is kept for so long it becomes an issue as its size becomes too big.
Unlike [1], I have setup log rotation directly in Mongo [2]. It is done as follows:
systemLog:
verbosity: 0
destination: file
path: "/SOME_PATH/mongo.log"
logAppend: true
timeStampFormat: iso8601-utc
logRotate: reopen
In terms of the setup: I am running MongoDB 4.2.9 (WireTiger) on RHEL 7.4. The database sits on an XFS filesystem. The mount options we have for XFS are as follows:
rw,nodev,relatime,seclabel,attr2,inode64,noquota
Any pointers as to what could be causing this behaviour would be greatly appreciated. Thanks in advance for your time.
Update 1
Thanks to everyone for all the pointers. I now understand better the configuration, but I still think there is something amiss. To recap, In addition to the settings telling MongoDB to reopen the file on log rotate, I am also using the logrotate command. The configuration is fairly similar to what is suggested in [3]:
# rotate log every day
daily
# or if size limit exceeded
size 100M
# number of rotations to keep
rotate 5
# don't error if log is missing
missingok
# don't rotate if empty
notifempty
# compress rotated log file
compress
# permissions of rotated logs
create 644 SOMEUSER SOMEGROUP
# run post-rotate once per rotation, not once per file (see 'man logrotate')
sharedscripts
# 1. Signal to MongoDB to start a new log file.
# 2. Delete the empty 0-byte files left from compression.
postrotate
/bin/kill -SIGUSR1 $(cat /SOMEDIR/PIDFILE.pid 2> /dev/null) > /dev/null 2>&1
find /SOMEDIR/ -size 0c -delete
endscript
The main difference really is the slightly more complex postrotate command, though it does seem semantically to do the same as in [3], e.g.:
kill -USR1 $(/usr/sbin/pidof mongod)
At any rate, what seems to be happening with the present log rotate configuration is that, very infrequently, MongoDB appears to get the SIGUSR1 but does not create a new log file. I stress the "seems/appears" as I do not have any hard evidence of this since its a very tricky problem to replicate under a controlled environment. But we can compare the two scenarios. I see that the log rolling is working in the majority of cases:
-rw-r--r--. 1 SOMEUSER SOMEGROUP 34M May 5 10:51 mongo.log
-rw-------. 1 SOMEUSER SOMEGROUP 9.2M Feb 25 2020 mongo.log-20200225.gz
-rw-------. 1 SOMEUSER SOMEGROUP 8.3M Nov 17 03:39 mongo.log-20201117.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 8.6M Jan 30 03:19 mongo.log-20210130.gz
-rw-------. 1 SOMEUSER SOMEGROUP 8.6M Feb 27 03:31 mongo.log-20210227.gz
...
However, on occasion it seems that instead of creating a new log file, MongoDB keeps hold of the deleted file handle (note the missing mongo.log):
$ ls -lh
total 74M
-rw-r--r--. 1 SOMEUSER SOMEGROUP 18M Feb 17 03:29 mongo.log-20210217.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 18M Feb 18 03:11 mongo.log-20210218.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 18M Feb 19 03:41 mongo.log-20210219.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 15M Feb 20 03:07 mongo.log-20210220.gz
-rw-r--r--. 1 SOMEUSER SOMEGROUP 6.5M Mar 13 03:41 mongo.log-20210313.gz
$ lsof -p SOMEPID | grep deleted | numfmt --field=7 --to=iec
mongod SOMEPID SOMEUSER 679w REG 253,5 106M 1191182408 /SOMEDIR/mongo.log (deleted)
Its not entirely clear to me how would one get more information on what MongoDB is doing upon receiving the SIGUSR1 signal. I also noticed that I get a lot of successful rotations before hitting the issue - may just be a coincidence, but I wonder if its the final rotation that is causing the problem (e.g. rotate 5). I'll keep on investigating but any additional pointers are most welcome. Thanks in advance.
[1] SO: MongoDB keeps deleted log files open after rotation
[2] MongoDB docs: Rotate Log Files
[3] How can I disable the logging of MongoDB?

Related

openshift postgres persistent volume permissions

The postgres image I am currently deploying with openshift is generally working great. However I need to persistently store the database data (of course) and to do so i created a persistent volume claim and mounted it to the postgres data directory like so:
- mountPath: /var/lib/pgsql/data/userdata
name: db-storage-volume
and
- name: db-storage-volume
persistentVolumeClaim:
claimName: db-storage
The problem I am facing now is that the initdb script wants to change the permission of that data folder, but it cant and the directory is assigned to a very weird user/group, as the output of ls -la /var/lib/pgsql/data indicates (including the failing command output):
total 12
drwxrwxr-x. 3 postgres root 21 Aug 30 13:06 .
drwxrwx---. 3 postgres root 17 Apr 5 09:55 ..
drwxrwxrwx. 2 nobody nobody 12288 Jun 26 11:11 userdata
chmod: changing permissions of '/var/lib/pgsql/data/userdata': Permission denied
How can I handle this? I mean the permissions are enough to read/write but initdb (and the base images initialization functions) really want to change the permission of that folder.

Just as I had sent my question I had an idea and it turns out it worked:
Change the mount to the parent folder /var/lib/pgsql/data/
Modify my entry script to include a mkdir /var/lib/pgsql/data/userdata when it runs first (aka the folder does not exist yet)
Now it is:
total 16
drwxrwxrwx. 3 nobody nobody 12288 Aug 30 13:19 .
drwxrwx---. 3 postgres root 17 Apr 5 09:55 ..
drwxr-xr-x. 2 1001320000 nobody 4096 Aug 30 13:19 userdata
Which works. Notice that the folder itself is still owned by nobody:nobody and is 777, but the created userdata folder is owned by the correct user.

How to purge zookeeper logs with PurgeTxnLog?

Zookeeper's rapidly pooping its internal binary files all over our production environment.
According to: http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
and
http://dougchang333.blogspot.com/2013/02/zookeeper-cleaning-logs-snapshots.html
this is expected behavior and you must call org.apache.zookeeper.server.PurgeTxnLog
regularly to rotate its poop.
So:
% ls -l1rt /tmp/zookeeper/version-2/
total 314432
-rw-r--r-- 1 root root 67108880 Jun 26 18:00 log.1
-rw-r--r-- 1 root root 947092 Jun 26 18:00 snapshot.e99b
-rw-r--r-- 1 root root 67108880 Jun 27 05:00 log.e99d
-rw-r--r-- 1 root root 1620918 Jun 27 05:00 snapshot.1e266
... many more
% sudo java -cp zookeeper-3.4.6.jar::lib/jline-0.9.94.jar:lib/log4j-1.2.16.jar:lib/netty-3.7.0.Final.jar:lib/slf4j-api-1.6.1.jar:lib/slf4j-log4j12-1.6.1.jar:conf \
org.apache.zookeeper.server.PurgeTxnLog \
/tmp/zookeeper/version-2 /tmp/zookeeper/version-2 -n 3
but I get:
% ls -l1rt /tmp/zookeeper/version-2/
... all the existing logs plus a new directory
/tmp/zookeeper/version-2/version-2
Am I doing something wrong?
zookeeper-3.4.6/

ZooKeeper now has an Autopurge feature as of 3.4.0. Take a look at https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
It says you can use autopurge.snapRetainCount and autopurge.purgeInterval
autopurge.snapRetainCount
New in 3.4.0: When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. Minimum value is 3.
autopurge.purgeInterval
New in 3.4.0: The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.

Since I'm not hearing a fix via Zookeeper, this was an easy workaround:
COUNT=6
DATADIR=/tmp/zookeeper/version-2/
ls -1drt ${DATADIR}/* | head --lines=-${COUNT} | xargs sudo rm -f
Should run once a day from a cron job or jenkins to prevent zookeeper from exploding.

You need to specify the parameter dataDir and snapDir with the value that is configured as dataDir in your .properties file of zookeeper.
If your configuration looks like the following.
dataDir=/data/zookeeper
You need to call PurgeTxnLog (version 3.5.9) like the following if you want to keep the last 10 logs/snapshots
java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf org.apache.zookeeper.server.PurgeTxnLog /data/zookeeper /data/zookeeper -n 10

Too many open files while ensure index mongo

I would like to create text index on mongo collection. I write
db.test1.ensureIndex({'text':'text'})
and then i saw in mongod process
Sun Jan 5 10:08:47.289 [conn1] build index library.test1 { _fts: "text", _ftsx: 1 }
Sun Jan 5 10:09:00.220 [conn1] Index: (1/3) External Sort Progress: 200/980 20%
Sun Jan 5 10:09:13.603 [conn1] Index: (1/3) External Sort Progress: 400/980 40%
Sun Jan 5 10:09:26.745 [conn1] Index: (1/3) External Sort Progress: 600/980 61%
Sun Jan 5 10:09:37.809 [conn1] Index: (1/3) External Sort Progress: 800/980 81%
Sun Jan 5 10:09:49.344 [conn1] external sort used : 5547 files in 62 secs
Sun Jan 5 10:09:49.346 [conn1] Assertion: 16392:FileIterator can't open file: data/_tmp/esort.1388912927.0//file.233errno:24 Too many open files
I work on MaxOSX 10.9.1.
Please help.

NB: This solution does/may not work with recent Mac OSs (comments indicate >10.13?). Apparently, changes have been made for security purposes.
Conceptually, the solution applies - following are a few sources of discussion:
https://wilsonmar.github.io/maximum-limits/
https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c
https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1
--
I've had the same problem (executing a different operation, but still, a "Too many open files" error), and as lese says, it seems to be down to the 'maxfiles' limit on the machine running mongod.
On a mac, it is better to check limits with:
sudo launchctl limit
This gives you:
<limit name> <soft limit> <hard limit>
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 709 1064
maxfiles 1024 2048
What I did to get around the problem was to temporarily set the limit higher (mine was originally something like soft: 256, hard: 1000 or something weird like that):
sudo launchctl limit maxfiles 1024 2048
Then re-run the query/indexing operation and see if it breaks. If not, and to keep the higher limits (they will reset when you log out of the shell session you've set them on), create an '/etc/launchd.conf' file with the following line:
limit maxfiles 1024 2048
(or add that line to your existing launchd.conf file, if you already have one).
This will set the maxfile via launchctl on every shell at login.

I added a temporary ulimit -n 4096 before the restore command.
also you can use
mongorestore --numParallelCollections=1 ... and that seems to help.
But still the connection pool seems to get exhausted.

it may be related to this
try to check your system configuration issuing the following command in terminal
ulimit -a

FAILED TO WRITE PID installing Zookeeper

I am new to Zookeeper and it has being a real issue to install it and run. I am not sure what is wrong in here but I will explain what I've being doing to make it more clear:
1.- I've followed the installation guide provided by Apache. This means download the Zookeeper distribution (stable release) extracted the file and moved into the home directory.
2.- As I am using Ubuntu 12.04 I've modified the .bashrc file including this:
export ZOOKEEPER_INSTALL=/home/myusername/zookeeper-3.4.5
export PATH=$PATH:$ZOOKEEPER_INSTALL/bin
3.- Create a config file on conf/zoo.cfg
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
and also tried with:
dataDir=/var/log/zookeeper
and
dataDir=/var/bin/zookeeper
4.- When running the start command
zkServer.sh start or `bin/zkServer.sh start` nothing happens and always returns this
JMX enabled by default
Using config: /home/sasuke/zookeeper-3.4.5/bin/../conf/zoo.cfg
mkdir: cannot create directory `/var/zookeeper': Permission denied
Starting zookeeper ... /home/sasuke/zookeeper-3.4.5/bin/zkServer.sh: line 113: /var/zookeeper/zookeeper_server.pid: No such file or directory
FAILED TO WRITE PID
I have Java installed and inside the zookeper directory there is a zookeeper.jar file that I think it's not running.
Checking here on stackoverflow there was a guy that said he could run zookeeper after typing
ssh localhost
But when I try to do it I get this error
ssh: connect to host localhost port 22: Connection refused
Please help. I've being here trying to solve it for too long.
Getting started guide of zookeeper:
http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html
Previous case solved with the shh localhost
Zookeeper: FAILED TO WRITE PID
UPDATE:
The permissions for log are:
drwxr-xr-x 19 root root 4096 Oct 10 07:52 log
and for zookeeper:
drwxr-xr-x 2 zookeeper zookeeper 4096 Mar 23 2012 zookeeper
Should I change any of these?

I have had the same problem. In my case was useful to start Zookeeper and directly specify a configuration file:
/bin/zkServer.sh start conf/zoo.conf

It seems you do not have the required permissions. The /var/log owner is is going to be root. Zookeeper stores the process id and snapshot of data in that directory. The process id of the spawned zookeeper server is stored in a file -zookeeper_server.pid (as of 3.3.6)
If you have root previleges, you could start zookeeper with sudo (root) previleges, it should work but definitely not recommended. Make sure you start zookeeper with the same(or higher) permissions as the owner of the directory.
Create a new directory in your home folder like /home/username/zookeeper-data.
Let dataDir point to that directory and it should work.

The default zookeeper installation (tar extract) comes with the conf file named conf/zoo_sample.cfg while the same extract's bin/zkServer.sh expects the conf file to be called zoo.cfg thereby resulting in a "No such file or dir" and the "failed to write pid" error. So before running zkServer.sh to start or stop zookeeper instance, either:
rename the zoo_sample.cfg in the conf dir to zoo.cfg, or
give the name (and path) to the conf file (as suggested by Ilya Lapitan), or, of course
edit zkServer.sh ;-)

When you create the Directory for dataDir make sure to use the -p option. This will allow subsequent directories to be created as required by the application placing files.
mkdir -p /var/log/zookeeperData
Then set:
dataDir=/var/log/zookeeperData

Seems there's all kinds of reasons this can happen. So many helpful answers here!
For me, I had improper line endings in my zoo.cfg file, and possibly invisible characters, so zookeeper was trying to create directories like /var/zookeeper? and /var/zookeeper\r. Reworking my zoo.cfg a bit fixed it for me, along with deleting zoo_sample.conf.

This happens to me due to low disk space. cause zookeeper cant create pid file inside zookeeper data folder.

I have faced the same issue while starting the zookeeper with this command:
hadoop#ubuntu:~/hadoop/zookeeper/zookeeper-3.4.8$ bin/zkServer.sh
start
ERROR [main] client.ConnectionManager$HConnectionImplementation:
The node /hbase is not in ZooKeeper.
It should have been written by the master. Check the value configured in zookeeper.znode.parent. There could be a mismatch with the one configured in the master.
But running the script as su rectified the issue:
hadoop#ubuntu:~/hadoop/zookeeper/zookeeper-3.4.8$ sudo bin/zkServer.sh
start
ZooKeeper JMX enabled by default Using config:
/home/hadoop/hadoop/zookeeper/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

Go to /usr/local/etc/
You will find zookeeper directory
delete the directory
and restart the server - zkServer start

Change the path give dataDir=/tmp/zookeeper. If it works then its clearly access issues
But its generally not advisable to use tmp directory.

This seems to be an ownership issue; running the following solved this for me.
$ sudo chown -R $USER /var/lib/zookeeper
N.B.
I've outlined my steps below which show the error I was getting (the same as the error in this SO question) and the attempt at trying the solution proposed by a user above, which advised to provide zoo.cfg as an argument.
13:01:29 ✔ ~ :: $ZK/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/../conf/zoo.cfg
Starting zookeeper ... /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/zkServer.sh: line 149: /var/lib/zookeeper/zookeeper_server.pid: Permission denied
FAILED TO WRITE PID
13:01:32 ✘ ~ :: $ZK/bin/zkServer.sh start $ZK/conf/zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/conf/zoo.cfg
Starting zookeeper ... /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/zkServer.sh: line 149: /var/lib/zookeeper/zookeeper_server.pid: Permission denied
FAILED TO WRITE PID
13:04:45 ✔ /var/lib :: ls -la
total 0
drwxr-xr-x 4 root wheel 128 Apr 19 18:55 .
drwxr-xr-x 27 root wheel 864 Apr 19 18:55 ..
drwxr--r-- 3 root wheel 96 Mar 24 15:07 zookeeper
13:04:48 ✔ /var/lib :: echo $USER
tallamjr
13:06:03 ✔ /var/lib :: sudo chown -R $USER zookeeper
Password:
13:06:44 ✔ /var/lib :: ls -la
total 0
drwxr-xr-x 4 root wheel 128 Apr 19 18:55 .
drwxr-xr-x 27 root wheel 864 Apr 19 18:55 ..
drwxr--r-- 3 tallamjr wheel 96 Mar 24 15:07 zookeeper
13:06:48 ✔ ~ :: $ZK/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
REF:
- https://askubuntu.com/questions/6723/change-folder-permissions-and-ownership

For me this solution worked:
I granted the read, write and execute permissions for everyone using the command $sudo chmod 777 foldername for the directory zookeeper by going inside the directory /var (/var/zookeeper).
After executing this command try running the zookeeper. It ran in my case

try to use sudo -E bin/zkServer.sh start

Database size on disk increases as a multiple of the CSV file I mongoimport?

I imported a CSV file which is 230M in total size, the dimensions of the file are 3069055 rows and 13 columns.
The command I used to import was:
mongoimport -d taq -c mycollection --type csv --file myfile.csv --headerline
Before I did this import the taq database was empty. After the import completed (which took 4 minutes), I checked the size of the database files in the mongodb user directory. This is what I see:
-rw------- 1 mongod mongod 64M Jul 23 14:13 taq.0
-rw------- 1 mongod mongod 128M Jul 23 14:10 taq.1
-rw------- 1 mongod mongod 256M Jul 23 14:11 taq.2
-rw------- 1 mongod mongod 512M Jul 23 14:13 taq.3
-rw------- 1 mongod mongod 1.0G Jul 23 14:13 taq.4
-rw------- 1 mongod mongod 2.0G Jul 23 14:13 taq.5
-rw------- 1 mongod mongod 16M Jul 23 14:13 taq.ns
Six taq files have been created, numbered from 0 to 5. The total size of these files is multiple GBs. Why is this, when the CSV file I imported is only 230M? Is this a bug? Or am I missing something?
Cheers.

MongoDB stores data in a totally different format, called BSON, which is going to take up more disk space. Not only do the values need to be stored for each field, it also will have to store the column names again in each document (row). If you have large column names, this can definitely increase the size in MongoDB to be 8 to 10 times of your CSV file. If possible, you can look at shortening your field names if this is too much for you.
MongoDB also preallocates data files for you. For example, the moment it starts adding data to taq.2, it will create taq.3, and similarly when it starts writing into taq.4 it creates tag.5. So in your case, say your 230MB file would create 1.9GB of data, MongoDB has already allocated the 2.0G sized taq.5. This behaviour can be turned off by specifying --noprealloc on the command line when starting mongod.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse