I did a mongorestore of a gzipped mongodump:
mongorestore -v --drop --gzip --db bigdata /Volumes/Lacie2TB/backup/mongo20170909/bigdata/
But it kept going. I left it, because I figure if I 'just' close it now, my (important) data will be corrupted. Check the percentages:
2017-09-10T14:45:58.385+0200 [########################] bigdata.logs.sets.log 851.8 GB/85.2 GB (999.4%)
2017-09-10T14:46:01.382+0200 [########################] bigdata.logs.sets.log 852.1 GB/85.2 GB (999.7%)
2017-09-10T14:46:04.381+0200 [########################] bigdata.logs.sets.log 852.4 GB/85.2 GB (1000.0%)
And it keeps going!
Note that the other collections have finished. Only this one goes beyond 100%. I do not understand.
This is mongo 3.2.7 on Mac OSX.
There is obviously a problem with the amount of data imported, because there is not even that much diskspace.
$ df -h
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk3 477Gi 262Gi 214Gi 56% 68749708 56193210 55% /
The amount of disk space used could be right, because the gzipped backup is about 200GB. I do not know if this would result in the same amount of data on the WiredTiger database with snappy compression.
However, the log keeps showing inserts:
2017-09-10T16:20:18.986+0200 I COMMAND [conn9] command bigdata.logs.sets.log command: insert { insert: "logs.sets.log", documents: 20, writeConcern: { getLastError: 1, w: 1 }, ordered: false } ninserted:20 keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 19, w: 19 } }, Database: { acquireCount: { w: 19 } }, Collection: { acquireCount: { w: 19 } } } protocol:op_query 245ms
2017-09-10T16:20:19.930+0200 I COMMAND [conn9] command bigdata.logs.sets.log command: insert { insert: "logs.sets.log", documents: 23, writeConcern: { getLastError: 1, w: 1 }, ordered: false } ninserted:23 keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 19, w: 19 } }, Database: { acquireCount: { w: 19 } }, Collection: { acquireCount: { w: 19 } } } protocol:op_query 190ms
update
Disk space is still being consumed. This is roughly 2 hours later, and roughly 30 GB later:
$ df -h
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk3 477Gi 290Gi 186Gi 61% 76211558 48731360 61% /
The question is: Is there a bug in the progress indicator, or is there some kind of loop that keeps inserting the same documents?
Update
It finished.
2017-09-10T19:35:52.268+0200 [########################] bigdata.logs.sets.log 1604.0 GB/85.2 GB (1881.8%)
2017-09-10T19:35:52.268+0200 restoring indexes for collection bigdata.logs.sets.log from metadata
2017-09-10T20:16:51.882+0200 finished restoring bigdata.logs.sets.log (3573548 documents)
2017-09-10T20:16:51.882+0200 done
604.0 GB/85.2 GB (1881.8%)
Interesting. :)
It looks similar to this bug: https://jira.mongodb.org/browse/TOOLS-1579
There seems to be a fix backported to 3.5 and 3.4. The fix might not be backported to 3.2. I'm thinking the problem might have something to do with using gzip and/or snappy compression.
Related
I am trying to read the MongoDB log file located at /var/log/mongodb it's contents are as such:
2019-11-04T05:04:00.390-0800 I COMMAND [conn38649] command loldb.$cmd command: update { update: "SUBSCRIPTION", ordered: true, writeConcern: { w: 1 }, $db: "loldb" } numYields:0 reslen:295 locks:{ Global: { acquireCount: { r: 460, w: 460 } }, Database: { acquireCount: { w: 460 } }, Collection: { acquireCount: { w: 459 } }, oplog: { acquireCount: { w: 1 } } } protocol:op_query 568ms
2019-11-04T05:04:00.396-0800 I COMMAND [conn38657] command loldb.SUBSCRIPTION command: find { find: "SUBSCRIPTION", filter: { customerID: 75824180, policeDepartmentID: 1 }, projection: {}, $readPreference: { mode: "secondaryPreferred" }, $db: "loldb" } planSummary: COLLSCAN keysExamined:0 docsExamined:69998 cursorExhausted:1 numYields:550 nreturned:1 reslen:430 locks:{ Global: { acquireCount: { r: 1102 } }, Database: { acquireCount: { r: 551 } }, Collection: { acquireCount: { r: 551 } } } protocol:op_query 424ms
2019-11-04T05:04:00.402-0800 I COMMAND [conn38735] command loldb.SUBSCRIPTION command: find { find: "SUBSCRIPTION", filter: { customerID: 75824164 }, projection: {}, $readPreference: { mode: "secondaryPreferred" }, $db: "loldb" } planSummary: COLLSCAN keysExamined:0 docsExamined:58142 cursorExhausted:1 numYields:456 nreturned:1 reslen:417 locks:{ Global: { acquireCount: { r: 914 } }, Database: { acquireCount: { r: 457 } }, Collection: { acquireCount: { r: 457 } } } protocol:op_query 374ms
Each blockquote is a single line entry
The contents of file update each second I need to read the file and if the query time protocol:op_query 385ms is more than 300ms I need to save that entire log/line into another text file slow_queries.text.
The file from which I am reading is .log file but the content seem like JSON format (please correct me if I am wrong) preceded by timestamp and command type, is there any efficient way to read data of this format? I am just reading word by word line by line.
Also, what do I do so that the changes made to the .log file are automatically read without running the script every time?
I just tried this on my local machine, maybe needs some work for your usecase. But I added some comments, so maybe this will help you:
EDIT: I added a check for the timestamp, you would have to configure it to your needs
#!/bin/bash
# continously read from the file and pipe it into the while loop
tail -F "test.log" | \
while read LINE
do
# get timestamp from LINE and get time in seconds
timeinseconds="$(grep -P "^\S*" | date -d - +%s)"
# get current timestamp before 5 minutes
timebeforefivemin="$(date -d '-5 minutes' +%s)"
# only log if timestamp of line is smaller to time before 5 min
if [[ $(expr $timeinseconds - $timebeforefivemin) -lt 0 ]];
then
# get the time of the query from the line
querytime="$(echo "$LINE" | grep -oP '\d+ms' | grep -oP '\d+')"
#if the grep was successful and the query time is greater than 300
if [ $? = 0 ] && [ "$querytime" -gt 300 ]
then
# echo the line into the slow_queries file -> change it to the path you want
echo "$LINE" >> slow_queries.txt
fi
fi
done
I'm having trouble with the following findOneAndUpdate MongoDB query:
planSummary: IXSCAN { id: 1 } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keysInserted:1 keysDeleted:1 numYields:0 reslen:3044791
locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } }
storage:{} protocol:op_query 135ms
writeConcern: { w: 0, j: false }
As you can see it has execution time of +100 ms. The query part uses an index and takes less than 1ms (using 'Explain query'). So it's the write part that is slow.
The Mongo instance is the master of a 3 member replica set. Write concern is set to 0 and journaling is disabled.
What could be the cause of the slow write? Could it be the update of indices?
MongoDB version 4.0
Driver: Node.js native mongodb version 3.2
Edit: I think it might be the length of the result. After querying a document smaller in size, the execution time is halved.
reslen:3044791
This was the source of the bad performance. Reducing this by adding a projection option to only return a specific field improved the execution from ~90ms on average to ~7ms.
As per MongoDb documentation the MongoDB shell command:
show dbs
Print a list of all databases on the server.
and
show databases
Print a list of all available databases.
I'm confused - from that what I read and understood these are not the same effect commands - right? show databases is not the alias of the show dbs?
There could be a database listed by show dbs which is not available and not listed by show databases is that right?
If so how it is possible that a database is on the server but is not available - access right of a user? is that what's behind show databases filtering?
I don't think there is a difference between the two commands. Both of the operations call the listDatabases command with the same option.
Increasing the log level, the show dbs command logged:
2018-11-30T15:40:59.539-0800 I COMMAND [conn23] command admin.$cmd appName: "MongoDB Shell" command: listDatabases { listDatabases: 1.0, $clusterTime: { clusterTime: Timestamp(1543621253, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "admin" } numYields:0 reslen:708 locks:{ Global: { acquireCount: { r: 22 } }, Database: { acquireCount: { r: 10 } } } protocol:op_msg 38ms
whereas show databases logged:
2018-11-30T15:41:01.722-0800 I COMMAND [conn23] command admin.$cmd appName: "MongoDB Shell" command: listDatabases { listDatabases: 1.0, $clusterTime: { clusterTime: Timestamp(1543621253, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "admin" } numYields:0 reslen:708 locks:{ Global: { acquireCount: { r: 22 } }, Database: { acquireCount: { r: 10 } } } protocol:op_msg 5ms
For reference, this is from MongoDB 3.6.7.
Tried to initiate a replica set in MongoDB but failed.
My mongod configuration file is as follows:
dbpath=C:\data\db
logpath=C:\data\log\mongo.log
storageEngine=mmapv1
After starting mongod with the command:
mongod --config "C:\data\mongo.conf" --replSet "rs0"
I went to mongo and typed
rs.initiate()
and got the error of "no configuration file specified" (code 8). Also tried to clearly instruct mongodb using
cfg = {"_id": "rs0", "version":1, "members":[{"_id":0,"host":"127.0.0.1:27017"}]}
rs.initiate(cfg)
However, the result is still the same (code 8).
Dig deeper into the log file, I found this
replSetInitiate failed to store config document or create the oplog; UnknownError: assertion C:\data\mci\7751c6064ad5f370b9aea0db0164a05e\src\src\mongo/util/concurrency/rwlock.h:204
2017-08-26T18:36:41.760+0700 I COMMAND [conn1] command local.oplog.rs command: replSetInitiate { replSetInitiate: { _id: "rs0", version: 1.0, members: [ { _id: 0.0, host: "127.0.0.1:27017" } ] } } keyUpdates:0 writeConflicts:0 numYields:0 reslen:143 locks:{ Global: { acquireCount: { r: 1, W: 1 } }, MMAPV1Journal: { acquireCount: { w: 2 } }, Metadata: { acquireCount: { W: 6 } } } protocol:op_command 4782ms
Any hint for me please? Thank you a ton.
I'm running into an issue where one of my shards is constantly at 100% CPU usage while I'm storing files into my Mongo DB (using Grid FS). I have shutdown writing to the DB and the usage does drop down to nearly 0%. However, the auto balancer is on and does not appear to be auto balancing anything. I have roughly 50% of my data on that one shard with nearly 100% CPU usage and virtually all the others are at 7-8%.
Any ideas?
mongos> version()
3.0.6
Auto Balancing Enabled
Storage Engine: WiredTiger
I have this general architecture:
2 - routers
3 - config server
8 - shards (2 shards per server - 4 servers)
No replica sets!
https://docs.mongodb.org/v3.0/core/sharded-cluster-architectures-production/
Log Details
Router 1 Log:
2016-01-15T16:15:21.714-0700 I NETWORK [conn3925104] end connection [IP]:[port] (63 connections now open)
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] Socket recv() timeout [IP]:[port]
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] SocketException: remote: [IP]:[port] error: 9001 socket exception [RECV_TIMEOUT] server [IP]:[port]
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] DBClientCursor::init call() failed
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] scoped connection to [IP]:[port],[IP]:[port],[IP]:[port] not being returned to the pool
2016-01-15T16:15:23.256-0700 W SHARDING [LockPinger] distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1442579303:1804289383' detected an exception while pinging. :: caused by :: SyncClusterConnection::update prepare failed: [IP]:[port] (IP) failed:10276 DBClientBase::findN: transport error: [IP]:[port] ns: admin.$cmd query: { getlasterror: 1, fsync: 1 }
2016-01-15T16:15:24.715-0700 I NETWORK [mongosMain] connection accepted from [IP]:[port] #3925105 (64 connections now open)
2016-01-15T16:15:24.715-0700 I NETWORK [conn3925105] end connection [IP]:[port] (63 connections now open)
2016-01-15T16:15:27.717-0700 I NETWORK [mongosMain] connection accepted from [IP]:[port] #3925106 (64 connections now open)
2016-01-15T16:15:27.718-0700 I NETWORK [conn3925106] end connection [IP]:[port](63 connections now open)
Router 2 Log:
2016-01-15T16:18:21.762-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e3d110ccb8e38549a9d
2016-01-15T16:18:24.316-0700 I SHARDING [LockPinger] cluster [IP]:[port],[IP]:[port],[IP]:[port] pinged successfully at Fri Jan 15 16:18:24 2016 by distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1442579454:1804289383', sleeping for 30000ms
2016-01-15T16:18:24.978-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
2016-01-15T16:18:35.295-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e4a110ccb8e38549a9f
2016-01-15T16:18:38.507-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
2016-01-15T16:18:48.838-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e58110ccb8e38549aa1
2016-01-15T16:18:52.038-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
2016-01-15T16:18:54.660-0700 I SHARDING [LockPinger] cluster [IP]:[port],[IP]:[port],[IP]:[port] pinged successfully at Fri Jan 15 16:18:54 2016 by distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1442579454:1804289383', sleeping for 30000ms
2016-01-15T16:19:02.323-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e66110ccb8e38549aa3
2016-01-15T16:19:05.513-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
Problematic Shard Log:
2016-01-15T16:21:03.426-0700 W SHARDING [conn40] Finding the split vector for Files.fs.chunks over { files_id: 1.0, n: 1.0 } keyCount: 137 numSplits: 200715 lookedAt: 46 took 17364ms
2016-01-15T16:21:03.484-0700 I COMMAND [conn40] command admin.$cmd command: splitVector { splitVector: "Files.fs.chunks", keyPattern: { files_id: 1.0, n: 1.0 }, min: { files_id: ObjectId('5650816c827928d710ef5ef9'), n: 1 }, max: { files_id: MaxKey, n: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:216396 reslen:8318989 locks:{ Global: { acquireCount: { r: 432794 } }, Database: { acquireCount: { r: 216397 } }, Collection: { acquireCount: { r: 216397 } } } 17421ms
2016-01-15T16:21:03.775-0700 I SHARDING [LockPinger] cluster [IP]:[port],[IP]:[port],[IP]:[port] pinged successfully at Fri Jan 15 16:21:03 2016 by distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1441718306:765353801', sleeping for 30000ms
2016-01-15T16:21:04.321-0700 I SHARDING [conn40] request split points lookup for chunk Files.fs.chunks { : ObjectId('5650816c827928d710ef5ef9'), : 1 } -->> { : MaxKey, : MaxKey }
2016-01-15T16:21:08.243-0700 I SHARDING [conn46] request split points lookup for chunk Files.fs.chunks { : ObjectId('5650816c827928d710ef5ef9'), : 1 } -->> { : MaxKey, : MaxKey }
2016-01-15T16:21:10.174-0700 W SHARDING [conn37] Finding the split vector for Files.fs.chunks over { files_id: 1.0, n: 1.0 } keyCount: 137 numSplits: 200715 lookedAt: 60 took 18516ms
2016-01-15T16:21:10.232-0700 I COMMAND [conn37] command admin.$cmd command: splitVector { splitVector: "Files.fs.chunks", keyPattern: { files_id: 1.0, n: 1.0 }, min: { files_id: ObjectId('5650816c827928d710ef5ef9'), n: 1 }, max: { files_id: MaxKey, n: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:216396 reslen:8318989 locks:{ Global: { acquireCount: { r: 432794 } }, Database: { acquireCount: { r: 216397 } }, Collection: { acquireCount: { r: 216397 } } } 18574ms
2016-01-15T16:21:10.989-0700 W SHARDING [conn25] Finding the split vector for Files.fs.chunks over { files_id: 1.0, n: 1.0 } keyCount: 137 numSplits: 200715 lookedAt: 62 took 18187ms
2016-01-15T16:21:11.047-0700 I COMMAND [conn25] command admin.$cmd command: splitVector { splitVector: "Files.fs.chunks", keyPattern: { files_id: 1.0, n: 1.0 }, min: { files_id: ObjectId('5650816c827928d710ef5ef9'), n: 1 }, max: { files_id: MaxKey, n: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:216396 reslen:8318989 locks:{ Global: { acquireCount: { r: 432794 } }, Database: { acquireCount: { r: 216397 } }, Collection: { acquireCount: { r: 216397 } } } 18246ms
2016-01-15T16:21:11.365-0700 I SHARDING [conn37] request split points lookup for chunk Files.fs.chunks { : ObjectId('5650816c827928d710ef5ef9'), : 1 } -->> { : MaxKey, : MaxKey }
For the splitting error - Upgrading to Mongo v.3.0.8+ resolved it
Still having an issue with the balancing itself...shard key is an md5 check sum so unless they all have very similar md5s (not very likely) there is still investigating to do....using range based partitioning
there are multiple ways to check
db.printShardingStatus() - this will give all collections sharded and whether auto balancer is on and current collection taken for sharding from when
sh.status(true) - this will give chunk level details. Look whether your chunk has jumbo:true . In case chunk is marked as jumbo it will not be split properly.
db.collection.stats() -- this will give collection stats and see each shard distribution details there