MongoDB - CPU utilisation goes beyond 70% on huge update - mongodb

I have a mongodb collection test. An update operation is done on this collection every 30 seconds. Sometimes the cpu utilization goes above 80% and connection times out. Also when I enabled profiling and checked /var/log/mongodb/mongod.log, the time taken for the update operation to complete is more than 40 seconds. Update query is very big and it looks like this.
Update Query
test.update({_id: xxxxxxxx},{ $set: {
c_id: "5803a892b6646ad17b2b7a67", s_id: "58f29ee1d6ee0610543f152e",
c_id: "58f29a38c91379637619605c", m: "78a3512ad29c",
date: new Date(1505520000000), l.0.h.0.t: 0, l.0.m.0.0.t: 0,
l.0.h.0.t: 0, l.0.m.0.0.t: 0, l.0.h.0.r: 0, l.0.m.0.0.r: 0,
l.0.h.0.r: 0, l.0.m.0.0.r: 0, l.1.h.0.t: 0, l.1.m.0.0.t: 0,
l.1.h.0.t: 0, l.1.m.0.0.t: 0, l.1.h.0.r: 0, l.1.m.0.0.r: 0,
l.1.h.0.r: 0, l.1.m.0.0.r: 0, l.0.m.0.1.t: 0, l.0.m.0.1.t: 0,
l.0.m.0.1.rxb: 0, l.0.m.0.1.r: 0,
l.1.m.0.1.t: 0, l.1.m.0.1.t: 0, l.1.m.0.1.r: 0, l.1.m.0.1.r: 0,
l.0.m.0.2.t: 0,
.....................
.....................
.....................
.....................
.....................
}})
The query is very large, I have only posted a part of the query and the schema is also very much involved. How can we improve the performance of this update query? Should I optimize schema to reduce the number of nested documents?
I would like to know what are steps I can take to improve this update query.
.explain() output for Update Query
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.stats",
"indexFilterSet" : false,
"winningPlan" : {
"stage" : "UPDATE",
"inputStage" : {
"stage" : "IDHACK"
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "abcxyz",
"port" : 27017,
"version" : "3.4.4",
"gitVersion" : "988390515874a9debd1b6c5d36559ca86b4babj"
},
"ok" : 1.0
}

Related

How to measure the query run time in MongoDB

I am trying to measure the query run time in MongoDB.
Steps:
I set the profiling in mongoDB and ran my query
When I did show Profile I got the below output.
db.blogpost.find({post:/.* NATO .*/i})
blogpost is the collection name, I searched for "NATO" keyword in query.
Output: It pulled out 20 records and after running the query to get execution results, I got the below output:
In the output I can see 3 time values, which one is similar to duration time in MySQL ?
query blogtrackernosql.blogpost **472ms** Wed Apr 11 2018 20:37:54
command:{
"find" : "blogpost",
"filter" : {
"post" : /.* NATO .*/i
},
"$db" : "blogtrackernosql"
} cursorid:99983342073 keysExamined:0 docsExamined:1122 numYield:19 locks:{
"Global" : {
"acquireCount" : {
"r" : NumberLong(40)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(20)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(20)
}
}
} nreturned:101 responseLength:723471 protocol:op_msg planSummary:COLLSCAN
execStats:{
**"stage"** : "COLLSCAN",
"filter" : {
"post" : {
"$regex" : ".* NATO .*",
"$options" : "i"
}
},
"nReturned" : 101,
**"executionTimeMillisEstimate" : 422**,
"works" : 1123,
"advanced" : 101,
"needTime" : 1022,
"needYield" : 0,
"saveState" : 20,
"restoreState" : 19,
"isEOF" : 0,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1122
} client:127.0.0.1 appName:MongoDB Shell allUsers:[ ] user:
This ...
"executionTimeMillisEstimate" : 422
... is MongoDB's estimation of how long that query will take to execute on the MongoDB server.
This ...
query blogtrackernosql.blogpost 472ms
... must be the end-to-end time including some client side piece (e.g. forming the query and sending it to the MongoDB server) plus the data transfer time from the MongoDB server back to your client.
So:
472ms is the total start-to-finish time
422ms is the time spent inside the MongoDb server
Note: the ouput also tells you that MongoDB has to scan the entire collection ("stage": "COLLSCAN") to perform this query. FWIW, the reason it has to scan the collection is that you are using a case insensitive $regex. According to the docs:
Case insensitive regular expression queries generally cannot use indexes effectively.

Can I batch aggregations in MongoDB?

I have some readonly aggregate pipelines that must be runned in parallel with only one connection available. Is that possible or Mongo allows to only have find, update operations in bulk but not aggregate?
Mongodb driver uses connections pool and executes aggregation commands asynchronously. You don't need to do anything special, apart from ensure your application doesn't wait for responses before executing next query.
Consider a test collection:
mgeneratejs '{"num": {"$integer": {"min": 1, "max": 20}}, "text": {"$paragraph": {sentences: 5}}}' -n 100000 | mongoimport -d so -c text
a single aggregation query
db.text.aggregate([
{$match: {text: /ert.*duv/i}},
{$group:{_id:null, cnt:{$sum:1}, text:{$push: "$text"}}}
]);
takes circa 400 millis.
Running 10 of these in parallel (javascript):
const started = new Date().getTime();
let db;
MongoClient.connect(url, {poolSize: 10})
.then(cl =>{
db = cl.db('so');
return Promise.all([/ert.*duv/i, /kkd.*aql/i, /zop/i, /bdgtter/i, /ppa.*mcm/i, /ert.*duv/i, /kkd.*aql/i, /zop/i, /bdgtter/i, /ppa.*mcm/i]
.map(regex=>([{$match: {text: regex}}, {$group:{_id:null, cnt:{$sum:1}, text:{$push: "$text"}}}]))
.map(pipeline=>db.collection('text').aggregate(pipeline).toArray()))
})
.then(()=>{db.close(); console.log("ended in " + ( new Date().getTime() - started))});
takes 1,883 millis (javascript time), of which ~1,830 are on the db side:
db.getCollection('system.profile').find({ns:"so.text", "command.aggregate": "text"}, {ts:1, millis:1})
{
"millis" : 442,
"ts" : ISODate("2018-02-22T17:32:39.738Z")
},
{
"millis" : 452,
"ts" : ISODate("2018-02-22T17:32:39.747Z")
},
{
"millis" : 445,
"ts" : ISODate("2018-02-22T17:32:39.756Z")
},
{
"millis" : 471,
"ts" : ISODate("2018-02-22T17:32:39.762Z")
},
{
"millis" : 448,
"ts" : ISODate("2018-02-22T17:32:39.771Z")
},
{
"millis" : 491,
"ts" : ISODate("2018-02-22T17:32:39.792Z")
},
{
"millis" : 566,
"ts" : ISODate("2018-02-22T17:32:39.854Z")
},
{
"millis" : 561,
"ts" : ISODate("2018-02-22T17:32:39.856Z")
},
{
"millis" : 1822,
"ts" : ISODate("2018-02-22T17:32:41.118Z")
},
{
"millis" : 1834,
"ts" : ISODate("2018-02-22T17:32:41.124Z")
}
If you do the math you see all 10 started at about same time 2018-02-22T17:32:39.300Z, and mongostat indeed shows 10 more connections at the time of script execution.
Limiting poolSize to 5 doubles the time, as the requests will be executed in 2 batches of 5.
Driver uses about 1Mb RAM per connection, so 100 connections per worker is not something unreal.
To summarise - ensure you have connections pool configured properly, check number of connections actually used runtime, check you handle requests asynchronously on the application level.

How to improve/optimize speed of MongoDB query?

I have a small Mongo database with ~30k records.
Simple query, which uses 5-6 parameters takes almost a second (considering entire DB is in RAM).
Can anyone suggest what I'm doing wrong?
2015-11-26T18:41:29.540+0200 [conn3] command vvpilotdb2.$cmd command:
count { count: "TestResults", query: { Test: 5.0, IsAC: true,
InputMode: 0.0, IsOfficialTest: true, IsSanity: false, IsStress:
false, IsUnderNoise: false, MetalRodSize: 9.0 }, fields: {} }
planSummary: COLLSCAN keyUpdates:0 numYields:1 locks(micros) r:1397227
reslen:48 944ms
Here is db.stats(). I haven't assigned any indexes by myself. all settings - default.:
> db.stats()
{
"db" : "vvpilotdb2",
"collections" : 5,
"objects" : 28997,
"avgObjSize" : 7549.571610856296,
"dataSize" : 218914928,
"storageSize" : 243347456,
"numExtents" : 17,
"indexes" : 3,
"indexSize" : 964768,
"fileSize" : 469762048,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"ok" : 1
}
In MongoDB, the _id field is indexed by default.
You should index the field which you will be using for making the query.
Compound indexes can also be created on multiple fields and the order for them (ascending/descending).
Here's the documentation for the same:
https://docs.mongodb.org/manual/indexes/

MongoDB: How to disable logging the warning: ClientCursor::staticYield can't unlock b/c of recursive lock?

I get the warning stated in the title
warning: ClientCursor::staticYield can't unlock b/c of recursive lock
ns....
in the log file for literally gazillion of times (the log file reaches 200 GB in size in a single day with this single log message). As mentioned in this SO question, I want to adopt the "solution" of simply ignoring the message.
What I did (to no avail) to stop it is to:
set the param quiet = true
set the param oplog = 0
set the param logpath=/dev/null (hoping that nothing gets logged anymore)
set the param logappend=false
All of the above are useless - the message still floods the log file.
The solution I use now is to run a cron job every night to simply empty that log file.
Please, is there anything else I can try?
I use MongoDB 2.6.2 on a Debian 6.0 while programming it from Perl
I have recently been looking into this error myself as I was seeing 25Gb a month generated from mongod.log with a similar message. However, I noticed that a query was included in the log message (I've formatted the message to fit in this post, it was actually all on one line):
warning: ClientCursor::yield can't unlock b/c of recursive lock ns: my-database.users top:
{
opid: 1627436260,
active: true,
secs_running: 0,
op: "query",
ns: "my-database",
query:
{
findAndModify: "users",
query: { Registered: false, Completed: 0 },
sort: { Created: 1 },
update: { $set: { NextRefresh: "2014-12-07" } },
new: true
},
client: "10.1.34.175:53582",
desc: "conn10906412",
threadId: "0x7f824f0f9700",
connectionId: 10906412,
locks: { ^: "w", ^my-database: "W" },
waitingForLock: false,
numYields: 0,
lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { r: 0, w: 3 } }
}
A bit of Googling revealed that this message is most commonly raised when the query cannot use any indexes. I tried using .explain() with the query in the log and sure enough it showed that a BasicCursor was being used with no index:
db.users.find( { Registered: false, Completed: 0 } ).sort( { Created: 1 } ).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 10453,
"nscanned" : 10453,
"nscannedObjectsAllPlans" : 10453,
"nscannedAllPlans" : 10453,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 7,
"indexBounds" : {
},
"server" : "mongodb-live.eu-west-1a.10_1_2_213:27017"
}
Adding an index for the elements in the query fixed the issue. The log was no longer generated and when I ran .explain() again it showed an index being used:
{
"cursor" : "BtreeCursor Registered_1_Completed_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 1,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"Registered" : [
[
false,
false
]
],
"Completed" : [
[
0,
0
]
]
},
"server" : "mongodb-live.eu-west-1a.10_1_2_213:27017"
}

Mongo is taking a HUGE amount of space to save this data?

https://gist.github.com/1173528#comments
shows the structure of the data file ...
the short version is
{ "img_ref" : {
"$ref" : "mapimage",
"$id" : ObjectId("4e454599f404e8d51c000002")
},
"scale" : 128, "image" : "4e454599f404e8d51c000002", "tile_i" : 0, "tile_j" : 9, "w" : 9, "e" : 10, "n" : 0, "s" : 0,
"heights" : [
[
0,
2,
0,
1,
515,
0,
256,
...], [...]
, _id: ObjectId("...") }
The stats() on this collection is:
{
"ns" : "ac2.mapimage_tile",
"count" : 18443,
"size" : 99513670744,
"avgObjSize" : 5395742.056281516,
"storageSize" : 100336473712,
"numExtents" : 74,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 5832704,
"indexSizes" : {
"_id_" : 786432,
"img_ref_1_tile_i_1_tile_j_1" : 2236416,
"image_1" : 1212416,
"image_1_tile_i_1_tile_j_1_scale_1" : 1597440
},
"ok" : 1
}
Note the average object size, 5,395,742 bytes - or 5 MB! 5 MB for storing 16,384 ints seems pretty extreme!
See http://bsonspec.org/#/specification for how things get serialized in mongodb. Arrays are actually very space inefficient especially because we store the index number as a string key for each element. This is less of a problem for small arrays of large elements like strings or objects, but for large arrays of 32-bit ints it is very expensive.
MongoDB pre-allocates space for it's databases: http://www.mongodb.org/display/DOCS/Developer+FAQ#DeveloperFAQ-Whyaremydatafilessolarge%3F
What you are likely seeing is that pre-allocation- if you add further items, you probably will not see a further increase in space usage for a long while.
Also: http://www.mongodb.org/display/DOCS/Excessive+Disk+Space