Why is MongoDB $geoWithin slow when index is hinted? - mongodb

I have been struggling with this for some time. I have a collection of over 100,000 documents. Each document has a geoLocation field, that uses GeoJSON format. I added a 2dsphere index to the geoLocation field.
If I run this simple query, it takes almost 1 second to complete:
db.guestBookPost.find({"geoLocation" : { "$geoWithin" : {$centerSphere:[[-118.3688331113197 , 34.1620417429723], .00068621014493]}}, $hint:"geoLocation_2dsphere"}).limit(10)
The explain shows:
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 100211,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 100211,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 910,
"indexBounds" : {
},
"nscanned" : 100211,
"matchTested" : NumberLong(100211),
"geoTested" : NumberLong(0),
"cellsInCover" : NumberLong(8),
"server" : "ip-10-245-26-151:27017"
}
It doesn't look like the $geoWithin query is using the hinted index. The cursor type is S2Cursor, which seems incorrect. Am I doing anything wrong? This is MongoDB 2.4.3
Thanks,
Les

s2cursor is the 2dsphere index, you can take a look at this:
http://blog.mongodb.org/post/50984169045/new-geo-features-in-mongodb-2-4?mkt_tok=3RkMMJWWfF9wsRovs67NZKXonjHpfsX74%2BktX6C1lMI%2F0ER3fOvrPUfGjI4JS8FnI%2BSLDwEYGJlv6SgFSrHCMahnybgIUhI%3D

Related

MongoDB index is not being used

I have a collection of questions with index on modified.
{
"v" : 1,
"key" : {
"modified" : 1
},
"name" : "modified_1",
"ns" : "app23568387.questions",
"background" : true,
"safe" : null
}
But when I query the questions with modified field, mongo does not use this index.
db.questions.find({modified: ISODate("2016-07-20T20:58:20.662Z")}).explain(true);
It returns
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 19315626,
"nscanned" : 19315626,
"nscannedObjectsAllPlans" : 19315626,
"nscannedAllPlans" : 19315626,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 384889,
"nChunkSkips" : 0,
"millis" : 43334,
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 19315626,
"nscanned" : 19315626,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0
}
],
"server" : "c387.candidate.37:10387",
"filterSet" : false,
"stats" : {
"type" : "COLLSCAN",
"works" : 19624020,
"yields" : 384889,
"unyields" : 384889,
"invalidates" : 3,
"advanced" : 0,
"needTime" : 19315627,
"needFetch" : 0,
"isEOF" : 1,
"docsTested" : 19315626,
"children" : []
}
}
When I use hint(), mongo throws an error bad hint.
I have another collection of folders which has exactly the same index and the query uses the index. (returns "cursor" : "BtreeCursor modified_1" for explain())
What could be the difference between questions and folders? Is it possible that the index is "broken" even though getIndexes() returns it? If so, what can I do to fix it?
It seems your index is not completely built in background. You can check this by using db.currentOp() command:
https://docs.mongodb.com/v3.0/reference/method/db.currentOp/#currentop-index-creation
Also check the mongod.log to see any error on the index building process.
The simple way to fix is to drop the index and create it again

MongoDB - Index scan low performance

I'm very new to MongoDB and i'm trying to test some performance in order to understand if my structure is fine.
I have a collection with 5 fields (3 date, one Int and one pointer to another ObjectId)
In this collection i've created an index on two fields:
_p_monitor_ref Asc (this is the pointer)
collected Desc (this is one Date field)
The index name is: _p_monitor_ref_1_collected_-1
I've created this index in the beginning and populated the table with some records. After that, i've duplicated the records many times with this script.
var bulk = db.measurements.initializeUnorderedBulkOp();
db.measurements.find().limit(1483570).forEach(function(document) {
document._id = new ObjectId();
bulk.insert(document);
});
bulk.execute();
Now, the collection have 3 million of document.
Now, i try to execute explain to see if the collection use the index and how many time is needed to be executed. This is the query:
db.measurements.find({ "_p_monitor_ref": "Monitors$iKNoB6Ga5P" }).sort({collected: -1}).explain()
As you see, i use _p_monitor_ref to search all documents by pointer, and then i order for collected -1 (this is the index)
This is the first result when i run it. MongoDB use the index (BtreeCursor _p_monitor_ref_1_collected_-1) but the execution time is very hight "millis" : 120286,:
{
"cursor" : "BtreeCursor _p_monitor_ref_1_collected_-1",
"isMultiKey" : false,
"n" : 126862,
"nscannedObjects" : 126862,
"nscanned" : 126862,
"nscannedObjectsAllPlans" : 126862,
"nscannedAllPlans" : 126862,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 23569,
"nChunkSkips" : 0,
"millis" : 120286,
"indexBounds" : {
"_p_monitor_ref" : [
[
"Monitors$iKNoB6Ga5P",
"Monitors$iKNoB6Ga5P"
]
],
"collected" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "my-pc",
"filterSet" : false
}
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2967141,
"nscannedObjects" : 2967141,
"nscanned" : 2967141,
"nscannedObjectsAllPlans" : 2967141,
"nscannedAllPlans" : 2967141,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 27780,
"nChunkSkips" : 0,
"millis" : 11501,
"server" : "my-pc",
"filterSet" : false
}
Now, if i execute the explain again this is the result and the time is "millis" : 201:
{
"cursor" : "BtreeCursor _p_monitor_ref_1_collected_-1",
"isMultiKey" : false,
"n" : 126862,
"nscannedObjects" : 126862,
"nscanned" : 126862,
"nscannedObjectsAllPlans" : 126862,
"nscannedAllPlans" : 126862,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 991,
"nChunkSkips" : 0,
"millis" : 201,
"indexBounds" : {
"_p_monitor_ref" : [
[
"Monitors$iKNoB6Ga5P",
"Monitors$iKNoB6Ga5P"
]
],
"collected" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "my-pc",
"filterSet" : false
}
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2967141,
"nscannedObjects" : 2967141,
"nscanned" : 2967141,
"nscannedObjectsAllPlans" : 2967141,
"nscannedAllPlans" : 2967141,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 23180,
"nChunkSkips" : 0,
"millis" : 651,
"server" : "my-pc",
"filterSet" : false
}
Why i have this two very different results ? Maybe the second execution take the data from some kind of cache...
Now, the collection have 3 million of record... what if the collection will grow and become 10/20/30 million ?
I dont know if i'm doing something wrong. Sure, i'm executing it on my Laptop (i dont have a SSD).
The reason why you have smaller execution time at second attempt is connected with fact, that first attempt forced mongo to load data into memory and data was still available in memory when second attempt was executed.
When your collection will grow, index will grow as well - so that could affect that it will be to big to fit in free memory blocks and mongodb engine will load/unload part of that index - so performance will vary.

mongodb $nearSphere performance issue with huge data(over 200w+)

mongodb version is 2.6
the total records in the collection is over 200w
details as below
collection table structure as below:
{
"postid":NumberLong(97040),
"accountid":NumberLong(348670),
"location":{
"type":"Point",
"coordinates":[
112.56531,
32.425657
]
},
"type":NumberLong(1),
"countspreads":NumberLong(6),
"countavailablespreads":NumberLong(6),
"timestamp":NumberLong(1428131578)
}
collection index is 2dsphere:
{
"v" : 1,
"key" : {
"location" : "2dsphere"
},
"name" : "location_2dsphere",
"ns" : "fly.postspreads",
"2dsphereIndexVersion" : 2
},
** query command ** as below
db.example.find({"location":{"$nearSphere":{"$geometry":{"type":"Point","coordinates":[113.547821,22.18648]},"$maxDistance":50000, "$minDistance":0}}}).explain()
result
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 145255,
"nscannedObjects" : 1290016,
"nscanned" : 1290016,
"nscannedObjectsAllPlans" : 1290016,
"nscannedAllPlans" : 1290016,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 4087,
"indexBounds" : {
},
"server" : "DB-SH-01:27017",
"filterSet" : false
}
the value of $maxDistance is too large for now, from the above result, we will find that the command was scanned over 100w+ records and cost 4087ms.
if we reduce the value of $maxDistance to 500, new result as below:
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 21445,
"nscannedObjects" : 102965,
"nscanned" : 102965,
"nscannedObjectsAllPlans" : 102965,
"nscannedAllPlans" : 102965,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 634,
"indexBounds" : {
},
"server" : "DB-SH-01:27017",
"filterSet" : false
}
now the command was scanned over 10w+ records and cost 634ms, the query speed is too slow too. even if i reduce the value of $maxDistance to 0.0001, the scanned records over 8w+, and the time is about 600ms too.
the query time is unacceptable, but i didn't find where was wrong.

Sort on $geoWithin geospatial query in MongoDB

I'm trying to retrieve a bunch of Polygons stored inside my db, and sort them by radius. So I wrote a query with a simple $geoWithin.
So, without sorting the code looks like this:
db.areas.find(
{
"geometry" : {
"$geoWithin" : {
"$geometry" : {
"type" : "Polygon",
"coordinates" : [ [ /** omissis: array of points **/ ] ]
}
}
}
}).limit(10).explain();
And the explain result is the following:
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 10,
"nscanned" : 367,
"nscannedObjectsAllPlans" : 10,
"nscannedAllPlans" : 367,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2,
"indexBounds" : {
},
"nscanned" : 367,
"matchTested" : NumberLong(10),
"geoTested" : NumberLong(10),
"cellsInCover" : NumberLong(27),
"server" : "*omissis*"
}
(Even if it's fast, it shows as cursor S2Cursor, letting me understand that my compound index has not been used. However, it's fast)
So, whenever I try to add a sort command, simply with .sort({ radius: -1 }), the query becomes extremely slow:
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 58429,
"nscanned" : 705337,
"nscannedObjectsAllPlans" : 58429,
"nscannedAllPlans" : 705337,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 3,
"nChunkSkips" : 0,
"millis" : 3186,
"indexBounds" : {
},
"nscanned" : 705337,
"matchTested" : NumberLong(58432),
"geoTested" : NumberLong(58432),
"cellsInCover" : NumberLong(27),
"server" : "*omissis*"
}
with MongoDB scanning all the documents. Obviously I tried to add a compound index, like { radius: -1, geometry : '2dsphere' } or { geometry : '2dsphere' , radius: -1 }, but nothing helped. Still very slow.
I would know if I'm using in the wrong way the compound index, if the S2Cursor tells me something I should change in my indexing strategy, overall, what I am doing wrong.
(PS: I'm using MongoDB 2.4.5+, so the problem is NOT caused by second field ascending in compound index when using 2dsphere index as reported here https://jira.mongodb.org/browse/SERVER-9647)
First of all, s2Cursor means that the query uses a geographic index.
There can be multiple reasons why the sort operation is slow, sort operation require memory, maybe your server has very little memory, you should consider executing sort operations in code, not at the server side.

MongoDB - Query on nested field with index

I am trying to figure out how I must structure queries such that they will hit my index.
I have documents structured like so:
{ "attributes" : { "make" : "Subaru", "color" : "Red" } }
With an index of: db.stuff.ensureIndex({"attributes.make":1})
What I've found is that querying using dot notation hits the index while querying with a document does not.
Example:
db.stuff.find({"attributes.make":"Subaru"}).explain()
{
"cursor" : "BtreeCursor attributes.make_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"attributes.make" : [
[
"Subaru",
"Subaru"
]
]
}
}
vs
db.stuff.find({attributes:{make:"Subaru"}}).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Is there a way to get the document style query to hit the index? The reason is that when constructing queries from my persistent objects it's much easier to serialize them out as documents as opposed to something using dot notation.
I'll also add that we're using a home grown data mapper layer built w/ Jackson. Would using something like Morphia help with properly constructing these queries?
Did some more digging and this thread explains what's going with the sub-document query. My problem above was that to make the sub-document based query act like the dot-notation I needed to use elemMatch.
db.stuff.find({"attributes":{"$elemMatch" : {"make":"Subaru"}}}).explain()
{
"cursor" : "BtreeCursor attributes.make_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 0,
"millis" : 2,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"attributes.make" : [
[
"Subaru",
"Subaru"
]
]
}
}