mongodb $nearSphere performance issue with huge data(over 200w+) - mongodb

mongodb version is 2.6
the total records in the collection is over 200w
details as below
collection table structure as below:
{
"postid":NumberLong(97040),
"accountid":NumberLong(348670),
"location":{
"type":"Point",
"coordinates":[
112.56531,
32.425657
]
},
"type":NumberLong(1),
"countspreads":NumberLong(6),
"countavailablespreads":NumberLong(6),
"timestamp":NumberLong(1428131578)
}
collection index is 2dsphere:
{
"v" : 1,
"key" : {
"location" : "2dsphere"
},
"name" : "location_2dsphere",
"ns" : "fly.postspreads",
"2dsphereIndexVersion" : 2
},
** query command ** as below
db.example.find({"location":{"$nearSphere":{"$geometry":{"type":"Point","coordinates":[113.547821,22.18648]},"$maxDistance":50000, "$minDistance":0}}}).explain()
result
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 145255,
"nscannedObjects" : 1290016,
"nscanned" : 1290016,
"nscannedObjectsAllPlans" : 1290016,
"nscannedAllPlans" : 1290016,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 4087,
"indexBounds" : {
},
"server" : "DB-SH-01:27017",
"filterSet" : false
}
the value of $maxDistance is too large for now, from the above result, we will find that the command was scanned over 100w+ records and cost 4087ms.
if we reduce the value of $maxDistance to 500, new result as below:
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 21445,
"nscannedObjects" : 102965,
"nscanned" : 102965,
"nscannedObjectsAllPlans" : 102965,
"nscannedAllPlans" : 102965,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 634,
"indexBounds" : {
},
"server" : "DB-SH-01:27017",
"filterSet" : false
}
now the command was scanned over 10w+ records and cost 634ms, the query speed is too slow too. even if i reduce the value of $maxDistance to 0.0001, the scanned records over 8w+, and the time is about 600ms too.
the query time is unacceptable, but i didn't find where was wrong.

Related

MongoDB - Index scan low performance

I'm very new to MongoDB and i'm trying to test some performance in order to understand if my structure is fine.
I have a collection with 5 fields (3 date, one Int and one pointer to another ObjectId)
In this collection i've created an index on two fields:
_p_monitor_ref Asc (this is the pointer)
collected Desc (this is one Date field)
The index name is: _p_monitor_ref_1_collected_-1
I've created this index in the beginning and populated the table with some records. After that, i've duplicated the records many times with this script.
var bulk = db.measurements.initializeUnorderedBulkOp();
db.measurements.find().limit(1483570).forEach(function(document) {
document._id = new ObjectId();
bulk.insert(document);
});
bulk.execute();
Now, the collection have 3 million of document.
Now, i try to execute explain to see if the collection use the index and how many time is needed to be executed. This is the query:
db.measurements.find({ "_p_monitor_ref": "Monitors$iKNoB6Ga5P" }).sort({collected: -1}).explain()
As you see, i use _p_monitor_ref to search all documents by pointer, and then i order for collected -1 (this is the index)
This is the first result when i run it. MongoDB use the index (BtreeCursor _p_monitor_ref_1_collected_-1) but the execution time is very hight "millis" : 120286,:
{
"cursor" : "BtreeCursor _p_monitor_ref_1_collected_-1",
"isMultiKey" : false,
"n" : 126862,
"nscannedObjects" : 126862,
"nscanned" : 126862,
"nscannedObjectsAllPlans" : 126862,
"nscannedAllPlans" : 126862,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 23569,
"nChunkSkips" : 0,
"millis" : 120286,
"indexBounds" : {
"_p_monitor_ref" : [
[
"Monitors$iKNoB6Ga5P",
"Monitors$iKNoB6Ga5P"
]
],
"collected" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "my-pc",
"filterSet" : false
}
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2967141,
"nscannedObjects" : 2967141,
"nscanned" : 2967141,
"nscannedObjectsAllPlans" : 2967141,
"nscannedAllPlans" : 2967141,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 27780,
"nChunkSkips" : 0,
"millis" : 11501,
"server" : "my-pc",
"filterSet" : false
}
Now, if i execute the explain again this is the result and the time is "millis" : 201:
{
"cursor" : "BtreeCursor _p_monitor_ref_1_collected_-1",
"isMultiKey" : false,
"n" : 126862,
"nscannedObjects" : 126862,
"nscanned" : 126862,
"nscannedObjectsAllPlans" : 126862,
"nscannedAllPlans" : 126862,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 991,
"nChunkSkips" : 0,
"millis" : 201,
"indexBounds" : {
"_p_monitor_ref" : [
[
"Monitors$iKNoB6Ga5P",
"Monitors$iKNoB6Ga5P"
]
],
"collected" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "my-pc",
"filterSet" : false
}
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2967141,
"nscannedObjects" : 2967141,
"nscanned" : 2967141,
"nscannedObjectsAllPlans" : 2967141,
"nscannedAllPlans" : 2967141,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 23180,
"nChunkSkips" : 0,
"millis" : 651,
"server" : "my-pc",
"filterSet" : false
}
Why i have this two very different results ? Maybe the second execution take the data from some kind of cache...
Now, the collection have 3 million of record... what if the collection will grow and become 10/20/30 million ?
I dont know if i'm doing something wrong. Sure, i'm executing it on my Laptop (i dont have a SSD).
The reason why you have smaller execution time at second attempt is connected with fact, that first attempt forced mongo to load data into memory and data was still available in memory when second attempt was executed.
When your collection will grow, index will grow as well - so that could affect that it will be to big to fit in free memory blocks and mongodb engine will load/unload part of that index - so performance will vary.

Geospatial index in MongoDB makes no difference to performance

I'm trying to find which documents are geo-located within a given rectangle. I have a Mongo collection looking a bit like:
{
...
"metadata" : {
...
"geometry" : { "type" : "Point", "coordinates" : [ -0.000, 51.477 ] }
}
}
And my query looks like:
db.my_coll.find({ "$query" : {
"metadata.geometry" : {
"$geoIntersects" : {
"$geometry" : { "type" : "Polygon", "coordinates" : [ [ [..., ...], ... ] ] }
} } }, "$explain":1})
With no geospatial index I get:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 646,
"nscannedObjects" : 19539,
"nscanned" : 19539,
"nscannedObjectsAllPlans" : 19539,
"nscannedAllPlans" : 19539,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 152,
"nChunkSkips" : 0,
"millis" : 125,
...
With the geospatial index db.my_coll.ensureIndex({"metadata.geometry" : "2dsphere"}); I get:
{
"cursor" : "BtreeCursor metadata.geometry_2dsphere",
"isMultiKey" : false,
"n" : 646,
"nscannedObjects" : 18726,
"nscanned" : 18727,
"nscannedObjectsAllPlans" : 18726,
"nscannedAllPlans" : 18727,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 146,
"nChunkSkips" : 0,
"millis" : 161,
...
i.e.it's slower with the index when explaining. Querying from an outside application shows no significant difference in query time with or without the index (ms resolution). What am I doing wrong? Shouldn't the index make the query rather faster than this?
Thanks :-)

MongoDB indexing results on array fields

I have a collection as { student_id :1, teachers : [ "....",...]}
steps done in sequence as : 1) find by {teachers : "gore"}
2) set the index as { student_id : 1 }
3) find by {teachers : "gore"}
4) set the index as { teachers : 1 }
5) find by {teachers : "gore"}
and the results(time taken) are not that much effective by indexing teachers(array) Please someone explain what is happening? I may be doing something wrong here please correct me. The results are as :
d.find({teachers : "gore"}).explain()
{ "cursor" : "BasicCursor", "nscanned" : 999999, "nscannedObjects" : 999999, "n" : 447055, "millis" : 1623, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : {
} }
d.ensureIndex({student_id : 1})
d.find({teachers : "gore"}).explain() { "cursor" : "BasicCursor", "nscanned" : 999999, "nscannedObjects" : 999999, "n" : 447055, "millis" : 1300, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : {
} }
d.ensureIndex({teachers : 1})
d.find({teachers : "gore"}).explain() { "cursor" : "BtreeCursor teachers_1", "nscanned" : 447055, "nscannedObjects" : 447055, "n" : 447055, "millis" : 1501, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "teachers" : [ [ "gore", "gore" ] ] } }
Do you have the same data inserted over and over? The fact that it is showing a BtreeCursor is a positive, but the number of nscannedObjects is too large. Do you have the same data inserted over and over again? Is it possible that you have 447055 "gore" values? If so, thats why its taking such a long time.

mongodb slow query with $near and other condition

I have a mongodb collection named rooms, and it has a 2d index for field location. I've queried like this:
db.rooms.find( { "location" : { "$near" : { "latitude" : 37.3356135, "longitude" : 127.12383030000001 } }, "status": "open", "updated" : { "$gt" : ISODate("2014-06-03T15:34:22.213Z") }}).explain()
The result:
{
"cursor" : "GeoSearchCursor",
"isMultiKey" : false,
"n" : 7,
"nscannedObjects" : 143247,
"nscanned" : 143247,
"nscannedObjectsAllPlans" : 143247,
"nscannedAllPlans" : 143247,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1457,
"indexBounds" : {
},
"server" : "ip-10-162-39-56:27017",
"filterSet" : false
}
Sometimes it takes more than 2000ms. But if I remove the $gt condition for updated field, the query is fast, about 5~30ms.
> db.rooms.find( { "location" : { "$near" : { "latitude" : 37.3356135, "longitude" : 127.12383030000001 } }, "status": "open"}).explain()
{
"cursor" : "GeoSearchCursor",
"isMultiKey" : false,
"n" : 100,
"nscannedObjects" : 1635,
"nscanned" : 2400,
"nscannedObjectsAllPlans" : 1635,
"nscannedAllPlans" : 2400,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 22,
"indexBounds" : {
},
"server" : "ip-10-162-39-56:27017",
"filterSet" : false
}
I've tried compound index for {location:"2d", updated: -1} but it didn't work. How can I make this query faster?

Mongodb indexing

I have a query
db.messages.find({'headers.Date':{'$gt': new Date(2001,3,1)}},{'headers.From':1, _id:0}).sort({'headers.From':1})
I have set headers.From as index. Now which part of query will use this index ? i.e find part of query or sort part of query?
Explain output is
{
"cursor" : "BtreeCursor headers.From_1",
"isMultiKey" : false,
"n" : 83057,
"nscannedObjects" : 120477,
"nscanned" : 120477,
"nscannedObjectsAllPlans" : 120581,
"nscannedAllPlans" : 120581,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 250,
"indexBounds" : {
"headers.From" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Andrews-iMac.local:27017"
}
Any help is appreciated !!!
The index is being used for the sort part, not for the query, as your query doesn't use the headers.From field and your sort does.