Mongo $near query takes 6s on 1.2 millions documents

Mongo $near query takes 6s on 1.2 millions documents - mongodb

I inserted about 1.2 millions identical documents for testing speed of geospatial index in MongoDb
Here is a query:
db.spreads.find({ loc: { '$near': { '$geometry': {type: "Point" , coordinates: [40,40]}, '$maxDistance': 10000000 } } }).explain();
And result
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 1568220,
"nscannedObjects" : 12545154,
"nscanned" : 12545154,
"nscannedObjectsAllPlans" : 12545154,
"nscannedAllPlans" : 12545154,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 11413,
"indexBounds" : {
},
"server" : "s1.heychat.io:27017",
"filterSet" : false
}
Indexes:
db.spreads.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.spreads"
},
{
"v" : 1,
"key" : {
"loc" : "2dsphere"
},
"name" : "loc_2dsphere",
"ns" : "test.spreads",
"2dsphereIndexVersion" : 2
}
]
Why so slowly?

"n" : 1568220 in the explain output means that the query returned 1.5 million docs. So that would explain why it took so long.
Using a much smaller $maxDistance is probably a better test.

Related

Mongo Geospatial Index on Large Database - Not using index

I have a MongoDB with over 150m+ records - an for some reason, even with the correct index, I get very poor performance with basic geospatial queries:
db.regions.find({
loc: { $near: {
$geometry: {
type: "Point" ,
coordinates: [ 15.8775 , 49.2177 ]
},
$maxDistance: 1000,
$minDistance: 1
} } }).limit(1).explain();
The explain shows that the index is not being used:
{
"cursor" : "S2NearCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 4102,
"nscanned" : 4102,
"nscannedObjectsAllPlans" : 4102,
"nscannedAllPlans" : 4102,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2001,
"nChunkSkips" : 0,
"millis" : 18252,
"indexBounds" : {
},
"server" : "N/A:27017",
"filterSet" : false
}
However the indexes are definitely there in a 2dpshere field:
> db.regions.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "hive.regions"
},
{
"v" : 1,
"key" : {
"checkin_id" : 1
},
"name" : "checkin_id_1",
"ns" : "hive.regions"
},
{
"v" : 1,
"key" : {
"bid" : 1
},
"name" : "bid_1",
"ns" : "hive.regions"
},
{
"v" : 1,
"key" : {
"loc" : "2dsphere"
},
"name" : "loc_2dsphere",
"ns" : "hive.regions",
"2dsphereIndexVersion" : 2
}
]
Quick Query for a basic sort:
> db.regions.find().sort({"checkin_id":1}).limit(1).pretty();
{
"_id" : ObjectId("56645ce6e5bfa89d1f8b4567"),
"checkin_id" : 51548290,
"created_at" : ISODate("2013-10-29T04:15:43Z"),
"loc" : {
"type" : "Point",
"coordinates" : [
-117.236,
33.1557
]
},
"suburb" : "",
"state_district" : "",
"county" : "United States of America",
"state" : "California",
"vid" : 0,
"user_id" : 133661,
"bid" : 9288,
"item_id" : 0
}
I see with this query (using explain), I get the correct indexBounds results:
> db.regions.find().sort({"checkin_id":1}).limit(1).explain();
{
"cursor" : "BtreeCursor checkin_id_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"checkin_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "XXXX:27017",
"filterSet" : false
}
Any what I am missing here? Why isn't it using any indexes?

Why indexOnly attribute is false for this covered query

I have a test db with fields _id, name, age, date
Indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "blogger.users"
},
{
"v" : 1,
"key" : {
"name" : 1,
"age" : 1
},
"name" : "name_1_age_1",
"ns" : "blogger.users"
},
{
"v" : 1,
"key" : {
"age" : 1,
"name" : 1
},
"name" : "age_1_name_1",
"ns" : "blogger.users"
}
]
When running the following query:
> db.users.find({"name":"user10"},{"_id":0,"date":0})
.explain()
I get following:
{
"cursor" : "BtreeCursor name_1_age_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"name" : [
[
"user10",
"user10"
]
],
"age" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Johny-PC:27017",
"filterSet" : false
}
Without explain the result is:
{ "name" : "user10", "age" : 68 }
Even though this is a covered query with proper projections, the indexOnly field is still false. I have also tried explicitly providing index using hint, but no change. In that case values of nscannedObjectsAllPlans and nscannedAllPlans are 1 as the query doesnt try other indexes.

For a query to be "indexOnly" or "covered" the only fields returned must be contained in the index. So even though you have an index for "name_1_age_1", the query engine still expects to be "told" that the only fields you want are those in the index. It does not know this about the document until you inspect it:
db.users.find({"name":"user10"},{"_id":0, "name": 1, "age": 1 }).explain()
That will return "indexOnly" as the query engine knows that the selected index contains all of the fields that are required for output. As such there is no need to go back through the collection in case there are other fields to return.

why is mongodb hitting this index

Given that i have an index in my collection asd
> db.system.indexes.find().pretty()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "asd.test", "name" : "_id_" },
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1,
"c" : 1
},
"ns" : "asd.test",
"name" : "a_1_b_1_c_1"
}
As far as i know in theory the order of the parameters queried is important in order to hit an index...
That is why im wondering how and why im actually hitting the index with this query
> db.asd.find({c:{$gt: 5000},a:{$gt:5000}}).explain()
{
"cursor" : "BtreeCursor a_1_b_1_c_1",
"isMultiKey" : false,
"n" : 90183,
"nscannedObjects" : 90183,
"nscanned" : 94885,
"nscannedObjectsAllPlans" : 90288,
"nscannedAllPlans" : 94990,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 272,
"indexBounds" : {
"a" : [
[
5000,
1.7976931348623157e+308
]
],
"b" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"c" : [
[
5000,
1.7976931348623157e+308
]
]
}
}

Order in which you pass fields in your query does not affect index selection process. If it did, it'd be a very fragile system.
Order of fields in the index definition, on the other hand, is very important. Maybe you confuse these two cases.

Nested queries Date range

I have a project where I embeds date ranges in a document.
Something like the following:
{ "availabilities" : [
{ "start_date" : ISODate("2012-06-28T00:00:00Z"), "end_date" : ISODate("2012-10-03T00:00:00Z") },
{ "start_date" : ISODate("2012-10-08T00:00:00Z"), "end_date" : ISODate("2012-10-28T00:00:00Z") }]
}
What I need to do is find all the documents that are available during a certain period
I use a query like this one:
db.faces.find({"availabilities" : {"$elemMatch" : {"$and" : [{"start_date" : {"$lte" : ISODate('2012-10-01 00:00:00 UTC')}}, {"end_date" : {"$gte": ISODate('2012-10-07 00:00:00 UTC')}}]}}})
But it won't use my indexes:
{
"v" : 1,
"key" : {
"availabilities.start_date" : 1,
"availabilities.end_date" : 1
},
"ns" : "faces_development.faces",
"name" : "availabilities.start_date_1_availabilities.end_date_1"
}
When I do an explain on the query, the output for the indexBounds is quite strange and I don't understand it.
{
"cursor" : "BtreeCursor availabilities.start_date_1_availabilities.end_date_1",
"isMultiKey" : true,
"n" : 71725,
"nscannedObjects" : 143019,
"nscanned" : 143019,
"nscannedObjectsAllPlans" : 143221,
"nscannedAllPlans" : 143221,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2,
"nChunkSkips" : 0,
"millis" : 1608,
"indexBounds" : {
"availabilities.start_date" : [
[
true,
ISODate("2012-10-01T00:00:00Z")
]
],
"availabilities.end_date" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "foobar.local:27017"
}
Current version of mongoDB: MongoDB shell version: 2.2.0
How must I do to use indexes?
Trying to find related questions and bugs on mongodb without great success.

This will scan less of the index in 2.3: https://jira.mongodb.org/browse/SERVER-3104
Meanwhile, I suggest moving each availability into its own document, instead of having many in one array, for more efficient querying.

Mongo index not being used (simple one column query)

Explain of find query:
> db.datasources.find({nid: 19882}).explain();
{
"cursor" : "BtreeCursor nid_1",
"nscanned" : 10161684,
"nscannedObjects" : 10161684,
"n" : 10161684,
"millis" : 8988,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"nid" : [
[
19882,
19882
]
]
}
}
Here are the indexes for the collection:
> db.datasources.getIndexes()
[
{
"name" : "_id_",
"ns" : "rocdocs_dev.datasources",
"key" : {
"_id" : 1
}
},
{
"_id" : ObjectId("4edcd725c605da5f200000a2"),
"ns" : "rocdocs_dev.datasources",
"key" : {
"nid" : 1
},
"name" : "nid_1"
},
{
"v" : 1,
"key" : {
"is_indexed" : 1
},
"ns" : "rocdocs_dev.datasources",
"name" : "is_indexed_1"
}
]

This is using an index as noted by BtreeCursor If it werent, it would say BasicCursor
Though I do see that the query takes 9 seconds and scans what appears to be the entire collection.
Did you add this index after inserting those documents? Perhaps its not done building yet?
I would consider rebuilding the index
db.datasources.reIndex()