force use of index on complex MongoDB query? - mongodb

i have a large collection of "messages" with 'to', 'from', 'type', and 'visible_to' fields that I want to query against with a fairly complex query that pulls only the messages to/from a particular user of a particular set of types that are visible to that user. Here is an actual example:
{
"$and": [
{
"$and": [
{
"$or": [
{
"to": "52f65f592f1d88ebcb00004f"
},
{
"from": "52f65f592f1d88ebcb00004f"
}
]
},
{
"$or": [
{
"type": "command"
},
{
"type": "image"
}
]
}
]
},
{
"$or": [
{
"public": true
},
{
"visible_to": "52f65f592f1d88ebcb00004f"
}
]
}
]
}
With indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"expires" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "expires_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"from" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "from_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"type" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "type_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"ts" : 1,
"type" : -1
},
"ns" : "n2-mongodb.messages",
"name" : "ts_1_type_-1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "to_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"visible_to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "visible_to_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"public" : 1,
"visible_to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "public_1_visible_to_1"
},
{
"v" : 1,
"key" : {
"to" : 1,
"from" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "to_1_from_1"
}
]
And here is the explain(true) output from our MongoDB 2.2.2 instance, which looks like a full scan:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 35702,
"nscanned" : 35702,
"nscannedObjectsAllPlans" : 35702,
"nscannedAllPlans" : 35702,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 85,
"indexBounds" : {
},
"allPlans" : [
{
"cursor" : "BasicCursor",
"n" : 0,
"nscannedObjects" : 35702,
"nscanned" : 35702,
"indexBounds" : {
}
}
],
"server" : "XXXXXXXX"
}
Looking at the explain output, MongoDB is not using any indexes for this - is there a way to get it to use at least the compound index {to: 1, from: 1} to dramatically narrow the search space? Or is there a better way to optimize this query? Or is MongoDB wholly unsuited for a query like this?

To force the MongoDB query optimizer to adopt a specific approach, you can use the $hint operator.
From the docs,
The $hint operator forces the query optimizer to use a specific index to fulfill the query. Specify the index either by the index name or by document.

The query optimizer in MongoDB 2.6 will include support for applying indexes to complex queries.

Related

MongoDB geospatial index on $center

Collection Schema
{
"_id" : ObjectId("5d3562bf1b48d90ea4b06a74"),
"name" : "19",
"location" : {
"type" : "Point",
"coordinates" : [
50.0480208,
30.5239127
]
}
}
Indexes
> db.places.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.places"
},
{
"v" : 2,
"key" : {
"location" : "2dsphere"
},
"name" : "location_2dsphere",
"ns" : "test.places",
"2dsphereIndexVersion" : 3
}
There is 2 milion documents is stored in collection.
First I ran query like this.
db.places.find({ location: {$geoWithin: { $center: [[60.0478308, 40.5237227], 10] } }})
But it takes 2 seconds. So I examine query via explain().
> db.places.find({ location: {$geoWithin: { $center: [[60.0478308, 40.5237227], 10] } }}).explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.places",
"indexFilterSet" : false,
"parsedQuery" : {
"location" : {
"$geoWithin" : {
"$center" : [
[
60.0478308,
40.5237227
],
10
]
}
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"location" : {
"$geoWithin" : {
"$center" : [
[
60.0478308,
40.5237227
],
10
]
}
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1414213,
"executionTimeMillis" : 2093,
"totalKeysExamined" : 0,
"totalDocsExamined" : 2000000,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"location" : {
"$geoWithin" : {
"$center" : [
[
60.0478308,
40.5237227
],
10
]
}
}
},
"nReturned" : 1414213,
"executionTimeMillisEstimate" : 1893,
"works" : 2000002,
"advanced" : 1414213,
"needTime" : 585788,
"needYield" : 0,
"saveState" : 15681,
"restoreState" : 15681,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 2000000
}
},
"serverInfo" : {
"host" : "Johnui-iMac",
"port" : 27017,
"version" : "4.0.3",
"gitVersion" : "7ea530946fa7880364d88c8d8b6026bbc9ffa48c"
},
"ok" : 1
}
You know that query stage is COLLSCAN.
I wonder that, I already created index for location fields, but it seems doesnt' work.
So I create more indexes.
"v" : 2,
"key" : {
"location.coordinates" : 1
},
"name" : "location.coordinates_1",
"ns" : "test.places"
},
{
"v" : 2,
"key" : {
"location" : 1
},
"name" : "location_1",
"ns" : "test.places"
}
But it doesn't work too.
Is there any issue on my index configuration?
You seem to have created a 2dsphere Index on your location, but the MongoDB docs on $centre specify that:
Only the 2d geospatial index supports $center.
Therefore, I suggest you create a 2d index on the location field and the scan will be performed using this index

What index to be added in MongoDB to support $elemMatch query on embedded document

Suppose we have a following document
{
embedded:[
{
email:"abc#abc.com",
active:true
},
{
email:"def#abc.com",
active:false
}]
}
What indexing should be used to support $elemMatch query on email and active field of embedded doc.
Update on question :-
db.foo.aggregate([{"$match":{"embedded":{"$elemMatch":{"email":"abc#abc.com","active":true}}}},{"$group":{_id:null,"total":{"$sum":1}}}],{explain:true});
on querying this i am getting following output of explain on aggregate :-
{
"stages" : [
{
"$cursor" : {
"query" : {
"embedded" : {
"$elemMatch" : {
"email" : "abc#abc.com",
"active" : true
}
}
},
"fields" : {
"_id" : 0,
"$noFieldsNeeded" : 1
},
"planError" : "InternalError No plan available to provide stats"
}
},
{
"$group" : {
"_id" : {
"$const" : null
},
"total" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
I think mongodb internally not using index for this query.
Thanx in advance :)
Update on output of db.foo.stats()
db.foo.stats()
{
"ns" : "test.foo",
"count" : 2,
"size" : 480,
"avgObjSize" : 240,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 3,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 24528,
"indexSizes" : {
"_id_" : 8176,
"embedded.email_1_embedded.active_1" : 8176,
"name_1" : 8176
},
"ok" : 1
}
db.foo.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.foo"
},
{
"v" : 1,
"key" : {
"embedded.email" : 1,
"embedded.active" : 1
},
"name" : "embedded.email_1_embedded.active_1",
"ns" : "test.foo"
},
{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.foo"
}
]
Should you decide to stick to that data model and your queries, here's how to create indexes that match the query:
You can simply index "embedded.email", or use a compound key of embedded indexes, i.e. something like
> db.foo.ensureIndex({"embedded.email" : 1 });
- or -
> db.foo.ensureIndex({"embedded.email" : 1, "embedded.active" : 1});
Indexing boolean fields is often not too useful, since their selectivity is low.

mongoDB does not combine 1d and 2d indexes, geo queries scans all documents irrespective of filters applied to limit the number of records

Below is the output from explain for one of the queries:
{
"cursor" : "GeoSearchCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : **199564**,
"nscanned" : 199564,
"nscannedObjectsAllPlans" : **199564**,
"nscannedAllPlans" : **199564**,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1234,
"indexBounds" : {
},
"server" : "MongoDB",
"filterSet" : false
}
This query scans all the 199564 records, where as constrains applied in the filter for the query, which should be around few hundred records only.
Pointers would be much appreciated
Adding the query and indexes applied:
Query
{
"isfeatured" : 1 ,
"status" : 1 ,
"isfesturedseq" : 1 ,
"loc_long_lat" : {
"$near" : [ 76.966438 , 11.114906]
} ,
"city_id" : "40" ,
"showTime.0" : { "$exists" : true}}
Indexes
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"loc_long_lat" : "2d"
},
"name" : "loc_long_lat_2d",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"georand" : "2d"
},
"name" : "georand_2d",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"city_id" : 1
},
"name" : "city_id_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"endDatetime" : 1
},
"name" : "endDatetime_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"movieid" : 1
},
"name" : "movieid_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"theaterid" : 1
},
"name" : "theaterid_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"status" : 1
},
"name" : "status_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfeatured" : 1
},
"name" : "isfeatured_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfesturedseq" : 1
},
"name" : "isfesturedseq_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"is_popular" : 1
},
"name" : "is_popular_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"loc_name" : 1
},
"name" : "loc_name_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"est_city_id" : 1
},
"name" : "est_city_id_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfeatured" : 1,
"status" : 1,
"city_id" : 1
},
"name" : "isfeatured_1_status_1_city_id_1",
"ns" : "test_live.movies_theater_map",
"background" : true
},
{
"v" : 1,
"key" : {
"movieid" : 1,
"endDatetime" : 1,
"city_id" : 1,
"status" : 1
},
"name" : "movieid_1_endDatetime_1_city_id_1_status_1",
"ns" : "test_live.movies_theater_map",
"background" : 2
},
{
"v" : 1,
"key" : {
"movieid" : 1,
"endDatetime" : 1,
"city_id" : 1,
"status" : 1,
"georand" : 1
},
"name" : "movieid_1_endDatetime_1_city_id_1_status_1_georand_1",
"ns" : "test_live.movies_theater_map",
"background" : 2
},
{
"v" : 1,
"key" : {
"rand" : 1
},
"name" : "rand_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfeatured" : 1,
"city_id" : 1,
"status" : 1
},
"name" : "isfeatured_1_city_id_1_status_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"movieid" : 1,
"city_id" : 1
},
"name" : "movieid_1_city_id_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"loc_long_lat" : 1,
"is_popular" : 1,
"movieid" : 1,
"status" : 1
},
"name" : "loc_long_lat_1_is_popular_1_movieid_1_status_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"status" : 1,
"city_id" : 1,
"theaterid" : 1,
"endDatetime" : 1
},
"name" : "status_1_city_id_1_theaterid_1_endDatetime_1",
"ns" : "test_live.movies_theater_map",
"background" : true
}
The $near operator uses a 2d or 2dsphere index to return documents in order from nearest to furthest. For a 2d index, a max of 100 documents are returned. Your query scanned every document because there were no matching documents and every document, from nearest to furthest, had to be scanned to check if it matched all the conditions.
I would suggest the following to improve the query:
Use the $maxDistance option, which is specified in radians for legacy coordinates, to limit the maximum number of documents scanned.
Use a 2dsphere index, ideally with GeoJSON points instead of legacy coordinates. You can have compound indexes with prefix keys to a geo index with a 2dsphere index, so you could index the query in part on all the other conditions to reduce the number of documents that need to be scanned. What version of MongoDB are you using? You may not have all of these features available with an old version.
Use limit to limit the maximum number of documents scanned. However, when the query has less results than the value of limit, you'll still scan every document.

Why indexOnly attribute is false for this covered query

I have a test db with fields _id, name, age, date
Indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "blogger.users"
},
{
"v" : 1,
"key" : {
"name" : 1,
"age" : 1
},
"name" : "name_1_age_1",
"ns" : "blogger.users"
},
{
"v" : 1,
"key" : {
"age" : 1,
"name" : 1
},
"name" : "age_1_name_1",
"ns" : "blogger.users"
}
]
When running the following query:
> db.users.find({"name":"user10"},{"_id":0,"date":0})
.explain()
I get following:
{
"cursor" : "BtreeCursor name_1_age_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"name" : [
[
"user10",
"user10"
]
],
"age" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Johny-PC:27017",
"filterSet" : false
}
Without explain the result is:
{ "name" : "user10", "age" : 68 }
Even though this is a covered query with proper projections, the indexOnly field is still false. I have also tried explicitly providing index using hint, but no change. In that case values of nscannedObjectsAllPlans and nscannedAllPlans are 1 as the query doesnt try other indexes.
For a query to be "indexOnly" or "covered" the only fields returned must be contained in the index. So even though you have an index for "name_1_age_1", the query engine still expects to be "told" that the only fields you want are those in the index. It does not know this about the document until you inspect it:
db.users.find({"name":"user10"},{"_id":0, "name": 1, "age": 1 }).explain()
That will return "indexOnly" as the query engine knows that the selected index contains all of the fields that are required for output. As such there is no need to go back through the collection in case there are other fields to return.

Mongo index not being used (simple one column query)

Explain of find query:
> db.datasources.find({nid: 19882}).explain();
{
"cursor" : "BtreeCursor nid_1",
"nscanned" : 10161684,
"nscannedObjects" : 10161684,
"n" : 10161684,
"millis" : 8988,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"nid" : [
[
19882,
19882
]
]
}
}
Here are the indexes for the collection:
> db.datasources.getIndexes()
[
{
"name" : "_id_",
"ns" : "rocdocs_dev.datasources",
"key" : {
"_id" : 1
}
},
{
"_id" : ObjectId("4edcd725c605da5f200000a2"),
"ns" : "rocdocs_dev.datasources",
"key" : {
"nid" : 1
},
"name" : "nid_1"
},
{
"v" : 1,
"key" : {
"is_indexed" : 1
},
"ns" : "rocdocs_dev.datasources",
"name" : "is_indexed_1"
}
]
This is using an index as noted by BtreeCursor If it werent, it would say BasicCursor
Though I do see that the query takes 9 seconds and scans what appears to be the entire collection.
Did you add this index after inserting those documents? Perhaps its not done building yet?
I would consider rebuilding the index
db.datasources.reIndex()