Mongodb geonear and aggregate very slow - mongodb

My current mongo version is 2.4.9 and collection have around 2.8millions rows. My query take super long to finish when using $geonear in the query.
Example of my collection
"loc" : {
"type" : "Point",
"coordinates" : [
100.46589473,
5.35149077
]
},
"email" : "abc#123.com"
loc index
{
"v" : 1,
"key" : {
"loc" : "2dsphere"
},
"ns" : "test.collect",
"name" : "loc_2dsphere",
"background" : true
}
Tested this query will took around 10 to 15minute to finish
db.getCollection('collect').aggregate(
[
{ '$match':
{'loc':
{'$geoNear':
{'$geometry':
{'type':'Point','coordinates':[101.6862,3.0829],'$maxDistance':10000}
}
}
}
},
{'$group':{'_id':'email', 'email':{'$last':'$email'},'loc':{'$last':'$loc'} }}
])
Below are explain result
{
"serverPipeline" : [
{
"query" : {
"loc" : {
"$geoNear" : {
"$geometry" : {
"type" : "Point",
"coordinates" : [
101.6862,
3.0829
],
"$maxDistance" : 10000
}
}
}
},
"projection" : {
"email" : 1,
"loc" : 1,
"_id" : 0
},
"cursor" : {
"cursor" : "S2NearCursor",
"isMultiKey" : true,
"n" : 111953,
"nscannedObjects" : 111953,
"nscanned" : 96677867,
"nscannedObjectsAllPlans" : 111953,
"nscannedAllPlans" : 96677867,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 183,
"nChunkSkips" : 0,
"millis" : 895678,
"indexBounds" : {},
"nscanned" : NumberLong(96677867),
"matchTested" : NumberLong(3472481),
"geoMatchTested" : NumberLong(3472481),
"numShells" : NumberLong(53),
"keyGeoSkip" : NumberLong(93205386),
"returnSkip" : NumberLong(20148837),
"btreeDups" : NumberLong(0),
"inAnnulusTested" : NumberLong(3472481),
"allPlans" : [
{
"cursor" : "S2NearCursor",
"n" : 111953,
"nscannedObjects" : 111953,
"nscanned" : 96677867,
"indexBounds" : {}
}
],
"server" : "xxx:xxx"
}
},
{
"$group" : {
"_id" : {
"$const" : "email"
},
"email" : {
"$last" : "$email"
},
"loc" : {
"$last" : "$loc"
}
}
}
],
"ok" : 1
}
Is my query inappropriate, anything else I can do to improve the speed??

Try to use $geoNear aggregation pipeline directly
https://docs.mongodb.com/v2.4/reference/operator/aggregation/geoNear/

To add to venkat's answer (using $geoNear in the aggregation pipeline directly, instead of using it under a $match):
Try reducing the $maxDistance number if you don't need all of the result.
$geoNear returns documents in a sorted order by distance. If you don't need the results in a sorted order, you can use $geoWithin instead: https://docs.mongodb.com/v2.4/reference/operator/query/geoWithin/
Also, MongoDB 2.4 was released in March 2013, and is not supported anymore. If possible, I would recommend you to upgrade to the latest version (currently 3.2.10). There are many bugfixes and performance improvements in the newer version.

$geoNear with aggregate framework works way faster than below(find query)
"$near": {
"$geometry": {
"type": "Point",
"coordinates": [lng,lat],
},
},
somewhere my mongodb also went to "no socket available" by the latter, but former worked great (<20ms) [tested also from golang + mgo drivers]

Related

MongoDB - Query not working in 2.6

I'm currently migrating our database from Mongo 2.4.9 to 2.6.4.
I have an odd situation where a query that is giving good results in 2.4 is returning no documents in 2.6.
The query in question:
var dbSearch = {
created: { $gte: new Date(1409815194808) },
geolocation: {
$geoWithin: {
$center: [ [ 4.895167900000001, 52.3702157 ], 0.1125 ]
}
}
};
on that collection are the following (relevant) indexes:
{ "v" : 1, "key" : { "created" : -1 }, "name" : "createdIndex", "ns" : "prod.search", "background" : true }
{ "v" : 1, "key" : { "geolocation" : "2d", "created" : -1 }, "name" : "geolocationCreatedIndex", "ns" : "prod.search" }
Running this query against Mongo 2.6 gives the following query-log:
{ created: { $gte: new Date(1409815194808) }, geolocation: { $geoWithin: { $center: [ [ 4.895167900000001, 52.3702157 ], 0.1125 ] } } }
planSummary: IXSCAN { created: -1 } ntoreturn:0 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:8196 nreturned:0 reslen:20 8ms
I'm firing this query to the database with NodeJS using the node-mongodb-native module.
Note that when i remove either one of the search fields (created or geolocation) the queries will produce the right results on 2.4 and 2.6. The combination (as posted above) gives no results on 2.6
Edits with extra requested information
Explain on mongo 2.4 query:
> db.search.find({created: { $gte: new Date(1409815194808) }, geolocation: {$geoWithin: {$center: [ [ 4.895167900000001, 52.3702157 ], 0.1125 ] } } }).explain()
{
"cursor" : "GeoBrowse-circle",
"isMultiKey" : false,
"n" : 321,
"nscannedObjects" : 321,
"nscanned" : 321,
"nscannedObjectsAllPlans" : 321,
"nscannedAllPlans" : 321,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 69,
"indexBounds" : {
"geolocation" : [ ]
},
"lookedAt" : NumberLong(8940),
"matchesPerfd" : NumberLong(8538),
"objectsLoaded" : NumberLong(8538),
"pointsLoaded" : NumberLong(0),
"pointsSavedForYield" : NumberLong(0),
"pointsChangedOnYield" : NumberLong(0),
"pointsRemovedOnYield" : NumberLong(0),
"server" : "ubmongo24.local:27017"
}
Explain on mongo 2.6 query:
> db.search.find({created: { $gte: new Date(1409815194808) }, geolocation: {$geoWithin: {$center: [ [ 4.895167900000001, 52.3702157 ], 0.1125 ] } } }).explain();
{
"cursor" : "BtreeCursor createdIndex",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 1403,
"nscanned" : 1403,
"nscannedObjectsAllPlans" : 2808,
"nscannedAllPlans" : 2808,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 21,
"nChunkSkips" : 0,
"millis" : 8,
"indexBounds" : {
"created" : [
[
ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ"),
ISODate("2014-09-04T07:19:54.808Z")
]
]
},
"server" : "ubmongo26.local:27017",
"filterSet" : false
}
Based on the explain output, the issue appears to be the index being chosen by the query optimizer (which was extensively overhauled in 2.6 - mostly for the better, but it does meant there are new edge cases). It is using the single field createdIndex rather than the compound index being used in 2.4
Try hinting the geolocationCreatedIndex index in 2.6 (.hint({ "geolocation" : "2d", "created" : -1 })) and see if that fixes your issues - it is choosing the createdIndex instead by default and hence not using the geo index at all.

MongoDB - Index Intersection with two multikey indexes

I have two arrays in my collection (one is an embedded document and the other one is just a simple collection of strings). A document for example:
{
"_id" : ObjectId("534fb7b4f9591329d5ea3d0c"),
"_class" : "discussion",
"title" : "A",
"owner" : "1",
"tags" : ["tag-1", "tag-2", "tag-3"],
"creation_time" : ISODate("2014-04-17T11:14:59.777Z"),
"modification_time" : ISODate("2014-04-17T11:14:59.777Z"),
"policies" : [
{
"participant_id" : "2",
"action" : "CREATE"
}, {
"participant_id" : "1",
"action" : "READ"
}
]
}
Since some of the queries will include only the policies and some will include the tags and the participants arrays, and considering the fact that I can't create an multikey indexe with two arrays, I thought that it will be a classic scenario to use the Index Intersection.
I'm executing a query , but I can't see the intersection kicks in.
Here are the indexes:
db.discussion.getIndexes()
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test-fw.discussion"
},
{
"v" : 1,
"key" : {
"tags" : 1,
"creation_time" : 1
},
"name" : "tags",
"ns" : "test-fw.discussion",
"dropDups" : false,
"background" : false
},
{
"v" : 1,
"key" : {
"policies.participant_id" : 1,
"policies.action" : 1
},
"name" : "policies",
"ns" : "test-fw.discussion"
}
Here is the query:
db.discussion.find({
"$and" : [
{ "tags" : { "$in" : [ "tag-1" , "tag-2" , "tag-3"] }},
{ "policies" : { "$elemMatch" : {
"$and" : [
{ "participant_id" : { "$in" : [
"participant-1",
"participant-2",
"participant-3"
]}},
{ "action" : "READ"}
]
}}}
]
})
.limit(20000).sort({ "creation_time" : 1 }).explain();
And here is the result of the explain:
"clauses" : [
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-1",
"tag-1"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-2",
"tag-2"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-3",
"tag-3"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 20000,
"nscannedObjects" : 30000,
"nscanned" : 30000,
"nscannedObjectsAllPlans" : 30203,
"nscannedAllPlans" : 30409,
"scanAndOrder" : false,
"nYields" : 471,
"nChunkSkips" : 0,
"millis" : 165,
"server" : "User-PC:27017",
"filterSet" : false
Each of the tags in the query (tag1, tag-2 and tag-3 ) have 10K documents.
Each of the policies ({participant-1,READ},{participant-2,READ},{participant-3,READ}) have 10K documents.
The AND operator results with 20K documents.
As I said earlier, I can't see why the intersection of the two indexes (I mean the policies and the tags indexes), doesn't kick in.
Can someone please shade some light on the thing that I'm missing?
There are two things that are actually important to your understanding of this.
The first point is that the query optimizer can only use one index when resolving the query plan and cannot use both of the indexes you have specified. As such it picks the one that is the best "fit" by it's own determination, unless you explicitly specify this with a hint. Intersection somewhat suits, but now for the next point:
The second point is documented in the limitations of compound indexes. This actually points out that even if you were to "try" to create a compound index that included both of the array fields you want, then you could not. The problem here is that as an array this introduces too many possibilities for the bounds keys, and a multi-key index already introduces a fair level of complexity when used in compound with a standard field.
The limitations on combining the two multi-key indexes is the main problem here, much as it is on creation, the complexity of "combining" the two produces two many permutations to make it a viable option.
It might just be the case that the policies index is actually going to be the better one to use for this type of search, and you could probably amend this by specifying that field in the query first:
db.discussion.find({
{
"policies" : { "$elemMatch" : {
"participant_id" : { "$in" : [
"participant-1",
"participant-2",
"participant-3"
]},
"action" : "READ"
}},
"tags" : { "$in" : [ "tag-1" , "tag-2" , "tag-3"] }
}
)
That is if that will select the smaller range of data, which it probably does. Otherwise use the hint modifier as mentioned earlier.
If that does not actually directly help results, it might be worth re-considering the schema to something that would not involve having those values in array fields or some other type of "meta" field that could be easily looked up with an index.
Also note in the edited form that all the wrapping $and statements should not be required as "and" is implicit in MongoDB queries. As a modifier it is only required if you want two different conditions on the same field.
After doing a little testing, I believe Mongo can, in fact, use two multikey indexes in an intersection. I created a collection with the following structure:
{
"_id" : ObjectId("54e129c90ab3dc0006000001"),
"bar" : [
"hgcmdflitt",
...
"nuzjqxmzot"
],
"foo" : [
"bxzvqzlpwy",
...
"xcwrwluxbd"
]
}
I created indexes on foo and bar and then ran the following query. Note the "true" passed in to explain. This enables verbose mode.
db.col.find({"bar":"hgcmdflitt", "foo":"bxzvqzlpwy"}).explain(true)
In the verbose results, you can find the "allPlans" section of the response, which will show you all of the query plans mongo considered.
"allPlans" : [
{
"cursor" : "BtreeCursor bar_1",
...
},
{
"cursor" : "BtreeCursor foo_1",
...
},
{
"cursor" : "Complex Plan"
...
}
]
If you see a plan with "cursor" : "Complex Plan" that means mongo considered using an index intersection. To find the reasons why mongo might not have decided to actually use that query plan, see this answer: Why doesn't MongoDB use index intersection?

MongoDB index not covering trivial query

I am working on a MongoDB application, and am having trouble with covered queries. After reworking many of my queries to perform better when paginating I found that my previously covered queries were no longer being covered by the index. I tried to distill the working set down as far as possible to isolate the issue but I'm still confused.
First, on a fresh (empty) collection, I inserted the following documents:
devdb> db.test.find()
{ "_id" : ObjectId("53157aa0dd2cab043ab92c14"), "metadata" : { "created_by" : "bcheng" } }
{ "_id" : ObjectId("53157aa6dd2cab043ab92c15"), "metadata" : { "created_by" : "albert" } }
{ "_id" : ObjectId("53157aaadd2cab043ab92c16"), "metadata" : { "created_by" : "zzzzzz" } }
{ "_id" : ObjectId("53157aaedd2cab043ab92c17"), "metadata" : { "created_by" : "thomas" } }
{ "_id" : ObjectId("53157ab9dd2cab043ab92c18"), "metadata" : { "created_by" : "bbbbbb" } }
Then, I created an index for the 'metadata.created_by' field:
devdb> db.test.getIndices()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "devdb.test",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"metadata.created_by" : 1
},
"ns" : "devdb.test",
"name" : "metadata.created_by_1"
}
]
Now, I tried to lookup a document by the field:
devdb> db.test.find({'metadata.created_by':'bcheng'},{'_id':0,'metadata.created_by':1}).sort({'metadata.created_by':1}).explain()
{
"cursor" : "BtreeCursor metadata.created_by_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"metadata.created_by" : [
[
"bcheng",
"bcheng"
]
]
},
"server" : "localhost:27017"
}
The correct index is being used and no extraneous documents are being scanned. Regardless of the presence of .hint(), limit(), or sort(), indexOnly remains false.
Digging through the documentation, I've seen that covered indices will fail to cover queries on array elements, but that isn't the case here (and isMultiKey shows false).
What am I missing? Are there other reasons for this behavior (eg. insuffient RAM, disk space, etc.)? And if so, how can I best diagnose these issues in the future?
It is currently not supported. See this Jira issue.

force use of index on complex MongoDB query?

i have a large collection of "messages" with 'to', 'from', 'type', and 'visible_to' fields that I want to query against with a fairly complex query that pulls only the messages to/from a particular user of a particular set of types that are visible to that user. Here is an actual example:
{
"$and": [
{
"$and": [
{
"$or": [
{
"to": "52f65f592f1d88ebcb00004f"
},
{
"from": "52f65f592f1d88ebcb00004f"
}
]
},
{
"$or": [
{
"type": "command"
},
{
"type": "image"
}
]
}
]
},
{
"$or": [
{
"public": true
},
{
"visible_to": "52f65f592f1d88ebcb00004f"
}
]
}
]
}
With indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"expires" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "expires_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"from" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "from_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"type" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "type_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"ts" : 1,
"type" : -1
},
"ns" : "n2-mongodb.messages",
"name" : "ts_1_type_-1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "to_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"visible_to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "visible_to_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"public" : 1,
"visible_to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "public_1_visible_to_1"
},
{
"v" : 1,
"key" : {
"to" : 1,
"from" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "to_1_from_1"
}
]
And here is the explain(true) output from our MongoDB 2.2.2 instance, which looks like a full scan:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 35702,
"nscanned" : 35702,
"nscannedObjectsAllPlans" : 35702,
"nscannedAllPlans" : 35702,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 85,
"indexBounds" : {
},
"allPlans" : [
{
"cursor" : "BasicCursor",
"n" : 0,
"nscannedObjects" : 35702,
"nscanned" : 35702,
"indexBounds" : {
}
}
],
"server" : "XXXXXXXX"
}
Looking at the explain output, MongoDB is not using any indexes for this - is there a way to get it to use at least the compound index {to: 1, from: 1} to dramatically narrow the search space? Or is there a better way to optimize this query? Or is MongoDB wholly unsuited for a query like this?
To force the MongoDB query optimizer to adopt a specific approach, you can use the $hint operator.
From the docs,
The $hint operator forces the query optimizer to use a specific index to fulfill the query. Specify the index either by the index name or by document.
The query optimizer in MongoDB 2.6 will include support for applying indexes to complex queries.

Nested queries Date range

I have a project where I embeds date ranges in a document.
Something like the following:
{ "availabilities" : [
{ "start_date" : ISODate("2012-06-28T00:00:00Z"), "end_date" : ISODate("2012-10-03T00:00:00Z") },
{ "start_date" : ISODate("2012-10-08T00:00:00Z"), "end_date" : ISODate("2012-10-28T00:00:00Z") }]
}
What I need to do is find all the documents that are available during a certain period
I use a query like this one:
db.faces.find({"availabilities" : {"$elemMatch" : {"$and" : [{"start_date" : {"$lte" : ISODate('2012-10-01 00:00:00 UTC')}}, {"end_date" : {"$gte": ISODate('2012-10-07 00:00:00 UTC')}}]}}})
But it won't use my indexes:
{
"v" : 1,
"key" : {
"availabilities.start_date" : 1,
"availabilities.end_date" : 1
},
"ns" : "faces_development.faces",
"name" : "availabilities.start_date_1_availabilities.end_date_1"
}
When I do an explain on the query, the output for the indexBounds is quite strange and I don't understand it.
{
"cursor" : "BtreeCursor availabilities.start_date_1_availabilities.end_date_1",
"isMultiKey" : true,
"n" : 71725,
"nscannedObjects" : 143019,
"nscanned" : 143019,
"nscannedObjectsAllPlans" : 143221,
"nscannedAllPlans" : 143221,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2,
"nChunkSkips" : 0,
"millis" : 1608,
"indexBounds" : {
"availabilities.start_date" : [
[
true,
ISODate("2012-10-01T00:00:00Z")
]
],
"availabilities.end_date" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "foobar.local:27017"
}
Current version of mongoDB: MongoDB shell version: 2.2.0
How must I do to use indexes?
Trying to find related questions and bugs on mongodb without great success.
This will scan less of the index in 2.3: https://jira.mongodb.org/browse/SERVER-3104
Meanwhile, I suggest moving each availability into its own document, instead of having many in one array, for more efficient querying.