mongodb query not using index - mongodb

I have a index:
{
"sourceName" : 1,
"addedDate" : 1,
"sourceKey" : 1,
"appKey" : 1,
}
But when I try to do
db.myCollection.find({and:[
{sourceName: "mySourceName"},
{addedDate: 1414878162405},
{sourceKey:"mySource Key"},
{appKey: "test"}]
}).explain()
It shows cursor is BasicCursor i.e it is not using the index:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 500,
"nscanned" : 500,
"nscannedObjectsAllPlans" : 500,
"nscannedAllPlans" : 500,
"scanAndOrder" : false,
"indexOnly" : false,
...
}
Can anyone please explain me why my query is not using defined index??

Your query object uses and instead of the $and operator so it's looking for an field named 'and' in your documents that contains your query values.
But you don't need to be using $and anyway, as multiple query terms are implicitly ANDed so you can just do:
db.myCollection.find({
sourceName: "mySourceName",
addedDate: 1414878162405,
sourceKey:"mySource Key",
appKey: "test"}
}).explain()
That should be able to use your index just fine.

Related

In MongoDB, how can I sort a 2d indexed $near query when there are over 100 records?

I have a query that uses $near to filter records down to a proximity. It is then supposed to be sorting the results by a separate field. However I'm running into a situation where records are missing even though they match the criteria.
I suspect this is due to the fact that using $near with 2d indexes has a 100 record limit. What I believe is happening is that the geospatial sort is occurring first and mine is only then being applied to the top 100 records of that result.
Is there anyway to overcome this behavior? Can I disregard the sort of $near and use my own as the primary sort or, alternatively, circumvent the 100 record limit so that my sort applies to the entire set?
Here is the explain() from the query I'm using:
db.properties.find({
loc: {
$near: [-80.173366, 34.07868],
$maxDistance: 5
}}).sort({mls: -1}).explain()
{
"cursor" : "GeoSearchCursor",
"isMultiKey" : false,
"n" : 100,
"nscannedObjects" : 211,
"nscanned" : 700,
"nscannedObjectsAllPlans" : 211,
"nscannedAllPlans" : 700,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 2,
"indexBounds" : {
},
"server" : "slate:27017",
"filterSet" : false
}
I ran into the same problem a while ago, you can use aggregate - $match. I have used the following Snippet at a hackaton.
db.kickstarter.aggregate(
{'$match' :
{geo2 :
{$geoWithin :
{ $centerSphere :[[parseFloat(lng), parseFloat(lat) ], radius/6371 ]
}
}
}
},
{$sort : {'pledged' : -1}},
{$limit : 1000}, //you can set your limit here
function(err, data){
if(err)console.log(err);
}
);

Incorrect items count with specific index usage

I'm using MongoDB, version 2.4.8 on windows server 2008 R2 and I have strange index behaviour which I can't explain. Here example of structure that I have in my collection:
{
"_id" : NUUID("67070100-4627-4aa5-8ab9-45624e5b82ad"),
"PropertyType" : "Cooperative",
"Address" : {
"Street" : "aaaaaaaaa",
"HouseNo" : "165",
"PostalCode" : 2860,
"City" : "bbbbb",
"Floor" : "1",
"DoorNumber" : ""
},
"Sales" : {
"Price" : 425000,
"Payout" : 0,
"AreaPrice" : 9042,
"GrossPrice" : 2340,
"NetPrice" : 800,
},
"WithdrawnFromSale" : true,
"UnitData" : {
"UnitType" : "aaaaa",
"Area" : 400,
"LivingArea" : 50,
"UnitArea" : 50,
"Rooms" : 2,
"BuildYear" : 1948,
"GroundArea" : 203,
"NoiseLevel" : 5
}
}
Also, I've created index for that collection:
db["UnitModel"].ensureIndex({ "Sales": 1, "PropertyType": 1, "UnitData.Rooms": 1, "UnitData.NoiseLevel": 1 })
The problem with that index is that I get wrong count of items when using this index.
When I issue this request:
db.UnitModel.find({Sales: {$ne: null}, WithdrawnFromSale: false}).explain({verbose: true})
I get following results:
{
"cursor" : "BtreeCursor Sales_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1 multi",
"isMultiKey" : false,
"n" : 19368,
"nscannedObjects" : 42875,
"nscanned" : 42876,
"nscannedObjectsAllPlans" : 43274,
"nscannedAllPlans" : 43276,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
Here we can see that index has been used, but the number of items returned is "n" : 19368. which is wrong.
It should be 70986 items in collection with that criteria.
Why am I sure that it should be more records? Well, here the code:
var totalCount = 0;
db.UnitModel.find({WithdrawnFromSale: false}).forEach(
function (e) {
if(e.hasOwnProperty('Sales') && e.Sales != null)
totalCount++;
}
)
totalCount;
totalCount = 70986
To be sure that query above do not use any indexes let's check it out:
db.UnitModel.find({WithdrawnFromSale: false}).explain({verbose: true})
And result:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 70986,
"nscannedObjects" : 3204212,
"nscanned" : 3204212,
"nscannedObjectsAllPlans" : 3204212,
"nscannedAllPlans" : 3204212,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
So, for UnitModel collection I'm using, for criteria: Sales: {$ne: null}, WithdrawnFromSale: false it should be 70986 records returned by mongo. But as you can see I get it wrong.
Can someone explain me why? What can be the reason?
BTW. When I drop that index and use following index:
db["UnitModel"].ensureIndex({ "WithdrawnFromSale": 1})
it works as expected. But I do not need that index, it's not optimzal for my case.
As at MongoDB 2.4, the maximum size of an indexed value is 1024 bytes. The current behaviour for a key too large to index is to log a warning on the server side -- but this does not throw an exception. In this case, documents with excessively long keys will not be included in the index when the key is too long, but will be included in other indexes. This can lead to inconsistencies in results such as incorrect counts and "missing documents" that cannot be found by one index but may be available in another index or with a $natural search.
In the MongoDB 2.5 development/unstable branch (which will culminate in the MongoDB 2.6 production release later this year) this behaviour has changed. As at MongoDB 2.5.5, an exception will now be raised if a insert/update includes an index update where the keys would be too large. See SERVER-5290 in the MongoDB issue tracker for more details.
Figure out what the reason of the issue. When I look in log files for monogodb I have seen tons of following messages:
HBReadModel.system.indexes Btree::insert: key too large to index, skipping HBReadModel.UnitModel.$Sales_1_WithdrawnFromSale_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1
I was trying to create index on sales field which in actually document and not field. To avoid this I just re-created index and specify field inside Sales document. Log is clear, query returns records as expected.

Efficiently sorting the results of a mongodb geospatial query

I have a very large collection of documents like:
{ loc: [10.32, 24.34], relevance: 0.434 }
and want to be able efficiently do a query like:
{ "loc": {"$geoWithin":{"$box":[[-103,10.1],[-80.43,30.232]]}} }
with arbitrary boxes.
Adding an 2d index on loc makes this very fast and efficient. However, I want to now also just get the most relevant documents:
.sort({ relevance: -1 })
Which causes everything to grind to a crawl (there can be huge amount of results in any particular box, and I just need the top 10 or so).
Any advise or help greatly appreciated!!
Have you tried using the aggregation framework?
A two stage pipeline might work:
a $match stage that uses your existing $geoWithin query.
a $sort stage that sorts by relevance: -1
Here's an example of what it might look like:
db.foo.aggregate(
{$match: { "loc": {"$geoWithin":{"$box":[[-103,10.1],[-80.43,30.232]]}} }},
{$sort: {relevance: -1}}
);
I'm not sure how it will perform. However, even if it's poor with MongoDB 2.4, it might be dramatically different in 2.6/2.5, as 2.6 will include improved aggregation sort performance.
When there is a huge result matching particular box, sort operation is really expensive so that you definitely want to avoid it.
Try creating separate index on relevance field and try using it (without 2d index at all): the query will be executed much more efficiently that way - documents (already sorted by relevance) will be scanned one by one matching the given geo box condition. When top 10 are found, you're good.
It might not be that fast if geo box matches only small subset of the collection, though. In worst case scenario it will need to scan through the whole collection.
I suggest you to create 2 indexes (loc vs. relevance) and run tests on queries which are common in your app (using mongo's hint to force using needed index).
Depending on your tests results, you may even want to add some app logic so that if you know the box is huge you can run the query with relevance index, otherwise use loc 2d index. Just a thought.
You cannot have the scan and order value as 0 when you trying to use to have sorting on the part of a compound key. Unfortunately currently there is no solution for your problem which is not related to the phenomenon that you are using a 2d index or else.
When you run an explain command on your query the value of "scanAndOrder" show weather it was needed to have a sorting phase after collecting the result or not.If it is true a sorting after the querying was needed, if it is false sorting was not needed.
To test the situation i created a collection called t2 in a sample db this way:
db.createCollection('t2')
db.t2.ensureIndex({a:1})
db.t2.ensureIndex({b:1})
db.t2.ensureIndex({a:1,b:1})
db.t2.ensureIndex({b:1,a:1})
for(var i=0;i++<200;){db.t2.insert({a:i,b:i+2})}
While you can use only 1 index to support a query i did the following test with the results included:
mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("b_1").explain()
{
"cursor" : "BtreeCursor b_1",
"isMultiKey" : false,
"n" : 150,
"nscannedObjects" : 200,
"nscanned" : 200,
"nscannedObjectsAllPlans" : 200,
"nscannedAllPlans" : 200,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"b" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "localhost:27418",
"millis" : 0
}
mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("a_1_b_1").explain()
{
"cursor" : "BtreeCursor a_1_b_1",
"isMultiKey" : false,
"n" : 150,
"nscannedObjects" : 150,
"nscanned" : 150,
"nscannedObjectsAllPlans" : 150,
"nscannedAllPlans" : 150,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1,
"indexBounds" : {
"a" : [
[
50,
1.7976931348623157e+308
]
],
"b" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "localhost:27418",
"millis" : 1
}
mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("a_1").explain()
{
"cursor" : "BtreeCursor a_1",
"isMultiKey" : false,
"n" : 150,
"nscannedObjects" : 150,
"nscanned" : 150,
"nscannedObjectsAllPlans" : 150,
"nscannedAllPlans" : 150,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1,
"indexBounds" : {
"a" : [
[
50,
1.7976931348623157e+308
]
]
},
"server" : "localhost:27418",
"millis" : 1
}
mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("b_1_a_1").explain()
{
"cursor" : "BtreeCursor b_1_a_1",
"isMultiKey" : false,
"n" : 150,
"nscannedObjects" : 150,
"nscanned" : 198,
"nscannedObjectsAllPlans" : 150,
"nscannedAllPlans" : 198,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"b" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"a" : [
[
50,
1.7976931348623157e+308
]
]
},
"server" : "localhost:27418",
"millis" : 0
}
The indexes on individual fields does not help much so a_1 (not support sorting) and b_1 (not support queryin) is out . The index on a_1_b_1 also not fortunate while it will perform worse than the single a_1, mongoDB engine will not utilize the situation that the part related to one 'a' value stored ordered this way. What is worth to try is a compound index b_1_a_1 which in your case relevance_1_loc_1 while it will return the results in ordered manner so scanAndOrder will be false and i have not tested for 2d index but i assume it will exclude scanning some documents based on just the index value (that is why in the test in that case the nscanned is higher than nscannedObjects). The index unfortunately will be huge but still smaller than the docs.
This solution is valid if you need to search inside a box(rectangle).
The problem with geospatial index is that you can only place it in the front of a Compound Index (at least it is so for mongo 3.2)
So I thought why not to create my own "geospatial" index? All I need is to create a Compound Index on Lat, Lgn (X,Y) and add the sort field at the first place. Then I'll need to implement the logic of searching inside the box boundaries and specifically instruct mongo to use it (hint).
Translating to your problem:
db.collection.createIndex({ "relevance": 1, "loc_x": 1, "loc_y": 1 }, { "background": true } )
Logic:
db.collection.find({
"loc_x": { "$gt": -103, "$lt": -80.43 },
"loc_y": { "$gt": 10.1, "$lt": 30.232 }
}).hint("relevance_1_loc_x_1_loc_y_1") // or whatever name you gave it
Use $gte and $lte if you need inclusive results.
And you don't need to use .sort() since it's already sorted, or you can do a reverse sort on relevance if you need.
The only issue that I encountered with it is when the box area is small. It took more time to find small areas than large ones. That is why I kept the geospatial index for small area searches.

Why explicit hint provides better performance?

I feel a bit confusing with how index works. If fill up database with documents with keys a, b, and c, each of which has random value (except c, it has incrementing value)
Here is python code I used:
from pymongo import MongoClient
from random import Random
r = Random()
client = MongoClient("server")
test_db = client.test
fubar_col = test_db.fubar
for i in range(100000):
doc = {'a': r.randint(10000, 99999), 'b': r.randint(100000, 999999), 'c': i}
fubar_col.insert(doc)
Then I create an index {c: 1}
Now, if I perform
>db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}, {a: 1, c: 1}).sort({c: -1}).explain()
I got
{
"cursor" : "BtreeCursor c_1 reverse",
"isMultiKey" : false,
"n" : 24668,
"nscannedObjects" : 100000,
"nscanned" : 100000,
"nscannedObjectsAllPlans" : 100869,
"nscannedAllPlans" : 100869,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 478,
"indexBounds" : {
"c" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "nuclight.org:27017"
}
See, mongodb uses c_1 index and it takes about 478 millisecond to perform. And if I specify which index I want to use ( via hint({c: 1}) ):
> db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}, {a: 1, c: 1}).sort({c: -1}).hint({c:1}).explain()
It takes only about 167 milliseconds. Why it happens?
Here is link to json dump of fubar collection fubar.tgz
p.s. I performed these queries several times and result are the same
explain forces MongoDB to re-evaluate all query plans. In a 'normal' query, the cached fastest query plan will be used. From the documentation (emphasis mine):
The explain() operation evaluates the set of query plans and reports
on the winning plan for the query. In normal operations the query
optimizer caches winning query plans and uses them for similar related
queries in the future. As a result MongoDB may sometimes select query
plans from the cache that are different from the plan displayed using
explain().
Unless you really need to iterate the entire result set for a typical query, you might want to include limit() in your query. In your particular example, using limit(100) will return a BasicCursor When using explain, not the index:
> db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}).sort({c: -1}).hint({c:1}).limit(100).explain();
{
"cursor" : "BtreeCursor c_1 reverse",
"n" : 100,
"nscanned" : 432,
"nscannedAllPlans" : 432,
"scanAndOrder" : false,
"millis" : 3,
"indexBounds" : {
"c" : [[{"$maxElement" : 1}, {"$minElement" : 1}]]
},
}
>
> db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}).sort({c: -1}).limit(100).explain();
{
"cursor" : "BasicCursor",
"n" : 100,
"nscanned" : 431,
"nscannedAllPlans" : 863,
"scanAndOrder" : true,
"millis" : 12,
"indexBounds" : { },
}
Note that this is a somewhat pathological case, because using the index doesn't help too much (compare nscanned).

MongoDB not using even the simplest index

Please look at the following example. It seems to me that the query should be covered by the index {a: 1}, however explain() gives me an indexOnly: false. What I am doing wrong?
> db.foo.save({a: 1, b: 2});
> db.foo.save({a: 2, b: 3});
> db.foo.ensureIndex({a: 1});
> db.foo.find({a: 1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 6,
"nscannedObjects" : 6,
"n" : 6,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
1,
1
]
]
}
}
Index only denotes a covered query ( http://docs.mongodb.org/manual/applications/indexes/#indexes-covered-queries ) whereby the query and its sort and data can all be found within a single index.
The problem with your query:
db.foo.find({a: 1}).explain();
Is that it must retrieve the full document which means it cannot find all data within the index. Instead you can use:
db.foo.find({a: 1}, {_id:0,a:1}).explain();
Which will mean you only project the a field which makes the entire query fit into the index, and so indexOnly being true.