MongoDB - can not get a covered query - mongodb

So I have an empty database 'tests' and a collection named 'test'.
First I ensured that my index was set correctly.
db.test.ensureIndex({t:1})
db.test.getIndices()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "tests.test"
},
{
"v" : 1,
"key" : {
"t" : 1
},
"name" : "t_1",
"ns" : "tests.test"
}
]
After that I inserted some test records.
db.test.insert({t:1234})
db.test.insert({t:5678})
When I query the DB with following command and let Mongo explain the results I get the following output:
db.test.find({t:1234},{_id:0}).explain()
{
"cursor" : "BtreeCursor t_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"t" : [
[
1234,
1234
]
]
},
"server" : "XXXXXX:27017",
"filterSet" : false
}
Can anyone please explain to me why indexOnly is false?
Thanks in advance.

To be a covered index query you need to only retrieve those fields that are in the index:
> db.test.find({ t: 1234 },{ _id: 0, t: 1}).explain()
{
"cursor" : "BtreeCursor t_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 0,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"t" : [
[
1234,
1234
]
]
},
"server" : "ubuntu:27017",
"filterSet" : false
}
Essentially this means that only the index is used in order to retrieve the data, without the need to go back to the actual document and retrieve further information. This can be as many fields as you need ( within reason ), but they do need to be included within the index and the only fields that are returned.

Hmm the reason has not been clearly explained (confusing me actually) so here is my effort.
Essentially in order for MongoDB to know that said index covers the query it has to know what fields you want.
If you just say you don't want _id how can it know that * - _id = t without looking?
Here * represents all fields, like it does in SQL.
Answer is it cannot. That is why you need to provide the full field/select/projection/whatever word they use for it definition so that MongoDB can know that your return fits the index.

Related

MongoDB: degraded query performance

I have a user's collection in MongoDB with over 2.5 million of records which constitute to 30 GB. I have about 4 to 6 GB of indexes. It's in sharded environment with two shards, each consisting of replica set. Servers are dedicated especially to Mongo with no overhead. Total RAM is over 10 GB which more than enough for the kind of queries I am performing (shown below).
My concern is that despite of having indexes to the appropriate fields time to retrieve the result is huge (2 minutes to whopping 30 minutes), which is not acceptable. I am newbie to MongoDB & really in confused state as to why this is happening.
Sample schema is:
user:
{
_id: UUID (indexed by default),
name: string,
dob: ISODate,
addr: string,
createdAt: ISODate (indexed),
.
.
.,
transaction:[
{
firstTransaction: ISODate(indexed),
lastTransaction: ISODate(indexed),
amount: float,
product: string (indexed),
.
.
.
},...
],
other sub documents...
}
Sub document length varies from 0- 50 or so.
Queries which I performed are:
1) db.user.find().min({createdAt:ISODate("2014-12-01")}).max({createdAt:ISODate("2014-12-31")}).explain()
This query worked slow at first, but then was lightning fast(I guess because of warming up).
2) db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).explain()
This query took over 30 mins & warming up wasn't of help as every time the performance was same. It returned over half of the collection.
3) db.user.find({transaction:{$elemMatch:{product:'mobile'}}, firstTransaction:{$in:[ISODate("2015-01-01"),ISODate("2015-01-02")]}}}}).explain()
This is the main query which I want to be performant. But to my bad luck this query takes more than 30 mins to perform. I tried many versions of it such as this:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).min({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-01")}}}).max({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-02")}}}).explain()
This query gave me error:
planner returned error: unable to find relevant index for max/min
query & with hint():
planner returned error: hint provided does not work with min query
I used min max function because of the uncertainty of the range queries in MongoDB with $lt, $gt operators, which sometimes ignore either of the bound & end up scanning more documents than needed.
I used indexes such as:
db.user.ensureIndex({createdAt: 1})
db.user.ensureIndex({"transaction.firstTransaction":1})
db.user.ensureIndex({"transaction.lastTransaction":1})
db.user.ensureIndex({"transaction.product":1})
I tried to use compound indexing for the 3 query, which is:
db.user.ensureIndex({"transaction.firstTransaction":1, "transaction.product":1})
But this seems to give me no result. Query gets stuck & never returns the result. I mean it. NEVER. Like deadlocked. I don't know why. So I dropped this index & got the result after waiting for over half an hour (really frustrating).
Please help me out as I am really desperate to find out the solution & out of ideas.
This output might help:
Following is the output for:
query:
db.user.find({transaction:{$elemMatch:{product:"mobile", firstTransaction:{$gte:ISODate("2015-01-01"), $lt:ISODate("2015-01-02")}}}}).hint("transaction.firstTransaction_1_transaction.product_1").explain()
output:
{
"clusteredType" : "ParallelSort",
"shards" : {
"test0/mrs00.test.com:27017,mrs01.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1_transaction.firstTransaction_1",
"isMultiKey" : true,
"n" : 622,
"nscannedObjects" : 350931,
"nscanned" : 352000,
"nscannedObjectsAllPlans" : 350931,
"nscannedAllPlans" : 352000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 119503,
"nChunkSkips" : 0,
"millis" : 375693,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
],
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
]
},
"server" : "ip-12-0-0-31:27017",
"filterSet" : false
}
],
"test1/mrs10.test.com:27017,mrs11.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1_transaction.firstTransaction_1",
"isMultiKey" : true,
"n" : 547,
"nscannedObjects" : 350984,
"nscanned" : 352028,
"nscannedObjectsAllPlans" : 350984,
"nscannedAllPlans" : 352028,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 132669,
"nChunkSkips" : 0,
"millis" : 891898,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
],
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
]
},
"server" : "ip-12-0-0-34:27017",
"filterSet" : false
}
]
},
"cursor" : "BtreeCursor transaction.product_1_transaction.firstTransaction_1",
"n" : 1169,
"nChunkSkips" : 0,
"nYields" : 252172,
"nscanned" : 704028,
"nscannedAllPlans" : 704028,
"nscannedObjects" : 701915,
"nscannedObjectsAllPlans" : 701915,
"millisShardTotal" : 1267591,
"millisShardAvg" : 633795,
"numQueries" : 2,
"numShards" : 2,
"millis" : 891910
}
Query:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).explain()
Output:
{
"clusteredType" : "ParallelSort",
"shards" : {
"test0/mrs00.test.com:27017,mrs01.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1",
"isMultiKey" : true,
"n" : 553072,
"nscannedObjects" : 553072,
"nscanned" : 553072,
"nscannedObjectsAllPlans" : 553072,
"nscannedAllPlans" : 553072,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 164888,
"nChunkSkips" : 0,
"millis" : 337909,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
"server" : "ip-12-0-0-31:27017",
"filterSet" : false
}
],
"test1/mrs10.test.com:27017,mrs11.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1",
"isMultiKey" : true,
"n" : 554176,
"nscannedObjects" : 554176,
"nscanned" : 554176,
"nscannedObjectsAllPlans" : 554176,
"nscannedAllPlans" : 554176,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 107496,
"nChunkSkips" : 0,
"millis" : 327928,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
"server" : "ip-12-0-0-34:27017",
"filterSet" : false
}
]
},
"cursor" : "BtreeCursor transaction.product_1",
"n" : 1107248,
"nChunkSkips" : 0,
"nYields" : 272384,
"nscanned" : 1107248,
"nscannedAllPlans" : 1107248,
"nscannedObjects" : 1107248,
"nscannedObjectsAllPlans" : 1107248,
"millisShardTotal" : 665837,
"millisShardAvg" : 332918,
"numQueries" : 2,
"numShards" : 2,
"millis" : 337952
}
Please let me know if I have missed any of the details.
Thanks.
1st: Your queries are overly complicated. Using $elemMatch way too often.
2nd: if you can include your shard key in the query it will drastically improve speed.
I'm going to optimize your queries for you:
db.user.find({
createdAt: {
$gte: ISODate("2014-12-01"),
$lte: ISODate("2014-12-31")
}
}).explain()
db.user.find({
'transaction.product':'mobile'
}).explain()
db.user.find({
'transaction.product':'mobile',
firstTransaction:{
$in:[
ISODate("2015-01-01"),
ISODate("2015-01-02")
]
}
}).explain()
Bottom line is this: include your shard key each time is a time saver.
It might even save time to loop through your shard keys and make the same query multiple times.
Reason for performance degradation was the large working set. For some queries (mainly range queries) the set exceeded physical limit & page faults occurred. Due to this performance got degraded.
One solution I did was to apply some filters for the query which will limit the result set & tried to perform equality check instead of the range (iterating over range).
Those tweaks worked for me. Hope it helps others too.

MongoDB object field and range query index

I have the following structure in the database:
{
"_id" : {
"user" : 14197,
"date" : ISODate("2014-10-24T00:00:00.000Z")
},
...
}
I have a performance problem when I try to select data by user & date-range. Monogo doesn't use index & runs full-scan over collection.
db.timeuse.daily.find({ "_id.user": 289006, "_id.date" : {$gt: ISODate("2014-10-23T00:00:00Z"), $lte: ISODate("2014-10-30T00:00:00Z")}}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 6,
"nscannedObjects" : 66967,
"nscanned" : 66967,
"nscannedObjectsAllPlans" : 66967,
"nscannedAllPlans" : 66967,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 523,
"nChunkSkips" : 0,
"millis" : 1392,
"server" : "mongo-shard0003:27018",
"filterSet" : false,
"stats" : {
"type" : "COLLSCAN",
"works" : 66969,
"yields" : 523,
"unyields" : 523,
"invalidates" : 16,
"advanced" : 6,
"needTime" : 66962,
"needFetch" : 0,
"isEOF" : 1,
"docsTested" : 66967,
"children" : [ ]
},
"millis" : 1392
}
So far I found only one way - use $in.
db.timeuse.daily.find({"_id": { $in: [
{"user": 289006, "date": ISODate("2014-10-23T00:00:00Z")},
{"user": 289006, "date": ISODate("2014-10-24T00:00:00Z")}
]}}).explain()
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
{
"user" : 289006,
"date" : ISODate("2014-10-23T00:00:00Z")
},
{
"user" : 289006,
"date" : ISODate("2014-10-23T00:00:00Z")
}
],
[
{
"user" : 289006,
"date" : ISODate("2014-10-24T00:00:00Z")
},
{
"user" : 289006,
"date" : ISODate("2014-10-24T00:00:00Z")
}
]
]
},
If there's a more elegant way to run this kind of query?
TL;DR: Don't put your data in the _id field and use a compound index: db.timeuse.daily.ensureIndex( { "user" : 1, "date": 1 } ).
Explanation:
You're abusing the _id key convention, or more precisely the fact that MongoDB can index entire objects. What you want to achieve requires index intersection or a compound index, that is, either two separate indexes that can be combined (that feature is called index intersection and by now, it should be available in MongoDB, but it has limitations) or a special index for the set of keys which in MongoDB is called a compound index.
The _id field is indexed by default, but it's indexed as a whole, i.e. the _id index with only support equality queries on the entire object, rather than parts of the object. That also explains why the $in query works.
In general, that data structure with the default index will behave oddly. Consider this:
> db.sort.insert({"_id" : {"name" : "foo", value : 1} });
> db.sort.insert({"_id" : {"name" : "foo", value : 1, bla : "foo"} });
> db.sort.find();
{ "_id" : { "name" : "foo", "value" : 4343 } }
{ "_id" : { "name" : "foo", "value" : 4343, "bla" : "fooffo" } }
> db.sort.find({"_id" : {"name" : "foo", value : 4343} });
{ "_id" : { "name" : "foo", "value" : 4343 } }
// no second result here...
Imagine MongoDB basically hashed the entire object and was simply looking for the object hash - such an index can't support range queries based on some part of the hash.

MongoDB: Compound geospatial & ascending index issues

I have a compund index consisting of a simple ascending index and a geospatial index:
{ v: 1, key: { PlayerSortMask: 1, RandomGeoIdentifier: "2dsphere" }, ns: "JellyDev.Players", name: "Sort Mask + Random Geo ID", min: 0, max: 1 }
Now I have the following 2 problems:
1.
When I try to use the prefix index (querying only on the 1st index), I get a basic cursor used, and not the index I've created:
Query used:
{ "PlayerSortMask" : 2 }
Explain returned:
{ "cursor" : "BasicCursor", "isMultiKey" : false, "n" : 1, "nscannedObjects" : 1, "nscanned" : 1, "nscannedObjectsAllPlans" : 1, "nscannedAllPlans" : 1, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { }, "allPlans" : [{ "cursor" : "BasicCursor", "n" : 1, "nscannedObjects" : 1, "nscanned" : 1, "indexBounds" : { } }], "server" : "widmore:10010" }
2.
Not sure if this is a problem or not, but when I query using both fields, using $eq & $near, I get the following explain:
{ "cursor" : "S2NearCursor", "isMultiKey" : true, "n" : 1, "nscannedObjects" : 1, "nscanned" : 6, "nscannedObjectsAllPlans" : 1, "nscannedAllPlans" : 6, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { }, "nscanned" : NumberLong(6), "matchTested" : NumberLong(1), "geoMatchTested" : NumberLong(1), "numShells" : NumberLong(3), "keyGeoSkip" : NumberLong(5), "returnSkip" : NumberLong(0), "btreeDups" : NumberLong(0), "inAnnulusTested" : NumberLong(1), "allPlans" : [{ "cursor" : "S2NearCursor", "n" : 1, "nscannedObjects" : 1, "nscanned" : 6, "indexBounds" : { } }], "server" : "widmore:10010" }
And this is the query used to fetch the result:
{ "PlayerSortMask" : 2, "RandomGeoIdentifier" : { "$near" : { "$geometry" : { "type" : "Point", "coordinates" : [0.88434365572610107, 0.90583264916475525] } } } }
Now it says it uses the S2NearCursor, but it's obviously not the index I've created - as it has the name Sort Mask + Random Geo ID.
Any help would be greatly appreciated.
For problem 1 there's a known issue in MongoDB with compound geo indexes.
https://jira.mongodb.org/browse/SERVER-9257
The problem is fixed in 2.5.4 which is a beta release.
You can workaround this for now by creating an additional simple index on PlayerSortMask.
For problem 2, S2NearCursor means an index is being used. I think the explain "loses" the name and this is a known issue, but I can't remember the bug number.

Mongodb indexing

I have a query
db.messages.find({'headers.Date':{'$gt': new Date(2001,3,1)}},{'headers.From':1, _id:0}).sort({'headers.From':1})
I have set headers.From as index. Now which part of query will use this index ? i.e find part of query or sort part of query?
Explain output is
{
"cursor" : "BtreeCursor headers.From_1",
"isMultiKey" : false,
"n" : 83057,
"nscannedObjects" : 120477,
"nscanned" : 120477,
"nscannedObjectsAllPlans" : 120581,
"nscannedAllPlans" : 120581,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 250,
"indexBounds" : {
"headers.From" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Andrews-iMac.local:27017"
}
Any help is appreciated !!!
The index is being used for the sort part, not for the query, as your query doesn't use the headers.From field and your sort does.

Improve querying fields exist in MongoDB

I'm in progress with estimation of MongoDB for our customers. Per requirements we need associate with some entity ent variable set of name-value pairs.
db.ent.insert({'a':5775, 'b':'b1'})
db.ent.insert({'c':'its a c', 'b':'b2'})
db.ent.insert({'a':7557, 'c':'its a c'})
After this I need intensively query ent for presence of fields:
db.ent.find({'a':{$exists:true}})
db.ent.find({'c':{$exists:false}})
Per MongoDB docs:
$exists is not very efficient even with an index, and esp. with {$exists:true} since it will effectively have to scan all indexed values.
Can experts there provide more efficient way (even with shift the paradigm) to deal fast with vary name-value pairs
You can redesign your schema like this:
{
pairs:[
{k: "a", v: 5775},
{k: "b", v: "b1"},
]
}
Then you indexing your key:
db.people.ensureIndex({"pairs.k" : 1})
After this you will able to search by exact match:
db.ent.find({'pairs.k':"a"})
In case you go with Sparse index and your current schema, proposed by #WesFreeman, you will need to create an index on each key you want to search. It can affect write performance or will be not acceptable if your keys are not static.
Simply redesign your schema such that it's an indexable query. Your use case is infact analogous to the first example application given in MongoDB The Definitive Guide.
If you want/need the convenience of result.a just store the keys somewhere indexable.
instead of the existing:
db.ent.insert({a:5775, b:'b1'})
do
db.ent.insert({a:5775, b:'b1', index: ['a', 'b']})
That's then an indexable query:
db.end.find({index: "a"}).explain()
{
"cursor" : "BtreeCursor index_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"index" : [
[
"a",
"a"
]
]
}
}
or if you're ever likely to query also by value:
db.ent.insert({
a:5775,
b:'b1',
index: [
{name: 'a', value: 5775},
{name: 'b', value: 'b1'}
]
})
That's also an indexable query:
db.end.find({"index.name": "a"}).explain()
{
"cursor" : "BtreeCursor index.name_",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"index.name" : [
[
"a",
"a"
]
]
}
}
I think a sparse index is the answer to this, although you'll need an index for each field. http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes
Sparse indexes should help with $exists:true queries.
Even still, if your field is not really sparse (meaning it's mostly set), it's not going to help you that much.
Update I guess I'm wrong. Looks like there's an open issue ( https://jira.mongodb.org/browse/SERVER-4187 ) still that $exists doesn't use sparse indexes. However, you can do something like this with find and sort, which looks like it properly uses the sparse index:
db.ent.find({}).sort({a:1});
Here's a full demonstration of the difference, using your example values:
> db.ent.insert({'a':5775, 'b':'b1'})
> db.ent.insert({'c':'its a c', 'b':'b2'})
> db.ent.insert({'a':7557, 'c':'its a c'})
> db.ent.ensureIndex({a:1},{sparse:true});
Note that find({}).sort({a:1}) uses the index (BtreeCursor):
> db.ent.find({}).sort({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
And find({a:{$exists:true}}) does a full scan:
> db.ent.find({a:{$exists:true}}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Looks like you can also use .hint({a:1}) to force it to use the index.
> db.ent.find().hint({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
How about setting the non-exists field to null? Then you can query them with {field: {$ne: null}}.
db.ent.insert({'a':5775, 'b':'b1', 'c': null})
db.ent.insert({'a': null, 'b':'b2', 'c':'its a c'})
db.ent.insert({'a':7557, 'b': null, 'c':'its a c'})
db.ent.ensureIndex({"a" : 1})
db.ent.ensureIndex({"b" : 1})
db.ent.ensureIndex({"c" : 1})
db.ent.find({'a':{$ne: null}}).explain()
Here's the output:
{
"cursor" : "BtreeCursor a_1 multi",
"isMultiKey" : false,
"n" : 4,
"nscannedObjects" : 4,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
null
],
[
null,
{
"$maxElement" : 1
}
]
]
},
"server" : "my-laptop"
}