MongoDB not using compound index on '_id' - mongodb

I have a collection in MongoDB which has following documents.
/* 0 */
{
"T" : [
374135056604448742
],
"_id" : {
"#" : 7778532275691,
"ts" : ISODate("2013-07-26T02:25:00Z")
}
}
/* 1 */
{
"T" : [
1056188940167152853
],
"_id" : {
"#" : 34103385525388,
"ts" : ISODate("2013-07-30T03:00:00Z")
}
}
/* 2 */
{
"T" : [
1056188940167152853
],
"_id" : {
"#" : 34103385525388,
"ts" : ISODate("2013-07-30T03:18:00Z")
}
}
Now, I'm trying to query some documents with following query.
db.entries.find({
'_id.ts': {'$gte': beginTS, '$lte': endTS},
'_id.#' : 884327843395156951
}).hint([('_id', 1)]).explain()
According to my understanding, since _id is a compound field, and Mongo always maintains a index on _id, hence to answer above query, Mongo should have used the index on '_id'. However, the answer to the above query is as following:
{u'allPlans': [{u'cursor': u'BtreeCursor _id_',
u'indexBounds': {u'_id': [[{u'$minElement': 1}, {u'$maxElement': 1}]]},
u'n': 2803,
u'nscanned': 4869528,
u'nscannedObjects': 4869528}],
u'cursor': u'BtreeCursor _id_',
u'indexBounds': {u'_id': [[{u'$minElement': 1}, {u'$maxElement': 1}]]},
u'indexOnly': False,
u'isMultiKey': False,
u'millis': 128415,
u'n': 2803,
u'nChunkSkips': 0,
u'nYields': 132,
u'nscanned': 4869528,
u'nscannedAllPlans': 4869528,
u'nscannedObjects': 4869528,
u'nscannedObjectsAllPlans': 4869528,
u'scanAndOrder': False,
As it can be observed, MongoDB is doing an entire scan of DB to find just handful of documents. I don't know what the hell is wrong here.
I tried changing the order of query, but same result. I have no idea what is happening here. Any help if deeply appreciated.
UPDATE
I understood the nuance here. The _id is not a compound index, it's a mere exact index. This means that if _id is a document then irrespective of the structure of document and how many nested attrs or sub-documents it may have, the _id index will only contain one entry for the _id field. This entry is suppose to be hash of _id document and will be maintained unique.

You are using an object as a key, but you're not using a compund index here.
The _id index is a bit special, because it is created automatically and is always unique. Normally, the _id index is an ObjectId, a UUID or maybe an integer or a string that contains some kind of hash. MongoDB supports complex objects as keys. However, to MongoDB, this is still just a document. It can be compared to other documents, and documents that have the same fields and values will be equal. But since you didn't create the index keys (and you can't create that index manually), MongoDB has no idea that it contains a field # and a field ts.
A compound index, on the other hand, refers to the fields of a document explicitly, e.g. {"product.quantity" : 1, "product.created" : -1}. This must be specified when the index is created.
It seems you're trying to basically store a timestamp in your primary key. MongoDB's ObjectId already contains a timestamp, so you can do date-based range queries on ObjectIds directly.

Related

mongodb find documents with fields with a data doesn't exist

I have mongodb document like
{
"_id" : ObjectId("543d563bde1e58511c264340"),
...some fields ...
"pref" : [
{
"user_id" : 1,
"value" : 0.56
}
]
}
How can I find all the documents where pref does not contain an entry with user_id :1 ?
It's a little unclear what you're looking for here. If you want to find all entries where user_id has any other value than '1', then you'd want:
db.collection.find({"pref.user_id": {'$ne': 1}})
If you're looking for documents where the 'user_id' field doesn't exist at all:
db.collection.find({"pref.user_id": {'$exists': 0}})
Keep in mind, though the behavior of both of these queries on a nested array. What you're actually going to get is all the documents where any of the objects in the 'pref' array matches the specified condition.

Does query order affect compound index usage

MongoDB compound indexes support queries on any prefix of the index fields, but does the field order in the query must match the order in the compound index itself?
Assuming we have the following index:
{ "item": 1, "location": 1, "stock": 1 }
Does it cover this query:
{"location" : "Antarctica", "item" : "Hamster Wheel"}
Yes. The order/sequence of the fields in the index creation matters.
In your examples above all the queries that filter on "item" may use the index, but queries that do not use the "item" field and use "location" and or "stock" as your filter condition will NOT use this index.
The sequence of the fields in the filter in the "read" query does NOT matter. MongoDB is smart enough to know that
{"location" : "Antarctica", "item" : "Hamster Wheel"}
is the same as
{"item" : "Hamster Wheel", "location" : "Antarctica"}
As others have pointed out, the best way to ensure that your query is using the index, is to run an explain on your query http://bit.ly/1oE6zo1

Index for sorting while using $in to query field containing an array

I'm querying an array using $in operator and I'm also trying to sort results, but I keep getting this error:
too much data for sort() with no index. add an index or specify a
smaller limit
I know that the sort is limited to 32 megabytes or you have to use an index. The problem is that I have a compound index on field that I'm querying and on field that I'm sorting on.
The collection:
{
"a" : [ 1, 5, 7, 10 ],
... // other fields are not relevant for querying
}
The query looks like this:
db.mycol.find({ a: { $in : [ 1, 10, 19, 100, 2000 ] }}).sort({b : 1});
The $in query contains approx. 2000 IDs to match.
The index is
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"ns" : "db.mycol",
"name" : "a_1_b_1",
"background" : true
},
If I use explain() when doing the query without sort() I can see that MongoDB is using that index to perform the query, but it obviously cannot use that same index to perform the sort. I also tried to use a skip and limit, but if I use a skip that's too big I get the same error, probably because index is not used for sorting.
If i create an index only on field b MongoDB will happily sort the data for me. But what I really want is to perform a search on indexed array field and sort the data.
I looked at the documentation but I couldn't find anything helpful. Did I encounter a bug in MongoDB or I'm doing something wrong?

Handling optional/empty data in MongoDB

I remember reading somewhere that the mongo engine was more confortable when the entire structure of a document was already in place in case of an update, so here is the question.
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
{
_id: ObjectId("5192b6072fda974610000005"),
description: ""
}
or
{
_id: ObjectId("5192b6072fda974610000005"),
description: null
}
or
{
_id: ObjectId("5192b6072fda974610000005")
}
You have to remember that the description field may or may not be filled in every document (based on user input).
Introduction
If a document doesn't have a value, the DB considers its value to be null. Suppose a database with the following documents:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
If you create a query to find documents with the field desc different than null, you will get just one document:
db.test.find({desc: {$ne: null}})
// Output:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The database doesn't differ documents without a desc field and documents with a desc field with the value null. One more test:
db.test.find({desc: null})
// Output:
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
But the differences are only ignored in the queries, because, as shown in the last example above, the fields are still saved on disk and you'll receive documents with the same structure of the documents that were sent to the MongoDB.
Question
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
There isn't much difference from {desc: null} to {}, because most of the operators will have the same result. You should only pay special attention to these two operators:
$exists
$type
I'd save documents without the desc field, because the operators will continue to work as expected and I'd save some space.
Padding factor
If you know the documents in your database grow frequently, then MongoDB might need to move the documents during the update, because there isn't enough space in the previous document place. To prevent moving documents around, MongoDB allocates extra space for each document.
The ammount of extra space allocated by MongoDB per document is controlled by the padding factor. You cannot (and don't need to) choose the padding factor, because MongoDB will adaptively learn it, but you can help MongoDB preallocating internal space for each document by filling the possible future fields with null values. The difference is very small (depending on your application) and might be even smaller after MongoDB learn the best padding factor.
Sparse indexes
This section isn't too important to your specific problem right now, but may help you when you face similar problems.
If you create an unique index on field desc, then you wouldn't be able to save more than one document with the same value and in the previous database, we had more than one document with same value on field desc. Let's try to create an unique index in the previous presented database and see what error we get:
db.test.ensureIndex({desc: 1}, {unique: true})
// Output:
{
"err" : "E11000 duplicate key error index: test.test.$desc_1 dup key: { : null }",
"code" : 11000,
"n" : 0,
"connectionId" : 3,
"ok" : 1
}
If we want to be able to create an unique index on some field and let some documents have this field empty, we should create a sparse index. Let's try to create the unique index again:
// No errors this time:
db.test.ensureIndex({desc: 1}, {unique: true, sparse: true})
So far, so good, but why am I explaining all this? Because there is a obscure behaviour about sparse indexes. In the following query, we expect to have ALL documents sorted by desc.
db.test.find().sort({desc: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The result seems weird. What happened to the missing document? Let's try the query without sorting it:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
All documents were returned this time. What's happening? It's simple, but not so obvious. When we sort the result by desc, we use the sparse index created previously and there is no entries for the documents that haven't the desc field. The following query show us the use of the index to sort the result:
db.test.find().sort({desc: 1}).explain().cursor
// Output:
"BtreeCursor desc_1"
We can skip the index using a hint:
db.test.find().sort({desc: 1}).hint({$natural: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
Summary
Sparse unique indexes don't work if you include {desc: null}
Sparse unique indexes don't work if you include {desc: ""}
Sparse indexes might change the result of a query
There is little difference between the null value field and a document without the field. The main difference is that the former consumes a little disk space, while the latter does not consume at all. They can be distinguished by using $exists operator.
The field with an empty string is quite different from them. Though it depends on purpose I don't recommend to use it as a replacement for null. To be precise, they should be used to mean different things. For instance, think about voting. A person who cast a blank ballot is different from a person who wasn't permitted to vote. The former vote is an empty String, while the latter vote is null.
There is already a similar question here.

matching fields internally in mongodb

I am having following document in mongodb
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"students" : [
{
"id" : 23,
"class" : "a"
},
{
"id" : 55,
"class" : "b"
}
]
}
{
"_id" : ObjectId("517b9d05254e385a07fc4e71"),
"studentId" : 55,
"students" : [
{
"id" : 33,
"class" : "c"
}
]
}
Note: Not an actual data but schema is exactly same.
Requirement: Finding the document which matches the studentId and students.id(id inside the students array using single query.
I have tried the code like below
db.data.aggregate({$match:{"students.id":"$studentId"}},{$group:{_id:"$student"}});
Result: Empty Array, If i replace {"students.id":"$studentId"} to {"students.id":33} it is returning the second document in the above shown json.
Is it possible to get the documents for this scenario using single query?
If possible, I'd suggest that you set the condition while storing the data so that you can do a quick truth check (isInStudentsList). It would be super fast to do that type of query.
Otherwise, there is a relatively complex way of using the Aggregation framework pipeline to do what you want in a single query:
db.students.aggregate(
{$project:
{studentId: 1, studentIdComp: "$students.id"}},
{$unwind: "$studentIdComp"},
{$project : { studentId : 1,
isStudentEqual: { $eq : [ "$studentId", "$studentIdComp" ] }}},
{$match: {isStudentEqual: true}})
Given your input example the output would be:
{
"result" : [
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"isStudentEqual" : true
}
],
"ok" : 1
}
A brief explanation of the steps:
Build a projection of the document with just studentId and a new field with an array containing just the id (so the first document it would contain [23, 55].
Using that structure, $unwind. That creates a new temporary document for each array element in the studentIdComp array.
Now, take those documents, and create a new document projection, which continues to have the studentId and adds a new field called isStudentEqual that compares the equality of two fields, the studentId and studentIdComp. Remember that at this point there is a single temporary document that contains those two fields.
Finally, check that the comparison value isStudentEqual is true and return those documents (which will contain the original document _id and the studentId.
If the student was in the list multiple times, you might need to group the results on studentId or _id to prevent duplicates (but I don't know that you'd need that).
Unfortunately it's impossible ;(
to solve this problem it is necessary to use a $where statement
(example: Finding embeded document in mongodb?),
but $where is restricted from being used with aggregation framework
db.data.find({students: {$elemMatch: {id: 23}} , studentId: 23});