Mongodb Indexing suggestions - mongodb

All,
I have a mangodb collection with below fields.
_ID
Title
Description
Tags , array
I have created 2 index on _id and tags field. I have created index for people to search the content with help of keywords.
I have created the index with tags:-1 to show the latest inserted records to show first. But even after that it is showing in the ascending order of _id.
How to create the index on tags field to show the last inserted to show first at the same time it should allow me to search on tags field faster .

If the _id field is the default ObjectId which reflects the insertion order, and you want to query all the documents with a specific Tags by descending insertion order, you can use the query as below:
find({ Tags : $value }).sort({ _id : -1 })
For this query, you can create a compound index on { Tags : 1, _id : -1 }. All the documents with the same Tags will be sorted in descending insertion order and this index should work well for this query.
Please note that if you are doing range query on Tags, like:
find({ Tags : { $in : [ $value1, $value2 ] }}).sort({ _id : -1 })
find({ Tags : { $gt : $value}}).sort({ _id : -1})
It wouldn’t be able to use the index to sort the result documents, and will need to sort the results in memory. You can run the query with .explain(true) to check the query plan. If scanAndOrder is true, it means the query cannot use the order of documents in the index for returning sorted results.
There are also some documents and blogs relate to indexes and that I'd recommend reading:
http://docs.mongodb.org/manual/tutorial/sort-results-with-indexes/
http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
http://blog.mongolab.com/2012/06/cardinal-ins/

Related

Created indexes on a mongodb collection, still fails while sorting a large data set

My Query below:
db.chats.find({ bid: 'someID' }).sort({start_time: 1}).limit(10).skip(82560).pretty()
I have indexes on chats collection on the fields in this order
{
"cid" : 1,
"bid" : 1,
"start_time" : 1
}
I am trying to perform sort, but when I write a query and check the result of explain(), I still get the winningPlan as
{
"stage":"SKIP",
"skipAmount":82560,
"inputStage":{
"stage":"SORT",
"sortPattern":{
"start_time":1
},
"limitAmount":82570,
"inputStage":{
"stage":"SORT_KEY_GENERATOR",
"inputStage":{
"stage":"COLLSCAN",
"filter":{
"ID":{
"$eq":"someID"
}
},
"direction":"forward"
}
}
}
}
I was expecting not to have a sort stage in the winning plan as I have indexes created for that collection.
Having no indexes will result into the following error
MongoError: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM [duplicate]
However I managed to make the sort work by increasing the size allocation on ram from 32mb to 64mb, looking for help in adding indexes properly
The order of fields in an index matters. To sort query results by a field that is not at the start of the index key pattern the query must include equality conditions on all the prefix keys that precede the sort keys. The cid field is not in the query nor used for sorting, so you must leave it out. Then you put the bid field first in the index definition as you use it in the equality condition. The start_time goes after that to be used in sorting. Finally, the index must look like this:
{"bid" : 1, "start_time" : 1}
See the documentation for further reference.

Return array index from $elemMatch query

Say I have a document like this
{
title : 'myTitle',
favorites : [{name : 'text', number : 6}, {name : 'other', number : 4}]
}
I would like to return the array index (if any) from where the embedded document was retrieved within the favorites array.
Say I have the following query
{title : 'myTitle'}
and the projection
{favorites : {$elemMatch : {name : 'text', number : 6} }}
if the projection returns the document AND it contains a favorites array with the sub document, is there a way to know at which index that sub document was found? Which in this case is index 0.
The reason I would like the index is because as soon as I retrieve the document, I proceed to update it, and it would boost the performance if I had a specific index to update instead of using $elemMatch again which would cause mongo to iterate through all the array entries until finding the document that matches.
Unfortunately, it is not possible using the $elemMatch operator. If you are using mongodb 3.2, you could make use of the $unwind operator and aggregate instead of doing a find()
db.collection.aggregate([
{$match:{"favorites.name":"text","favorites.number":6}},
{$unwind:{"path":"$favorites","includeArrayIndex":"index"}},
{$match:{"favorites.name":"text","favorites.number":6}}
])
The document would be returned, with its array index in the field - index. If you want the entire document, with other array elements, you would have to $group, after $unwind instead of $match.
For previous versions, iterating the array in the client side, and getting the sub document's index would be the way to go.

Can I do a second 'query' on a MongoDB cursor?

Imagine a collection with about 5,000,000 documents. I need to do a basicCursor query to select ~100 documents based on too many fields to index. Let's call this the basicCursorMatch. This will be immensely slow.
I can however to a bTreeCursor query on a few indexes that will limit my search to ~500 documents. Let's call this query the bTreeCursorMatch.
Is there a way I can do this basicCursorMatch directly on the cursor or collection resulting from the bTreeCursorMatch?
Intuitively I tried
var cursor = collection.find(bTreeCursorMatch);
var results = cursor.find(basicCursorMatch);
similar to collection.find(bTreeCursorMatch).find(basicCursorMatch), which doesn't seem to work.
Alternatively, I was hoping I could do something like this:
collection.aggregate([
{$match: bTreeCursorMatch}, // Uses index 5,000,000 -> 500 fast
{$match: basicCursorMatch}, // No index, 500 -> 100 'slow'
{$sort}
]);
.. but it seems that I cannot do this either. Is there an alternative to do what I want?
The reason I am asking is because this second query will differ a lot and there is no way I can index all the fields. But I do want to make that first query using a bTreeCursor, otherwise querying the whole collection will take forever using a basicCursor.
update
Also, through user input the subselection of 500 documents will be queried in different ways during a session with an unpredictable basicCursor query, using multiple $in $eq $gt $lt. But during this, the bTreeCursor subselection remains the same. Should I just keep doing both queries for every user query, or is there a more efficient way to keep a reference to this collection?
In practice, you rarely need to run second queries on a cursor. You specially don't need to break MongoDB's work into separate indexable / non-indexable chunks.
If you pass a query to MongoDB's find method that can be partially fulfilled by a look-up in an index, MongoDB will do that look-up first, and then do a full scan on the remaining documents.
For instance, I have a collection users with documents like:
{ _id : 4, gender : "M", ... }
There is an index on _id, but not on gender. There are ~200M documents in users.
To get an idea of what MongoDB is doing under the hood, add explain() to your cursor (in the Mongo shell):
> db.users.find( { _id : { $gte : 1, $lt : 10 } } ).explain()
{
"cursor" : "BtreeCursor oldId_1_state_1",
"n" : 9,
"nscannedObjects" : 9
}
I have cut out some of the fields returned by explain. Basically, cursor tells you if it's using an index, n tells you the number of documents returned by the query and nscannedObjects is the number of objects scanned during the query. In this case, mongodb was able to scan exactly the right number of objects.
What happens if we now query on gender as well?
> db.users.find( { _id : { $gte : 1, $lt : 10 }, gender : "F" } ).explain()
{
"cursor" : "BtreeCursor oldId_1_state_1",
"n" : 5,
"nscannedObjects" : 9
}
find returns 5 objects, but had to scan 9 documents. It was therefore able to isolate the correct 9 documents using the _id field. It then went through all 9 documents and filtered them by gender.

mongodb , wildcard in $in

I have a mongodb query where i want to get documents if a field has particular value.
db.collection.find({key:{$in:['value1','value2']}}) if i run above command i get documents containing either 'value1' or 'value2'. but lets just say there are no values. and i search db.collection.find({key:{$in:[]}}), nothing is displayed. and db.collection.find({key:{$in:[*]}}) gives unexpected token* which wild card do i use in $in to show all results.?
I think this is logically consistent behavior for $in. The query
db.collection.find({ "key" : { "$in" : [] } })
could be translated as "find all the documents where the value of key is one of the values contained in the array []". Since there are no values in the array [], there are no matching documents. If you want to find all of the extant values for key, use .distinct to return them as an array:
db.collection.distinct("key")
.distinct will use an index if possible.
If you want a query to match all documents, omit the query selector from .find:
db.collection.find()
as suggested in the comments.

MongoDB select subdocument with aggregation function

I have a mongo DB collection that looks something like this:
{
{
_id: objectId('aabbccddeeff'),
objectName: 'MyFirstObject',
objectLength: 0xDEADBEEF,
objectSource: 'Source1',
accessCounter: {
'firstLocationCode' : 283,
'secondLocationCode' : 543,
'ThirdLocationCode' : 564,
'FourthLocationCode' : 12,
}
}
...
}
Now, assuming that this is not the only record in the collection and that most/all of the documents contain the accessCounter subdocument/field how will I go with selecting the x first documents where I have the most access from a specific location.
A sample "query" will be something like:
"Select the first 10 documents From myCollection where the accessCounter.firstLocationCode are the highest"
So a sample result will be X documents where the accessCounter. will be the greatest is the database.
Thank your for taking the time to read my question.
No need for an aggregation, that is a basic query:
db.collection.find().sort({"accessCounter.firstLocation":-1}).limit(10)
In order to speed this up, you should create a subdocument index on accessCounter first:
db.collection.ensureIndex({'accessCounter':-1})
assuming the you want to do the same query for all locations. In case you only want to query firstLocation, create the index on accessCounter.firstLocation.
You can speed this up further in case you only need the accessCounter value by making this a so called covered query, a query of which the values to return come from the index itself. For example, when you have the subdocument indexed and you query for the top secondLocations, you should be able to do a covered query with:
db.collection.find({},{_id:0,"accessCounter.secondLocation":1})
.sort("accessCounter.secondLocation":-1).limit(10)
which translates to "Get all documents ('{}'), don't return the _id field as you do by default ('_id:0'), get only the 'accessCounter.secondLocation' field ('accessCounter.secondLocation:1'). Sort the returned values in descending order and give me the first ten."