How to fetch between a range of indexes in mongodb? - mongodb

I need help.. Is there any method available to fetch documents between a range of indexes while using find in mongo.. Like [2:10] (from 2 to 10) ?

If you are talking about the "index" position within an array in your document then you want the $slice operator. The first argument being the index to start with and the second is how many to return. So from a 0 index position 2 is the "third" index:
db.collection.find({},{ "list": { "$slice": [ 2, 8 ] })
Within a collection itself if you use the .limit() an .skip() modifiers to move through the range in the collection:
db.collection.find({}).skip(2).limit(8)
Keep in mind that in the collection context MongoDB has no concept of "ordered" records and is dependent on the query and/or sort order that is given

Related

How to optimize mongo query with two parallel array?

I have a query like this:
xml_db.find(
{
'high_performer': {
'$nin': [some_value]
},
'low_performer': {
'$nin': [some_value]
},
'expiration_date': {
'$gte': datetime.now().strftime('%Y-%m-%d')
},
'source': 'some_value'
}
)
I have tried to create an index with those fields but getting error:
pymongo.errors.OperationFailure: cannot index parallel arrays [low_performer] [high_performer]
So, how to efficiently run this query?
Compound indexing ordering should follow the equality --> sort --> range rule. A good description of this can be found in this response.
This means that the first field in the index would be source, followed by the range filters (expiration_date, low_performer and high_performer).
As you noticed, one of the "performer" fields cannot be included in the index since only a single array can be indexed. You should use your knowledge of the data set to determine which filter (low_performer or high_performer) would be more selective and choose that filter to be included in the index.
Assuming that high_performer is more selective, the only remaining step would be to determine the ordering between expiration_date and high_performer. Again, you should use your knowledge of the data set to make this determination based on selectivity.
Assuming expiration_date is more selective, the index to create would then be:
{ "source" : 1, "expiration_date" : 1, "high_performer" : 1 }

Mongo range query slow on objects inside array

I have a collection
orders
{
"_id": "abcd",
"last_modified": ISODate("2016-01-01T00:00:00Z"),
"suborders": [
{
"suborder_id": "1",
"last_modified: ISODate("2016-01-02T00: 00: 00Z")
}, {
"suborder_id":"2",
"last_modified: ISODate("2016-01-03T00:00:00Z")
}
]
}
I have two indexes on this collection:
{"last_modified":1}
{"suborders.last_modified": 1}
when I use range queries on last_modified, indexes are properly used, and results are returned instantly. eg query: db.orders.find({"last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});
However, when I am querying on suborders.last_modified, the query takes too long to execute. eq query:db.orders.find({"suborders.last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});
Please help debug this.
The short answer is to use min and max to set the index bounds correctly. For how to approach debugging, read on.
A good place to start for query performance issues is to attach .explain() at the end of your queries. I made a script to generate documents like yours and execute the queries you provided.
I used mongo 3.2.9 and both queries do use the created indices with this setup. However, the second query was returning many more documents (approximately 6% of all the documents in the collection). I suspect that is not your intention.
To see what is happening lets consider a small example in the mongo shell:
> db.arrayFun.insert({
orders: [
{ last_modified: ISODate("2015-01-01T00:00:00Z") },
{ last_modified: ISODate("2016-01-01T00:00:00Z") }
]
})
WriteResult({ "nInserted" : 1 })
then query between May and July of 2015:
> db.arrayFun.find({"orders.last_modified": {
$gt: ISODate("2015-05-01T00:00:00Z"),
$lt: ISODate("2015-07-01T00:00:00Z")
}}, {_id: 0})
{ "orders" : [ { "last_modified" : ISODate("2015-01-01T00:00:00Z") }, { "last_modified" : ISODate("2016-01-01T00:00:00Z") } ] }
Although neither object in the array has last_modified between May and July, it found the document. This is because it is looking for one object in the array with last_modified greater than May and one object with last_modified less than July. These queries cannot intersect multikey index bounds, which happens in your case. You can see this in the indexBounds field of explain("allPlansExecution") output, in particular one of the lower bound or upper bound Date will not be what you specified. This means that a large number of documents may need to be scanned to complete the query depending on your data.
To find objects in the array that have last_modified between two bounds, I tried using $elemMatch.
db.orders.find({"suborders": {
$elemMatch:{
last_modified:{
"$gt":ISODate("2016-09-15T00:00:00Z"),
"$lt":ISODate("2016-09-16T00:00:00Z")
}
}
}})
In my test this returned about 0.5% of all documents. However, it was still running slow. The explain output showed it was still not setting the index bounds correctly (only using one bound).
What ended up working best was to manually set the index bounds with min and max.
db.subDocs.find()
.min({"suborders.last_modified":ISODate("2016-09-15T00:00:00Z")})
.max({"suborders.last_modified":ISODate("2016-09-16T00:00:00Z")})
Which returned the same documents as $elemMatch but used both bounds on the index. It ran in 0.021s versus 2-4s for elemMatch and the original find.

Optimising queries in mongodb

I am working on optimising my queries in mongodb.
In normal sql query there is an order in which where clauses are applied. For e.g. select * from employees where department="dept1" and floor=2 and sex="male", here first department="dept1" is applied, then floor=2 is applied and lastly sex="male".
I was wondering does it happen in a similar way in mongodb.
E.g.
DbObject search = new BasicDbObject("department", "dept1").put("floor",2).put("sex", "male");
here which match clause will be applied first or infact does mongo work in this manner at all.
This question basically arises from my background with SQL databases.
Please help.
If there are no indexes we have to scan the full collection (collection scan) in order to find the required documents. In your case if you want to apply with order [department, floor and sex] you should create this compound index:
db.employees.createIndex( { "department": 1, "floor": 1, "sex" : 1 } )
As documentation: https://docs.mongodb.org/manual/core/index-compound/
db.products.createIndex( { "item": 1, "stock": 1 } )
The order of the fields in a compound index is very important. In the
previous example, the index will contain references to documents
sorted first by the values of the item field and, within each value of
the item field, sorted by values of the stock field.

How to efficiently page batches of results with MongoDB

I am using the below query on my MongoDB collection which is taking more than an hour to complete.
db.collection.find({language:"hi"}).sort({_id:-1}).skip(5000).limit(1)
I am trying to to get the results in a batch of 5000 to process in either ascending or descending order for documents with "hi" as a value in language field. So i am using this query in which i am skipping the processed documents every time by incrementing the "skip" value.
The document count in this collection is just above 20 million.
An index on the field "language" is already created.
MongoDB Version i am using is 2.6.7
Is there a more appropriate index for this query which can get the result faster?
When you want to sort descending, you should create a multi-field index which uses the field(s) you sort on as descending field(s). You do that by setting those field(s) to -1.
This index should greatly increase the performance of your sort:
db.collection.ensureIndex({ language: 1, _id: -1 });
When you also want to speed up the other case - retrieving sorted in ascending order - create a second index like this:
db.collection.ensureIndex({ language: 1, _id: 1 });
Keep in mind that when you do not sort your results, you receive them in natural order. Natural order is often insertion order, but there is no guarantee for that. There are various events which can cause the natural order to get messed up, so when you care about the order you should always sort explicitly. The only exception to this rule are capped collections which always maintain insertion order.
In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.
You desired "sort key" here is _id, so that makes things simple:
First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:
db.collection.createIndex({ "language": 1, "_id": -1 })
Then you want to do some simple processing, from the start:
var lastId = null;
var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
That's the first batch. Now when you move on to the next one:
var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.
That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.
Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.

How do I do a "NOT IN" query in Mongo?

This is my document:
{
title:"Happy thanksgiving",
body: "come over for dinner",
blocked:[
{user:333, name:'john'},
{user:994, name:'jessica'},
{user:11, name: 'matt'},
]
}
What is the query to find all documents that do not have user 11 in "blocked"?
You can use $in or $nin for "not in"
Example ...
> db.people.find({ crowd : { $nin: ["cool"] }});
I put a bunch more examples here: http://learnmongo.com/posts/being-part-of-the-in-crowd/
Since you are comparing against a single value, your example actually doesn't need a NOT IN operation. This is because Mongo will apply its search criteria to every element of an array subdocument. You can use the NOT EQUALS operator, $ne, to get what you want as it takes the value that cannot turn up in the search:
db.myCollection.find({'blocked.user': {$ne: 11}});
However if you have many things that it cannot equal, that is when you would use the NOT IN operator, which is $nin. It takes an array of values that cannot turn up in the search:
db.myCollection.find({'blocked.user': {$nin: [11, 12, 13]}});
Try the following:
db.stack.find({"blocked.user":{$nin:[11]}})
This worked for me.
See http://docs.mongodb.org/manual/reference/operator/query/nin/#op._S_nin
db.inventory.find( { qty: { $nin: [ 5, 15 ] } } )
This query will
select all documents in the inventory collection where the qty field
value does not equal 5 nor 15. The selected documents will include
those documents that do not contain the qty field.
If the field holds an array, then the $nin operator selects the
documents whose field holds an array with no element equal to a value
in the specified array (e.g. , , etc.).