MongoDB: To Find Objects with Three Integer Values, Sort Range of Three Values or Perform Three Queries? - mongodb

My basic question is this: Which is more efficient?
mongo_db[collection].find(year: 2000)
mongo_db[collection].find(year: 2001)
mongo_db[collection].find(year: 2002)
or
mongo_db[collection].find(year: { $gte: 2000, $lte: 2002 }).sort({ year: 1 })
More detail: I have a MongoDB query in which I'll be selecting objects with 'year' attribute values of either 2000, 2001, or 2002, but no others. Is this best done as a find() with a sort(), or three separate find()s for each value? If it depends on the size of my collection, at what size does the more efficient search pattern change?

The single query is going to be faster because Mongo only has to scan the collection (or its index) once instead of three times. But you don't need a sort clause for your range query unless you actually want the results sorted for a separate reason.
You could also use $in for this:
mongo_db[collection].find({year: {$in: [2000, 2001, 2002]}})
Its performance should be very similar to your range query.

Related

MongoDB Scope of Indexes when Searching and Filtering

We have a huge collection where we query for specific documents each minute.
the query looks like:
db.mycoll.find({
date: {$lt: now},
public: true,
mail.delivered: false
}).sort({
remote.continent: 1,
remote.country: 1,
remote.city: 1
})
(the data is example data, the real schema looks a bit different)
how would I need to define an index that this query gets faster?
do multiple indexes work together? e.g. when I define one index for the sorting, one for public and one for the rest?
how expensive is indexing?

MongoDB indexing on variable query

I have a collection of user generated posts. They contain the following fields
_id: String
groupId: String // id of the group this was posted in
authorId: String
tagIds: [String]
latestActivity: Date // updated whenever someone comments on this post
createdAt: Date
numberOfVotes: Number
...some more...
My queries always look something like this...
Posts.find({
groupId: {$in: [...]},
authorId: 'xyz', // only SOMETIMES included
tagIds: {$in: [...]}, // only SOMETIMES included
}, {
sort: {latestActivity/createdAt/numberOfVotes: +1/-1, _id: -1}
});
So I'm always querying on the groupId, but only sometimes adding tagIds or userIds. I'm also switching out the field on which this is sorted. How would my best indexing strategy look like?
From what I've read so far here on SO, I would probably create multiple compound indices and have them always start with {groupId: 1, _id: -1} - because they are included in every query, they are good prefix candidates.
Now, I'm guessing that creating a new index for every possible combination wouldn't be a good idea memory wise. Therefore, should I just keep it like that and only index groupId and _id?
Thanks.
You are going in the right direction. With compound indexes, you want the most selective indexes on the left and the ranges on the right. {groupId: 1, _id: -1} satisfies this.
It's also important to remember that compound indexes are used when the keys are in the query from left to right. So, one compound index can cover many common scenarios. If, for example, your index was {groupId: 1, authorId:1, tagIds: 1} and your query was Posts.find({groupId: {$in: [...]},authorId: 'xyz'}), that index would get used (even though tagIds was absent). Also, Posts.find({groupId: {$in: [...]},tagIds: {$in: [...]}}) would use this index (the first and third field of the index was used, so if there isn't a more specific index found by Mongo, this index would be used) . However, Posts.find({authorId: 'xyz',tagIds: {$in: [...]}}) would not use the index because the first field in the index was missing.
Given all of that, I would suggest starting with {groupId: 1, authorId:1, tagIds: 1, _id: -1}. groupId is the only non-optional field in your queries, so it goes on the left before the optional ones. It looks like authorId is more selective than tagIds, so it should go on the left after groupId. You're sorting by _id so that should go on the right. Be sure to Analyze Query performance on the different ways you query the data. Make sure they are all choosing this index (otherwise you'll need to make more tweaks or possibly a second compound index). You could then make other indexes and force the query to use it to do some a-b testing on performance.

Improve distinct query performance using indexes

Documents in the collection follow the schema as given below -
{
"Year": String,
"Month": String
}
I want to accomplish the following tasks -
I want to run distinct queries like
db.col.distinct('Month', {Year: '2016'})
I have a compound index on {'Year': 1, 'Month': 1}, so by
intuition looks like the answer can be computed by looking at
indices only. The query performance on a collection in millions is
really poor right now. Is there any way that this could be done?
I want to find the latest month of a given year. One way is to sort the result of the above distinct query and take the first element as the answer.
A much better and faster solution as pointed out by # Blakes Seven in the discussion below, would be to use the query db.col.find({ "Year": "2016"}).sort({ "Year": -1, "Month": -1 }).limit(1)

Does mongodb has some 'soft' indexing optimization?

I have a 10 Go collection with pretty small documents (~1kb) which all contain the field 'date'. I need to do some daily mapreduce over the documents, only on the last day.
I have a few options :
no index
index over 'date'
create a field "day" which is date without the time.
have one collection per day. myCollection_20140106 for instance
I am thinking of 3 because it looks to me as a good compromise for indexing (slow) and reading the entire not indexed database (slow). Sorting the array 1, 3, 2, 3, 3, 2, 2, 1, 3, 3 ,1, 2 might be faster than sorting 1, 13, 2, 8, 5, 4, 6, 3, 7, 11 because there are more equal valued items. Does it apply to mongodb indexes ? Is the solution 3 good for this or is it just stupid and not faster than 2 ?
Any advice on this is most welcomed. Thank you very much!
EDIT : MR code :
db.my_col.mapReduce(map, reduce, {finalize: finalize, out: {merge: "day"},
query: {"date": {$gte: start_date, $lt: end_date, $exists: true}}})
map/reduce/finalize are basic functions to compute the average of a given field over the day "group by" another field. (e.g date, name, price -> compute the average price per person for a given day). (This is not the case but you can consider it is, I think the mapReduce/query are the things of interest here and I don't want to pollute the question by adding extra weight)
Given the fact that you are using date for your initial selection criteria, having an index over date makes more sense than having an index over day. Date is superset of day values and in terms of entries they still refer to index of similar (just to be cautious, it's not same) order of magnitude.
M/R functions are not analyzed and cannot use any indexes in mongodb. However as in your case, the query and sort portion of the command can take advantage of the indexes defined in mongodb.
I would also suggest taking a look at Mongodb MapReduce performance using Indexes .

Sorting on Multiple fields mongo DB

I have a query in mongo such that I want to give preference to the first field and then the second field.
Say I have to query such that
db.col.find({category: A}).sort({updated: -1, rating: -1}).limit(10).explain()
So I created the following index
db.col.ensureIndex({category: 1, rating: -1, updated: -1})
It worked just fined scanning as many objects as needed, i.e. 10.
But now I need to query
db.col.find({category: { $ne: A}}).sort({updated: -1, rating: -1}).limit(10)
So I created the following index:
db.col.ensureIndex({rating: -1, updated: -1})
but this leads to scanning of the whole document and when I create
db.col.ensureIndex({ updated: -1 ,rating: -1})
It scans less number of documents:
I just want to ask to be clear about sorting on multiple fields and what is the order to be preserved when doing so. By reading the MongoDB documents, it's clear that the field on which we need to perform sorting should be the last field. So that is the case I assumed in my $ne query above. Am I doing anything wrong?
The MongoDB query optimizer works by trying different plans to determine which approach works best for a given query. The winning plan for that query pattern is then cached for the next ~1,000 queries or until you do an explain().
To understand which query plans were considered, you should use explain(1), eg:
db.col.find({category:'A'}).sort({updated: -1}).explain(1)
The allPlans detail will show all plans that were compared.
If you run a query which is not very selective (for example, if many records match your criteria of {category: { $ne:'A'}}), it may be faster for MongoDB to find results using a BasicCursor (table scan) rather than matching against an index.
The order of fields in the query generally does not make a difference for the index selection (there are a few exceptions with range queries). The order of fields in a sort does affect the index selection. If your sort() criteria does not match the index order, the result data has to be re-sorted after the index is used (you should see scanAndOrder:true in the explain output if this happens).
It's also worth noting that MongoDB will only use one index per query (with the exception of $ors).
So if you are trying to optimize the query:
db.col.find({category:'A'}).sort({updated: -1, rating: -1})
You will want to include all three fields in the index:
db.col.ensureIndex({category: 1, updated: -1, rating: -1})
FYI, if you want to force a particular query to use an index (generally not needed or recommended), there is a hint() option you can try.
That is true but there are two layers of ordering you have here since you are sorting on a compound index.
As you noticed when the first field of the index matches the first field of sort it worked and the index was seen. However when working the other way around it does not.
As such by your own obersvations the order needed to be preserved is query order of fields from first to last. The mongo analyser can sometimes move around fields to match an index but normally it will just try and match the first field, if it cannot it will skip it.
try this code it will sort data first based on name then keeping the 'name' in key holder it will sort 'filter'
var cursor = db.collection('vc').find({ "name" : { $in: [ /cpu/, /memo/ ] } }, { _id: 0, }).sort( { "name":1 , "filter": 1 } );
Sort and Index Use
MongoDB can obtain the results of a sort operation from an index which
includes the sort fields. MongoDB may use multiple indexes to support
a sort operation if the sort uses the same indexes as the query
predicate. ... Sort operations that use an index often have better
performance than blocking sorts.
db.restaurants.find().sort( { "borough": 1, "_id": 1 } )
more information :
https://docs.mongodb.com/manual/reference/method/cursor.sort/