MongoDB covered query on embedded document - mongodb

I am learning indexing in MongoDB
My sample schema is:
name
location
street
number
I have created two indexes, on name and on location.number.
When I type
db.table.find({ 'name': 'Steve' }, { _id: 0, 'name': 1 }).explain('executionStats')
I got covered query, but when I type
db.table.find({ 'location.number': 46 }, { _id: 0, 'location.number': 1 }).explain('executionStats')
the totalDocsExamined is not equal to 0 so it is not covered query. Why? The query contains only one field, which index has and _id is excluded same as in first query. Covered queries are not working with embedded documents?

No, they are not. It is very well documented restriction:
An index cannot cover a query if any of the indexed fields in the query predicate or returned in the projection are fields in embedded documents.

The quoted text on Alex Blex answer no longer appears in the linked site. I still don't do my own research with the data I managed with this, but I think this may be possible now.
According to the new docs:
Changed in version 3.6: An index can cover a query on fields within
embedded documents.
Version 3.6 was released in November 2017, so definitely a new feature for the date the OP was made.
See the docs for more examples.

Related

Covered queries and find all (no filter)

I am not sure if it is possible, I'm just curious.
I have a region collection with the purpose of being loaded to a web dropdown. Not the big thing there.
{
_id:"some numeric id",
name:"region name",
}
and having the index created like db.regions.createIndex({name:1})
I tried both db.regions.find({},{_id:0}).sort(name:1) and without the sort.
Using the explain shows that the totalDocsExamined is greater than zero, so that means that it is not a covered query if I understood the concept.
For a covered query, you have to explicitly list each of the covered fields in the projection:
db.regions.find({}, {_id:0, name: 1}).sort({name:1})
If you only exclude _id, you're telling MongoDB to include all fields besides _id. You know that all your docs only have a name field, but MongoDB doesn't so you have to be explicit.

Mongodb update limited number of documents

I have a collection with 100 million documents. I want to safely update a number of the documents (by safely I mean update a document only if it hasn't already been updated). Is there an efficient way to do it in Mongo?
I was planning to use the $isolated operator with a limit clause but it appears mongo doesn't support limiting on updates.
This seems simple but I'm stuck. Any help would be appreciated.
Per Sammaye, it doesn't look like there is a "proper" way to do this.
My workaround was to create a sequence as outlined on the mongo site and simply add a 'seq' field to every record in my collection. Now I have a unique field which is reliably sortable to update on.
Reliably sortable is important here. I was going to just sort on the auto-generated _id but I quickly realized that natural order is NOT the same as ascending order for ObjectId's (from this page it looks like the string value takes precedence over the object value which matches the behavior I observed in testing). Also, it is entirely possible for a record to be relocated on disk which makes the natural order unreliable for sorting.
So now I can query for the record with the smallest 'seq' which has NOT already been updated to get an inclusive starting point. Next I query for records with 'seq' greater than my starting point and skip (it is important to skip since the 'seq' may be sparse if you remove documents, etc...) the number of records I want to update. Put a limit of 1 on that query and you've got a non-inclusive endpoint. Now I can issue an update with a query of 'updated' = 0, 'seq' >= my starting point and < my endpoint. Assuming no other thread has beat me to the punch the update should give me what I want.
Here are the steps again:
create an auto-increment sequence using findAndModify
add a field to your collection which uses the auto-increment sequence
query to find a suitable starting point: db.xx.find({ updated: 0 }).sort({ seq: 1 }).limit(1)
query to find a suitable endpoint: db.xx.find({ seq: { $gt: startSeq }}).sort({ seq: 1 }).skip(updateCount).limit(1)
update the collection using the starting and ending points: db.xx.update({ updated: 0, seq: { $gte: startSeq }, seq: { $lt: endSeq }, $isolated: 1}, { updated: 1 },{ multi: true })
Pretty painful but it gets the job done.

searching with multiple parameters with mongodb

How is fine-grained search achiveable with mongodb, without the use of external engines? Take this object as example
{
genre: 'comedy',
pages: 380,
year: 2013,
bestseller: true,
author: 'John Doe'
}
That is being searched by the following:
db.books.find({
pages: { '&gt': 100 },
year: { '&gt': 2000 },
bestseller: true,
author: "John Doe"
});
Pretty straightforward so far. Now suppose that there are a bit more values in the document, and that I am making more refined searches and I have a pretty big collection.
First thing I would do is to create indexes. But, how does it work? I have read that the index intersection, as defined in here https://jira.mongodb.org/browse/SERVER-3071 is not doable. That means that if I set the index to "year" and "pages" I will not really optimize the AND operations in searches.
So how can the searches be optimized for having many parameters?
Thanks in advance.
It seems like you are asking about compound indexes in mongodb. Compound indexes allow you to create a single index on multiple fields in a document. By creating compound indexes you can make these large/complex queries while still using an index.
On a more general note, if you create a basic index on a field that is highly selective, your search can end up being very quick. Using your example, if you had an index on author, the query engine would use that index to find all the entries where author == "John Doe". Presumably there are not that many books with that specific author relative to the number of books in the entire collection. So, even if the rest of your query is fairly complex, it is only evaluated over those few documents with the matching author. Thus, by structuring your indexes properly you can get a significant performance gain without having to have any complex indexes.

Sorting on Multiple fields mongo DB

I have a query in mongo such that I want to give preference to the first field and then the second field.
Say I have to query such that
db.col.find({category: A}).sort({updated: -1, rating: -1}).limit(10).explain()
So I created the following index
db.col.ensureIndex({category: 1, rating: -1, updated: -1})
It worked just fined scanning as many objects as needed, i.e. 10.
But now I need to query
db.col.find({category: { $ne: A}}).sort({updated: -1, rating: -1}).limit(10)
So I created the following index:
db.col.ensureIndex({rating: -1, updated: -1})
but this leads to scanning of the whole document and when I create
db.col.ensureIndex({ updated: -1 ,rating: -1})
It scans less number of documents:
I just want to ask to be clear about sorting on multiple fields and what is the order to be preserved when doing so. By reading the MongoDB documents, it's clear that the field on which we need to perform sorting should be the last field. So that is the case I assumed in my $ne query above. Am I doing anything wrong?
The MongoDB query optimizer works by trying different plans to determine which approach works best for a given query. The winning plan for that query pattern is then cached for the next ~1,000 queries or until you do an explain().
To understand which query plans were considered, you should use explain(1), eg:
db.col.find({category:'A'}).sort({updated: -1}).explain(1)
The allPlans detail will show all plans that were compared.
If you run a query which is not very selective (for example, if many records match your criteria of {category: { $ne:'A'}}), it may be faster for MongoDB to find results using a BasicCursor (table scan) rather than matching against an index.
The order of fields in the query generally does not make a difference for the index selection (there are a few exceptions with range queries). The order of fields in a sort does affect the index selection. If your sort() criteria does not match the index order, the result data has to be re-sorted after the index is used (you should see scanAndOrder:true in the explain output if this happens).
It's also worth noting that MongoDB will only use one index per query (with the exception of $ors).
So if you are trying to optimize the query:
db.col.find({category:'A'}).sort({updated: -1, rating: -1})
You will want to include all three fields in the index:
db.col.ensureIndex({category: 1, updated: -1, rating: -1})
FYI, if you want to force a particular query to use an index (generally not needed or recommended), there is a hint() option you can try.
That is true but there are two layers of ordering you have here since you are sorting on a compound index.
As you noticed when the first field of the index matches the first field of sort it worked and the index was seen. However when working the other way around it does not.
As such by your own obersvations the order needed to be preserved is query order of fields from first to last. The mongo analyser can sometimes move around fields to match an index but normally it will just try and match the first field, if it cannot it will skip it.
try this code it will sort data first based on name then keeping the 'name' in key holder it will sort 'filter'
var cursor = db.collection('vc').find({ "name" : { $in: [ /cpu/, /memo/ ] } }, { _id: 0, }).sort( { "name":1 , "filter": 1 } );
Sort and Index Use
MongoDB can obtain the results of a sort operation from an index which
includes the sort fields. MongoDB may use multiple indexes to support
a sort operation if the sort uses the same indexes as the query
predicate. ... Sort operations that use an index often have better
performance than blocking sorts.
db.restaurants.find().sort( { "borough": 1, "_id": 1 } )
more information :
https://docs.mongodb.com/manual/reference/method/cursor.sort/

MongoDB - Query embbeded documents

I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?
When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.
Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.
For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.