MongoDB: Update field with size of embedded array - mongodb

I have a collection of documents with an array (set in this case): my_array
I'm adding things to this set periodically
collection.find({ "_id": target_id })
.upsert().update({ '$addToSet':{ 'my_array': new_entry }})
Many of the logical operations I perform on this DB are based on this sub-array's size. So I've created a field (indexed) called len_of_array. The index is quite critical to my use case.
In the case where this is a true array and not a set, the $incr would work beautifully in the same update
However, since the sub-collection is a set, the length of the collection, my_array, may or may not have changed.
My current solution:
Call this periodically for each target_id, but this requires performing a find in order to get the correct len_of_array
collection.find({ '_id': target_id})
.upsert().update({ '$set':{ 'len_of_array': new_length }})
My Question
Is there a way to set a field of a document to the indexed size of a sub-array in the same document in a single update?

You don't need the field length_of_array in order to query by its size. There is the $size operator. Which would save you the periodical update, too.
Let's say you want to find all documents for which the length of my_array is greater than 2:
db.coll.find({ "my_array":{ "$size" :{ "$gt": 2 }}})

Related

Can an index on a subfield cover queries on projections of that field?

Imagine you have a schema like:
[{
name: "Bob",
naps: [{
time: 2019-05-01T15:35:00,
location: "sofa"
}, ...]
}, ...
]
So lots of people, each with a few dozen naps. You want to find out 'what days do people take the most naps?', so you index naps.time, and then query with:
aggregate([
{$unwind: naps},
{$group: { _id: {$day: "$naps.time"}, napsOnDay: {"$sum": 1 } }
])
But when doing explain(), mongo tells me no index was used in this query, when clearly the index on the time Date field could have been. Why is this? How can I get mongo to use the index for the more optimal query?
Indexes stores pointers to actual documents, and can only be used when working with a material document (i.e. the document that is actually stored on disk).
$match or $sort does not mutate the actual documents, and thus indexes can be used in these stages.
In contrast, $unwind, $group, or any other stages that changes the actual document representations basically loses the connection between the index and the material documents.
Additionally, when those stages are processed without $match, you're basically saying that you want to process the whole collection. There is no point in using the index if you want to process the whole collection.

FindAndUpdate first 5 documents

I am looking to a way to FindAndModify not more than 5 documents in MongoDB.
This is collection for queue which will be processed from multiple workers, so I want to put it into single query.
While I cannot control amount of updates in UpdateOptions parameter, is it possible to limit number of rows which will be found in filterDefinition?
Problem 1: findAndModify() can only update a single document at a time, as per the documentation. This is an inherent limit in MongoDB's implementation.
Problem 2: There is no way to update a specific number of arbitrary documents with a simple update() query of any kind. You can update one or all depending on the boolean value of your multi option, but that's it.
If you want to update up to 5 documents at a time, you're going to have to retrieve these documents first then update them, or update them individually in a foreach() call. Either way, you'll either be using something like:
db.collection.update(
{_id: {$in: [ doc1._id, doc2._id, ... ]}},
{ ... },
{multi: true}
);
Or you'll be using something like:
db.collection.find({ ... }).limit(5).forEach(function(doc) {
//do something to doc
db.collection.update({_id: doc._id}, doc);
});
Whichever approach you choose to take, it's going to be a workaround. Again, this is an inherent limitation.

mongodb limit operation retrieve newest docs

mongodb find return docs by ascending order of "_id", when apply limit(n) on find(), it always return oldest n docs (Assume doc1's _id > doc2's _id imply doc1 newer than doc2, for example, the ObjectId ). I want let it return newest n docs so I do:
col.find().sort({"_id":-1}).limit(n)
Is this inefficient? Will mongodb sort all docs in 'col'?
The _id field is essentially the "primary key" and therefore has an index so there is not actually a "sort" on the whole collection, it just traverses that primary index in reverse order in this case.
Provided that you are happy enough that this does reflect your "newest" documents, and in normal circumstances there is no reason to believe otherwise, then this will return what you want in an efficient manner.
If indeed you want to sort by something else such as a timestamp or other field then just create an index on that field and sort as you have above. The general cases should use that index as well and just return in "descending order" or as specified in the direction of your sort or default index
db.collection.ensureIndex({ "created": 1 })
Or as default "descending" :
db.collection.ensureIndex({ "created": -1 })
Then query:
db.collection.find().sort({ "created": -1 })
So basically it does not "sort" the whole collection when an index is present to use. The _id key is always indexed.
Also see .ensureIndex() in the documentation.

how to build index in mongodb in this situation

I have a mongodb database, which has following fields:
{"word":"ipad", "date":20140113, "docid": 324, "score": 98}
which is a reverse index for a log of docs(about 120 millions).
there are two kinds of queries in my system:
one of which is :
db.index.find({"word":"ipad", "date":20140113}).sort({"score":-1})
this query fetch the word "ipad" in date 20140113, and sort the all docs by score.
another query is:
db.index.find({"word":"ipad", "date":20140113, "docid":324})
to speed up these two kinds of query, what index should I build?
Should I build two indexes like this?:
db.index.ensureIndex({"word":1, "date":1, "docid":1}, {"unique":true})
db.index.ensureIndex({"word":1, "date":1, "score":1}
but I think build the two index use two much hard disk space.
So do you have some good ideas?
You are sorting by score descending (.sort({"score":-1})), which means that your index should also be descending on the score-field so it can support the sorting:
db.index.ensureIndex({"word":1, "date":1, "score":-1});
The other index looks good to speed up that query, but you still might want to confirm that by running the query in the mongo shell followed with .explain().
Indexes are always a tradeoff of space and write-performance for read-performance. When you can't afford the space, you can't have the index and have to deal with it. But usually the write-performance is the larger concern, because drive space is usually cheap.
But maybe you could save one of the three indexes you have. "Wait, three indexes?" Yes, keep in mind that every collection must have an unique index on the _id field which is created implicitely when the collection is initialized.
But the _id field doesn't have to be an auto-generated ObjectId. It can be anything you want. When you have another index with an uniqueness-constraint and you have no use for the _id field, you can move that unique-constraint to the _id field to save an index. Your documents would then look like this:
{ _id: {
"word":"ipad",
"date":20140113,
"docid": 324
},
"score": 98
}

MongoDB multikeys on _id + some value

In MongoDB I have a query which looks like this to find out for which comments the user has already voted:
db.comments.find({
_id: { $in: [...some ids...] },
votes.uid: "4fe1d64d85d4f4c00d000002"
});
As the documentation says you should have
One index per query
So what's better creating a multikey on _id + votes.uid or is it enough to just index on votes.uid because Mongo handles _id automatically in any way?
There is automatically an index on _id.
Depending of your queries (how many ids you have in the $in array) and your data, (how many votes you have on one object) you may create a index on votes.uid.
Take care of which index is used during query execution and remember you can force Mongo to use the index you want by adding .hints(field:1) or hints('indexname')