Mongo findAndUpdateMany atomically - mongodb

Let's say there are 10000 documents in a collection. I have 3 app nodes doing something with those documents. I want one document to only be processed once. How I've currently done it is that in app there's a loop which queries the collection with findOneAndUpdate which finds document where claimed=false and at the same time updates them to claimed=true. It works, but the problem with this is querying documents one by one is slow. What I'd like to do is "find up to 100 documents where claimed=false and at the same time update them to claimed=true". I need this to be atomic to avoid race conditions where multiple app nodes claim the same document. But from Mongo's documentation I can't find anything like findManyAndUpdate(). In SQL worlds it's basically select for update skip locked. Is there something like this? Maybe I can utilise Mongo's transactions somehow?

Assuming "find up to" a soft limit, you can run 2 queries:
db.collection.find({claimed:false}, {_id:1}).limit(100)
to get all _ids into an array ids, then
db.collection.updateMany({claimed: false, _id: {$in: `ids`}}, {$set: {claimed: true}})
It will update 0 to 100 documents depending on concurrent updates.
UPDATE
I guess I missed the point that you actually need to retrieve the documents too, not only update them.
There is no options but update them individually. Select 100:
db.collection.find({claimed:false}).limit(100)
Then iterate for each _id:
db.collection.updateOne({_id: id, claimed:false}, {$set: {claimed:true}})
The result of each update contains modifiedCount with value 1 or 0. Discard the documents that were not modified, they were claimed by the concurrent update.

Related

How to use mongodb change stream instead of periodic query?

I wan't to calculate sum the documents in my collection satisfying a query. I dont want to poll my collection. How can you do this with mongodb changestream?
For example there are documents in the database and they all have some property: {"destination": "Target1"} And i want to know the amount of documents which are satisfying this previous requirement.
I don't want to run a query on every change of a collection. Because the documents changing very often
I am looking for a similar to oracle's cqn
You can use changestream and watch changes as follow:
watchCursor = db.getSiblingDB("mydatabase").mycollection.watch()
while (!watchCursor.isExhausted()){
if (watchCursor.hasNext()){
printjson(watchCursor.next());
}
}
changeStream docs
But perhaps you may do some query and use some good indexes?
It seems you can just execute:
db.collection.count({destination:"Target1"})
and if you have index on "destination" field it will be pretty quick ...

How does the limit() option work in mongodb?

Let say you have a collection of 10,000 documents and I make a find query with a the option limit(50). How will mongoDb choose which 50 documents to return.
Will it auto-sort them(maybe by their creation date) or not?
Will the query return the same documents every time it is called? How does the limit option work in mongodb?
Does mongoDB limit the documents after they are returned or as it queries them. Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
The first 50 documents of the result set will be returned.
If you do not sort the documents (or if the order is not well-defined, such as sorting by a field with values that occur multiple times in the result set), the order may change from one execution to the next.
Will it auto-sort them(maybe by their creation date) or not?
No.
Will the query return the same documents every time it is called?
The query may produce the same results for a while and then start producing different results if, for example, another document is inserted into the collection.
Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
Depends on the query. If an index is used, only the needed documents will be read from the storage engine. If a sort stage is used in the query execution, all documents will be read from storage, sorted, then the required number will be returned and the rest discarded.

How to improve the performance of this MongoDB query

I am trying to take an extract from a huge MongoDB collection.
In particular, the collection contains 2.65TB data (unzipped), i.e., 600GB data (zipped). Each document has a deep hierarchy and a couple of arrays and I want to extract some parts out of them. In this collection we have multiple documents for each customer id. Since I want to export the most active document for each customer, I need to group and take the records with the maximum timestamp field and perform some further processing on them. I need some help in forming the query for the export. I have tried to sort the documents per customer id, but this could not be achieved in an acceptable time when combined with a 'match' construct (this is needed since it is a huge collection and we try to create the export in parts). Currently the query looks like this:
db.getCollection('CEM').aggregate([
{'$match' : {'LiveFeed.customer.profile.id':'TCAYT2RY2PF93R93JVSUGU7D3'}},
{'$project':{'LiveFeed.customer.profile.id':1,'LiveFeed.customer.profile.products.air.flights':1, 'LiveFeed.context.timestamp':1}},
{'$sort':{'LiveFeed.customer.profile.id':1,"LiveFeed.context.timestamp":1}},
{'$group':{'_id':'$LiveFeed.customer.profile.id',
'products':{'$last':'$LiveFeed.customer.profile.products.air.flights'}}},
{'$unwind': '$products'},
{'$unwind': '$products.sources'},
{'$project':{'_id':0,
'ceid': '$_id',
'coupon_no':{'$ifNull':['$products.couponId.couponNumber', ""]},
'ticket_no':{'$ifNull':['$products.couponId.ticketId.number','']},
'pnr_id':'$products.sources.id',
'departure_date':'$products.segment.departure.at',
'departure_airport':'$products.segment.departure.code',
'arrival_airport':'$products.segment.arrival.code',
'created_date':'$products.createdAt'}}])
Any ideas/suggestions on to how to improve this query will be very helpful indeed - Thanks in advance!
It is difficult to answer this without knowing the indexes on your collection. However, you can save some time by eliminating stage 3. The $sort is undone by the $group in stage 4. See $group does not preserve order

FindAndUpdate first 5 documents

I am looking to a way to FindAndModify not more than 5 documents in MongoDB.
This is collection for queue which will be processed from multiple workers, so I want to put it into single query.
While I cannot control amount of updates in UpdateOptions parameter, is it possible to limit number of rows which will be found in filterDefinition?
Problem 1: findAndModify() can only update a single document at a time, as per the documentation. This is an inherent limit in MongoDB's implementation.
Problem 2: There is no way to update a specific number of arbitrary documents with a simple update() query of any kind. You can update one or all depending on the boolean value of your multi option, but that's it.
If you want to update up to 5 documents at a time, you're going to have to retrieve these documents first then update them, or update them individually in a foreach() call. Either way, you'll either be using something like:
db.collection.update(
{_id: {$in: [ doc1._id, doc2._id, ... ]}},
{ ... },
{multi: true}
);
Or you'll be using something like:
db.collection.find({ ... }).limit(5).forEach(function(doc) {
//do something to doc
db.collection.update({_id: doc._id}, doc);
});
Whichever approach you choose to take, it's going to be a workaround. Again, this is an inherent limitation.

MongoDB - Query embbeded documents

I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?
When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.
Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.
For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.