I'm doing multiple updates in a single bulk. Note: they are updates, not upserts. The problem doesn't allow it. Is there a way to find out which commands form the bulk matched (or didn't)?
From what I saw in the manual, you can only find the number of matches from BulkWriteResult, not which one matched, but I thought I'd ask anyway. Thanks for the help.
The BulkWriteResult doesn't contain this information and, as of MongoDB 2.6.3, there's no way to obtain it from the execution of the bulk operation. Of course, since you specify the criteria to determine which documents are updated, you can find out which documents are updated from the results of a find query with the same criteria. as long as the documents don't change in between. During a multistage bulk operation, you might change what documents match the update.
Related
Say I have a mongo $or query, something like { $or: [query1, query2, ... queryN] }, where each embedded query could be complex. Upon executing the query, a set of documents matching one or more of the embedded queries is returned. I would like to know which of the N embedded queries was satisfied for each document in the returned set, perhaps by adding a new field that I specify, eg. marks, into each returned document that would hold a list of the indexes of whichever of the queries was satisfied. I need this information to indicate how each document was identified in my application's interface.
I realize I could inspect the set once it is returned and determine the queries that were satisfied, but these queries could be arbitrarily complex and expensive to inspect - besides, this must have already been done inside mongo itself while doing the search.
I also realize I could run each of the N queries sequentially and then mark and merge the results into a growing set, but I want to save that overhead by running a single query instead of N queries.
And I realize that Mongo will certainly stop once the first satisfying query is found for each document, so I may not be able to get the complete set, but then I would at least like some assurance that the queries are executed in a certain order, say 1...N, and each document could be marked with its first satisfying index.
Does anyone know if there's a mechanism in Mongo that can do this?
You can use aggregation.
Use $addFields to add a new field for each query.
You could either $match first, and then add the fields, or add the fields first and on the added fields.
We're using the replaceOne operation on the collection which uses the "filter" clause to check that it's about to update a document with its certain field having some specific value.
All is good but since the documents in the collection are updated by multiple parties working in parallel, I'd like to know whether it is guaranteed that between the time the filter has found a document to be replaced — that is, the filter has matched, — and that very document gets replaced, it is impossible for that document to be replaced by a concurrently running operation. I failed to find any statement on this in the MongoDB docs.
We're using MongoDB 3.6 (please don't ask why) so using transactions is out of the question.
I have have a Python application that is iteratively going through every document in a MongoDB (3.0.2) collection (typically between 10K and 1M documents), and adding new fields (probably doubling/tripling the number of fields in the document).
My initial thought was that I would use upsert the entire of the revised documents (using pyMongo) - now I'm questioning that:
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
this is actually a great question that can be solved a few different ways depending on how you are managing your data.
if you are upserting additional fields does this mean your data is appending additional fields at a later point in time with the only changes being the addition of the additional fields? if so you could set the ttl on your documents so that the old ones drop off over time. keep in mind that if you do this you will want to set an index that sorts your results by descending _id so that the most recent additions are selected before the older ones.
the benefit of this of doing it this way is that your are continually writing data as opposed to seeking and updating data so it is faster.
in regards to upserts vs bulk inserts. bulk inserts are always faster than upserts since bulk upserting requires you to find the original document first.
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
you really need to understand your data fully to determine what is best but if only change to the data is additional fields or changes that only need to be considered from that point forward then bulk inserting and setting a ttl on your older data is the better method from the stand point of write operations as opposed to seek, find and update. when using this method you will want to db.document.find_one() as opposed to db.document.find() so that only your current record is returned.
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
bulk inserts will be faster than inserting each one sequentially.
What I mean as global search is searching for documents in specified collections, for example, searching for a name in both User and Organization collections and will return both user and organization documents that match the criteria.
Is it possible to simply copy the documents in User and Organization into another collection and do a search in it?
No, it is not possible to do a multi-collection search automatically. There's no reason however that you couldn't perform the same query on multiple collections and combine the results.
While you could duplicate the data into another collection for query purposes, if you need to be guaranteed that the source collection's values matches identically with the "index" collection, you'll need to implement your own multi-phase transaction (example) as MongoDb doesn't have a multi-collection atomic commit. Or, you can accept the fact that the "index" table may be out of sync. Of course, it could be periodically updated through custom code. Further, it means your working set has increased as you're double storing data. Also, if you then need to grab data from individual collections (to grab more of the source document), you've likely not gained anything and made things worse when compared to doing multiple queries in the first place.
You could store related documents in the same collection and take advantage of the built-in indexing offered. Of course, this comes with the caveat that if your documents are now typed, you may find it more challenging to build MongoDb indexes that are efficient. Every changing/new document must go through the indexing pipeline, which may introduce significant overhead.
If it's only a few collections, I'd just do multiple searches without understanding more deeply your requirements. If not, the second best would be to combine documents into a single collection. Last choice would be to copy the data.
Is there a way to update (or delete) many documents matching a certain criteria and get the list of IDs of actually updated/deleted documents (or some other fields of those documents)? I cannot simply query the documents matching my criteria beforehand because I need kinda atomicity for this operation. And I can't use findAndModify because it can only process one document at a time which is too slow because of round-trips. Suggestions?
MongoDB only supports atomic operations on a single document.
http://www.mongodb.org/display/DOCS/Atomic+Operations
The only way to do this is to do what you said you didn't one to:
First query the collection to find id's for our query:
db.things.find({"name":"john"}, {_id:1});
Then, use the same query to remove:
db.things.remove({"name":"john"}, {_id:1});
Not ideal, and not atomic, but it's as good as you're going to get in this scenario.