I am trying to update many documents in a single query, how can I update many documents in a single query such that I don't have to loop over a list and update each individually?
You can create an array of operations that you want, and use a bulkWrite (view the docs here).
In that way you don't need to make a lot of request and get all the updates done. You can choose if you want the operations to be ordered or unordered and each type of operation has its own behavior. You can also choose which level of write concern you want.
Related
I am trying to replace all documents with new values in bulk.
Example
we have 500k docs in db and we have 500k the same docs with updated props inside. Now we need to update old data.
Idea was to use InsertMany with lean option to new collection and then remove the old one to get less number of reads/writes.
The question is there something easier for such a scenario?
Maybe even import/export is better in this case?
PS
Model.updateMany() has a filter, we do not need filter here, because we know for sure that every document has updated properties, so we just need to replace them
There are different options.
1.insertMany to insert many documents.
2. UpdateMany with upsert will insert and update documents
3. Replace will replace the matching documents.
If your new doc has altogether new set of fields, you could use replace.
If only values changes and new fields, then you could use updateMany.
On the other hand, there is something called bulk Write. Explore that also.
Edit:
The best option for this use case is creating new collection everyday. Considering the number 500k is easy to handle. Swap the new and old collections.
Otherwise you could use replace option.
I have have a Python application that is iteratively going through every document in a MongoDB (3.0.2) collection (typically between 10K and 1M documents), and adding new fields (probably doubling/tripling the number of fields in the document).
My initial thought was that I would use upsert the entire of the revised documents (using pyMongo) - now I'm questioning that:
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
this is actually a great question that can be solved a few different ways depending on how you are managing your data.
if you are upserting additional fields does this mean your data is appending additional fields at a later point in time with the only changes being the addition of the additional fields? if so you could set the ttl on your documents so that the old ones drop off over time. keep in mind that if you do this you will want to set an index that sorts your results by descending _id so that the most recent additions are selected before the older ones.
the benefit of this of doing it this way is that your are continually writing data as opposed to seeking and updating data so it is faster.
in regards to upserts vs bulk inserts. bulk inserts are always faster than upserts since bulk upserting requires you to find the original document first.
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
you really need to understand your data fully to determine what is best but if only change to the data is additional fields or changes that only need to be considered from that point forward then bulk inserting and setting a ttl on your older data is the better method from the stand point of write operations as opposed to seek, find and update. when using this method you will want to db.document.find_one() as opposed to db.document.find() so that only your current record is returned.
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
bulk inserts will be faster than inserting each one sequentially.
What I mean as global search is searching for documents in specified collections, for example, searching for a name in both User and Organization collections and will return both user and organization documents that match the criteria.
Is it possible to simply copy the documents in User and Organization into another collection and do a search in it?
No, it is not possible to do a multi-collection search automatically. There's no reason however that you couldn't perform the same query on multiple collections and combine the results.
While you could duplicate the data into another collection for query purposes, if you need to be guaranteed that the source collection's values matches identically with the "index" collection, you'll need to implement your own multi-phase transaction (example) as MongoDb doesn't have a multi-collection atomic commit. Or, you can accept the fact that the "index" table may be out of sync. Of course, it could be periodically updated through custom code. Further, it means your working set has increased as you're double storing data. Also, if you then need to grab data from individual collections (to grab more of the source document), you've likely not gained anything and made things worse when compared to doing multiple queries in the first place.
You could store related documents in the same collection and take advantage of the built-in indexing offered. Of course, this comes with the caveat that if your documents are now typed, you may find it more challenging to build MongoDb indexes that are efficient. Every changing/new document must go through the indexing pipeline, which may introduce significant overhead.
If it's only a few collections, I'd just do multiple searches without understanding more deeply your requirements. If not, the second best would be to combine documents into a single collection. Last choice would be to copy the data.
Is there a way to update (or delete) many documents matching a certain criteria and get the list of IDs of actually updated/deleted documents (or some other fields of those documents)? I cannot simply query the documents matching my criteria beforehand because I need kinda atomicity for this operation. And I can't use findAndModify because it can only process one document at a time which is too slow because of round-trips. Suggestions?
MongoDB only supports atomic operations on a single document.
http://www.mongodb.org/display/DOCS/Atomic+Operations
The only way to do this is to do what you said you didn't one to:
First query the collection to find id's for our query:
db.things.find({"name":"john"}, {_id:1});
Then, use the same query to remove:
db.things.remove({"name":"john"}, {_id:1});
Not ideal, and not atomic, but it's as good as you're going to get in this scenario.
I'm looking for the equivalent of this sort of SQL query.
SELECT field, count(*) as counter from table order by counter DESC
What's the best way to achieve this?
Thanks
Use Map-Reduce. Map each document by emitting the key and a value 1, then aggregate them using a simple reduce operation. See http://www.mongodb.org/display/DOCS/MapReduce
I'd handle aggregation queries by keeping track of the respective counts separately, i.e. in their own collection. This way, you can simply query the "most frequently occurring" collection. Downside: you need to perform another write whenever the data changes.
Of course, you could also update that collection from time to time using Map/Reduce. This depends a bit on how accurate the information must be and how often it changes.
Make sure, however, not to call the Map/Reduce operation very often: It is not meant to be used in an interactive fashion (i.e. not in every page view) but rather scarcely in an offline process that updates the counts every hour or so. Hence, if your counts change very quickly, use a counters collection.