According to the documentation, using $out in MongoDB's aggregation framework is wiping any existing data before writing.
Is there any way to force it not to remove existing documents but only add to the collection?
No, the aggregation framework doesn't have such a feature. You can either write a map reduce job which, if I remember correctly, can append to a collection or you can have the aggregation job return a cursor which you can iterate over and then update your collection.
Related
I am using MongoDB's aggregation pipeline to generate a new collection B containing aggregated results from collection A. To this purpose I use the $out stage. Every time I run the aggregation pipeline, new documents might be added, some might be updated and some removed.
I would now like to have a change stream over the aggregated collection B in order to be notified when the aggregation generates different results from the previous one (i.e. at least one insertion/update/remove).
However, if I use the $out stage, the collection is recreated on every execution and I get a rename and invalidate change and then the stream is closed. I can use start_after with a resume token to open the stream again but I am not notified of the changes (rename and invalidate).
I tried using $merge to avoid recreating the collection. The change stream is working as I expect but I can no longer delete old documents from collection B.
Is there a way to make my use case work (i.e. the result of the aggregation pipeline is the new content of the collection + get change notification for insert/remove/update from previous collection content)?
$out does not diff the new result set vs the previous contents of a collection. It drops the previous contents and inserts new documents.
Therefore there is nothing in MongoDB that knows which documents were added to B and which were removed. I don't see how you would be able to get this information via a change stream on B.
You are going to need to come up with another solution I'm afraid.
I would like to find-out the most efficient way to duplicate documents in MongoDB, given that I want to take a bunch of documents from an existing collection, update one of their field, unset _id to generate a new one, and push them back in the collection to create duplicates.
This is typically to create a "branching" feature in MongoDB, allowing users to modify data in two separate branches at the same time.
I've tried the following things:
In my server, get data chunks in multiple threads, modify data, and insert modified data with a new _id in the base
This basically works but performance is not super good (~ 20s for 1 million elements).
In the future MongoDB version (tested on version 4.1.10), use the new $out aggregation mechanism to insert in the same collection
This does not seem to work and raise an error message "errmsg" : "$out with mode insertDocuments is not supported when the output collection is the same as the aggregation collection"
Any ideas how to be faster than the first approach? Thanks!
I tried to create single aggregation request but without any luck - I need to split it. I think I can do following:
First aggregation request will filter/transform/sort/limit documents
and save result to temporary collection by using $out
After that, I'll execute 2-3 aggregation requests on temporary
collection
Finally, I'll delete temporary collection
By saving data to a temporary collection, I'll skip filter/sort/limit stages on subsequent aggregation requests.
Is it ok? What's the overhead of this approach? What's the main usage of $out operator?
Yes; MongoDB does this itself when it runs map-reduce aggregations: Temporary Collection in MongoDB
It would be great to get specifics as to what you are trying to accomplish as it may be possible to do in a single aggregation or map-reduce operation.
I have a mongodb collection that stores raw information coming from an app. I wrote a multi-pipeline aggregation method to generate more meaningful data from the raw documents.
Using the $out operator in my aggregation function I store the aggregation results in another collection.
I would like to be able to either delete raw documents that were already aggregated, or somehow mark those documents so I know not to aggregate again.
I am worried that I cannot guaranty I won't miss out some documents that are created in between or create duplicate aggregated documents.
Is there a way to achieve this?
I want to aggregate and insert the results into an existing collection, without deleting that collection. The documentation seems to suggest that this isn't directly possible. I find that hard to believe.
The map-reduce functionality has 'output modes', including 'merge', which does what I want. I'm looking for the equivalent for aggregation.
The new $out aggregation stage supports inserting into a collection, but it replaces the collection rather than updating it. If I did this I would (I think) have to run another map-reduce to merge this into another collection, which seems inefficient.
Am I missing something or is the functionality just missing from the aggregation feature?
I used the output from aggregation to insert/merge to collection:
db.coll2.insert(
db.coll1.aggregate([]).toArray()
)
Reading the documentation answers this question quite precisely. Atm mongo is not able to do what you want.
The $out operation creates a new collection in the current database if one does not already exist. The collection is not visible until the aggregation completes. If the aggregation fails, MongoDB does not create the collection.
If the collection specified by the $out operation already exists, then upon completion of aggregation the $out stage atomically replaces the existing collection with the new results collection. The $out operation does not change any indexes that existed on the previous collection. If the aggregation fails, the $out operation makes no changes to the previous collection.
For anyone coming to this more recently, this is available from version 4.2, you will be able to do this using the $merge operator in an aggregation pipeline. It needs to be the last stage in the pipeline.
{ $merge: { into: "myOutput", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
If your not stuck on using the Aggregation operators, you could do an incremental map-reduce on the collection. This operator allows you to merge results into an existing collection.
See documentation below:
http://docs.mongodb.org/manual/tutorial/perform-incremental-map-reduce/