mongodb - keeping track of aggregated documents - mongodb

I have a mongodb collection that stores raw information coming from an app. I wrote a multi-pipeline aggregation method to generate more meaningful data from the raw documents.
Using the $out operator in my aggregation function I store the aggregation results in another collection.
I would like to be able to either delete raw documents that were already aggregated, or somehow mark those documents so I know not to aggregate again.
I am worried that I cannot guaranty I won't miss out some documents that are created in between or create duplicate aggregated documents.
Is there a way to achieve this?

Related

How to query documents from an entire MongoDB database containing multiple collections at once (Pymongo)

I have a DB containing thousands of collections, each collection containing documents.
How would I query all these documents from all the collections at once?
I have considered using the $lookup aggregation method but from my knowledge, it only combines an additional collection at once, whereas I have thousands of collections that I want to combine and query altogether.
Is there a way to achieve this?
MongoDB operations work on a single collection at a time. As you point out, $lookup can perform a cross-collection lookup, but it won't help you with a large number collections.
Re-design your data model.

Duplicate efficiently documents in MongoDB

I would like to find-out the most efficient way to duplicate documents in MongoDB, given that I want to take a bunch of documents from an existing collection, update one of their field, unset _id to generate a new one, and push them back in the collection to create duplicates.
This is typically to create a "branching" feature in MongoDB, allowing users to modify data in two separate branches at the same time.
I've tried the following things:
In my server, get data chunks in multiple threads, modify data, and insert modified data with a new _id in the base
This basically works but performance is not super good (~ 20s for 1 million elements).
In the future MongoDB version (tested on version 4.1.10), use the new $out aggregation mechanism to insert in the same collection
This does not seem to work and raise an error message "errmsg" : "$out with mode insertDocuments is not supported when the output collection is the same as the aggregation collection"
Any ideas how to be faster than the first approach? Thanks!

MongoDB: Recommended schema for search in trees

Currently I have a tree with various depth that contains user's documents.
John\folder1\sub-folder2...\document
Peter\folder1...\document
But as I can see Mongo does not support indexes in nested documents.
I tried to de-normalize my DB to User\documents with children ids. But it seems search would search whole collection, not only documents for given app-user.
should I create collection for every app user?
What is the better solution to use built in Mongo aggregation methods?
You need to use only one collection and not create collection for every user.
Using aggregation first thing you would do is a match by userId which filters all the documents by that user and then do any aggregation operations.
Aggregation in mongo is pipeline. Documents move from one operation to another.
So if you do match on userId then only those documents would be chosen and the next aggregation operation will get only those documents which matches the userId. So your aggregation is still faster.

Swift firestore get more than one collection's documents at same time?

How to get more than one collection's documents data?
my firebase tree is like :
"tahminler/'useruid'/'date as like (5.3.2018 or 6.3.2018)'/'documentID'/'Document data'"
so how to take all dates collection's documents in a function
and here is my firebase firestore console:
Firestore doesn't have a way to make a single query across multiple collecitons. You will have to query each collection individually, then merge the results in your client code.
If it's important to be able to make a single query for everything you need, considering restructuring or duplicating your data to suit that query.

List of updated documents

Is there a way to update (or delete) many documents matching a certain criteria and get the list of IDs of actually updated/deleted documents (or some other fields of those documents)? I cannot simply query the documents matching my criteria beforehand because I need kinda atomicity for this operation. And I can't use findAndModify because it can only process one document at a time which is too slow because of round-trips. Suggestions?
MongoDB only supports atomic operations on a single document.
http://www.mongodb.org/display/DOCS/Atomic+Operations
The only way to do this is to do what you said you didn't one to:
First query the collection to find id's for our query:
db.things.find({"name":"john"}, {_id:1});
Then, use the same query to remove:
db.things.remove({"name":"john"}, {_id:1});
Not ideal, and not atomic, but it's as good as you're going to get in this scenario.