MongoDB: Recommended schema for search in trees - mongodb

Currently I have a tree with various depth that contains user's documents.
John\folder1\sub-folder2...\document
Peter\folder1...\document
But as I can see Mongo does not support indexes in nested documents.
I tried to de-normalize my DB to User\documents with children ids. But it seems search would search whole collection, not only documents for given app-user.
should I create collection for every app user?
What is the better solution to use built in Mongo aggregation methods?

You need to use only one collection and not create collection for every user.
Using aggregation first thing you would do is a match by userId which filters all the documents by that user and then do any aggregation operations.
Aggregation in mongo is pipeline. Documents move from one operation to another.
So if you do match on userId then only those documents would be chosen and the next aggregation operation will get only those documents which matches the userId. So your aggregation is still faster.

Related

How to query documents from an entire MongoDB database containing multiple collections at once (Pymongo)

I have a DB containing thousands of collections, each collection containing documents.
How would I query all these documents from all the collections at once?
I have considered using the $lookup aggregation method but from my knowledge, it only combines an additional collection at once, whereas I have thousands of collections that I want to combine and query altogether.
Is there a way to achieve this?
MongoDB operations work on a single collection at a time. As you point out, $lookup can perform a cross-collection lookup, but it won't help you with a large number collections.
Re-design your data model.

Show only collections which have documents inside in MongoDB

Is it possible to get the list of the collections name which have at least one document?
I mean to not list the collections which don't have any document.
listCollections command allows filtering but the filters must be over the returned fields, and I don't see any field that returns the size of the collection.
Therefore you most likely need to query each collection with a find to find out if it has any documents.

MongoDB: How to remove all documents matching a query and return their ids

With MongoDB, is it possible to specify a query and removes all matching documents while returning the documents' ids (alternatively the whole document)?
I found "How to get removed document in MongoDB?" that explains how remove a single document and return it using findAndModify. However, I need to remove a bunch of documents at once.
I'm ok with a solution involving the aggregation pipeline as long as it fulfils the requirements.
I'm using the offical C# driver, but solutions in JS are ok.

mongodb - keeping track of aggregated documents

I have a mongodb collection that stores raw information coming from an app. I wrote a multi-pipeline aggregation method to generate more meaningful data from the raw documents.
Using the $out operator in my aggregation function I store the aggregation results in another collection.
I would like to be able to either delete raw documents that were already aggregated, or somehow mark those documents so I know not to aggregate again.
I am worried that I cannot guaranty I won't miss out some documents that are created in between or create duplicate aggregated documents.
Is there a way to achieve this?

List of updated documents

Is there a way to update (or delete) many documents matching a certain criteria and get the list of IDs of actually updated/deleted documents (or some other fields of those documents)? I cannot simply query the documents matching my criteria beforehand because I need kinda atomicity for this operation. And I can't use findAndModify because it can only process one document at a time which is too slow because of round-trips. Suggestions?
MongoDB only supports atomic operations on a single document.
http://www.mongodb.org/display/DOCS/Atomic+Operations
The only way to do this is to do what you said you didn't one to:
First query the collection to find id's for our query:
db.things.find({"name":"john"}, {_id:1});
Then, use the same query to remove:
db.things.remove({"name":"john"}, {_id:1});
Not ideal, and not atomic, but it's as good as you're going to get in this scenario.