Is it possible on Firestore: get top items, and, then for each of these items, query their top subitems (in a subcollection), all in one go? - google-cloud-firestore

My Firestore is organized in a way similar to this:
├── posts/
│ ├── post1/
│ │ ├── comment1
│ │ ├── comment2
│ │ ├── ...
│ ├── post2/
│ │ ├── comment1
│ │ ├── comment2
│ │ ├── comment3
│ │ ├── ...
│ ├── ...
Is there a way, for example, for me to query the most recent posts and, for each of these posts, get the most recent comments, all in one go? (i.e. without storing the post query and then creating a local "for-each" to get their respective comments.)
Ideally, I would like to go beyond that even: I would like to get the posts themselves as well, all in one go. This is useful for creating a view with a summary of the whole app's activity, for example.
I have the feeling this isn't possible with Firestore though. As far as I know, they follow the policy of not mixing together documents on different levels/(sub)collections in the same query.
I'm using Dart right now, but this question is kind of language-agnostic, I guess. I imagine a query like that to be of this shape:
ref
.collection('posts')
.orderBy('date')
.forEach()
.collection('comments')
.orderBy('date');

Firestore can only return documents from a single collections, or all collections with the same name. Similarly it can only filter on data that is present in a known field in the documents that it returns.
So if the comments are in a subcollection under each post, you will need to run a separate query to get them.
A common workaround is to store the N most recent comments in the post document themselves, in addition to in the subcollection. That way your writes get a bit more complicated, but it allows you to read the data you need much more efficiently.
This type of data duplication is very common when dealing with NoSQL databases, and is one of the reasons they scale so well on read operations. To learn more about such trade-offs, I recommend reading NoSQL data modeling and watching Get to know Cloud Firestore.

Related

mongoimport --mode merge issue with multiple level JSON objects

Looking at the following example from the MongoDB documentation: https://docs.mongodb.com/manual/reference/program/mongoimport/#ex-mongoimport-merge, the --mode merge works for 1 level deep collections.
I'm trying to merge collections that are n-level deep and the levels after the 1 level get overwritten as in using the --mode upsert flag.
Is that a bug in the merge function or was it not intended to work recursively?
Thanks!

MongoDB Schema: Nested, Flattened, or Independent Collections?

We are writing an application in which we have multiple 'Projects'. Each 'Project' has multiple 'Boards'. Each 'Board' has its own set of 'Comments'. What is the recommended way to structure this in MongoDB?
= Option I (nested collection)
-Project
|
|----- Board
|
|----- Comments
= Option II (flattened collection)
-Project
|
|----- Board
|
|----- Comment
|-----Board_ID
= Option III (independent collections)
-Project
- Boards
|-----Project_ID
- Comments
|-----Board_ID
There are 10,000 projects. Each project has 5 Boards, so total boards is 50,000. Each Board has 20 comments, so total comments are '1,000,000. Only one project, and one board can be open in the application at one time.
So, if we pick Option I, then to get the associated 'Comments' for a particular project/board combination, we will have to query/parse through only 20 comments. However, if I pick Option III, then, to get the associated 'Comments' for a given project/board combination, we will have to query/parse through 1,000,000 comments. So, in theory, Option I sounds faster and more efficient. However, Option I uses a nested collection: Is there any dis-advantages on a nested collection? Are there any reasons for not using nested collections in MongoDB, like Option I?
MongoDB experts: What Option (I, II, or III), is the recommended practice for such cases?
Probably the most important question is: What do you read and write together?
Only one project, and one board can be open in the application at one time.
So basically 1 project with its 20 comments are mainly read and written together? Then I'd store them in one document (embedded comments) and have a projects collection pointing to the boards collection.
Background:
Even if you read a single attribute from a document, you're always fetching the whole document from disk and load it into RAM. Even if you limit the query to the single attribute, you'll load it — you just won't send it over the network.
Since there are no (multi-document) transactions, put things into a single document, which you want to write atomically.
Avoid growing documents, since one document needs to be stored as a single block on disk and moves are expensive. Either preallocate values (if possible) or use separate documents for stuff you want to add later on.

Proper tree NoSQL structure with focus on full-text searching

I developing an app with tree(folder-file) structure, on which I should perform full-text searches with MongoDB. I did a research on the best tree structure practices and found this great article, but I still can not decide which DB structure will fit my needs.
I have the following requirements in my mind:
I should be able to perform full-text search on individual folders, as well as everything from specific users
The folders/files should be shareable, so I need to be able to perform full-text search on all items accessible by specific user
I've been thinking about the following structures.
Structure 1
Fields of Users collection
1. _id - objectid
2. name - string
Fields of Folders collection
1. _id - objectid
2. name - string
3. owner - objectid
4. sharedWith - array of objectIds
5. location - objectid of parent folder, null if in root
6. createDate - datetime
Fields of File collection
1. _id - objectid
2. name - string
3. owner - objectid
4. sharedWith - array of objectIds
5. data - string
6. location - objectId of folder
7. createDate - datetime
So here comes my questions:
Should I use model tree structures with Parent References or Child References?
Should I use 1 collection for both files and folders(with type field) or I should separate them.
Does it worth to have only folder collection and nest documents in it.
This were my most important questions, thought I will greatly appreciate any advice on how I can improve the structure. I'm sorry if this isn't the right place to ask such questions.
It depends a bit on how far you want/need the setup to scale. For small numbers of files, folders, and files per folder it doesn't matter too much. That said,
I'd use references from children to parents. Parents (folders) may have hundreds or thousands of children (files and folders). This might be ok to store as references in one folder document, but most likely in that case you would want to index the array to support fast queries like "is file x in folder y?", and the array would be frequently changing. A large, frequently changing, indexed array is a recipe for bad performance in MongoDB. If you have only a couple hundred or so children per folder, you might be able to get away with storing references to all children in the parent, as long as you don't rely on that array being indexed for your queries. This essentially means you'd put a reference from the children to the parents to support the same queries.
I'd use one collection since you want to return both in response to many queries. Add a field to identify folders, like `folder : true or something.
No, it won't work to have folder documents with many nested layers. MongoDB in general doesn't support recursive or arbitrary-depth operations, making it difficult to work with such structures.

How to draw a MongoDB diagram

I am recently working on a django project documentation which uses MongoDB.
So I need to know how to make some clear diagrams of the MongoDB collections.
I understand that there's no fixed form to do it due to the MongoDB's flexible nature,
but my records in each collection follows certain order and rule defined by the collection while some of the collection records contain inner documents.
So, is something like this the most proper way, or there's more standard way to show my MongoDB collections to other developers or package users?
I would sugguest the tree mode.
collection
`-- _id
`-- field1
`-- field2
`-- f1
`-- f2
`-- field3

Best way to keep MongoDB collection schema in external file

What is the best way to keep MongoDB collection schema (using mongoose) in external file and access them from main app.js considering n number of schema?
You can have a separate directory models. And have your db models there. If you want you can follow MVC. Here is what your directory structure can be like.
./project_dir
app.js
models
views
routes
package.json
In models you have files where you have your db models (files where you have your schema).
You can also have a look at this sample app at github