collection performance in Firestore - google-cloud-firestore

i am some confuse between two choices in Firestore collection
in my Firestore i have ability to make only one main collection then make others collections as sub collections into same that main collection
now my question is this will make my main collection in big pressure ? .
should i make others main collections instead of sub collections? or this will be the same ?
in other word : Will the performance of the collection which has no sub collections be better than the collection which has many sub collections? i mean per doc has sub collections .
what is the best choice ?

The main performance guarantee that Firestore makes is that the performance on a query depends on the amount of data it reads, not on the amount of data it has to consider.
So there is no performance difference between getting the same set of data from a smaller subcollection or getting it from one big top-level collection.
There is however a difference in write performance between the two approaches, so that is usually to pick one or the other.
Performance pressure comes from updating the indexes for each write operation, where having multiple subcollections will allow better throughput as the writes to separate subcollections are isolated from each other.
The one exception to that is when you have a collection group index, as the writes to all collections in that group will have to update the same index.

Related

MongoDB - Downside to having different documents on same collection?

What are the downsides of storing completely different documents on the same collection of MongoDB?
Unlike others questions, the documents I'm referring to are not related (like parent-child).
The motivator here is cost-reduction. Azure CosmosDB Mongo API charges and scalability are per-collection.
The size of the collection will get a lot bigger a lot faster.
Speed of queries could be impacted as you'll have to scan more documents than required (could maybe use sparse indexes)
Index sizes will be a lot bigger and longer to scan
You'll need to store a discriminator with the documents so you can tell what type one document is compared to another.
If the documents are not related at all, I'd store them in completely separate collections.

Is it good idea to use Capped Collections for reading the queries with few index defined

I wanted to insert around 4 million of record in the normal collection. But the bulk insert was very slow, so I have created Capped Collections and loaded my data. Someone suggested to me that there will not any performance impact so no need to create the indexes.
But I am seeing for fetching the first 25 records with some filtering taking lots of time. I have a few questions to understand it better.
What is the ideal situation where Capped Collections are suggested
Can I create a compound index on the Capped Collections
Any performance improvement with Capped Collections over the normal collection
A capped collection limits how much data it stores. It does not make retrieval of the data it does store any faster.
Generally if you need fast (or, realistically, reasonably performant) reads you should be using indexes.

MongoDB- dealing with huge collections

I have one huge MongoDB collection which contains hundreds millions of documents (e.g. 300m, 400m and still growing). What is the best solution to ensure that queries and aggregations will run fast? I have some ideas, which one is the proper one?
Splitting the data into few smaller collections.
Storing initially agreggated data in separate collections so for the most common queries/ agrregations the result can be returned quickly.
Adding proper indexes- does it make sense to add indexes to such big collection?
Leave one collection and distribute this data accross multiple machines (sharding)? Does MongoDB cope with such collections which are distributed over few or more machines?
Are there any better solutions which I missed?
Splitting the data into few smaller collections.
Makes only sense when your queries and aggregations are limited to such smaller collections. If your query has to join several collections then you don't gain so much. Your queries would be more complex.
Storing initially aggregated data in separate collections so for the most common queries/ aggregations the result can be returned quickly.
Could make sense, however you create redundant data which may become inconsistent to your actual data. Apart from that you need more disc space.
Adding proper indexes- does it make sense to add indexes to such big collection?
Definitely a good idea. It would be very surprising if such big collection does not have any indexes.
Leave one collection and distribute this data across multiple machines (sharding)?
Definitely also a good idea. To certain extent this is similar to (1) but MongoDB deals with splitting and joining, so you don't need to care about it.

I wonder if there are a lot of collections

Do many mongodb collections have a big impact on mongodb performance, memory and capacity? I am designing an api with mvc pattern, and a collection is being created for each model. I question the way I am doing now.
MongoDB with the WirdeTiger engine supports an unlimited number of collections. So you are not going to run into any hard technical limitations.
When you wonder if something should be in one collection or in multiple collections, these are some of the considerations you need to keep in mind:
More collections = more maintenance work. Sharding is configured on the collection level. So having a large number of collections will make shard configuration a lot more work. You also need to set up indexes for each collection separately, but this is quite easy to automatize, because createIndex on an index which already exists does nothing.
The MongoDB API is designed in a way that every database query operates on one collection at a time. That means when you need to search for a document in n different collections, you need to perform n queries. When you need to aggregate data stored in multiple collections, you run into even more problems. So any data which is queried together should be stored together in the same collection.
Creating one collection for each class in your model is usually a god rule of thumb, but it is not a golden hammer solution. There are situations where you want to embed object in their parent-object documents instead of putting them into a separate collection. There are also cases where you want to put all objects with the same base-class in the same collection to benefit from MongoDB's ability to handle heterogeneous collections. But that goes beyond the scope of this question.
Why don't you use this and test your application ?
https://docs.mongodb.com/manual/tutorial/evaluate-operation-performance/
By the way your question is not completely clear... is more like a "discussion" rather than question. And you're asking others to evaluate your work instead of searching the web the rigth approach.

MongoDB - Using email id as identifier across collections

I have user collection which holds email_id and _id as unique. I want to store user data across various collections. I would like to use email_id as identifier in those collections. Because it is easy to query in the shell against those collections with email_id instead of complex ObjectId.
Is this right way? will it give any performance problem while creating indexes with big emailIds?
Also, don't consider this option, If you have plan to enable email_id change
option in future.
While relational databases encourage you to normalize your data and spread it over many tables, this approach is usually not the best for MongoDB. MongoDB doesn't support JOINs over multiple collections or even multiple documents from the same collection. So you should try to design your database documents in a way that each query can be statisfied by searching for a single document. That means it is usually a good idea to store all information about a user in one document.
An exception for this is when certain points of data of the user grows indefinitely (like the posts made by a user in a forum). First, MongoDB documents have a size limit and second, when the size of a document increases, the database needs to reallocate its hard drive space frequently. This slows down writes and leads to fragmentation in the database. In that case it's better to put each entity in a different collection.
The size of the fields covered by an index don't matter when you search for equality. When you have an unique index on email_id, it should be just as fast as searching by _id.