Is sharding necessary for subcollections in Cloud Firestore? - google-cloud-firestore

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I am saving sequential timestamps in a subcollection.
I would like to know if I need a shard field in this specific case to increase the maximum writes per second?
I am only using "normal" collection indexes, no collection group index.
Some additional explanations:
I have a top level collection called "companies" and under every document is a subcollection called "orders". Every order document has a timestamp "created". This field is indexed and I need this index. These orders could be created very frequently. I am sure that the 500 writes per second limit would apply to this constellation. But I wonder if every subcollection "orders" would have its own limit of 500 writes per second or if all subcollections share one limit. I could add a shard field to avoid the write limit as stated in the documentation but if every subcollection would get its own limit this would not be necessary for my specific case. 500 write per second per subcollection should be more than enough. I think that every subcollection has its own index as long as I am not using a collection group index. And therefore the server should be able to split the data across multiple servers if necessary. But maybe I am wrong. I can't find any concrete informations on that in the documentation or in the internet.
Screenshots from database:
root collection companies
subcollection orders

Related

Difference between .where() and .doc() related to number of reads

If I get a document by .collection('users').where('uid', isEqualTo, widget.uid) or by .collection('users').doc(widget.uid) will I read the database the same number of times?
You are charged one read for every document that matches your query when it is executed. It does not matter how you make that query - what matters is the number of documents returned to the app.
If your where query returns exactly one document, then it is not any different in price or performance than one using doc.
The former returns a QueryFieldFilterConstraint while the latter returns a DocumentReference. The DocumentReference is preferable because it can be used to write, read, or listen to the location.
Naturally, a query must read every id in the referenced collection (this is probably indexed) while a reference points to a single id. In terms of pricing, aggregateQueries documentation is actually returning a page not found error at the moment?
See Understand Cloud Firestore billing
For aggregation queries such as count(), you are charged one document
read for each batch of up to 1000 index entries matched by the query.
For aggregation queries that match 0 index entries, there is a minimum
charge of one document read.
For example, count() operations that match between 0 and 1000 index
entries are billed for one document read. For a count() operation that
matches 1500 index entries, you are billed 2 document reads.

Is the Firestore collection write limit (imposed by sequentially-updated indexed fields) affected by collection-group queries?

From how I understand it, if a collection has a monotonically-increasing indexed field, a write limit is imposed on that collection. If that collection is split into two separate collections, each collection would have its own write limit. However, if we split that collection into two separate collections but give them the same name (putting them under different documents), would they still have their own independent write limits if the monotonically-indexed field was part of a collection-group query that queried them both together?
No, that's not the way it works. A collection group query requires its own index, and the limit you're talking about is the write rate of the index itself, not the collection. Each collection automatically indexes fields from documents for just that specific collection, but that would not apply the collection group queries that span collections.
Note that the documentation states the limit as:
Maximum write rate to a collection in which documents contain sequential values in an indexed field
On a related note, disabling the indexing for a specific field on a collection allows you to bypass the normal monotonic write limits for that one field on that collection because it's no longer being indexed.

How is a Firestore collection write limit imposed between different composite index query scopes?

[collectionA]
<someDocument>
[subcollectionA]
<someDocument>
- lastActive: timestamp
- joined: boolean
In this schema, lastActive is an indexed property, and it is sequential. Therefore, a write limit is imposed on subcollectionA. If I made a composite index of lastActive and joined on subcollectionA, I have the option to choose a query scope of collection and collection group. If I choose collection, the write limit is imposed on that specific subcollection instance, and if I choose collection group, then the write limit is imposed on all subcollections called subcollectionA as if they were one giant collection. Is that correct?
Write limits are a physical limitation about how fast the indexes can be synchronized between the multiple data centers, before a write can be confirmed to the clients.
If you have a collection group query, the index needs to be updated for all collections in that group. So the limitation would then indeed apply to writes across all those collections.

How does the Firestore 500 writes/sec per collection limit work on subcollection groups?

Collections with sequentially-valued indexed fields impose a limit of 500 writes per second on that collection. How does a subcollection group affect that limit? For example, consider this data schema:
[collection]
<documentId>
+ indexed field
- index-exempt field
[subcollection]
<documentId>
...
[products]
<productId>
- name: string
[sensors]
<sensorId>
+ lastCalibrated: timestamp
Because lastCalibrated is a sequentially-valued indexed field, the 500 writes/sec collection limit comes into play. In this example, does that limit apply to each sensors subcollection independently or to all sensors subcollections in the aggregate as if they were one giant collection?
I find it easiest to keep in mind that the limit comes from the need to update indexes. With that knowledge answering a question like yours becomes a lot easier.
If you want to perform a collection-group query across all the sensors subcollections, then you will need to have have an index across those collections. At that point those collections will be subject to the same 500 writes/sec limit.

Is better to have multiple collections with thousands of documents or one collection with 100 million documents?

I'm migrating a MySql table which has 100 million rows to a MongoDB database, this table stores companys documents and what difference them are the column company_id. I was wondering if have multiple collections on mongodb would be faster than just one collection, for example, each company would have it own collection (collections: company_1, company_2, company_3...) and store only documents from that company, so I will not need to filter then as I would need to do if I just had 1 big collection and in every document there would be a column named company_id that would be used to filter documents.
Which method would perform best in this case?
EDIT:
Here's a JSON document example: https://pastebin.com/T5m2tbaY
{"_id":"5d8b8241ae0f000015006142","id_consulta":45254008,"company_id":7,"tipo_doc":"nfe","data_requisicao":"2019-09-25T15:05:35.155Z","xml":Object...
You could have one collection and one document per company, with company specific details in the document, assuming the details do not exceed 16MB in size. Place an index on company id for performance reasons. If performance conditions are not meeting expectations scale vertically - i.e., add memory, CPU, disk IO, and network enhancements to increase performance. If that does not suffice, consider sharding the collection across multiple hosts.