How is a Firestore collection write limit imposed between different composite index query scopes? - google-cloud-firestore

[collectionA]
<someDocument>
[subcollectionA]
<someDocument>
- lastActive: timestamp
- joined: boolean
In this schema, lastActive is an indexed property, and it is sequential. Therefore, a write limit is imposed on subcollectionA. If I made a composite index of lastActive and joined on subcollectionA, I have the option to choose a query scope of collection and collection group. If I choose collection, the write limit is imposed on that specific subcollection instance, and if I choose collection group, then the write limit is imposed on all subcollections called subcollectionA as if they were one giant collection. Is that correct?

Write limits are a physical limitation about how fast the indexes can be synchronized between the multiple data centers, before a write can be confirmed to the clients.
If you have a collection group query, the index needs to be updated for all collections in that group. So the limitation would then indeed apply to writes across all those collections.

Related

Is sharding necessary for subcollections in Cloud Firestore?

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I am saving sequential timestamps in a subcollection.
I would like to know if I need a shard field in this specific case to increase the maximum writes per second?
I am only using "normal" collection indexes, no collection group index.
Some additional explanations:
I have a top level collection called "companies" and under every document is a subcollection called "orders". Every order document has a timestamp "created". This field is indexed and I need this index. These orders could be created very frequently. I am sure that the 500 writes per second limit would apply to this constellation. But I wonder if every subcollection "orders" would have its own limit of 500 writes per second or if all subcollections share one limit. I could add a shard field to avoid the write limit as stated in the documentation but if every subcollection would get its own limit this would not be necessary for my specific case. 500 write per second per subcollection should be more than enough. I think that every subcollection has its own index as long as I am not using a collection group index. And therefore the server should be able to split the data across multiple servers if necessary. But maybe I am wrong. I can't find any concrete informations on that in the documentation or in the internet.
Screenshots from database:
root collection companies
subcollection orders

Is the Firestore collection write limit (imposed by sequentially-updated indexed fields) affected by collection-group queries?

From how I understand it, if a collection has a monotonically-increasing indexed field, a write limit is imposed on that collection. If that collection is split into two separate collections, each collection would have its own write limit. However, if we split that collection into two separate collections but give them the same name (putting them under different documents), would they still have their own independent write limits if the monotonically-indexed field was part of a collection-group query that queried them both together?
No, that's not the way it works. A collection group query requires its own index, and the limit you're talking about is the write rate of the index itself, not the collection. Each collection automatically indexes fields from documents for just that specific collection, but that would not apply the collection group queries that span collections.
Note that the documentation states the limit as:
Maximum write rate to a collection in which documents contain sequential values in an indexed field
On a related note, disabling the indexing for a specific field on a collection allows you to bypass the normal monotonic write limits for that one field on that collection because it's no longer being indexed.

MongoDB documents order shuffled [duplicate]

When we run a Mongo find() query without any sort order specified, what does the database internally use to sort the results?
According to the documentation on the mongo website:
When executing a find() with no parameters, the database returns
objects in forward natural order.
For standard tables, natural order is not particularly useful because,
although the order is often close to insertion order, it is not
guaranteed to be. However, for Capped Collections, natural order is
guaranteed to be the insertion order. This can be very useful.
However for standard collections (non capped collections), what field is used to sort the results?
Is it the _id field or something else?
Edit:
Basically, I guess what I am trying to get at is that if I execute the following search query:
db.collection.find({"x":y}).skip(10000).limit(1000);
At two different points in time: t1 and t2, will I get different result sets:
When there have been no additional writes between t1 & t2?
When there have been new writes between t1 & t2?
There are new indexes that have been added between t1 & t2?
I have run some tests on a temp database and the results I have gotten are the same (Yes) for all the 3 cases - but I wanted to be sure and I am certain that my test cases weren't very thorough.
What is the default sort order when none is specified?
The default internal sort order (or natural order) is an undefined implementation detail. Maintaining order is extra overhead for storage engines and MongoDB's API does not mandate predictability outside of an explicit sort() or the special case of fixed-sized capped collections which have associated usage restrictions. For typical workloads it is desirable for the storage engine to try to reuse available preallocated space and make decisions about how to most efficiently store data on disk and in memory.
Without any query criteria, results will be returned by the storage engine in natural order (aka in the order they are found). Result order may coincide with insertion order but this behaviour is not guaranteed and cannot be relied on (aside from capped collections).
Some examples that may affect storage (natural) order:
WiredTiger uses a different representation of documents on disk versus the in-memory cache, so natural ordering may change based on internal data structures.
The original MMAPv1 storage engine (removed in MongoDB 4.2) allocates record space for documents based on padding rules. If a document outgrows the currently allocated record space, the document location (and natural ordering) will be affected. New documents can also be inserted in storage marked available for reuse due to deleted or moved documents.
Replication uses an idempotent oplog format to apply write operations consistently across replica set members. Each replica set member maintains local data files that can vary in natural order, but will have the same data outcome when oplog updates are applied.
What if an index is used?
If an index is used, documents will be returned in the order they are found (which does necessarily match insertion order or I/O order). If more than one index is used then the order depends internally on which index first identified the document during the de-duplication process.
If you want a predictable sort order you must include an explicit sort() with your query and have unique values for your sort key.
How do capped collections maintain insertion order?
The implementation exception noted for natural order in capped collections is enforced by their special usage restrictions: documents are stored in insertion order but existing document size cannot be increased and documents cannot be explicitly deleted. Ordering is part of the capped collection design that ensures the oldest documents "age out" first.
It is returned in the stored order (order in the file), but it is not guaranteed to be that they are in the inserted order. They are not sorted by the _id field. Sometimes it can be look like it is sorted by the insertion order but it can change in another request. It is not reliable.

Mongodb performance of paging without sort vs. with sort? [duplicate]

When we run a Mongo find() query without any sort order specified, what does the database internally use to sort the results?
According to the documentation on the mongo website:
When executing a find() with no parameters, the database returns
objects in forward natural order.
For standard tables, natural order is not particularly useful because,
although the order is often close to insertion order, it is not
guaranteed to be. However, for Capped Collections, natural order is
guaranteed to be the insertion order. This can be very useful.
However for standard collections (non capped collections), what field is used to sort the results?
Is it the _id field or something else?
Edit:
Basically, I guess what I am trying to get at is that if I execute the following search query:
db.collection.find({"x":y}).skip(10000).limit(1000);
At two different points in time: t1 and t2, will I get different result sets:
When there have been no additional writes between t1 & t2?
When there have been new writes between t1 & t2?
There are new indexes that have been added between t1 & t2?
I have run some tests on a temp database and the results I have gotten are the same (Yes) for all the 3 cases - but I wanted to be sure and I am certain that my test cases weren't very thorough.
What is the default sort order when none is specified?
The default internal sort order (or natural order) is an undefined implementation detail. Maintaining order is extra overhead for storage engines and MongoDB's API does not mandate predictability outside of an explicit sort() or the special case of fixed-sized capped collections which have associated usage restrictions. For typical workloads it is desirable for the storage engine to try to reuse available preallocated space and make decisions about how to most efficiently store data on disk and in memory.
Without any query criteria, results will be returned by the storage engine in natural order (aka in the order they are found). Result order may coincide with insertion order but this behaviour is not guaranteed and cannot be relied on (aside from capped collections).
Some examples that may affect storage (natural) order:
WiredTiger uses a different representation of documents on disk versus the in-memory cache, so natural ordering may change based on internal data structures.
The original MMAPv1 storage engine (removed in MongoDB 4.2) allocates record space for documents based on padding rules. If a document outgrows the currently allocated record space, the document location (and natural ordering) will be affected. New documents can also be inserted in storage marked available for reuse due to deleted or moved documents.
Replication uses an idempotent oplog format to apply write operations consistently across replica set members. Each replica set member maintains local data files that can vary in natural order, but will have the same data outcome when oplog updates are applied.
What if an index is used?
If an index is used, documents will be returned in the order they are found (which does necessarily match insertion order or I/O order). If more than one index is used then the order depends internally on which index first identified the document during the de-duplication process.
If you want a predictable sort order you must include an explicit sort() with your query and have unique values for your sort key.
How do capped collections maintain insertion order?
The implementation exception noted for natural order in capped collections is enforced by their special usage restrictions: documents are stored in insertion order but existing document size cannot be increased and documents cannot be explicitly deleted. Ordering is part of the capped collection design that ensures the oldest documents "age out" first.
It is returned in the stored order (order in the file), but it is not guaranteed to be that they are in the inserted order. They are not sorted by the _id field. Sometimes it can be look like it is sorted by the insertion order but it can change in another request. It is not reliable.

How does MongoDB sort records when no sort order is specified?

When we run a Mongo find() query without any sort order specified, what does the database internally use to sort the results?
According to the documentation on the mongo website:
When executing a find() with no parameters, the database returns
objects in forward natural order.
For standard tables, natural order is not particularly useful because,
although the order is often close to insertion order, it is not
guaranteed to be. However, for Capped Collections, natural order is
guaranteed to be the insertion order. This can be very useful.
However for standard collections (non capped collections), what field is used to sort the results?
Is it the _id field or something else?
Edit:
Basically, I guess what I am trying to get at is that if I execute the following search query:
db.collection.find({"x":y}).skip(10000).limit(1000);
At two different points in time: t1 and t2, will I get different result sets:
When there have been no additional writes between t1 & t2?
When there have been new writes between t1 & t2?
There are new indexes that have been added between t1 & t2?
I have run some tests on a temp database and the results I have gotten are the same (Yes) for all the 3 cases - but I wanted to be sure and I am certain that my test cases weren't very thorough.
What is the default sort order when none is specified?
The default internal sort order (or natural order) is an undefined implementation detail. Maintaining order is extra overhead for storage engines and MongoDB's API does not mandate predictability outside of an explicit sort() or the special case of fixed-sized capped collections which have associated usage restrictions. For typical workloads it is desirable for the storage engine to try to reuse available preallocated space and make decisions about how to most efficiently store data on disk and in memory.
Without any query criteria, results will be returned by the storage engine in natural order (aka in the order they are found). Result order may coincide with insertion order but this behaviour is not guaranteed and cannot be relied on (aside from capped collections).
Some examples that may affect storage (natural) order:
WiredTiger uses a different representation of documents on disk versus the in-memory cache, so natural ordering may change based on internal data structures.
The original MMAPv1 storage engine (removed in MongoDB 4.2) allocates record space for documents based on padding rules. If a document outgrows the currently allocated record space, the document location (and natural ordering) will be affected. New documents can also be inserted in storage marked available for reuse due to deleted or moved documents.
Replication uses an idempotent oplog format to apply write operations consistently across replica set members. Each replica set member maintains local data files that can vary in natural order, but will have the same data outcome when oplog updates are applied.
What if an index is used?
If an index is used, documents will be returned in the order they are found (which does necessarily match insertion order or I/O order). If more than one index is used then the order depends internally on which index first identified the document during the de-duplication process.
If you want a predictable sort order you must include an explicit sort() with your query and have unique values for your sort key.
How do capped collections maintain insertion order?
The implementation exception noted for natural order in capped collections is enforced by their special usage restrictions: documents are stored in insertion order but existing document size cannot be increased and documents cannot be explicitly deleted. Ordering is part of the capped collection design that ensures the oldest documents "age out" first.
It is returned in the stored order (order in the file), but it is not guaranteed to be that they are in the inserted order. They are not sorted by the _id field. Sometimes it can be look like it is sorted by the insertion order but it can change in another request. It is not reliable.