Is the Firestore collection write limit (imposed by sequentially-updated indexed fields) affected by collection-group queries? - google-cloud-firestore

From how I understand it, if a collection has a monotonically-increasing indexed field, a write limit is imposed on that collection. If that collection is split into two separate collections, each collection would have its own write limit. However, if we split that collection into two separate collections but give them the same name (putting them under different documents), would they still have their own independent write limits if the monotonically-indexed field was part of a collection-group query that queried them both together?

No, that's not the way it works. A collection group query requires its own index, and the limit you're talking about is the write rate of the index itself, not the collection. Each collection automatically indexes fields from documents for just that specific collection, but that would not apply the collection group queries that span collections.
Note that the documentation states the limit as:
Maximum write rate to a collection in which documents contain sequential values in an indexed field
On a related note, disabling the indexing for a specific field on a collection allows you to bypass the normal monotonic write limits for that one field on that collection because it's no longer being indexed.

Related

Is sharding necessary for subcollections in Cloud Firestore?

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I am saving sequential timestamps in a subcollection.
I would like to know if I need a shard field in this specific case to increase the maximum writes per second?
I am only using "normal" collection indexes, no collection group index.
Some additional explanations:
I have a top level collection called "companies" and under every document is a subcollection called "orders". Every order document has a timestamp "created". This field is indexed and I need this index. These orders could be created very frequently. I am sure that the 500 writes per second limit would apply to this constellation. But I wonder if every subcollection "orders" would have its own limit of 500 writes per second or if all subcollections share one limit. I could add a shard field to avoid the write limit as stated in the documentation but if every subcollection would get its own limit this would not be necessary for my specific case. 500 write per second per subcollection should be more than enough. I think that every subcollection has its own index as long as I am not using a collection group index. And therefore the server should be able to split the data across multiple servers if necessary. But maybe I am wrong. I can't find any concrete informations on that in the documentation or in the internet.
Screenshots from database:
root collection companies
subcollection orders

How is a Firestore collection write limit imposed between different composite index query scopes?

[collectionA]
<someDocument>
[subcollectionA]
<someDocument>
- lastActive: timestamp
- joined: boolean
In this schema, lastActive is an indexed property, and it is sequential. Therefore, a write limit is imposed on subcollectionA. If I made a composite index of lastActive and joined on subcollectionA, I have the option to choose a query scope of collection and collection group. If I choose collection, the write limit is imposed on that specific subcollection instance, and if I choose collection group, then the write limit is imposed on all subcollections called subcollectionA as if they were one giant collection. Is that correct?
Write limits are a physical limitation about how fast the indexes can be synchronized between the multiple data centers, before a write can be confirmed to the clients.
If you have a collection group query, the index needs to be updated for all collections in that group. So the limitation would then indeed apply to writes across all those collections.

I have a MongoDB collection with 580+ fields: should I split it up?

The vast majority of the fields will not need to be indexed and will never be queried (think display ONLY). There are maybe 20-30 fields that need to be queried.
Likewise, the vast majority of the the fields will be for simple field value pairs (not embedded docs).
Finally, there will be some fields that will store embedded docs, but not a lot and the embedded docs will not be large.
I was thinking of maybe breaking up the collection into two collections:
A collection with fields that need to be indexed/queried and fields that need to be displayed in larger result sets (really anything more than 1 single document).
A collection with: _id, related_id and data (where data is an embedded doc with all the extra data in it). This collection would only ever be accessed when viewing the "detailed" display of the document.
Also, there will be 100s of millions of documents (eventually).

Mapping datasets to NoSql (MongoDB) collection

what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.

Mongodb : multiple specific collections or one "store-it-all" collection for performance / indexing

I'm logging different actions users make on our website. Each action can be of different type : a comment, a search query, a page view, a vote etc... Each of these types has its own schema and common infos. For instance :
comment : {"_id":(mongoId), "type":"comment", "date":4/7/2012,
"user":"Franck", "text":"This is a sample comment"}
search : {"_id":(mongoId), "type":"search", "date":4/6/2012,
"user":"Franck", "query":"mongodb"} etc...
Basically, in OOP or RDBMS, I would design an Action class / table and a set of inherited classes / tables (Comment, Search, Vote).
As MongoDb is schema less, I'm inclined to set up a unique collection ("Actions") where I would store these objects instead of multiple collections (collection Actions + collection Comments with a link key to its parent Action etc...).
My question is : what about performance / response time if I try to search by specific columns ?
As I understand indexing best practices, if I want "every users searching for mongodb", I would index columns "type" + "query". But it will not concern the whole set of data, only those of type "search".
Will MongoDb engine scan the whole table or merely focus on data having this specific schema ?
If you create sparse indexes mongo will ignore any rows that don't have the key. Though there is the specific limitation of sparse indexes that they can only index one field.
However, if you are only going to query using common fields there's absolutely no reason not to use a single collection.
I.e. if an index on user+type (or date+user+type) will satisfy all your querying needs - there's no reason to create multiple collections
Tip: use date objects for dates, use object ids not names where appropriate.
Here is some useful information from MongoDB's Best Practices
Store all data for a record in a single document.
MongoDB provides atomic operations at the document level. When data
for a record is stored in a single document the entire record can be
retrieved in a single seek operation, which is very efficient. In some
cases it may not be practical to store all data in a single document,
or it may negatively impact other operations. Make the trade-offs that
are best for your application.
Avoid Large Documents.
The maximum size for documents in MongoDB is 16MB. In practice most
documents are a few kilobytes or less. Consider documents more like
rows in a table than the tables themselves. Rather than maintaining
lists of records in a single document, instead make each record a
document. For large media documents, such as video, consider using
GridFS, a convention implemented by all the drivers that stores the
binary data across many smaller documents.