Maximum size of a document in firestore? - firebase-storage

I want create a document that containing about 20 million objects.
The structure like that:
documentID
---- key1
-------- object1
-------------name: "test1"
-------------score: 123
I don't know the limitation of a document size in firestore, so can you help me any reference or information about that?
Thanks!

The maximum size is roughly 1 Megabyte, storing such a large number of objects (maps) inside of a single document is generally a bad design (and 20 million is beyond the size limit anyway).
You should reconsider why they need to be in a document, rather than each object being their own document.
Cloud Firestore's limits are listed in the documentation.

Have you looked at Firestore sub-collections?
You can store the main item as a document in one top-level collection, and all of its underlying data could be in a sub-collection of that document.
There is no limit to how many object records a sub-collection can contain when those objects are stored as child documents of that sub-collection.
So 20M records should not be an issue.

If you want to save objects bigger than 1mb you should use cloud storage, the limit is 5tb per object:
There is a maximum size limit of 5 TB for individual objects stored in Cloud Storage. There is an update limit on each object of once per second, so rapid writes to a single object won't scale.
Google cloud storage

If you want to check the size of a document against the maximum of 1 MiB (1,048,576 bytes), there is a library that can help you with that:
https://github.com/alexmamo/FirestoreDocument-Android/tree/master/firestore-document
In this way, you'll be able to always stay below the limit.

Related

Firestore Document Size Limitations

I've Google Cloud Firestore Project. My database model is like this:
Each store has own document. Sales and inventory collections has a lot of documents and their size increases every day.
There is document max size limitation for documents in Firestore. So, Document that named Store1 has sales and inventory collections and they store every sale and item. Does Store1 document have max size limitation? Would sales and inventory documents size increasing be a problem? If it would be, my data model should be incorrect and if it's incorrect, how should it be?
The document size limitation in Firestore is enforced per individual document, and does not include the size of the documents in subcollections of that document. It is relatively uncommon for folks to hit the document size limit.

Using nested document structure in mongodb

I am planning to use a nested document structure for my MongoDB Schema design as I don't want to go for flat schema design as In my case I will need to fetch my result in one query only.
Since MongoDB has a size limit for a document.
MongoDB Limits and Threshold
A MongoDB document has a size limit of 16MB ( an amount of data). If your subcollection can growth without limits go flat.
I don't need to fetch my nested data but only be needing my nested data for filtering and querying purpose.
I want to know whether I will still be bound by MongoDB size limits even if I use my embedded data only for querying and filter purpose and never for fetching of nested data because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?
Nested schema design example
{
clinicName: "XYZ Hopital",
clinicAddress: "ABC place.",
"doctorsWorking":{
"doctorId1":{
"doctorJoined": ISODate("2017-03-15T10:47:47.647Z")
},
"doctorId2":{
"doctorJoined": ISODate("2017-04-15T10:47:47.647Z")
},
"doctorId3":{
"doctorJoined": ISODate("2017-05-15T10:47:47.647Z")
},
...
...
//upto 30000-40000 more records suppose
}
}
I don't think your understanding is correct when you say "because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?".
If we see MongoDB Doc. then it reads
The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.
So the clear limit is 16 MB on document size. Mongo should stop you from saving such a document which is greater than this size.
If I agree with your understanding for a while then let's say that it allows to
save any size of document but more than 16 MB in RAM is not allowed. But on other hand, while storing the data it won't know what queries will be run on this data. So ultimately you will be inserting such big documents which can't be used later. (because while inserting we don't tell the query pattern, we can even try to fetch the full document in a single shot later).
If the limit is on transmission (hypothetically assuming) then there are lot of ways (via code) software developers can bring data into RAM in clusters and they won't cross 16 MB limit ever (that's how they do IO ops. on large files). They will make fun of this limit and just leave it useless. I hope MongoDB creators knew it and didn't want it to happen.
Also if limit is on transmission then there won't be any need of separate collection. We can put everything in a single collections and just write smart queries and can fetch data. If fetched data is crossing 16 MB then fetch it in parts and forget the limit. But it doesn't go this way.
So the limit must be on document size else it can create so many issues.
In my opinion if you just need "doctorsWorking" data for filtering or querying purpose (and if you also think that "doctorsWorking" will cause document to cross 16 MB limit) then it's good to keep it in a separate collection.
Ultimately all things depend on query and data pattern. If a doctor can serve in multiple hospitals in shifts then it will be great to keep doctors in separate collection.

Aggregate collection that have an aggregate collectin

I am having some trouble which schema design to pick, i have a document which holds user info each user have a very big set of items that can be up to 20k items.
an item have a date and an id and 19 other fields and also an internal array which can have 20-30 items , and it can be modified,deleted and of course newly inserted and queried by any property that it holds.
so i came up with 2 possible schemas.
1.Putting everything into a single docment
{_id:ObjectId("") type:'user' name:'xxx' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
{_id:ObjectId("") type:'user' name:'yyy' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
2.Seperating the items from the user and letting eachitem have its own
document
{_id:ObjectId(""), type:'user', username:'xxx'}
{_id:ObjectId(""), type:'user', username:'yyy'}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
as i explained before a single user can have thousands of items and i have tens of users, internalitems can have 20-30 items, and it has 9 fields
considering that a single item can be queried by different users and can be modified only by the owner and another process.
if performance is really important which design would you pick?
if you pick neither of them what schema can you suggest?
on a side note i will be sharding and i have a single collection for everything.
I wouldn't recommend the first approach, there is a limit to the maximum document size:
"The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS."
Source: http://docs.mongodb.org/manual/reference/limits/
There is also a performance implication if you exceed the current allocated document space when updating (http://docs.mongodb.org/manual/core/write-performance/ "Document Growth").
Your first solution is susceptible to both of these issues.
The second one is (Disclaimer: In the case of 20-30 internal items) is less susceptible of reaching the limit but still might require reallocation when doing updates. I haven't had this issue with a similar scenario, so this might be the way to go. And you might wanna look into Record Padding(http://docs.mongodb.org/manual/core/record-padding/) for some more details.
And, if all else fails, you can always split the internal items out as well.
Hope this helps!

How should I use MongoDB GridFS to store my big-size data?

After I read MongoDB Gridfs official document , I know that GridFS is used by MongoDB to store large file(size>16M),the file can be a video , a movie or anything else.But now , what I meet , is large strutured data , not a simple physical file. Size of the data exceeds the limit. To make it more detailed, what I am dealing with is thousands of gene sequences,and many of them exceeds BSON-document size limit .You can just consider each gene sequence as a simple string ,and the string is so large that some string has exceeds the mongoDB BSOM size limit.So ,what can I do to solve such a problem ? Is GridFS still suitable to solve my problem?
GridFS will split up the data in chunks of smaller size, that's how it overcomes the size limit. It's particularly useful for streaming data, because you can quickly access data at any given offset since the chunks are indexed.
Storing 'structured' data in the tens of megabytes sounds a bit weird: either you need to access parts of the data based on some criteria, then you need a different data structure that allows access to smaller parts of the data.
Or you really need to process the entire data set based on some criteria. In that case, you'll want an efficiently indexed collection that you can query based on your criteria and that contains the id of the file that must then be processed.
Without a concrete example of the problem, i.e. what does the query and the data structure look like, it will be hard to give you a more detailed answer.

maximum field and item size limits in dynamo db and mongo db

I want to use dynamo DB to store some attributes which are big string values...
Is there any maximum field size/maximum item size limitation in Dynamo DB? Something which limits the data that i store for 1 item in a table?
Also, what is the equivalent limit (if any) in Mongo DB?
I am evaluating these 2 nosql databases as possible solutions for one of my applications. Any advice/inputs you could give would be appreciated.
The MongoDB limit is currently 16MB per document, this has increased a couple of times over the course of development, but that is the limit as of the current release (2.0.x at the time of writing), see here:
http://www.mongodb.org/display/DOCS/Documents
I'm not as familiar with the various limits in Dynamo, but you can find a list of them here:
http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/Limits.html
DynamoDB max per item is currently 64k (which is seen by many as a serious shortcoming). Best practice for documents larger than 64k would be to store a pointer to document (S3?) instead of the entire document, but this obviously has some associated issues as well.
The maximum size allowed for an item in Dynamo DB is 400KB. This is mentioned here