Does MongoDB 16MB limit apply to a single denormalized db document, or to each embedded document in particular? In one-to-many relationships with embedded documents, do all the "many" documents have to be less than 16 MB in size altogether?
I'm asking this because I can't choose my data model yet - Its recommended for "contains" relationships (which is exactly my case) to stick to the denormalized strategy, but I believe the sub-documents will exceed 16 MB in sum. Will that work?
One whole doc, regardless of the complexity of the shape and level of embedding, if any, cannot exceed 16MB.
Related
I'm not sure why I can't find any information on this, but I'd like to find out what performance consequences might exist when a MongoDB collection has a huge number of documents. I see a lot of answers about the size of documents, but not the document count.
I have a collection of small documents (about 600b each). There are about 2.3m documents in this collection at the time of this writing. They're indexed on two fields each.
I'm concerned about the scalability of this collection. Depending on how many users sign up, this collection could theoretically hit 875+ billion documents.
Will this impact query performance or the index?
875B documents at 600b each will definitely give you scaling challenges. 2.3M shouldn't be a problem on even modest hardware.
I see a lot of answers about the size of documents, but not the document count.
I would think about this in terms of the total collection size (which is influenced by the document count) to get an idea of its scalability. Higher collection size means more RAM required to satisfy the indexes. MongoDB tries to keep indexes in RAM, which makes sense because indexes perform much better when they're in RAM.
Not sure whether you meant 600b as in bits or bytes, but that's either 65 TB or 525 TB. Even with an index or two on only one field each, they're going to be large indexes difficult to fit in memory. Actual performance will probably depend entirely on your query patterns, such as whether you keep accessing the same documents (faster, stays in cache) or whether queries are relatively evenly distributed across the documents (slower, needs more memory to perform well).
I am planning to use a nested document structure for my MongoDB Schema design as I don't want to go for flat schema design as In my case I will need to fetch my result in one query only.
Since MongoDB has a size limit for a document.
MongoDB Limits and Threshold
A MongoDB document has a size limit of 16MB ( an amount of data). If your subcollection can growth without limits go flat.
I don't need to fetch my nested data but only be needing my nested data for filtering and querying purpose.
I want to know whether I will still be bound by MongoDB size limits even if I use my embedded data only for querying and filter purpose and never for fetching of nested data because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?
Nested schema design example
{
clinicName: "XYZ Hopital",
clinicAddress: "ABC place.",
"doctorsWorking":{
"doctorId1":{
"doctorJoined": ISODate("2017-03-15T10:47:47.647Z")
},
"doctorId2":{
"doctorJoined": ISODate("2017-04-15T10:47:47.647Z")
},
"doctorId3":{
"doctorJoined": ISODate("2017-05-15T10:47:47.647Z")
},
...
...
//upto 30000-40000 more records suppose
}
}
I don't think your understanding is correct when you say "because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?".
If we see MongoDB Doc. then it reads
The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.
So the clear limit is 16 MB on document size. Mongo should stop you from saving such a document which is greater than this size.
If I agree with your understanding for a while then let's say that it allows to
save any size of document but more than 16 MB in RAM is not allowed. But on other hand, while storing the data it won't know what queries will be run on this data. So ultimately you will be inserting such big documents which can't be used later. (because while inserting we don't tell the query pattern, we can even try to fetch the full document in a single shot later).
If the limit is on transmission (hypothetically assuming) then there are lot of ways (via code) software developers can bring data into RAM in clusters and they won't cross 16 MB limit ever (that's how they do IO ops. on large files). They will make fun of this limit and just leave it useless. I hope MongoDB creators knew it and didn't want it to happen.
Also if limit is on transmission then there won't be any need of separate collection. We can put everything in a single collections and just write smart queries and can fetch data. If fetched data is crossing 16 MB then fetch it in parts and forget the limit. But it doesn't go this way.
So the limit must be on document size else it can create so many issues.
In my opinion if you just need "doctorsWorking" data for filtering or querying purpose (and if you also think that "doctorsWorking" will cause document to cross 16 MB limit) then it's good to keep it in a separate collection.
Ultimately all things depend on query and data pattern. If a doctor can serve in multiple hospitals in shifts then it will be great to keep doctors in separate collection.
I have two solutions while modeling mongodb objects :
to increase the size of document by storing more fields in one document.
to increase number of documents by storing data separetly in different documents.
In case 1 i have to apply complex queries but the number of documents will be less.
In case 2 only simple queries are required but the number of document will be more.
which one is better in terms of speed and performance if my data scales up exponentially.
This is about a recommendation on mongodb. I have a collection that always increase row by row (I mean the count of documents). It is about 5 billion now. When I make a request on this collection I sometimes get the error about 16 MB size.
The first thing that I want to ask is which structure is the best way of creating collections that increasing the rows hugely. What is the best approach? What should I do for this kind of structure and the performance?
Just to clarify, the 16MB limitation is on documents, not collections. Specifically, its the maximum BSON document size, as specified in this page in the documentation.
The maximum BSON document size is 16 megabytes.
If you're running into the 16MB limit in aggregation is because you are using MongoDB version 2.4 or older. In these, the aggregate() method returned a document, which is subject to the same limitation as all other documents. Starting in 2.6, the aggregate() method returns a cursor, which is not subject to the 16MB limit. For more information, you should consult this page in the documentation. Note that each stage in the aggregation pipeline is still limited to 100MB of RAM.
what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.