Document Size Vs Documents array length in mongodb - mongodb

I have two solutions while modeling mongodb objects :
to increase the size of document by storing more fields in one document.
to increase number of documents by storing data separetly in different documents.
In case 1 i have to apply complex queries but the number of documents will be less.
In case 2 only simple queries are required but the number of document will be more.
which one is better in terms of speed and performance if my data scales up exponentially.

Related

Does a MongoDB collection's document count affect performance?

I'm not sure why I can't find any information on this, but I'd like to find out what performance consequences might exist when a MongoDB collection has a huge number of documents. I see a lot of answers about the size of documents, but not the document count.
I have a collection of small documents (about 600b each). There are about 2.3m documents in this collection at the time of this writing. They're indexed on two fields each.
I'm concerned about the scalability of this collection. Depending on how many users sign up, this collection could theoretically hit 875+ billion documents.
Will this impact query performance or the index?
875B documents at 600b each will definitely give you scaling challenges. 2.3M shouldn't be a problem on even modest hardware.
I see a lot of answers about the size of documents, but not the document count.
I would think about this in terms of the total collection size (which is influenced by the document count) to get an idea of its scalability. Higher collection size means more RAM required to satisfy the indexes. MongoDB tries to keep indexes in RAM, which makes sense because indexes perform much better when they're in RAM.
Not sure whether you meant 600b as in bits or bytes, but that's either 65 TB or 525 TB. Even with an index or two on only one field each, they're going to be large indexes difficult to fit in memory. Actual performance will probably depend entirely on your query patterns, such as whether you keep accessing the same documents (faster, stays in cache) or whether queries are relatively evenly distributed across the documents (slower, needs more memory to perform well).

Is better to have multiple collections with thousands of documents or one collection with 100 million documents?

I'm migrating a MySql table which has 100 million rows to a MongoDB database, this table stores companys documents and what difference them are the column company_id. I was wondering if have multiple collections on mongodb would be faster than just one collection, for example, each company would have it own collection (collections: company_1, company_2, company_3...) and store only documents from that company, so I will not need to filter then as I would need to do if I just had 1 big collection and in every document there would be a column named company_id that would be used to filter documents.
Which method would perform best in this case?
EDIT:
Here's a JSON document example: https://pastebin.com/T5m2tbaY
{"_id":"5d8b8241ae0f000015006142","id_consulta":45254008,"company_id":7,"tipo_doc":"nfe","data_requisicao":"2019-09-25T15:05:35.155Z","xml":Object...
You could have one collection and one document per company, with company specific details in the document, assuming the details do not exceed 16MB in size. Place an index on company id for performance reasons. If performance conditions are not meeting expectations scale vertically - i.e., add memory, CPU, disk IO, and network enhancements to increase performance. If that does not suffice, consider sharding the collection across multiple hosts.

is there a maximum number of records for MongoDB?

I am looking to use MongoDB to store a huge amount of records : between 12 and 15 billions. Is it possible to store this number of documents in mongoDB ?
I saw on the net, that there are limits for : document size, index size, number of elements in collection.
But is there a limit in terms of number of records ?
There is no limit on the number of documents in one collection. However, you are probobly will run into issues with disk, ram, and lookup/update performance. If your data has some kind of logical grouping, I would suggest splinting it between multiple collections, or even instances (on different servers of course).

Mapping datasets to NoSql (MongoDB) collection

what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.

Storing very large documents in MongoDB

In short: If you have a large number of documents with varying sizes, where relatively few documents hit the maximum object size, what are the best practices to store those documents in MongoDB?
I have set of documents like:
{_id: ...,
values: [12, 13, 434, 5555 ...]
}
The length of the values list varies hugely from one document to another. For the majority of documents, it will have a few elements, for a few it will have tens of millions of elements, and I will hit the maximum object size limit in MongoDB. The trouble is any special solution I come up with for those very large (and relatively few) documents might have an impact on how I store the small documents which would, otherwise, live happily in a MongoDB collection.
As far as I see, I have the following options. I would appreciate any input on pros and cons of those, and any other option that I missed.
1) Use another datastore: That seems too drastic. I like MongoDB, and it's not like I hit the size limit for many objects. In the words case, my application could treat the very large objects and the rest differently. It just doesn't seem elegant.
2) Use GridFS to store the values: Like a blob in a traditional DB, I could keep the first few thousand elements of values in document and if there are more elements in the list, I could keep the rest in a GridFS object as a binary file. I wouldn't be able to search in this part, but I can live with that.
3) Abuse GridFS: I could keep every document in gridFS. For the majority of the (small) documents the binary chunk would be empty because the files collection would be able to keep everything. For the rest I could keep the excess elements in the chunks collection. Does that introduce an overhead compared to option #2?
4) Really abuse GridFS: I could use the optional fields in the files collection of GridFS to store all elements in the values. Does GridFS do smart chunking also for the files collection?
5) Use an additional "relational" collection to store the one-to-many relation, but th number of documents in this collection would easily exceed a hundred billion rows.
If you have large documents, try to store some metadata about them in MongoDB, and put the rest of the data --the part you will not be querying on-- outside.