I currently have an environment of MongoDB, including Shards and Replicat Sets.
I was wondering if there is any way to limit the size of a database. I've read that I can limit the size with the --quota parameter, but this would globally limit the size of all the databases, while I'm trying to limit the size of certain databases to differnet sizes.
Thank You!
Use capped collection, which is applicable per collection within the database.
Refer to sample on MongoDB's doc http://docs.mongodb.org/manual/core/capped-collections/
db.createCollection("log", { capped : true, size : 5242880, max : 5000 } )
Where "log" is the collection name.
Do consider that in a capped collection, oldest document gets overwritten by new inserted document.
Related
In mongo local db, you can check oplog related data by using db.oplog.rs.stats()
But what does the "count" field mean? And I see it's decreasing every second in my db server.
The replica set oplog (oplog.rs) is a capped collection, which means it has a maximum total size for data. The underlying implementation varies by storage engine (eg WiredTiger vs MMAPv1) but the conceptual outcome is the same: capped collections make room for new documents by overwriting or expiring the oldest documents in the collection in FIFO order (First In, First Out).
But what does the "count" field mean?
As with any collection, the count information in db.collection.stats() indicates the number of documents currently in the collection.
For an explanation of collection stats output, see collStats in the MongoDB documentation.
Note: The output will vary depending on your version of MongoDB server and storage engine used.
I see it's decreasing every second in my db server.
The count of documents in the oplog will vary over time based on the size of the write operations being applied, so this is expected to fluctuate for an active deployment. For example, single field updates will generally write smaller oplog entries than full document updates. Once your oplog reaches its maximum data size, the count may also decrease as the oldest oplog documents are removed to make room for new oplog entries.
I am recently working on a time series data project to store some sensor data.To achieve maximum insertion/write throughput i used capped collection(As per the mongodb documentation capped collection will increase the read/write performance). when i test the collection for insertion/write of some thousand documents/records using python driver with capped collection without index against the normal collection, i couldn't see much difference in improvement in write performance of capped collection over normal collection. example is like i inserted 40K records on single thread using pymongo driver. capped collection took around 25.4 seconds and normal collection took 25.7 seconds.
Could anyone please explain me when can we achieve maximum insertion/write throughput of capped collection? Is this is the right choice for time series data collections?
Data stored into capped collections are rotated upon exceeding fixed size of capped collection .
Capped collections don't require any indexes as they preserve the insertion order and also data is retrieved in natural order same as order in which the database refers to documents on disk.Hence it offers high performance in insertion and data retrieval process.
For more detailed description related to Capped collections please refer the documentation as mentioned in URL
https://docs.mongodb.com/manual/core/capped-collections/
I haven't found any information about this.
How many documents can a single collection have in MongoDB before it is necessary to use sharding?
There is no limitation as you can see here:
If you specify a maximum number of documents for a capped collection using the max parameter to create, the limit must be less than 2^32 documents. If you do not specify a maximum number of documents when creating a capped collection, there is no limit on the number of documents.
#M-A.Fernandes has right.
I can only add this information:
Maximum Number of Documents Per Chunk to Migrate
MongoDB cannot move a chunk if the number of documents in the chunk exceeds either 250000 documents or 1.3 times the result of dividing the configured chunk size by the average document size. db.collection.stats() includes the avgObjSize field, which represents the average document size in the collection.
When specifying the 'size' parameter for a capped collection, which of the following sizes from db.collection.stats() is capped to that size?
is it "size", "storageSize", "storageSize"+"totalIndexSize" or some other option?
Thanks!
According to the documentation:
Optional. Specifies a maximum size in bytes for a capped collection.
The size field is required for capped collections. If capped is false,
you can use this field to preallocate space for an ordinary
collection.
So I would assume it is storageSize.
This would also suggest it is limited on storageSize:
https://jira.mongodb.org/browse/SERVER-5615
Where Elliot says:
Indexes don't count within the capped size. The capped size is for
data alone. Indexes do use space, but b-trees have up to 50% extra
space because of balancing. The guarantee is that it'll never be more
than 50%.
what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.