I would like to know if is make sense to resize the storageSize of MongoDB?
I recognize that my size is larger then the storage size. Maybe it decrease my performance if I retrieve data, etc..?
"count" : 9622,
"size" : 9329997,
"avgObjSize" : 969,
"storageSize" : 3198976,
"capped" : false
If is necessary how can I resize the storagesize?
No, Per this doc why-are-the-files-in-my-data-directory-larger-than-the-data-in-my-database, it is NOT necessary to resize the storagesize. Because MongoDB preallocates data and journal files
The data files in your data directory, which is the /data/db directory in default configurations, might be larger than the data set inserted into the database. Consider the following possible causes:
Preallocated data files
MongoDB preallocates its data files to avoid filesystem fragmentation, and because of this, the size of these files do not necessarily reflect the size of your data.
The storage.mmapv1.smallFiles option will reduce the size of these files, which may be useful if you have many small databases on disk.
The oplog
If this mongod is a member of a replica set, the data directory includes the oplog.rs file, which is a preallocated capped collection in the local database.
The default allocation is approximately 5% of disk space on 64-bit installations. In most cases, you should not need to resize the oplog.
The journal
The data directory contains the journal files, which store write operations on disk before MongoDB applies them to databases. See Journaling.
Empty records
MongoDB maintains lists of empty records in data files as it deletes documents and collections. MongoDB can reuse this space, but will not, by default, return this space to the operating system.
Also here is one good blog how-big-is-your-mongodb
Related
db.collection.stats()
Response :
"count" : 20696445,
"size" : NumberLong("1478263842661"),
"storageSize" : 334732324864,
"totalIndexSize" : 676327424,
"indexSizes" : {
"_id_" : 377094144,
"leadID_1" : 128049152,
"leadID_hashed" : 171184128
},
"avgObjSize" : 71425.97884134208
My actual disk size is matched with storageSize. So what is the size and other keys.
You haven't mentioned the version of MongoDB server you are using but given the size of your data is much larger than the storageSize on disk, I'm assuming you are using the WiredTiger storage engine which compresses data and indexes by default. The WiredTiger storage engine was first available as an option in the MongoDB 3.0 production series and became the default storage engine for new deployments in MongoDB 3.2+.
In your example output it looks like you have 1.4TB of uncompressed data which is currently occupying 334GB on disk (the storageSize value). Storage space used by indexes for this collection is reported separately under indexSizes and summed up as totalIndexSize.
The output of collection.stats() will vary depending on your MongoDB server version and configured storage engine, but is generally described in the MongoDB manual as part of the output of the collStats command which is called by the db.collection.stats() shell helper.
Note: MongoDB documentation is versioned so you should always make sure you are referencing documentation that matches your release series of MongoDB (i.e. 3.2, 3.4, ...). Default documentation links will point to the current production release.
Refer link
collStats.size
The total size in memory of all records in a collection. This value does not include the record header, which is 16 bytes per record, but does include the record’s padding. Additionally size does not include the size of any indexes associated with the collection, which the totalIndexSize field reports.
The scale argument affects this value.
collStats.storageSize
The total amount of storage allocated to this collection for document storage. The scale argument affects this value.
storageSize does not include index size. See totalIndexSize for index sizing.
I need to create a large number of MongoDb databases, something around 1000+, later it will grow to more than 3000.
They will be hosted on a server with SSD disks and most of the databases will have around 20-30 collections with no more than 500 objects inside. Most of the objects are between 10-50kb in size.
So the size of the data inside will be relatevely small.
My question is how should I configure the creation of these mongodb databases, in order to use the disk space in most effective manner. I've read that mongodb allocates empty disk space and that an empty databases can take up to 100MB in size, is there a way to reduce this size?
You can set the storage.smallFiles configuration option to true. This will make the initial data and journal files smaller.
From the MongoDB docs:
The storage.smallFiles option reduces the initial size for data files
and limits the maximum size to 512 megabytes. storage.smallFiles also
reduces the size of each journal file from 1 gigabyte to 128
megabytes. Use storage.sma. lFiles if you have a large number of
databases that each holds a small quantity of data.
Depending on your workload, you can also change the record allocation strategy. The exact fit allocation will use less storage space than power of 2 (which is a default allocation strategy for v2.6+). But exact fit allocation is ideal only for collections without update and delete workloads.
Edit
For an empty database With a smallFiles option (let's call it db01), MongoDB will create two files in your dbpath that are 16MB large:
db01.0 - file holding the data
db01.ns - namespace file
As you add documents to your collection MongoDB will create additional files for the data with size: the next one will be 32MB (db01.1), one after that will be 64MB (db01.2) ... up to 512MB. So MongoDB will not preallocate e.g. 1GB for your database if you have only 50MB of data in the collection (if that's what you're worried about).
If you're only worried about the exceeding disk size (on a small SSD), you can also use storage.directoryPerDB. Each database will have it's own directory which you can link to an another disk.
I have a db named log_test1, with only 1 capped collection logs. The max size of capped collection is 512M. After I inserted 200k data, I found the disk usage of the db is 1.6G. With db.stats(), I can see the storageSize is 512M, correct, but my actual fileSize is 1.6G, why did this happen? How can I control the disk size is just my capped collection size plus index size?
> use log_test1
switched to db log_test1
> db.stats()
{
"db" : "log_test1",
"collections" : 3,
"objects" : 200018,
"avgObjSize" : 615.8577328040476,
"dataSize" : 123182632,
"storageSize" : 512008192,
"numExtents" : 3,
"indexes" : 8,
"indexSize" : 71907920,
"fileSize" : 1610612736,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1
}
This is probably because MongoDB preallocates data and journal files.
MongoDB 2
In the data directory, MongoDB preallocates data files to a particular size, in part to prevent file system fragmentation. MongoDB names the first data file <databasename>.0, the next <databasename>.1, etc. The first file mongod allocates is 64 megabytes, the next 128 megabytes, and so on, up to 2 gigabytes, at which point all subsequent files are 2 gigabytes. The data files include files with allocated space but that hold no data. mongod may allocate a 1 gigabyte data file that may be 90% empty. For most larger databases, unused allocated space is small compared to the database.
On Unix-like systems, mongod preallocates an additional data file and initializes the disk space to 0. Preallocating data files in the background prevents significant delays when a new database file is next allocated.
You can disable preallocation with the noprealloc run time option. However noprealloc is not intended for use in production environments: only use noprealloc for testing and with small data sets where you frequently drop databases.
MongoDB 3
The data files in your data directory, which is the /data/db
directory in default configurations, might be larger than the data set
inserted into the database. Consider the following possible causes:
Preallocated data files
MongoDB preallocates its data files to avoid filesystem fragmentation,
and because of this, the size of these files do not necessarily
reflect the size of your data.
The storage.mmapv1.smallFiles option will reduce the size of these
files, which may be useful if you have many small databases on disk.
The oplog
If this mongod is a member of a replica set, the data
directory includes the oplog.rs file, which is a preallocated capped
collection in the local database.
The default allocation is approximately 5% of disk space on 64-bit
installations.
The journal
The data directory contains the journal files, which store
write operations on disk before MongoDB applies them to databases.
Empty records
MongoDB maintains lists of empty records in data files
as it deletes documents and collections. MongoDB can reuse this space,
but will not, by default, return this space to the operating system.
Taken from MongoDB Storage FAQ.
I am started using MongoDB few days ago, and i have problem in understanding some database architecture. If i execute the query db.stats(); i had got filesize ,datasize,storagesize & indexsize. While i am surfing i found that the following:
Storagesize = datasize + free space allocated for collection
datasize = database size utilised by MongoDB
Here, I could not understand the representaion of filesize & datasize reprsentation. For datasize --> indexsize is also included?. Please provide a precise solution for the specified attributes and please do correct me if i mentioned anythng wrong.
Advance Thanks,
dataSize : Sum of all actual data (BSON objects) used by the database, in bytes
indexSize : Sum of all indexes used by the database, in bytes
storageSize : dataSize plus all preallocated collection space, in bytes
fileSize : Sum of the sizes of all files allocated for this database (e.g. test.0 + test.1 etc.), in bytes
nsSizeMB : Size of namespace file for this database, in megabytes.
avgObjSize : Average size of document objects in database. This value includes padding and may therefore not change when you reduce the size of documents.
As explained in this post about the different MongoDB performance metrics you should monitor (with MMAPv1), here are all the storage size metrics returned by dbStats that you should track:
dataSize measures the space taken by all the documents and padding in the database. Because of padding, dataSize decreases if documents are deleted but not when they shrink or get bigger following an update—those operations just add to or borrow from the document’s padding.
indexSize returns the size of all indexes created on the database.
storageSize measures the size of all the data extents in the database. With MMAPv1, it’s always greater than or equal to dataSize because extents contain free space not yet used or freed by deleted and moved documents. storageSize is not affected when documents shrink or are moved.
fileSize corresponds to the size of your data files. It’s obviously always larger than storageSize and can be seen as the storage footprint of you database on disk. It decreases only if you delete a database and is not affected when collections, documents, or indexes are removed.
Here is a diagram with the different important storage metrics returned by dbStats:
NOTE: With the MMAPv1 storage engine, MongoDB pre-allocates extra space on the disk to documents so efficient in-place updates are possible since documents have room to grow without having to be relocated. This extra space is called padding.
I realize this is an older question, but I figured I'd link to the official docs for db.stats() for anyone else looking for similar info (as I was).
Database Statistics Reference :: Fields
I read that MongoDB documents are limited to 4 MB in size. I also read that when you insert a document, MongoDB puts some padding in so that if you add something to the document, the entire document doesn't have to be moved and reindexed.
So I was wondering, does it store documents in 4MB chunks on disk?
Thanks
As of 1.8, individual documents are now limited to 16MB in size (was previously 4MB). This is an arbitary limitation imposed as when you read a document off disk, the whole document is read into RAM. So I think the intention is that this limitation is there to try and safeguard memory / make you think about your schema design.
Data is then stored across multiple data files on disk - I forget the initial file size, but every time the database grows, a new file is created to expand into, where each new file is created bigger than the previous file until a single file size of 2GB is reached. From this point on, if the database continues to grow, subsequent 2GB data files are created for documents to be inserted into.
"chunks" has a meaning in the sharding aspect of MongoDB. Whereby documents are stored in "chunks" of a configurable size and when balancing needs to be done, it's these chunks of data (n documents) that are moved around.
The simple answer is "no." The actual space a document takes up in Mongo's files is variable, but it isn't the maximum document size. The DB engine watches to see how much your documents tend to change after insertion and calculates the padding factor based on that. So it changes all the time.
If you're curious, you can see the actual padding factor and storage space of your data using the .stats() function on a collection in the mongo shell. Here's a real-world example (with some names changed to protect the innocent clients):
{14:42} ~/my_directory ➭ mongo
MongoDB shell version: 1.8.0
connecting to: test
> show collections
schedule_drilldown
schedule_report
system.indexes
> db.schedule_report.stats()
{
"ns" : "test.schedule_report",
"count" : 16749,
"size" : 60743292,
"avgObjSize" : 3626.681712341035,
"storageSize" : 86614016,
"numExtents" : 10,
"nindexes" : 3,
"lastExtentSize" : 23101696,
"paddingFactor" : 1.4599999999953628,
"flags" : 1,
"totalIndexSize" : 2899968,
"indexSizes" : {
"_id_" : 835584,
"WeekEnd_-1_Salon_1" : 925696,
"WeekEnd_-1_AreaCode_1" : 1138688
},
"ok" : 1
}
So my test collection has about 16,749 records in it, with an average size of about 3.6 KB ("avgObjSize") and a total data size of about 60 MB ("size"). However, it turns out they actually take up about 86 MB on disk ("storageSize") because of the padding factor. That padding factor has varied over time as the collection's documents have been updated, but if I inserted a new document right now, it'd allocate 1.46 times as much space as the document needs ("paddingFactor") to avoid having to move things around if I change it later. To me that's a fair size/speed tradeoff.