What is meant by filesize and datasize in MongoDB? - mongodb

I am started using MongoDB few days ago, and i have problem in understanding some database architecture. If i execute the query db.stats(); i had got filesize ,datasize,storagesize & indexsize. While i am surfing i found that the following:
Storagesize = datasize + free space allocated for collection
datasize = database size utilised by MongoDB
Here, I could not understand the representaion of filesize & datasize reprsentation. For datasize --> indexsize is also included?. Please provide a precise solution for the specified attributes and please do correct me if i mentioned anythng wrong.
Advance Thanks,

dataSize : Sum of all actual data (BSON objects) used by the database, in bytes
indexSize : Sum of all indexes used by the database, in bytes
storageSize : dataSize plus all preallocated collection space, in bytes
fileSize : Sum of the sizes of all files allocated for this database (e.g. test.0 + test.1 etc.), in bytes
nsSizeMB : Size of namespace file for this database, in megabytes.
avgObjSize : Average size of document objects in database. This value includes padding and may therefore not change when you reduce the size of documents.

As explained in this post about the different MongoDB performance metrics you should monitor (with MMAPv1), here are all the storage size metrics returned by dbStats that you should track:
dataSize measures the space taken by all the documents and padding in the database. Because of padding, dataSize decreases if documents are deleted but not when they shrink or get bigger following an update—those operations just add to or borrow from the document’s padding.
indexSize returns the size of all indexes created on the database.
storageSize measures the size of all the data extents in the database. With MMAPv1, it’s always greater than or equal to dataSize because extents contain free space not yet used or freed by deleted and moved documents. storageSize is not affected when documents shrink or are moved.
fileSize corresponds to the size of your data files. It’s obviously always larger than storageSize and can be seen as the storage footprint of you database on disk. It decreases only if you delete a database and is not affected when collections, documents, or indexes are removed.
Here is a diagram with the different important storage metrics returned by dbStats:
NOTE: With the MMAPv1 storage engine, MongoDB pre-allocates extra space on the disk to documents so efficient in-place updates are possible since documents have room to grow without having to be relocated. This extra space is called padding.

I realize this is an older question, but I figured I'd link to the official docs for db.stats() for anyone else looking for similar info (as I was).
Database Statistics Reference :: Fields

Related

why the size of json files is larger than the size of its collection in mongodb

i have a large number of json files 2M(5Go).I imported to mongodB, I noticed that the total size of my mongoDb has only (0.9Go) which mean mongodb helps me to reduce momery storage. what is the explication behind this?is some kind of compression or just because he store is a json object instead json files ?
MongoDB actually stores objects in the BSON format. Since it is binary-encoded, it is naturally expected to demand less disk space.
BSON is more compact (as mentioned by #amiasato), but not by a lot. You don't have quotes or braces, but you have to specify length for every field. _id field and other binary blobs take 2x less space compared to JSON, but in a typical db, these do not take up the majority of space.
The disk usage savings you're observing are coming from WiredTiger's compression of data files, I'm 99% sure. (WiredTiger is the default storage engine in current version of mongodb)

Calculating database size percentage used in MongoDB

I'm currently running db.stats on a database and then I get back some information that includes dataSize and storageSize.
Now I'd imagine that
dataSize is how much data is in the database
storageSize is how much data I could store before the database is full.
BUT when I do a calculation like (dataSize/totalSize) * 100 to find the percentage of space that's used in the db, I get a number that's almost always greater than 100%.
How can I use db.stats to tell me how much of the database space I'm currently using?
I'm using Mongo 3.6 atm.
Briefly:
dataSize: The total size of the uncompressed data held in this database.
storageSize: The total amount of space allocated to collections in this database for document storage. This is the size (in bytes) of all the data extents in the database and it includes allocated-but-unused space in the data extents and space vacated by deleted documents within the data extents.
More details in the docs.
However, neither of these metrics tell you anything about the maximum available space for a MongoDB database because storage footprint in MongoDB is not capped per database. Instead, you might want to think about how much diskspace you have made available to your MongoDB instance and then assess how much of that is in use by your database. Mongo v3.6 added two new dbstats metrics which are useful here:
dbStats.fsUsedSize
Total size of all disk space in use on the filesystem where MongoDB stores data.
dbStats.fsTotalSize
Total size of all disk capacity on the filesystem where MongoDB stores data.
In my opinion using db.stats(1024) or db.runCommand({ dbStats: 1, scale: 1024 }) -- 1024 to convert the size in KB is only available in Mongo v 3.6. It can give you total disc space allocated and how much of it is used:
dbStats.fsUsedSize
Total size of the disk space in use on the filesystem where MongoDB stores the data.
dbStats.fsTotalSize
Total size of the disk capacity on the filesystem where MongoDB stores the data.
You can also cross-check it with filesystem command (only for Linux if you have access) and provided you know the partition path where the data is stored, which can be found in mongod.cfg
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg--data-data **70G 9.1G 61G** 13% /data

Resize storagesize of Mongodb

I would like to know if is make sense to resize the storageSize of MongoDB?
I recognize that my size is larger then the storage size. Maybe it decrease my performance if I retrieve data, etc..?
"count" : 9622,
"size" : 9329997,
"avgObjSize" : 969,
"storageSize" : 3198976,
"capped" : false
If is necessary how can I resize the storagesize?
No, Per this doc why-are-the-files-in-my-data-directory-larger-than-the-data-in-my-database, it is NOT necessary to resize the storagesize. Because MongoDB preallocates data and journal files
The data files in your data directory, which is the /data/db directory in default configurations, might be larger than the data set inserted into the database. Consider the following possible causes:
Preallocated data files
MongoDB preallocates its data files to avoid filesystem fragmentation, and because of this, the size of these files do not necessarily reflect the size of your data.
The storage.mmapv1.smallFiles option will reduce the size of these files, which may be useful if you have many small databases on disk.
The oplog
If this mongod is a member of a replica set, the data directory includes the oplog.rs file, which is a preallocated capped collection in the local database.
The default allocation is approximately 5% of disk space on 64-bit installations. In most cases, you should not need to resize the oplog.
The journal
The data directory contains the journal files, which store write operations on disk before MongoDB applies them to databases. See Journaling.
Empty records
MongoDB maintains lists of empty records in data files as it deletes documents and collections. MongoDB can reuse this space, but will not, by default, return this space to the operating system.
Also here is one good blog how-big-is-your-mongodb

default maxsize for mongodb database

Can anyone tell me what is the default mongodb database maxsize.
I have installed mongodb on my windows server, and created a document(db).
Document created and it is showing the size - 65,536KB. Is this max size that I
can write the data or can I extend it.
Mongodb's manual shows as below:
http://docs.mongodb.org/manual/reference/limits/
MongoDB Limits and Thresholds
This document provides a collection of hard and soft limitations of the MongoDB system.
BSON Documents
BSON Document Size
The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS.

Multiple MongoDb databases configuration

I need to create a large number of MongoDb databases, something around 1000+, later it will grow to more than 3000.
They will be hosted on a server with SSD disks and most of the databases will have around 20-30 collections with no more than 500 objects inside. Most of the objects are between 10-50kb in size.
So the size of the data inside will be relatevely small.
My question is how should I configure the creation of these mongodb databases, in order to use the disk space in most effective manner. I've read that mongodb allocates empty disk space and that an empty databases can take up to 100MB in size, is there a way to reduce this size?
You can set the storage.smallFiles configuration option to true. This will make the initial data and journal files smaller.
From the MongoDB docs:
The storage.smallFiles option reduces the initial size for data files
and limits the maximum size to 512 megabytes. storage.smallFiles also
reduces the size of each journal file from 1 gigabyte to 128
megabytes. Use storage.sma. lFiles if you have a large number of
databases that each holds a small quantity of data.
Depending on your workload, you can also change the record allocation strategy. The exact fit allocation will use less storage space than power of 2 (which is a default allocation strategy for v2.6+). But exact fit allocation is ideal only for collections without update and delete workloads.
Edit
For an empty database With a smallFiles option (let's call it db01), MongoDB will create two files in your dbpath that are 16MB large:
db01.0 - file holding the data
db01.ns - namespace file
As you add documents to your collection MongoDB will create additional files for the data with size: the next one will be 32MB (db01.1), one after that will be 64MB (db01.2) ... up to 512MB. So MongoDB will not preallocate e.g. 1GB for your database if you have only 50MB of data in the collection (if that's what you're worried about).
If you're only worried about the exceeding disk size (on a small SSD), you can also use storage.directoryPerDB. Each database will have it's own directory which you can link to an another disk.