I need to create a large number of MongoDb databases, something around 1000+, later it will grow to more than 3000.
They will be hosted on a server with SSD disks and most of the databases will have around 20-30 collections with no more than 500 objects inside. Most of the objects are between 10-50kb in size.
So the size of the data inside will be relatevely small.
My question is how should I configure the creation of these mongodb databases, in order to use the disk space in most effective manner. I've read that mongodb allocates empty disk space and that an empty databases can take up to 100MB in size, is there a way to reduce this size?
You can set the storage.smallFiles configuration option to true. This will make the initial data and journal files smaller.
From the MongoDB docs:
The storage.smallFiles option reduces the initial size for data files
and limits the maximum size to 512 megabytes. storage.smallFiles also
reduces the size of each journal file from 1 gigabyte to 128
megabytes. Use storage.sma. lFiles if you have a large number of
databases that each holds a small quantity of data.
Depending on your workload, you can also change the record allocation strategy. The exact fit allocation will use less storage space than power of 2 (which is a default allocation strategy for v2.6+). But exact fit allocation is ideal only for collections without update and delete workloads.
Edit
For an empty database With a smallFiles option (let's call it db01), MongoDB will create two files in your dbpath that are 16MB large:
db01.0 - file holding the data
db01.ns - namespace file
As you add documents to your collection MongoDB will create additional files for the data with size: the next one will be 32MB (db01.1), one after that will be 64MB (db01.2) ... up to 512MB. So MongoDB will not preallocate e.g. 1GB for your database if you have only 50MB of data in the collection (if that's what you're worried about).
If you're only worried about the exceeding disk size (on a small SSD), you can also use storage.directoryPerDB. Each database will have it's own directory which you can link to an another disk.
Related
After storing some binary data in MongoDB 4.2.5 (3 nodes replicate set) the oplog.rs collection did grow to ca. 700MB. The binary data was removed and the data model restructured, but the oplog.rs collection stays the same size (as expected). I do understand that it's a capped collection with a maximum size and eventually it'll reuse the space. In my case though, I'd like to reclaim the space and start over. The database is used mostly for internal testing purposes. I don't mind losing some data from the oplog, but I do mind having a big oplog file, since the whole database is just a few MB.
Is it safe to use the emptycapped command on the oplog.rs collection in a replicate set scenario? Do I need to run this command on each node? Do I need to compact the collection after the deletion (last part from https://docs.mongodb.com/manual/tutorial/change-oplog-size/)?
Is there any other way to gracefully "reset" the oplog and free up the space?
OpLog is limited by what size you have defined in config or whether you have left it to default.
The OpLog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
It fills up to the defined size as the changes are coming through (or noops heartbeats).
If you want to reduce the size, reset the OpLog size in your config. But don't forget, larger OpLog size means you get a better OpLog window.
OpLog Window tells you how long a secondary member can be offline and still catch up to the primary without doing a full resync.
I'm currently running db.stats on a database and then I get back some information that includes dataSize and storageSize.
Now I'd imagine that
dataSize is how much data is in the database
storageSize is how much data I could store before the database is full.
BUT when I do a calculation like (dataSize/totalSize) * 100 to find the percentage of space that's used in the db, I get a number that's almost always greater than 100%.
How can I use db.stats to tell me how much of the database space I'm currently using?
I'm using Mongo 3.6 atm.
Briefly:
dataSize: The total size of the uncompressed data held in this database.
storageSize: The total amount of space allocated to collections in this database for document storage. This is the size (in bytes) of all the data extents in the database and it includes allocated-but-unused space in the data extents and space vacated by deleted documents within the data extents.
More details in the docs.
However, neither of these metrics tell you anything about the maximum available space for a MongoDB database because storage footprint in MongoDB is not capped per database. Instead, you might want to think about how much diskspace you have made available to your MongoDB instance and then assess how much of that is in use by your database. Mongo v3.6 added two new dbstats metrics which are useful here:
dbStats.fsUsedSize
Total size of all disk space in use on the filesystem where MongoDB stores data.
dbStats.fsTotalSize
Total size of all disk capacity on the filesystem where MongoDB stores data.
In my opinion using db.stats(1024) or db.runCommand({ dbStats: 1, scale: 1024 }) -- 1024 to convert the size in KB is only available in Mongo v 3.6. It can give you total disc space allocated and how much of it is used:
dbStats.fsUsedSize
Total size of the disk space in use on the filesystem where MongoDB stores the data.
dbStats.fsTotalSize
Total size of the disk capacity on the filesystem where MongoDB stores the data.
You can also cross-check it with filesystem command (only for Linux if you have access) and provided you know the partition path where the data is stored, which can be found in mongod.cfg
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg--data-data **70G 9.1G 61G** 13% /data
I would like to know if is make sense to resize the storageSize of MongoDB?
I recognize that my size is larger then the storage size. Maybe it decrease my performance if I retrieve data, etc..?
"count" : 9622,
"size" : 9329997,
"avgObjSize" : 969,
"storageSize" : 3198976,
"capped" : false
If is necessary how can I resize the storagesize?
No, Per this doc why-are-the-files-in-my-data-directory-larger-than-the-data-in-my-database, it is NOT necessary to resize the storagesize. Because MongoDB preallocates data and journal files
The data files in your data directory, which is the /data/db directory in default configurations, might be larger than the data set inserted into the database. Consider the following possible causes:
Preallocated data files
MongoDB preallocates its data files to avoid filesystem fragmentation, and because of this, the size of these files do not necessarily reflect the size of your data.
The storage.mmapv1.smallFiles option will reduce the size of these files, which may be useful if you have many small databases on disk.
The oplog
If this mongod is a member of a replica set, the data directory includes the oplog.rs file, which is a preallocated capped collection in the local database.
The default allocation is approximately 5% of disk space on 64-bit installations. In most cases, you should not need to resize the oplog.
The journal
The data directory contains the journal files, which store write operations on disk before MongoDB applies them to databases. See Journaling.
Empty records
MongoDB maintains lists of empty records in data files as it deletes documents and collections. MongoDB can reuse this space, but will not, by default, return this space to the operating system.
Also here is one good blog how-big-is-your-mongodb
In Mongodb if i continuously update Key Values of a document in a collection, will it consume more space? If i update its value 100 thousand times, will the space be wasted on the hard disc.
Basically it won't use more space as the writes happens in place, so if the new value doesn't require more space it won't have to allocate more.
About rapid updates - mongodb writes are lazy so it can group multiple writes to one physical write to the disk.
you can find more info here
Please note that if you have logging enabled, it will use more disk space, but it is depends on your configuration.
MongoDB dbStats provide you the database storage usage, try to use it.
I am started using MongoDB few days ago, and i have problem in understanding some database architecture. If i execute the query db.stats(); i had got filesize ,datasize,storagesize & indexsize. While i am surfing i found that the following:
Storagesize = datasize + free space allocated for collection
datasize = database size utilised by MongoDB
Here, I could not understand the representaion of filesize & datasize reprsentation. For datasize --> indexsize is also included?. Please provide a precise solution for the specified attributes and please do correct me if i mentioned anythng wrong.
Advance Thanks,
dataSize : Sum of all actual data (BSON objects) used by the database, in bytes
indexSize : Sum of all indexes used by the database, in bytes
storageSize : dataSize plus all preallocated collection space, in bytes
fileSize : Sum of the sizes of all files allocated for this database (e.g. test.0 + test.1 etc.), in bytes
nsSizeMB : Size of namespace file for this database, in megabytes.
avgObjSize : Average size of document objects in database. This value includes padding and may therefore not change when you reduce the size of documents.
As explained in this post about the different MongoDB performance metrics you should monitor (with MMAPv1), here are all the storage size metrics returned by dbStats that you should track:
dataSize measures the space taken by all the documents and padding in the database. Because of padding, dataSize decreases if documents are deleted but not when they shrink or get bigger following an update—those operations just add to or borrow from the document’s padding.
indexSize returns the size of all indexes created on the database.
storageSize measures the size of all the data extents in the database. With MMAPv1, it’s always greater than or equal to dataSize because extents contain free space not yet used or freed by deleted and moved documents. storageSize is not affected when documents shrink or are moved.
fileSize corresponds to the size of your data files. It’s obviously always larger than storageSize and can be seen as the storage footprint of you database on disk. It decreases only if you delete a database and is not affected when collections, documents, or indexes are removed.
Here is a diagram with the different important storage metrics returned by dbStats:
NOTE: With the MMAPv1 storage engine, MongoDB pre-allocates extra space on the disk to documents so efficient in-place updates are possible since documents have room to grow without having to be relocated. This extra space is called padding.
I realize this is an older question, but I figured I'd link to the official docs for db.stats() for anyone else looking for similar info (as I was).
Database Statistics Reference :: Fields