Mongodb data files become smaller after migration - mongodb

On my first server I get:
root#prod ~ # du -hs /var/lib/mongodb/
909G /var/lib/mongodb/
After migration this database with mongodump/mongorestore
On my second server I get:
root#prod ~ # du -hs /var/lib/mongodb/
30G /var/lib/mongodb/
After I waited a few hours, mongo finished indexing I got:
root#prod ~ # du -hs /var/lib/mongodb/
54G /var/lib/mongodb/
I tested database and there's no corrupted or missed data.
Why there's so big difference in size before and after migration?

MongoDB does not recover disk space when actually data size drops due to data deletion along with other causes. There's a decent explanation in the online docs:
Why are the files in my data directory larger than the data in my database?
The data files in your data directory, which is the /data/db directory
in default configurations, might be larger than the data set inserted
into the database. Consider the following possible causes:
Preallocated data files.
In the data directory, MongoDB preallocates data files to a particular
size, in part to prevent file system fragmentation. MongoDB names the
first data file .0, the next .1, etc. The
first file mongod allocates is 64 megabytes, the next 128 megabytes,
and so on, up to 2 gigabytes, at which point all subsequent files are
2 gigabytes. The data files include files with allocated space but
that hold no data. mongod may allocate a 1 gigabyte data file that may
be 90% empty. For most larger databases, unused allocated space is
small compared to the database.
On Unix-like systems, mongod preallocates an additional data file and
initializes the disk space to 0. Preallocating data files in the
background prevents significant delays when a new database file is
next allocated.
You can disable preallocation by setting preallocDataFiles to false.
However do not disable preallocDataFiles for production environments:
only use preallocDataFiles for testing and with small data sets where
you frequently drop databases.
On Linux systems you can use hdparm to get an idea of how costly
allocation might be:
time hdparm --fallocate $((1024*1024)) testfile
The oplog.
If this mongod is a member of a replica set, the data directory
includes the oplog.rs file, which is a preallocated capped collection
in the local database. The default allocation is approximately 5% of
disk space on 64-bit installations, see Oplog Sizing for more
information. In most cases, you should not need to resize the oplog.
However, if you do, see Change the Size of the Oplog.
The journal.
The data directory contains the journal files, which store write
operations on disk prior to MongoDB applying them to databases. See
Journaling Mechanics.
Empty records.
MongoDB maintains lists of empty records in data files when deleting
documents and collections. MongoDB can reuse this space, but will
never return this space to the operating system.
To de-fragment allocated storage, use compact, which de-fragments
allocated space. By de-fragmenting storage, MongoDB can effectively
use the allocated space. compact requires up to 2 gigabytes of extra
disk space to run. Do not use compact if you are critically low on
disk space.
Important
compact only removes fragmentation from MongoDB data files and does
not return any disk space to the operating system.
To reclaim deleted space, use repairDatabase, which rebuilds the
database which de-fragments the storage and may release space to the
operating system. repairDatabase requires up to 2 gigabytes of extra
disk space to run. Do not use repairDatabase if you are critically low
on disk space.
http://docs.mongodb.org/manual/faq/storage/
What they don't tell you are the two other ways to restore/recover disk space - mongodump/mongorestore as you did or adding a new member to the replica set with an empty disk so that it writes it's databsae files from scratch.
If you are interested in monitoring this, the db.stats() command returns a wealth of data on data, index, storage and file sizes:
http://docs.mongodb.org/manual/reference/command/dbStats/

Over time the MongoDB files develop fragmentation. When you do a "migration", or whack the data directory and force a re-sync, the files pack down. If your application does a lot of deletes or updates which grow the documents fragmentation develops fairly quickly. In our deployment it is updates that grow the documents that causes this. Somehow MongoDB moves the document when it sees that the updated document can't fit in the space of the original document. There is some way to add padding factors to the collection to avoid this.

Related

Stop Mongo Db from creating backups

Can any one tell me how to stop mongo DB from creating backup restores ?
If my DB name is "Database"
It is creating backups like
DataBase1
Database2
Database3
.
.
.
DataBase.ns
I want to use only working copy
MongoDB allocates data files like this:
First, a namespace file (mydb.ns) and a data file with 64MB (mydb.0). If the required space grows larger, it will add a 128MB file (mydb.1) and continuing like this, doubling the file size every time until the files are 2GB each (mydb.5 and following).
This is a somewhat aggressive allocation pattern. If you perform a lot of in-place updates and deletes, your datafiles can fragment severely. Running the repair database command via db.runCommand({repairDatabase:1}) can help, but it requires even more disk space while it runs and it stalls writes to the DB. Make sure to carefully read the documentation first.
Before you do that, run db.stats(), then compare dataSize (the amount of data you actually stored), storageSize (the allocated size including padding, but w/o indexes), and fileSize (the disk space allocated). If the differences are huge (factors of > 3), repair will probably reclaim quite a bit of disk space. If not, it can't help you because it can't magically shrink your data.

mongodb Excessive Disk Space

my mongodb take 114g which is 85%of my disk
trying to free some space using db.repairDatabase() will fail as i don't have enough free space
i know that my data shouldn't take so much space as i used to have a big collection that took 90% of the disk.
i then drop this collection and re-inserted only 20% of its data.
how can i free some space ?
The disk space for data storage is preallocated by MongoDB and can only be reclaimed by rebuilding the database with new preallocated files. Typically this is done through a db.repairDatabase() or a backup & restore. As you've noted, a repair requires enough space to create a new copy of the database so may not be an option.
Here are a few possible solutions, but all involve having some free space available elsewhere:
if there is enough free space to mongodump that database, you could mongodump, drop, and mongorestore it. db.dropDatabase() will remove the data files on disk.
if you are using a volume manager such as LVM or ZFS, you could add extra disk space to the logical volume in order to repair or dump & restore the database.
if you have another server, you could set up a replica set to sync the data without taking down your current server (which would be a "primary" in the replica set). Once the data is sync'd to a secondary server, you could then stepdown the original primary and resync the database.
Note that for a replica set you need a minimum of three nodes .. so either three data nodes, or two data nodes plus an arbiter. In a production environment the arbiter would normally run on a third server so it can allow either of the data nodes to become a primary if the current primary is unavailable. In your reclaiming disk space scenario it would be OK to run the arbiter on one of the servers instead (presumably the new server).
Replica sets are generally very helpful for administrative purposes, as they allow you to step down a server for maintenance (such as running a database compact or repair) while still having a server available for your application. They have other benefits as well, such as maintaining redundant copies of your data for automatic failover/recovery.

mongo db --smallfiles switch drawbacks

I want to use mongodb for my new project. the problem is, mongo use pre-alocate files :
Each datafile is preallocated to a particular size. (This is done to prevent file system fragmentation, among other reasons.) The first filename for a database is .0, then .1, etc. .0 will be 64MB, .1 128MB, et cetera, up to 2GB. Once the files reach 2GB in size, each successive file is also 2GB. Thus, if the last datafile present is, say, 1GB, that file might be 90% empty if it was recently created.
from here : http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
And its normal to have many 2GB files with nothing in it. there is a --smallfiles switch, to limit this files to 512MB
--smallfiles => Use a smaller initial file size (16MB) and maximum size (512MB)
I want to know using smallfiles is good for production? and what's its drawbacks.
there is noprealloc switch but its not good in production. but there is no note about smallfiles.
You would usually only use smallfiles if you are creating a whole bunch of databases, if you're only operating out of a few databases it doesn't save you enough to mess with.
We haven't seen any performance problems with it for customers that have many, many DBS (and actually benefit from small files). Their activity level is normally somewhat low compared to other installs, though. Based on what Mongo is doing, it might be slightly slower to do some operations but I don't think you'll ever notice.
Additionally, if running in AWS cloud and using the m3.small instances with SSDs, you are limited to 4GB storage. Setting this option will allow you to have a small SSD-backed mongodb node. Could be sufficient for small tasks

calculation logic of 0.203125GB MongoDB database?

How 0.203125GB is calculated for MongoDB database size?
Is it same for all OS( 32 bit and 64 bit)?
Can we see the current usage of a particular database?
> show dbs
local (empty)
tutorial 0.203125GB
MongoDB Documentation
Each datafile is preallocated to a particular size. (This is done to prevent file system fragmentation, among other reasons.) The first filename for a database is .0, then .1, etc. .0 will be 64MB, .1 128MB, et cetera, up to 2GB. Once the files reach 2GB in size, each successive file is also 2GB.
Thus, if the last datafile present is, say, 1GB, that file might be 90% empty if it was recently created.
Additionally, on Unix, mongod will preallocate an additional datafile in the background and do background initialization of this file. These files are prefilled with zero bytes. This initialization can take up to a minute (less on a fast disk subsystem) for larger datafiles. Pre-filling in the background prevents significant delays when a new database file is next allocated.
On Windows, additional datafiles are not preallocated. NTFS can allocate large files filled with zeroes relatively quickly, rendering preallocation unnecessary.
As soon as a datafile starts to be used, the next one will be preallocated.
You can disable preallocation with the --noprealloc command line parameter. This flag is nice for tests with small datasets where you drop the database after each test. It should not be used on production servers.
For large databases (hundreds of GB or more), this is of no significant consequence as the unallocated space is relatively small.

Compact command not freeing up space in MongoDB 2.0

I just installed MongoDB 2.0 and tried to run the compact command instead of the repair command in earlier versions. My database is empty at the moment, meaning there is only one collection with 0 entries and the two system collections (indices, users). Currently the db takes about 4 GB of space on the harddisk. The db is used as a temp queue with all items being removes after they have been processed.
I tried to run the following in the mongo shell.
use mydb
db.theOnlyCollection.runCommand("compact")
It returns with
ok: 1
But still the same space is taken on the harddisk. I tried to compact the system collections as well, but this did not work.
When I run the normal repair command
db.repairDatabase()
the database is compacted and only takes 400 MB.
Anyone has an idea why the compact command is not working?
Thanks a lot for your help.
Best
Alex
Collection compaction is not supposed to decrease the size of data files. Main point is to defragment collection and index data - combine unused space gaps into continuous space allowing new data to be stored there. Moreover it may actually increase the size of data files:
Compaction may increase the total size of your data files by up to 2GB. Even in this case, total collection storage space will decrease.
http://www.mongodb.org/display/DOCS/compact+Command#compactCommand-Effectsofacompaction