Is there a way to determine the journal file size based on a data file size?
For example, I've arrived at a data file size of 10 GB (approximately) based on data + index length considerations and preallocation.
I understand journal is also pre-allocated (after every 1GB file size). So, for 10 GB data file, is it possible to assume journal will also be 10 GB? Or is there any other way to calculate it?
The MongoDB journal files are fixed size 1GB files (unless you use the smallfiles option). There will be at most three 1GB journal files, so you will never have more than 3GB of journal.
http://docs.mongodb.org/manual/core/journaling/
Related
I have done a full import with the planet from OSM website and scheduled the updates to run daily on a cronjob.
But I have noticed that the disk usage is growing very fast in size on a daily basis.
When running the command df -h, I have noticed that every day the disk size grows about 1GB. Not sure if this command does some round up, but even so this size seems very huge.
I have a disk with 1TB free, but this would mean that the disk would be full in about 3 years.
I have tried to inspect the folders under /var/lib/postgresql/<version>/<cluster> and it seems that the folders that concur to this size increase are the folders pg_wal and base/16390.
The folder base/16390 has many files with 1GB each and the folder pg_wal has about 40 something files of 16MB each.
I don't know which files are safe to remove or if there are some configs for the postgresql.conf file that would prevent this huge increase in size each day.
Also don't know if this has to do with some backups or logs that postgres does by default, but I would like to also reduce those backups and logs to a minimum.
Any help on this would be appreciated.
Thanks in advance.
We have a cluster with 100 GB storage, per the configuration for the cluster in mongodb atlas.
And the overview page for the cluster, it shows that 43.3 GB out of a 100 GB max are used.
Since the clusters configuration also has 100 GB storage selected, I am assuming the 100 GB of disc space is the same as the 100 GB available storage?
When we click into our database, it shows the database size is 66.64 GB + 3.21 GB indexes, for a total size of about 70GB.
What is the difference between the 100GB of available storage and disc, and the database size + index size of 70GB? Should we be concerned that the 70 GB is approaching 100GB, or is it only the 43.3 GB of disc usage that matters?
Edit Since I've posted this, MongoDB has removed database size, and replaced it with both storage size and logical data size, which further complicates this. In most instances, the logical data size is 3-4x the storage size.
Your mongodb database is using by default wiredTiger storage engine with snappy compression which mean that most probably your data stored on disk is using 43.3GB , but the actual(uncompressed) data size is ~ 70GB , so there is no place to worry about since you have used only 43.3% from your 100GB storage. Afcourse you need to monitor your data grow and if it is increasing faster you may need to allocate more space ...
I have been working with orientDB and stored about 120 Million records to it, the size on disk was 24 GB, I then I deleted all the records by running the following commands against console :
Delete from E unsafe
Delete from V unsafe
When i checked the DB size on disk it was also 24 GB, Is there anything extra I need to do to get free disk space?
In OrientDB when you delete a record the disk space remains allocated. The only way to free it is to export than re-import the DB.
How to control the size of large journals files as journal file take large amount of space. How can the space can be saved using small files.
After doing some research:
Setting smallfiles option for controlling journaling doesn't control the size
However the command is --smallfiles
Here are some ways to control journal file size,
Use --smallfiles option in mongod which will use 128MB as max size of journal file instead of 1GB
ulimit is way in unix to control system settings. It will allow you to set max. file size in system. Consider this when you have sharding in place.
Reduce commitIntervalMs to a lower value i.e flush data to disk frequently. Use this option only when you can tolerate heavy IO load periodically.
I am currently using MongoDB to store a single collection of data. This data is 50 GB in size and has about 95 GB of indexes. I am running a machine with 256 GB RAM. I basically want to have all the index and working set in the RAM since the machine is exclusively allocated to mongo.
Currently I see that, though mongo is running with the collection size of 50 GB + Index size of 95 GB, total RAM being used in the machine is less than 20 GB.
Is there a way to force mongo to leverage the RAM available so that it can store all its indexes and working set in memory ?
When your mongod process starts it has none of your data in resident memory. Data is then paged in as it is accessed. Given your collection fits in memory (which is the case here) you can run the touch command on it. On Linux this will call the readahead system call to pull your data and indexes into the filesystem cache, making them available in memory to mongod. On Windows mongod will read the first byte of each page, pulling it in to memory.
If your collection+indexes do not fit into memory, only the tail end of data accessed during the touch will be available.