MongoDB - Database vs Collection memory-wise - mongodb

Lets say for example i need to have 5 collections, each collection is about 10GB.
What is the difference in performance, with emphasis on memory usage, between assigning each said collection to a database, versus having all of these collections in the same database?
Also, in this scenario, whats the difference between MMAPv1 storage engine and TigerWire?

In MongoDB, database is just a namespace. All data is stored in collections. This is true for both MMAPv1 (deprecated in 4.0, not available anymore in the upcoming 4.2) and WiredTiger storage engine.
Due to this, separating your data into separate databases or not doesn't make any difference storage-wise, unless you specify directoryPerDB setting, which simply put collections related to a database in their own folder.

Related

What is the exact limit on the number of collections in a MongoDB database based on the WiredTiger(MongoDB 4.0) engine as of today(May 2019)?

I'm trying to put a forum-like structure in a MongoDB 4.0 database, which consists of multiple threads under a same "topic", each thread consists of a bunch of posts. So usually there are no limits on the numbers of the threads and posts. And I want to try fully utilizing the benefits of NoSQL features, grabbing a list of posts under any speicified thread at one time without having to scan and look up for the identical "thread_id" and "post_id" in a RDBMS table in the traditional way, so in my mind I want to put all the threads as collections in a database, as the thread_id as the code-generated collection names, and put all the posts of a thread as normal documents under that collection, so the way to access a post may look like:
forum_db【database name】.thread_id【collection name】.post_id【document ID】
But my concern is, despite of the obscure phrase saying at https://docs.mongodb.com/manual/reference/limits/#data,
Number of Collections in a Database
Changed in version 3.0.
For the MMAPv1 storage engine, the maximum number of collections in a database is a function of the size of the namespace file and the number of indexes of collections in the database.
The WiredTiger storage engine is not subject to this limitation.</pre>
Is it safe to do it in this way in terms of performance and scalability? Can we safely take it that there is no limit on the number of collections in a WiredTiger database (MongoDB 4.0+) today as there is pratically no limit on the number of documents in a collection? Many thanks in advance.
To calculate how many collections one can store in a MongoDB database, you need to figure out the number of indexes in each collection.
WiredTiger engine keeps an open file handler for each used collection (and its indexes). A large number of open file handlers can cause extremely long checkpoints operations.
Furthermore, each of the handles will take about ~22KB memory outside the WT cache; this means that just for keeping the files open, mongod process will need ~NUM_OF_FILE_HANDLES * 22KB of RAM.
High memory swapping will lead to a decrease in performance.
As you probably understand from the above, different hardware (RAM size & Disk speed) will behave differently.
From my point of view, you first need to understand the behavior of your application then calculate the required hardware for your MongoDB database server.

In MongoDB, does a lock apply to a collection, a database, or a server?

In a MongoDB server, there may be multiple databases, and each database can have multiple collections, and a collection can have multiple documents.
Does a lock apply to a collection, a database, or a server?
I asked this question because when designing MongoDB database, I want to determine what is stored in a database and what is in a collection. My data can be partitioned into different parts, and I hope to be able to move a part from a MongoDB server to a filesystem, without being hindered by the lock that applies to another part, so I wish to store the parts of data in a way that different parts have different locks.
Thanks.
From the official documentation : https://docs.mongodb.com/manual/faq/concurrency/
Basically, it's global / database / collection.
But with some specific storage engines, it can lock at document level too, for instance with WiredTiger (only with Mongo 3.0+)

How to choose from MMAPV1, WiredTiger or In-Memory StorageEngine for MongoDB?

In MongoDb Documentation 3.2 I saw that they support 3 Storage Engine,
MMAPV1, WiredTiger, In-Memory, it is very confusing which one to choose.
I have the sensation from the description that WiredTiger is better than MMAPV1, but in other sources they say that MMAPV1 is better for heavy reads... and WiredTiger for heavy writes...
Is there some constraints when to choose one over the other ?
Can someone suggest some best practices for example
when I have this type of application usually is best this , else choose an other...
This is from personal experience, however please have a look at this blog entry it explains very well different types of the engines:
Mongo Blog v3
Comparing the MongoDB WiredTiger and MMAPv1 storage engines.Higher Performance & Efficiency Between 7x and 10x Greater Write Performance
MongoDB 3.0 provides more granular document-level concurrency control, delivering between 7x and 10x greater throughput for most write-intensive applications, while maintaining predictable low latency.
For me choice was very simple, I needed document level locks which makes WiredTiger ideal choice, we don't have Enterprise version of mongo hence in memory engine is not available. MMAPv1 Btree is very basic technique to map memory to hard drive and not very efficient.
The MMAP storage engine uses a process called “record allocation” to grab disk space for document storage. All records are contiguously located on disk, and when a document becomes larger than the allocated record, it must allocate a new record. New allocations require moving a document and updating all indexes that refer to the document, which takes more time than in-place updates and leads to storage fragmentation. Furthermore, MMAPv1 in it’s current iterations usually leads to high space utilization on your filesystem due to over-allocation of record space and it’s lack of support for compression.
As mentioned previously, a storage engine’s locking scheme is one of the most important factors in overall database performance. MMAPv1 has collection-level locking – meaning only one insert, update or delete operation can use a collection at a time. This type of locking scheme creates a very common scenario in concurrent workloads, where update/delete/insert operations are always waiting for the operation(s) in front of them to complete. Furthermore, oftentimes those operations are flowing in more quickly than they can be completed in serial fashion by the storage engine. To put it in context, imagine a giant supermarket on Sunday afternoon that only has one checkout line open: plenty of customers, but low throughput!
Everyone has different requirements, but for most cases WiredTiger would be ideal choice the fact that it makes atomic operations on document level and not collection level has a great advantage, you simply can't beat that.
More reads and not a lot of writes
If reading is your main concern here is one way to address that.
You can tweak Mongo Driver Read Preference Modes in the following way:
Setup Replica Set, say 1 master and 3 secondaries.
Set write concern to majority this would make
write a bit slower (trade off).
Set read preference to secondary.
This setup will perform very well when you have a lot of reads, but as a tradeoff write would be slower. However throughput of read data would be great.
I hope this helps if you have additional questions add them as a comment and I will try to address it in this answer.
Also you can check MMAPv1 vs WiredTiger review and notice how he changed his mind from MMAPv1 to WiredTiger. The seller is document locking that performance you just can't beat.
For new projects, I use WiredTiger now. Since a migration from a compressed to an uncompressed WiredTiger storage is rather easy, I tend to start with compression to enhance the CPU utilization ("get more bang for the buck"). Should the compression have a noticeable impact on performance or UX, I migrate to uncompressed WiredTiger.
MongoDB database profiler
Best way of determining your database needs is to setup test cluster and run application on it with MongoDB profiler
Like most database profilers, the MongoDB profiler can be configured to only write profile information about queries that took longer than a given threshold. So once you know slow queries you can figure out if it reads vs writes or cpu vs ram and go from there.
You should use a replica set consisting of both in-memory and WiredTiger storage engines. And you should shard your MongoDB in such a way that the most frequented data should be accessed by the in-memory storage engine and rest uses WiredTiger storage engine.
After acquiring WiredTiger in 2014, MongoDB introduced this storage engine as their default storage engine from version 3.2. Thereafter, they themselves started to encourage users to use WiredTiger because of its following advantages over MMAPV1:
WiredTiger uses document level concurrency whereas MMAPV1 uses collection level locking. That means multiple users can write to a collection simultaneously using WiredTiger but not using MMAPV1.
Since WiredTiger manages its own memory, it can use compression whereas MMPAV1 doesn't have any such feature.
WiredTiger doesn't allow any in-place updates. Thus, eventually it reclaims the space that is no longer used.
Only advantages of MMPAV1 over WiredTiger which I have found so far is:
WiredTiger is not available on Solaris platform whereas MMPAV1 is.
Even while updating a big document with only a single element, WiredTiger re-writes the whole document making it slower.
So you can always left MMPAV1 out while choosing your storage engine. Now let's come to the point of in-memory storage engine. Starting in MongoDB Enterprise version 3.2.6, the in-memory storage engine is part of general availability (GA) in the 64-bit builds.
It has the following advantages over the storage engines:
Similar to WiredTiger, the in-memory storage engine also allow document level concurrency.
The in-memory storage engine is lot faster than others.
By avoiding disk I/O, the in-memory storage engine allows for more predictable latency of database operations.
But this storage engine has quite a few disadvantages as well:
The in-memory storage engine does not persist data after process shutdown.
If your dataset is way too large, then in-memory engine is not a good option.
In-memory storage engine requires that all its data (including oplog if mongod is part of replica set, etc.) fit into the specified --inMemorySizeGB command-line option or storage.inMemory.engineConfig.inMemorySizeGB setting.
Check the MongoDB Manual for example Deployment Architectures using in-memory storage engine.

MongoDB - Is there any advantage in moving frequently updated fields to different collections?

I'm new to MongoDB and Document-Oriented Databases and while I was migrating a relational database to this whole new concept of storing a question raised:
In relational databases it's usually a good idea to create a new table to store frequently updated fields (let's say you have a user's table and a last_activity one) so that the slow write operations don't lock the other tables.
Is there any advantage of doing the same in MongoDB, since the read operations seem to be very performant and doing two queries wouldn't be much of a problem?
Thank you all in advance.
Starting with version mongodb 3.2, is already in use by default wiredtiger. This engine is not necessary to create additional collections.
Well, do not forget to create updatable fields Index.
db.test.ensureIndex({name: 1});
db.test.update({"name":"Alex"}, {$set:{"last_name":"alexeev"}})
If you use the default storage engine, MMAPv1, then you have collection-level concurrency and it may be beneficial to create new collections for frequently updated fields.
However, the WiredTiger storage engine has document-level concurrency and there is no need to create additional tables.
https://docs.mongodb.org/v3.0/core/wiredtiger/

creating a different database for each collection in MongoDB 2.2

MongoDB 2.2 has a write lock per database as opposed to a global write lock on the server in previous versions. So would it be ok if i store each collection in a separate database to effectively have a write lock per collection.(This will make it look like MyISAM's table level locking). Is this approach faulty?
There's a key limitation to the locking and that is the local database. That database includes a the oplog collection which is used for replication.
If you're running in production, you should be running with Replica Sets. If you're running with Replica Sets, you need to be aware of the write lock effect on that database.
Breaking out your 10 collections into 10 DBs is useless if they all block waiting for the oplog.
Before taking a large step to re-write, please ensure that the oplog will not cause issues.
Also, be aware that MongoDB implements DB-level security. If you're using any security features, you are now creating more DBs to secure.
Yes that will work, 10gen actually offers this as an option in their talks on locking.
I probably isolate every collection, though. Most databases seem to have 2-5 high activity collections. For the sake of simplicity it's probably better to keep the low activity collections grouped in one DB and put high activity collections in their own databases.