MongoDB In-Memory Storage engine - mongodb

We are currently using the Wired Tiger storage engine with MongoDB 3.2
We have tweaked a server (196Go RAM, disable journal) in order to use mongodb as a cache server (no replication, write concern = 0 for fire & forget writes).
I'd like to know if it could be interesting for us to switch to the In Memory storage engine knowing that our data already fits in memory. Is there another benefits?
Thank you

As you have already disabled journaling and the data fits your memory, switching to In-memory storage engine makes more sense as it provides the performance benefits that comes with the engine.
Just make sure that writeConcernMajorityJournalDefault to false, not sure if write concern=0 servers the same purpose during write ACKs.

Related

What is the recommended EC2 instance size for MongoDB

We plan to use MongoDB in production for our website which will be hosted (for the start) on a single ec2 instance.
What is the recommandation for a MongoDB, which will have around 25k documents for the start with low traffic? So far i am not used to AWS, therefore a have no comparison to other dedicated hosters.
The "storagesize" of the collection in question will be arond 400MB, "totalindexsize" maybe around 20MB
If it's not a production instance, you can start with t2.medium instance. If its a production instance, start with m5.large.
Attach a new ebs volume of size 10GB and configure mongodb to use this new volume as the data directory. This helps to scale up your storage easily at later point of time.
Make sure you format your ebs volume to have xfs file system before installing mongodb which is required for best performance by mongo.
Also, later if you feel like increasing the instace size wheb your traffic increases, just use the "instance modify" option to get it done.
I cannot give you a specific answer, but the cloud provides you with the option to test your setup very quickly. Just start with some instance (e.g. m3.medium) and you can create an Amazon Machine Image of your running MongoDB instance anytime and just start it on a larger or smaller instance type.
You can find the instance types here: https://aws.amazon.com/ec2/instance-types/
Deeper thought about the instance type choice can be found here: https://www.mongodb.com/blog/post/maximizing-mongodb-performance-on-aws
If you have any doubt about sizing, err on the side of going larger and then scaling down as needed. Undersizing your instances when deploying MongoDB can, and likely will, create performance problems. In order to achieve the best performance, your MongoDB working set should fit in memory.
The working set is the portion of data and related indexes that your clients access most frequently.

How to performance test MongoDB Storage Engine for website Session data?

I'm looking to utilize MongoDB for session data storage, so we don't need sticky sessions in our load balanced environment.
As of 3.0, we can use different storage engines within MongoDB.
While MMapV1 and WiredTiger come out of the box, it's also possible to run other storage engines (RocksDB?).
What I would like to do is test out my website using MongoDB with the different storage engines backed behind it.
I currently have a JMeter script that will hit multiple pages on the site for many different users.
Between tests I can switch out the Mongo connection, to different Mongod instances on different storage engines.
All I can really take out of this is the average latency for the page loads in JMeter.
Is there better results I can find, possibly using different tools or techniques?
Or, for session data, which is heavily read/write, is there one storage engine that would be preferred over another?
I'm not sure if this question is too open-ended or not, but I thought I'd ask here to maybe get more direction about how to test this out.
An important advantage of WiredTiger over the default MMAP storage engine is that while MMAP locks the whole collection for a write, WiredTiger locks only the affected document(s). That means multiple users can change multiple documents at the same time. This is especially interesting in your case of session data, because you will likely have many website visitors at the same time, each one regularly updating their own session document. But when you want to test if this feature really provides a benefit in your use-case, you will have to build a more sophisticated test setup which simulates many simultaneous updates and requests from multiple users.
Another interesting feature of WiredTiger is that it compresses both data and indexes, which greatly reduces filesize. But this feature does of course cost performance. So when you only want to compare performance, you should switch off compression to have a fair comparison. The relevant config keys are:
storage.wiredTiger.collectionConfig.blockCompressor = none
storage.wiredTiger.indexConfig.prefixCompression = false
Keep in mind that changes to these keys will only take effect on newly created collections and indexes.
Another factor which could skew your results is cache size. The MMAP engine always uses all the RAM it can get to cache data. But WiredTiger is far more conservative and only uses half of the available RAM, unless you set a different value in
storage.wiredTiger.engineConfig.cacheSizeGB
So when you want a fair comparision, you should set this to the RAM size of the machine it runs on, minus the ram required by other processes running on the same machine. But this will of course only make a difference when your test uses more test data than fits into memory, so that the cache handling of both engines starts to matter.

Lock rate in mongo 3.0

I've been playing with mongo 3.0 rc7 and rc8 and I've discovered that mongostat doesn't show lock rate column whether I use MMAPv1 or WiredTiger engine. Similarly in MMS, "lock %" chart is unavailable for 3.0 systems
We've been using lock rate in our monitoring systems, and also as one of the measurement during performance tests (we've been running same sets of heavy load tests via Gatling or Tsung and observing if recent optimizations in our usage of DB have some real impact, and also to discover if some new features doesn't have regression in this area).
Is there a way to find this value some way in mongo 3? Now we mainly want to run comparison tests on 2.6.7 and 3.0.0-rc8 to see what the difference is, and while we of course get nice set of data from the application performance standpoint, we'd also like to compare some DB stats and lock rate was one of them. Or are we completely missing the point and collection level locks in v3 MMAPv1 or document level locks in WiredTiger are now pointless to measure or compare? If so, how can we measure, what is the DB limit at heavy load (in < 2.6.7 it was fairly easy, usually lock rate was the first thing to fire and once it got above 70-80% we knew that it's the upper limit), or test regressions/improvements in how we use DB?
Many thanks
It's not pointless to compare some kind of lock statistics for mmapv1 and WiredTiger, but I think the situation right now is that it's unclear what you should be looking at in WiredTiger for comparison. The operation of the storage engine is very different from mmapv1. Presently, I think you'll want to look at other statistics like throughput, and you can expect more statistics and more guidance on using them in future versions of MongoDB with WiredTiger.

Can you turn off MongoDB caching?

Can you turn off MongoDB cacheing? We are using Redis to store active data. We plan to use redis to hold this data in a normalized way while it is being used. One of the sources that passes data into redis is a mongodb instance.
Since we are using redis to keep the data in memory can we turn off mongos cacheing features?
Thanks!
MongoDB uses memory-mapped I/O, this means that the OS caches the data not the datababase and it's not possible to turn this caching off.
The OS tipically uses least-recently-used algorithm to drop cache if memory is needed, so if you don't request old data from MongoDB those pages are going to be freed, which means it won't really interfere with Redis.

Is Memcache recommended when using MongoDB?

I would like to know if Memcache is recommended when using a NoSQL database like mongoDB.
The concept of using memcache stems from the idea that you have "extra RAM" sitting around somewhere. Both MongoDB and MySQL (and most DBs) will take every meg of RAM that they can get.
In the case of the very common MySQL / Memcache, it is very well documented that using Memcache is more about reducing query load on the server than it is about speeding up queries. A good memcache implementation basically just tries to keep the most common data in memory so that the database server can churn away on bigger stuff.
In fact, it's been my experience that use of memcache generally becomes a reliance on memcache to maintain system performance.
So back to the original question, where do you have extra RAM?
If you have extra RAM on web servers, you may be able to use Memcache. Of course, you could also run Mongo locally on the web server. Just slave the data you need from the master.
If you have extra RAM on other computers, then there's not really a point in using memcache. Just add more nodes to your MongoDB replica set or shard. This is where MongoDB actually shines. Because of sharding / replication, you can add more RAM to Mongo Horizontally to increase performance. With SQL it's very difficult to "just add more servers" because joins don't scale very well. But with Mongo, it's quite possible to simply "add more nodes" to a problem.
MongoDB stores everything in memory anyway and works in a similar vein, being a key-value based system, however I believe MongoDB is more flexible, as it allows for storing BSON objects within themselves.
(Just for clarification, MongoDB uses BSON, a specialised form of JSON, for storing all its data, which includes objects within objects.)
At first no. If you run into performance problems later add a caching layer (memcache). But you won't gain anything if you're going to use Redis for example, as Redis already stores everything in memory.
The answer would depend on your use cases.
In general, accessing RAM is orders of magnitude faster than accessing disk.
Even the fastest SSD drives are about 100 times slower to access than RAM.
Now, I don't know if Mongo has a caching system in place (most likely it does), or what the eviction policy is, but as a programmer i would prefer a cache where i can store/retrieve and delete items at will. Therefore i would prefer using a caching solution even with Mongo.
In summary, it really depends what you are using these solutions for. There is no one answer to cover all possible uses.