Can you turn off MongoDB caching? - mongodb

Can you turn off MongoDB cacheing? We are using Redis to store active data. We plan to use redis to hold this data in a normalized way while it is being used. One of the sources that passes data into redis is a mongodb instance.
Since we are using redis to keep the data in memory can we turn off mongos cacheing features?
Thanks!

MongoDB uses memory-mapped I/O, this means that the OS caches the data not the datababase and it's not possible to turn this caching off.
The OS tipically uses least-recently-used algorithm to drop cache if memory is needed, so if you don't request old data from MongoDB those pages are going to be freed, which means it won't really interfere with Redis.

Related

MongoDB In-Memory Storage engine

We are currently using the Wired Tiger storage engine with MongoDB 3.2
We have tweaked a server (196Go RAM, disable journal) in order to use mongodb as a cache server (no replication, write concern = 0 for fire & forget writes).
I'd like to know if it could be interesting for us to switch to the In Memory storage engine knowing that our data already fits in memory. Is there another benefits?
Thank you
As you have already disabled journaling and the data fits your memory, switching to In-memory storage engine makes more sense as it provides the performance benefits that comes with the engine.
Just make sure that writeConcernMajorityJournalDefault to false, not sure if write concern=0 servers the same purpose during write ACKs.

Performance of MongoDB using GlusterFS

We have several disk arrays that are shared in a distributed file system across multiple servers using GlusterFS. It works really well.
The problem is, we have no available storage that is not appropriated to the distributed file system. As a result, I have stored our MongoDB data within the distributed file system.
For now, I have no benchmarks for performance considering it is the only available solution for my setup. However, I've been thinking of dedicating a disk array and server to only mongo, where I would plug the disk array directly into the server.
Does anyone know why you should, or should not store mongo data on top of distributed file system? I know Mongo has it's own sharding solution for precisely this reason, so I'm thinking that it's not ideal. If you have multiple blocks of data that mongo thinks are in the same location, however they are actually on different storage media, can this cause a performance issue?

Which noSQL database is best for high volume inserts / writes?

Which nosql system is better equipped for handling high volume inserts out of the box?
Preferably, running on 1 physical machine (many instances allowed).
Has anyone done any benchmarks? (googling did not help)
Note: I understand that choosing noSQL database depends on what kind of data needs to be stored (document:MongoDB, graph:Neo4j, etc.).
If you want fast write speed, you can just insert your data into memory and flush data to the disc at a background every minute or so. That should be fastest solution.
MongoDB and Redis do this actually. For example, in mongodb you can go without journal enabled and writes will be very fast. But keep in mind that if you store data in memory at a single server there is possibility to loose your data (data that not flushed to the disc yet) when your server goes down.
In general, what database to use highly depends on data you want to store and task you are trying to solve.
Apache Cassandra is great in write operations, thanks to its unique persistence model. Some claim that it writes about 20 times faster than it reads but I believe it's really dependent on your usage profile.
Read about it in their FAQ and in various blog posts.
That is, of course, if you have "classical" DB profile of large amounts of data. If your data is small, or is used temporarily and/or as a cache layer, then of course opt for Redis which has the fastest throughput both for reads and for writes, since it's memory-based (with eventual disk persistence).
If you're dealing with a complex object model for inserts your best option is an object database like Versant's:
http://www.versant.com/vision/The_Magic_Cube.aspx
According to my benchmarks, Cassandra is better than MongoDB on large arrays, but MongodDB is more flexible.

is memcached just instantiating another virtual operating system?

I have read a few tutorials on memcached and I have a few questions, in order to ease the pain of requests to the default database.
What is being instantiated to allow memcached to operate?
Is it virtual operating systems with say mysql installed or is the database in its entirety being stored in ram?
My other question is say i have a blog and using memcache and a user comes to request data from the browser and the request first checks the memcache for the data and sees that the data exists and is displayed to that user.
What if the data being requested doesn't match what is on the original database because i had updated it myself. how will the cache know that i changed it?
Is it always checking to see if the data on the db is the same as what is cached?
From the memcached front-page:
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Although memcached is frequently used with MySQL, it has no particular ties to MySQL or any other database. It is just a simple key-value store providing constant time (O(1)) access to data cached by key. The data is stored in memory by the memcached process. (Much of this is explained on the FAQ).
Regarding your second question, it is really your application / your responsibility to ensure that memcached is notified of any changes. You can do this via reasonable expiration periods on your cached data or by using a script or the command line interface to manually purge stale entries. Some frameworks will handle notifying memcached of changes provided the change is made through the framework. Ultimately, if you need to ensure that users always have access to the latest data in real-time, than caching is not a good solution for your problem. Caching works on the principle that it's ok to occasionally serve up stale data -- you should construct your application so that it caches data that can be stale, but always uses look-ups to authoritative sources for data that must be fresh.
1
You will start a memcached server in every machine you need, assigning an amount of memory to dedicate to memcached.
Then with the library memcached you will use the amount of memory on every single server.
NB There is no manner to know in which server a single object will be stored.
2
The mechanism of duplicates is easy: you can set a timeout for the object. When the timeout elapses the system will delete that object.
To store an object you will assign to that object a key as an hash because you don t want that 2 object have the same key.

Is Memcache recommended when using MongoDB?

I would like to know if Memcache is recommended when using a NoSQL database like mongoDB.
The concept of using memcache stems from the idea that you have "extra RAM" sitting around somewhere. Both MongoDB and MySQL (and most DBs) will take every meg of RAM that they can get.
In the case of the very common MySQL / Memcache, it is very well documented that using Memcache is more about reducing query load on the server than it is about speeding up queries. A good memcache implementation basically just tries to keep the most common data in memory so that the database server can churn away on bigger stuff.
In fact, it's been my experience that use of memcache generally becomes a reliance on memcache to maintain system performance.
So back to the original question, where do you have extra RAM?
If you have extra RAM on web servers, you may be able to use Memcache. Of course, you could also run Mongo locally on the web server. Just slave the data you need from the master.
If you have extra RAM on other computers, then there's not really a point in using memcache. Just add more nodes to your MongoDB replica set or shard. This is where MongoDB actually shines. Because of sharding / replication, you can add more RAM to Mongo Horizontally to increase performance. With SQL it's very difficult to "just add more servers" because joins don't scale very well. But with Mongo, it's quite possible to simply "add more nodes" to a problem.
MongoDB stores everything in memory anyway and works in a similar vein, being a key-value based system, however I believe MongoDB is more flexible, as it allows for storing BSON objects within themselves.
(Just for clarification, MongoDB uses BSON, a specialised form of JSON, for storing all its data, which includes objects within objects.)
At first no. If you run into performance problems later add a caching layer (memcache). But you won't gain anything if you're going to use Redis for example, as Redis already stores everything in memory.
The answer would depend on your use cases.
In general, accessing RAM is orders of magnitude faster than accessing disk.
Even the fastest SSD drives are about 100 times slower to access than RAM.
Now, I don't know if Mongo has a caching system in place (most likely it does), or what the eviction policy is, but as a programmer i would prefer a cache where i can store/retrieve and delete items at will. Therefore i would prefer using a caching solution even with Mongo.
In summary, it really depends what you are using these solutions for. There is no one answer to cover all possible uses.