Using MongoDB and Redis together? - mongodb

We started with Redis, storing active data, logged in users, etc. We're using some pubsub too for realtime data passing.
Recently we added Mongo to fit our geo spatial needs, and it seems great for non-active data too.
How should these two work together? It is dumb to use both? Is it dumb to pass chunks of data from mongo to redis when they becomes active?
Our thoughts were that we might store everything in mongo but then pass user data from mongo to redis when a user is active and the data is likely to be accessed. I know Mongo does some cacheing like this on its own, we are new to both of them and just want to know how they should be used together, if at all.
Thanks!!

It is dumb to use both? Is it dumb to pass chunks of data from mongo to redis when they becomes active?
So I feel like there's actually a legitimate to test and validate this question. Redis is basically an "in-memory" DB, so how much better can you do by giving that RAM to Mongo?
Historically, we've used the Memcache/MySQL combo to basically "add RAM" to MySQL and limit the amount of writing it needed to do. We did this simply because it was complicated to shard MySQL.
However, MongoDB provides a sharding mechanism. So you can "add RAM" to a problem (along with "adding disks") simply by adding more shards.
Thanks to the way memory-mapped files work, MongoDB tends to keep recently used data in memory. So if you're pulling recent data into Redis, that data is probably also in memory on the MongoDB side, so it's not clear that you benefit from having it in two places.
Is it dumb ...
That's hard to say without some testing and analysis. MongoDB doesn't really have the pub/sub mechanism, but it does tend to have fast query times, so it may be appropriate in specific spots.

Related

MongoDB Document of Size 300kb taking 8-15s

I am using MongoDB Atlas Free Tier hosted on GCP. I have documents which have arrays containing 300kb data. A simple Get By ID query takes around 8-15 seconds. There are less than 50 records in the collection so probably indexing is not an issue. Also, the I have used my custom Ids, and not the built in ObjectIds in my collection. Is this much query time normal? If yes, what are some ways to address this issue as I need fast realtime analytics on Frontend. I already have Redis in mind, but is there any better way to address this?
Ensure your operations are not throttled. https://docs.atlas.mongodb.com/reference/free-shared-limitations/
Test performance with a different driver (another language), verify you are using most recent driver releases.
Test smaller documents to identify whether time is being expended on the server or over the network.
Test with mongo shell.
As for an answer, I highly recommend you not to deal with M0 Atlas tier. Or at least choose it wisely, don't choose US-based cluster if you thousand of miles away from States side. Don't understood me wrong. It's a good product. But it depends on your costs.
As for myself, I prefer to deal with MongoDB Community Edition version and deploy it on my VPS/VDS. Of course it doesn't provide you such good web-interface like you have seen in Atlas. And there is no support of Realms functional (stitch), but instead you could design it yourself. And also, every performance issue is depend on you.
As for me, I using MongoDB not for real-time data, but visual snapshots on front-end, and I have no problems with performance.
I mean if I have them, then I deal with them myself, via indexing,
increasing VPS CPU/RAM, optimizing queries and so on
Also, one more thing about your problem: «I have documents which have arrays containing 300kb data»
If you have an array field in your schema, and it stores lots of data, especially if it's embedded docs, are you sure that you are using right schema pattern?
You might wanna take a look at this articles at Mongo University about architecture patterns.
Probably it will be much better for you to have a different collection for embedded docs, and request them via aggregation.$lookup when they needed.

Should I use different databases or just different collections in MongoDB to store user information and rest of the database?

I am pretty new to MongoDB. I am creating an application where I will have users and a lot of other data.I have already created a database where I am storing user information using MongoDB. Now I have to create a new database or collection to store rest of the data. What are the pros and cons of creating different or different collection ?
I use MongoDB in a very similar way and have already thought a lot about dividing my database. Here are some of the things we considered:
Using 2 databases is harder to maintain, your application will have to know which database to update, also it can increase the costs (even more if you intend to monitor the databases and host on different infrastructure).
Mongo 2 used to lock the entire database when updating, so I think it would be better to separate then, but Mongo 3 with WiredTiger locks only the document, so you won't have the problems we used to have in the past.
One good thing about splitting the database in two is that even if your data explodes one database, the other will still work
IMHO, if you use a decent machine to store your databases and monitor it the right way, you won't have any troubles keeping just one until your system is giant with millions of active users. You can also use Replica Sets and Sharding to increase efficiency.

MongoDB with ElasticSearch

I'm looking for the right way so use ElasticSearch with MongoDB. I want to save several informations in MongoDB. Additionally i want to save a larger text with ElasticSearch to support complex fulltext-search.
My problem at the moment is:
I'm not sure what the best solution is for this. Most solutions i found to synchronize MongoDB with ElasticSearch are using "river" which is deprecated!
What is the best way to combine these two technologies?
Is it even the best way to save it in MongoDB and ElasticSearch?
I found multiple articles that explained, that ElasticSearch alone is not safe enough and that you have to use another DBMS.
Also under robustness on the mongoDB website I found this:
Unfortunately, Elasticsearch (and the components it's made of) does not currently handle OutOfMemory-errors very well.
[source]
So saving the data redundant is probably the best way.
Thanks in advance!
Hei,
We are also working with both Elasticsearch and MongoDb. We started with a river and after having a lot of issues with it we got rid of it before becoming deprecated. The way we do it is: when saving data to mongo we create a message in a queue which notifies the search storage to do the insert/delete operation with the given data.
So basically we keep them in sync manually and there will always be a delay between mongo and elaticsearch. The good part is that if elasticsearch would fail, we have implemented an endpoint which reimports the data from mongo to ES. Also, the structure inside ES it's different from the one in mongo. Before, it was a lot more complicated to do this with the river. Imagine that we even had our own custom implementation.
Hope my answer helps at least a bit.

Configure a Mongo replica set to only replicate certain collections

I have a ~3GB mongo database with several dozen collections. Three of these collections handle ~300 queries per second, while the rest sustain a much lower volume. I expect the traffic to continue to grow quickly.
I'd like to set up a replica set to handle the high-traffic collections. It isn't necessary for this new instance to replicate the rest of the database. Is this possible?
Seems like not possible at the moment by built-in features of mongodb and only way to do is to come up with your own manual replication algorithm or use some other tools written by third parties.
https://github.com/wordnik/wordnik-oss project might help you to achieve this according to the following post.
https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/Ap9V4ArGuFo
Describes workaround to filter documents in replication.
Replicate only documents where {'public':true} in MongoDB
Or just replicate the data yourself manually which might worth trying.
Good luck.
No that isn't possible now. What you could do is move those collections into another unreplicated database. But this will cause headaches once these collections see higher traffic too, so you would need to move them into your "replication"-db.
But in general Replication isn't the way to go if you need to scale, it's more considered for DR/failover. Replicaset Secondaries can only (optionally) answer read queries but no write queries, this is something you should keep in mind. So if you have high write load this may not cure your problem.
Once you allow your application to read from secondaries you need to live with eventual consistency, meaning that your application isn't guaranteed to see always the latest data. This is caused due to the asynchronous replication to the secondaries.
Indeed you can cure this problem if you configure your writeconcern, so that the write needs to succeeded on all replicas, before it's considered written and your driver returns. But this may slow down your write operations significant.
So for scaling query execution capabilities I would go with Sharding. This is possible on a per collection level, all unsharded collections will remain on a "default-shard".
Not possible but then if the data size is so small and these collections aren't updated, then the only overhead of having them replicated is the small storage size on the secondary. That is a relatively small price to pay, especially since the collections won't grow in size, compared with writing your own replication logic.
Instead of that archive the data, and have only the latest data set on the production server and the rest of the data can archive on the new server.

MongoDB with redis

Can anyone give example use cases of when you would benefit from using Redis and MongoDB in conjunction with each other?
Redis and MongoDB can be used together with good results. A company well-known for running MongoDB and Redis (along with MySQL and Sphinx) is Craiglist. See this presentation from Jeremy Zawodny.
MongoDB is interesting for persistent, document oriented, data indexed in various ways. Redis is more interesting for volatile data, or latency sensitive semi-persistent data.
Here are a few examples of concrete usage of Redis on top of MongoDB.
Pre-2.2 MongoDB does not have yet an expiration mechanism. Capped collections cannot really be used to implement a real TTL. Redis has a TTL-based expiration mechanism, making it convenient to store volatile data. For instance, user sessions are commonly stored in Redis, while user data will be stored and indexed in MongoDB. Note that MongoDB 2.2 has introduced a low accuracy expiration mechanism at the collection level (to be used for purging data for instance).
Redis provides a convenient set datatype and its associated operations (union, intersection, difference on multiple sets, etc ...). It is quite easy to implement a basic faceted search or tagging engine on top of this feature, which is an interesting addition to MongoDB more traditional indexing capabilities.
Redis supports efficient blocking pop operations on lists. This can be used to implement an ad-hoc distributed queuing system. It is more flexible than MongoDB tailable cursors IMO, since a backend application can listen to several queues with a timeout, transfer items to another queue atomically, etc ... If the application requires some queuing, it makes sense to store the queue in Redis, and keep the persistent functional data in MongoDB.
Redis also offers a pub/sub mechanism. In a distributed application, an event propagation system may be useful. This is again an excellent use case for Redis, while the persistent data are kept in MongoDB.
Because it is much easier to design a data model with MongoDB than with Redis (Redis is more low-level), it is interesting to benefit from the flexibility of MongoDB for main persistent data, and from the extra features provided by Redis (low latency, item expiration, queues, pub/sub, atomic blocks, etc ...). It is indeed a good combination.
Please note you should never run a Redis and MongoDB server on the same machine. MongoDB memory is designed to be swapped out, Redis is not. If MongoDB triggers some swapping activity, the performance of Redis will be catastrophic. They should be isolated on different nodes.
Obviously there are far more differences than this, but for an extremely high overview:
For use-cases:
Redis is often used as a caching layer or shared whiteboard for distributed computation.
MongoDB is often used as a swap-out replacement for traditional SQL databases.
Technically:
Redis is an in-memory db with disk persistence (the whole db needs to fit in RAM).
MongoDB is a disk-backed db which only needs enough RAM for the indexes.
There is some overlap, but it is extremely common to use both. Here's why:
MongoDB can store more data cheaper.
Redis is faster for the entire dataset.
MongoDB's culture is "store it all, figure out access patterns later"
Redis's culture is "carefully consider how you'll access data, then store"
Both have open source tools that depend on them, many of which are used together.
Redis can be used as a replacement for a traditional datastore, but it's most often used with another normal "long" data store, like Mongo, Postgresql, MySQL, etc.
Redis works excellently with MongoDB as a caching server. Here is what happens.
Anytime that mongoose issues a cache query, it will first go over to the cache server.
The cache server will check to see if that exact query has ever been issued before.
If it hasn’t then the cache server will take the query, send it over to mongodb and Mongo will execute the query.
We will then take the result of that query, it then goes back to the cache server, the cache server will store the result of the query on itself.
It will say anytime I execute that query, I get this response and so its going to maintain a record between queries that are issued and responses that come back from those queries.
The cache server will take the response and send it back to mongoose, mongoose will give it to express and it eventually ends up inside the application.
Anytime that the same exact query is issued again, mongoose will send the same query to the cache server, but if the cache server sees that this query was issued before it will not send the query onto mongodb, instead its going to take the response to the query it got the last time and immediately send it back over to mongoose. There is no indices here, no full table scan, nothing.
We are doing a simple lookup to say has this query been executed? Yes? Okay, take the request and send it back immediately and don’t send anything to mongo.
We have the mongoose server, the cache server (Redis) and Mongodb.
On the cache server there might be a datastore with key value type of data store where all the keys are some type of query issued before and the value the result of that query.
So maybe we are looking up a bunch of blogposts by _id.
So maybe the keys in here are the _id of the records we have looked up before.
So lets imagine that mongoose issues a new query where it tries to find a blogpost with _id of 123, the query flows into the cache server, the cache server will check to see if it has a result for any query that was looking for an _id of 123.
If it does not exist in the cache server, this query is taken and sent on to the mongodb instance. Mongodb will execute the query, get a response and send it back.
This result is sent back over to the cache server who takes that result and immediately sends it back to mongoose so we get as fast a response as possible.
Right after that, the cache server will also take the query issued, and add that on to its collection of queries that have been issued and take the result of the query and store it right up against the query.
So we can imagine that in the future we issue the same query again, it hits the cache server, it looks at all the keys it has and says oh I already found that blogpost, it doesn’t reach out to mongo, it just takes the result of the query and sends it directly to mongoose.
We are not doing complex query logic, no indices, nothing like that. Its as fast as possible. Its a simple key value lookup.
Thats an overview of how the cache server (Redis) works with MongoDB.
Now there are other concerns. Are we caching data forever? How do we update records?
We don’t want to always be storing data in the cache and be reading from the cache.
The cache server is not used for any write actions. The cache layer is only used for reading data. If we ever write data, writing will always go over to the mongodb instance and we need to ensure that anytime we write data we clear any data stored on the cache server that is related to the record we just updated in Mongo.