MongoDB is not scaling - mongodb

I am building out some tools that require <1s response times and using Mongo (or SQL for that matter) will resolve that. I am pretty sure that I'll need to either use Redis or some in-memory data structure. Right now my Mongo aggregate takes almost 5 seconds to load 30 rows. All the collections are indexed but there are lookups within lookups which I think is where the issue lies (i.e. having a reference to the user and having multiple users, need to get their personal info). I also have pagination enabled and on Mongo 5.0 (which uses indexes for sub lookups) Few questions:
Should I just switch to a SQL database?
Should I add change streams to the collection I need to query and update a redis instance and read from there?
Should I just embed the information I'm extracting from the lookups into the parent document
Do 2 and 3?

Related

MongoDB multiple collections or multiple databases

We are using .net Core and node.js micro services some of them with mongoDB.
Currently we got the following DB structure :
Every customer gets his own Database.
So if we got a micro service for Invoices, every new customer adds 1 new DB for that micro service.
Invoice_customerA
Invoice_customerB
etc...
While the collections in each such DB remain the same (usually we got 1-3 collections in each DB)
In terms of logic - We choose the right DB by request input in runtime.
I am thinking now about changing it a bit, to start making separation on the collections instead:
So if we take the same example from before this time around this Invoice Service will only have 1 DB,
Invoice_allCustomers
and there will be 1 new collection for each customer in it ( or more if there were more collections for this service).
collection_customerA
collection_customerB
What I am trying to understand is if there is any difference performance wise?
Or is it mostly a "cosmetic" change?
Or maybe there are some other considerations?
P.S.
If the change is mostly cosmetic I am thinking that the new solution is better for us since we usually got only 1-2 collections per each micro service.
And it will be easier to navigate when there are significantly less Databases.
As far as I know in microservices,each service should have its own database. If it is not a different service than you can use one database with different collections in it. It is more of cosmetic changes but I should also warn you that mongodb still has it limits which you can find here. It really depends on the amount of data that will be stored and retrieved.

Meteor MongoDB Server Aggregation into new Collection

I'm currently experimenting with a test collection on a LAN-accessible MongoDB server and data in a Meteor (v1.6) application. View layer of choice is React and right now I'm using the createContainer to bind the subscriptions to props.
The data that gets put in the MongoDB storage is updated on a daily basis and consists of a big set of data from several SQL databases, netting up to about 60000 lines of JSON per day. The data has been ever-so-slightly reshaped to be turned into a usable format whilst remaining as RAW as I'd like it to be.
The working solution right now is fetching all this data and doing further manipulations client-side to prepare the data for visualization. The issue should seem obvious: each client is fetching a set of documents that grows every day and repeats a lot of work on earlier entries before being ready to display. I want to do this manipulation on the server, through MongoDB's Aggregation Framework.
My initial idea is to do the aggregations on the server and to create new Collections containing smaller, more specific datasets without compromising the RAWness of the original Collection. That would mean the "reduced" Collections can still be reactive, as I've been able to confirm through testing in a Remote Desktop, subscribing to an aggregated Collection which I can update through Robo3T.
I don't know if this would be ideal. As far as storage goes, there's plenty of room for the extra Collections. But I have no idea how to set up an automated aggregation script on said server. And regarding Meteor, I've tried using meteorhacks:aggregate and jcbernack:reactive-aggregate but couldn't figure out how to deal with either one of them. If anyone is dealing, or has dealt with, something similar; I'd love to hear ideas / suggestions.

Caching query results in MongoDB

I will be working on a large data set that changes slowly so I want to optimize the query result time by using a caching mechanism. For example , if I want to see some metrics about the data from the last 360 days I don't need to query the database again because I can reuse the last query result.
Does MongoDB natively support caching or do I have to use another database , for example Redis as mentioned here
EDIT : my question is different from Caching repeating query results in MongoDB because I asked about external caching systems and the response in the late question was specific to working with MongoDB and Tornado
The author of the Motor (MOngo + TORnado) package gives an example of caching his list of categories here: http://emptysquare.net/blog/refactoring-tornado-code-with-gen-engine/
Basically, he defines a global list of categories and queries the database to fill it in; then, whenever he need the categories in his pages, he checks the list: if it exists, he uses it, if not, he queries again and fills it in. He has it set up to invalidate the list whenever he inserts to the database, but depending on your usage you could create a global timeout variable to keep track of when you need to re-query next. If you're doing something complicated, this could get out of hand, but if it's just a list of the most recent posts or something, I think it would be fine.

Multitenancy in MongoDb

I am building a Multitenant MongoDb system. How to switch between Db's depending upon request. I am using MongoDb with Node js using MongoDb native Driver.
Your MongoClient object has a method .db(dbname) which returns a reference to a different database object using the same connection.
But you might want to consider to just store the data of all tennants in the same collections of a single database and add a field tennant to every document which you then include in every query. When you have individual collections or even an individual databases per tenant, the maintenance effort for your database administrator increases linearly with the number of tenants you have, because many maintenance and configuration tasks (like configuring sharding, for example) need to be performed on every collection of every database separately.

MongoDB with redis

Can anyone give example use cases of when you would benefit from using Redis and MongoDB in conjunction with each other?
Redis and MongoDB can be used together with good results. A company well-known for running MongoDB and Redis (along with MySQL and Sphinx) is Craiglist. See this presentation from Jeremy Zawodny.
MongoDB is interesting for persistent, document oriented, data indexed in various ways. Redis is more interesting for volatile data, or latency sensitive semi-persistent data.
Here are a few examples of concrete usage of Redis on top of MongoDB.
Pre-2.2 MongoDB does not have yet an expiration mechanism. Capped collections cannot really be used to implement a real TTL. Redis has a TTL-based expiration mechanism, making it convenient to store volatile data. For instance, user sessions are commonly stored in Redis, while user data will be stored and indexed in MongoDB. Note that MongoDB 2.2 has introduced a low accuracy expiration mechanism at the collection level (to be used for purging data for instance).
Redis provides a convenient set datatype and its associated operations (union, intersection, difference on multiple sets, etc ...). It is quite easy to implement a basic faceted search or tagging engine on top of this feature, which is an interesting addition to MongoDB more traditional indexing capabilities.
Redis supports efficient blocking pop operations on lists. This can be used to implement an ad-hoc distributed queuing system. It is more flexible than MongoDB tailable cursors IMO, since a backend application can listen to several queues with a timeout, transfer items to another queue atomically, etc ... If the application requires some queuing, it makes sense to store the queue in Redis, and keep the persistent functional data in MongoDB.
Redis also offers a pub/sub mechanism. In a distributed application, an event propagation system may be useful. This is again an excellent use case for Redis, while the persistent data are kept in MongoDB.
Because it is much easier to design a data model with MongoDB than with Redis (Redis is more low-level), it is interesting to benefit from the flexibility of MongoDB for main persistent data, and from the extra features provided by Redis (low latency, item expiration, queues, pub/sub, atomic blocks, etc ...). It is indeed a good combination.
Please note you should never run a Redis and MongoDB server on the same machine. MongoDB memory is designed to be swapped out, Redis is not. If MongoDB triggers some swapping activity, the performance of Redis will be catastrophic. They should be isolated on different nodes.
Obviously there are far more differences than this, but for an extremely high overview:
For use-cases:
Redis is often used as a caching layer or shared whiteboard for distributed computation.
MongoDB is often used as a swap-out replacement for traditional SQL databases.
Technically:
Redis is an in-memory db with disk persistence (the whole db needs to fit in RAM).
MongoDB is a disk-backed db which only needs enough RAM for the indexes.
There is some overlap, but it is extremely common to use both. Here's why:
MongoDB can store more data cheaper.
Redis is faster for the entire dataset.
MongoDB's culture is "store it all, figure out access patterns later"
Redis's culture is "carefully consider how you'll access data, then store"
Both have open source tools that depend on them, many of which are used together.
Redis can be used as a replacement for a traditional datastore, but it's most often used with another normal "long" data store, like Mongo, Postgresql, MySQL, etc.
Redis works excellently with MongoDB as a caching server. Here is what happens.
Anytime that mongoose issues a cache query, it will first go over to the cache server.
The cache server will check to see if that exact query has ever been issued before.
If it hasn’t then the cache server will take the query, send it over to mongodb and Mongo will execute the query.
We will then take the result of that query, it then goes back to the cache server, the cache server will store the result of the query on itself.
It will say anytime I execute that query, I get this response and so its going to maintain a record between queries that are issued and responses that come back from those queries.
The cache server will take the response and send it back to mongoose, mongoose will give it to express and it eventually ends up inside the application.
Anytime that the same exact query is issued again, mongoose will send the same query to the cache server, but if the cache server sees that this query was issued before it will not send the query onto mongodb, instead its going to take the response to the query it got the last time and immediately send it back over to mongoose. There is no indices here, no full table scan, nothing.
We are doing a simple lookup to say has this query been executed? Yes? Okay, take the request and send it back immediately and don’t send anything to mongo.
We have the mongoose server, the cache server (Redis) and Mongodb.
On the cache server there might be a datastore with key value type of data store where all the keys are some type of query issued before and the value the result of that query.
So maybe we are looking up a bunch of blogposts by _id.
So maybe the keys in here are the _id of the records we have looked up before.
So lets imagine that mongoose issues a new query where it tries to find a blogpost with _id of 123, the query flows into the cache server, the cache server will check to see if it has a result for any query that was looking for an _id of 123.
If it does not exist in the cache server, this query is taken and sent on to the mongodb instance. Mongodb will execute the query, get a response and send it back.
This result is sent back over to the cache server who takes that result and immediately sends it back to mongoose so we get as fast a response as possible.
Right after that, the cache server will also take the query issued, and add that on to its collection of queries that have been issued and take the result of the query and store it right up against the query.
So we can imagine that in the future we issue the same query again, it hits the cache server, it looks at all the keys it has and says oh I already found that blogpost, it doesn’t reach out to mongo, it just takes the result of the query and sends it directly to mongoose.
We are not doing complex query logic, no indices, nothing like that. Its as fast as possible. Its a simple key value lookup.
Thats an overview of how the cache server (Redis) works with MongoDB.
Now there are other concerns. Are we caching data forever? How do we update records?
We don’t want to always be storing data in the cache and be reading from the cache.
The cache server is not used for any write actions. The cache layer is only used for reading data. If we ever write data, writing will always go over to the mongodb instance and we need to ensure that anytime we write data we clear any data stored on the cache server that is related to the record we just updated in Mongo.