Query and sort on nultiple collections based on time - mongodb

Our application is planning to use MongoDB for reports.
Our reports are time-based (where the raw data is different events).
We were thinking of creating a separate collection for each day, so we will not need to query a whole large collection when we need to query,aggregate and sort events for a specific day only.
One question is whether this design makes sense.
Another question is what will happen if we need to aggregate and sort event over more than one collection - for one week for example.
Does MongoDB supports this? If it does - how should it be done so it will be efficient n terms of performance?

We were thinking of creating a separate collection for each day, so we will not need to query a whole large collection when we need to query,aggregate and sort events for a specific day only.
One question is whether this design makes sense.
While using proper indexes, mongoDB should not have problems with a very big collection.
You could read more here: MongoDB indexes
Another question is what will happen if we need to aggregate and sort event over more than one collection - for one week for example.
Does MongoDB supports this? If it does - how should it be done so it will be efficient n terms of performance?
If you want to go your way, you could use aggregation pipelines and $facet to run multiple queries at once. However this could become a little tricky because you have to generate the collection names from your query parameters. Infact, i think this could be slower and more prone to errors. So i don't recomend this approach.

Related

I wonder if there are a lot of collections

Do many mongodb collections have a big impact on mongodb performance, memory and capacity? I am designing an api with mvc pattern, and a collection is being created for each model. I question the way I am doing now.
MongoDB with the WirdeTiger engine supports an unlimited number of collections. So you are not going to run into any hard technical limitations.
When you wonder if something should be in one collection or in multiple collections, these are some of the considerations you need to keep in mind:
More collections = more maintenance work. Sharding is configured on the collection level. So having a large number of collections will make shard configuration a lot more work. You also need to set up indexes for each collection separately, but this is quite easy to automatize, because createIndex on an index which already exists does nothing.
The MongoDB API is designed in a way that every database query operates on one collection at a time. That means when you need to search for a document in n different collections, you need to perform n queries. When you need to aggregate data stored in multiple collections, you run into even more problems. So any data which is queried together should be stored together in the same collection.
Creating one collection for each class in your model is usually a god rule of thumb, but it is not a golden hammer solution. There are situations where you want to embed object in their parent-object documents instead of putting them into a separate collection. There are also cases where you want to put all objects with the same base-class in the same collection to benefit from MongoDB's ability to handle heterogeneous collections. But that goes beyond the scope of this question.
Why don't you use this and test your application ?
https://docs.mongodb.com/manual/tutorial/evaluate-operation-performance/
By the way your question is not completely clear... is more like a "discussion" rather than question. And you're asking others to evaluate your work instead of searching the web the rigth approach.

Reading the similar data from more than two collections in mongoDB

I am novice user to MongoDB. In our application the data size for each table quite bit large, So I decided to split the same into different collections even though it is same of kind. The only difference is the "id" between each document(documents in one collection is under one category) in the all the collections. So we decided to insert the data into number collections and each collections will be having certain number of documents. currently I have 10 collections of same of kind of document data.
My requirement is
1) to get the data from all the collections in a single query to display in application home page.
2) I do need to get the data by using sorting and filtering before fetching.
I have gone through some of the posts in the stackoverflow saying that use Mongo-3.2 $lookup aggregation for this requirement. but I am suspecting If I use $lookup for 10 collections, there might be performance Issue and too complex query.
since I have divided the my same kind of data into number of collections(Each collection will have the documents which comes under one category, Like that I have the 10 categories, so I need to use 10 collections).
Could any body please suggest me whether my approach is correct?
If you have a lot data, how could you display all of them in a webpage?
My understanding is that you will only display a portion of the dataset by querying the database. Since you didn't mention how many records you have, it's not easy to make a recommendation.
Based on the vague description, sharding is the solution, you should check out the official doc.
However, before you do sharding, and since you mentioned are a novice user, you probably want to check your databases' indexing, data models, and benchmark your performance first.
Hope this helps.
You should store all 10 types of data in 1 collection, not 10. Don't make things more difficult than they need to be.

All vs All comparisons on MongoDB

We are planning to use MongoDB for a general purpose system and it seems well suited to the particular data and use cases we have.
However we have one use case where we will need to compare every document (of which there could be 10s of millions) with every other document. The 'distance measure' could be pre computed offline by another system but we are concerned about the online performance of MongoDB when we want to query - eg when we want to see the top 10 closest documents in the entire collection to a list of specific documents ...
Is this likely to be slow? Also can this be done across documents (eg query for the top10 closest documents in one collection to a document in another collection)...
Thanks in advance,
FK

Denormalization vs Parent Referencing vs MapReduce

I have a highly normalized data model with me. Currently I'm using manual referencing by storing the _id and running sequential queries to fetch details from the deepest collection.
The referencing is one-way and the flow has around 5-6 collections. For one particular use case, I'm having to query down to the deepest collection by querying subsequent "_id" from the higher level collections. So technically I'm hitting the database every time I run a
db.collection_name.find(_id: ****).
My prime goal is to optimize the read without hugely affecting the atomicity of the other collections. I have read about de-normalization and it does not make sense to me because I want to keep an option for changing the cardinality down the line and hence want to maintain a separate collection altogether.
I was initially thinking of using MapReduce to do an aggregation from the back and have a collection primarily for the particular use-case. But well even that does not sound that good.
In a relational db, I would be breaking the query in sub-queries and performing a join to get the data sets that intersect from the initial results. Since mongodb does not support joins, I'm having a tough time figuring anything out.
Please help if you have faced anything like this earlier or have any idea how to resolve it.
Denormalize your data.
MongoDB does not do JOIN's - period.
There is no operation on the database which gets data from more than one collection. Not find(), not aggregate() and not MapReduce. When you need to puzzle your data together from more than one collection, there is no other way than doing it on the application layer. For that reason you should organize your data in a way that any common and performance-relevant query can be resolved by querying just a single collection.
In order to do that you might have to create redundancies and transitive dependencies. This is normal in MongoDB.
When this feels "dirty" to you, then you should either accept the fact that your performance will be sub-optimal or use a different kind of database, like a classic relational database or a graph database.

Use publications or separate into collections (performance)?

I have a collection, in which only two queries are ever called on it.
Ex. Cars.find({color: 'red'}); and Cars.find({color: 'blue'});
I was wondering if I should just create RedCars and BlueCars collections instead of using two publications on Cars.
Thinking of performance here, if the Cars collection were to get very large, would it be more performant to use two collections? Also, they are never called on the same template. Each has its own template.
Thanks
From a Mongo perspective, if you have a scenario where a single field across documents within a collection begins to look like an index (as you have described above) it will actually start to index queries against that field and make the return highly tuned. You can update this index (and if you have a lot of data that falls into scenario like you have described, you should tune this index), using standard Mongo indexing parameters against the database. There is more to this performance as well. For example, if it is a high read, low write, then Mongo will often keep portions or all of the query in memory for quick retrieval if it can.
As for whether it is better to split these into two collections. That's a tough one. From a performance standpoint it might be about the same either way if you tune your indexes properly and allow Mongo to do what it does best. However, from the meteor standpoint, I would consider it much easier to just keep them in a single collection from a code maintainability and testability standpoint.
In terms of performance, if the collection does get large, then your application will end up receiving alot more data than you expected it to if changes are made on either blue or red cars. A good solution rather than creating two collection is to use a parameterized subscription that will filter only on the data set you are looking at.
e.g.
Meteor.publish('cars', function(c) {
check(c, String);
return Cars.find({color: c});
});
Then you can access the data by subscribing Meteor.subscribe('cars', 'blue')