I'm currently planning the development of a service which should handle a fair amount of request and for each request do some logging.
Each log will have the following form
{event: "EVENTTYPE", userid: "UID", itemid: "ITEMID", timestamp: DATETIME}
I expect that a lot of writing will be done, while reading and analysis will only be done once per hour.
A requirement in the data analysis is that I have to be able to do the following query:
Are both events, A and B, on item (ITEMID) logged for user (UID)? (Maybe even tell if event A came before event B based on their timestamps)
I have thought about MongoDB as my storage solution.
Can the above query be (properly) carried out by the MongoDB aggregation framework?
In the future I might add on to the analysis step, with a relation from ITEMID to ITEM.Categories (I have a collection of items, and each item has a series of categories). Possibly it would be interesting to know how many times event A occured on items grouped by the individual items category, during the last 30 days. Will MongoDB then be a good fit for my requirements?
Some information about the data I'll be working with:
I expect to be logging in the order of 10.000 events a day on average.
I haven't decided yet, whether the data should be stored indefinitely.
Is MongoDB a proper fit for my requirements? Is there another NoSQL database that will handle my requirements better? Is NoSQL even usable in this case or am I better off sticking with relational databases?
If my requirement of the frequency of analysis changes, say from once an hour to real time. I believe Redis would serve my purpose better than MongoDB, is this correctly understood?
Are both events, A and B, on item (ITEMID) logged for user (UID)? (Maybe even tell if event A came before event B based on their timestamps)
Can the above query be (properly) carried out by the MongoDB aggregation framework?
Yes, absolutely. You can use the $group operator to aggregate events by ITEMID, UID, you can filter results before the grouping via $match to limit them to a specific time period, or with any other filter, and you can push times (first, last) of each type of event into the document that the $group operator creates. Then you can use $project to create a field indicating what came before what, if you wish.
All of the capabilities of aggregation framework are well outlined here:
http://docs.mongodb.org/manual/core/aggregation-pipeline/
In the future I might add on to the analysis
step, with a relation from ITEMID to ITEM.Categories (I have a
collection of items, and each item has a series of categories).
Possibly it would be interesting to know how many times event A
occured on items grouped by the individual items category, during the
last 30 days. Will MongoDB then be a good fit for my requirements?
Yes. Aggregation in MongoDB allows you to $unwind arrays so that you can group things by categories, if you wish. All of the things you've described are easy to accomplish with aggregation framework.
Whether or not MongoDB is the right choice for your application is outside the scope of this site, but you the requirements you've listed in this question can be implemented in MongoDB.
Related
Our application is planning to use MongoDB for reports.
Our reports are time-based (where the raw data is different events).
We were thinking of creating a separate collection for each day, so we will not need to query a whole large collection when we need to query,aggregate and sort events for a specific day only.
One question is whether this design makes sense.
Another question is what will happen if we need to aggregate and sort event over more than one collection - for one week for example.
Does MongoDB supports this? If it does - how should it be done so it will be efficient n terms of performance?
We were thinking of creating a separate collection for each day, so we will not need to query a whole large collection when we need to query,aggregate and sort events for a specific day only.
One question is whether this design makes sense.
While using proper indexes, mongoDB should not have problems with a very big collection.
You could read more here: MongoDB indexes
Another question is what will happen if we need to aggregate and sort event over more than one collection - for one week for example.
Does MongoDB supports this? If it does - how should it be done so it will be efficient n terms of performance?
If you want to go your way, you could use aggregation pipelines and $facet to run multiple queries at once. However this could become a little tricky because you have to generate the collection names from your query parameters. Infact, i think this could be slower and more prone to errors. So i don't recomend this approach.
In my case, there are 10 fields and all of them need to be searched by "or", that is why I'm using multiple queries and filter common items in client side by using Promise.all().
The problem is that I would like to implement pagination. I don't want to get all the results of each query, which has too much "read" cost. But I can't use .limit() for each query cause what I want is to limit "final result".
For example, I would like to get the first 50 common results in the 10 queries' results, if I do limit(50) to each query, the final result might be less than 50.
Anyone has ideas about pagination for multiple queries?
I believe that the best way for you to achieve that is using query cursors, so you can better manage the data that you retrieve from your searches.
I would recommend you to take a look at the below links, to find out more information - including a question answered by the community, that seems similar to your case.
Paginate data with query cursors
multi query and pagination with
firestore
Let me know if the information helped you!
Not sure it's relevant but I think I'm having a similar problem and have come up with 4 approaches that might be a workaround.
Instead of making 10 queries, fetch all the products matching a single selection filter e.g. category (in my case a customer can only set a single category field). And do all the filtering on the client side. With this approach the app still reads lots of documents at once but at least reuse these during the session time and filter with more flexibility than firestore`s strict rules.
Run multiple queries in a server environment, such as cloud store functions with Node.js and get only the first 50 documents that are matching all the filters. With this approach client only receives wanted data not more, but server still reads a lot.
This is actually your approach combined with accepted answer
Create automated documents in firebase with the help of cloud functions, e.g. Colors: {red:[product1ID,product2ID....], ....} just storing the document IDs and depending on filters get corresponding documents in server side with cloud functions, create a cross product of matching arrays (AND logic) and push first 50 elements of it to the client side. Knowing which products to display client then handle fetching client side library.
Hope these would help. Here is my original post Firestore multiple `in` queries with `and` logic, query structure
I'm using MongoDB and need some analytics query to produce some report of my data. I know that MongoDB is not a good choice for OLAP applications but if I want to use MonogoDB one solution could be pre-computing the required data. For instance we can create new collections for the any specific OLAP query and just update that collection when some related events happen in system. Consider this scenario:
In my app I'm storing the sales information for some vendors in sales collection. Each document in sales consist of sale value, vendor ID and date. I want to report that withing a specified time period find those vendors that have sold the most. To avoid aggregations I've created a middle collection that stores the whole amount of sales for each vendor in each day. Then when I want to prepare that report I just find those documents in the middle collection that their dates is in the specified time period and then group the results by their vendor ID and then sort the it. I think this solution would have less aggregation time because the documents in the middle collection are less than the original collection. Also it would be of O(n) time complexity.
I want to know is there any mechanism in MongoDB that makes it possible to avoid this aggregation too and make the query simpler?
I am looking into using Mongo to store my data. I am wanting to store one document per change to a record. For example, a record represents a calendar event. Each time this event is updated (via a web form), I want to store the new version in a new document. This will allow historical details of this event to be retrieved upon request.
If I were to store this data using a relational database, I would have an 'events' table and an 'events_history' table:
'events' table:
event_id
last_event_history_id
'events_history' table:
event_history_id
event_id
event_date
So, when I want to retrieve a list of events (showing the latest history for each event), I would do this:
SELECT * FROM events_history eh, events e
WHERE
eh.events_history_id = e.last_event_history_id
However, I am unsure about how to approach storing the data and generating this list if using Mongo?
Joe,
Your question is a frequent question for folks coming from an RDBMS background to MongoDB (which is BTW exactly how I personally came to MongoDB)
I can relate to your question.
If I were to restate your question in a generic way, I would say:
How to model one-to-many relationships in MongoDB?
There are basically two approaches:
Embedded Documents
You can have a "events" collection. The documents in this collection can contain a "key" called "Event_history where each entry is an "old version" of the event itself.
The discussion on embedded documents for MongoDB is here.
Document References
This is very similar to what you do in relational databases. You can have two collections, each with its own documents.One collection for "active" events and one collections for historical events
The discussion for Document references in MongoDB is here.
Now back to your question: Which one of these 2 approaches is better.
There are a couple of factors to consider
1 - MongoDB does not currently have database based joins - If your workload is primarily reads, and your documents/events do not change frequently The approach with embedded documents will be easier and have better performance.
2 - Avoid Growing Documents. If your events change frequently causing MongoDB documents to grow, then you should opt for design #2 with the references. "Document growth" at scale with MongoDB is usually not the best performance option. An in-depth discussion of why document growth should be avoided is here.
Without knowing details about your app, I am inclined to "guess" document references would be better for an event management system where history is an important feature. Have 2 separate collections, and perform the join inside of your app.
I am new to MongoDB and I have difficulties implementing a solution in it.
Consider a case where I have two collections: a client and sales collection with such designs
Client
==========
id
full name
mobile
gender
region
emp_status
occupation
religion
Sales
===========
id
client_id //this would be a DBRef
trans_date //date time value
products //an array of collections of product sold in the form {product_code, description, units, unit price, amount}
total sales
Now there is a requirement to develop another collection for analytical queries where the following questions can be answered
What are the distribution of sales by gender, region and emp_status?
What are the mostly purchase products for clients in a particular region?
I considered implementing a very denormalized collection to create a flat and wide collection of the properties of the sales and client collection so that I can use map-reduce to further answer the questions.
In RDBMS, an aggregation back by a join would answer these question but I am at loss to how to make Map-Reduce or Agregation help out.
Questions:
How do I implement Map-Reduce to map across 2 collections?
Is it possible to chain MapReduce operations?
Regards.
MongoDB does not do JOINs - period!
MapReduce always runs on a single collection. You can not have a single MapReduce job which selects from more than one collection. The same applies to aggregation.
When you want to do some data-mining (not MongoDBs strongest suit), you could create a denormalized collection of all Sales with the corresponding Client object embedded. You will have to write a little program or script which iterates over all clients and
finds all Sales documents for the clinet
merges the relevant fields from Client into each document
inserts the resulting document into the new collection
When your Client document is small and doesn't change often, you might consider to always embed it into each Sales. This means that you will have redundant data, which looks very evil from the viewpoint of a seasoned RDB veteran. But remember that MongoDB is not a relational database, so you should not apply all RDBMS dogmas unreflected. The "no redundancy" rule of database normalization is only practicable when JOINs are relatively inexpensive and painless, which isn't the case with MongoDB. Besides, sometimes you might want redundancy to ensure data persistence. When you want to know your historical development of sales by region, you want to know the region where the customer resided when they bought the product, not where they reside now. When each Sale only references the current Client document, that information is lost. Sure, you can solve this with separate Address documents which have date-ranges, but that would make it even more complicated.
Another option would be to embed an array of Sales in each Client. However, MongoDB doesn't like documents which grow over time, so when your clients tend to return often, this might result in sub-par write-performance.