I have to generate a dynamic report.
I have a complex query which returns result using db.collection.find() and has millions of records in it.
Now I want to perform aggregate operation on this result.
I tried inserting into a collection and than executing aggregate function on it using below:
db.users.find().forEach( function(myDoc) { db.usersDummy.insert(myDoc); } );
But this does not seems to be feasible to temporarily insert data and then perform aggregate operation on it.
Is there any way mongoDB supports Temporary tables or perform aggregate operation directly on find result?
As suggested by JohnnyHK i am using $match in aggregation to run the huge collection instead of creating a dummy collection
So closing this question
Related
I have a reporting app, and I generate mongodb commands, and it involves running three aggregate calls. The aggregate calls have [match,group,project] in their pipes.
RESULT OF AGGREGATE 1-3
{_id: <XXX>, ...}
The grouping "_id" for these calls are same, but, because their $match are different they cannot be in the same aggregate call. I need to join all of these aggregation results. I know that one way to solve this is using conditions during the $group stage, but the problem is the conditions are really complicated to mix with the already complex $group pipe.
To give some context why that solution is very difficult if not impossible; the data is quite huge, each doc has 700 attributes, and the docs are coming it at around 1k per day. Generating such complicated condition into EACH field in the $group stage will make a mess.
I have seen answers that are running map-reduce to combine these aggregation results, but I am looking for other solutions. As I've researched, aggregate has an $out pipe. Is there any way that I can manipulate that $out pipe to join these aggregation results? (The reason for thinking of $out is that I have to save ALL the results anyway as a report)
If you indeed want to go ahead with merging the aggregation results, then you can create an output collection using bulk upserts. For performance you can create a compound index on this output collection, which has your grouping attributes.
dataArray.map(function(data) {
data.forEach(function(err, row){
var setOnInsert = {grouping_attrs: row.grouping_values, v1: row.v1}
var set = {v2: row.v2}
var query = {grouping_attrs: row.grouping_values}
bulk.find(query).upsert().update({$setOnInsert: setOnInsertStmt, $set: set});
})
})
Here your dataArray is created using find on the $out collections.
I need to perform some aggregation on one existing table and then use aggregated table to perform the map reduce.
The aggregation table is sort of a temporary used so that it can be used in map reduce. Record set in temporary table reaches around 8M.
What can be the way to avoid the temporary table?
One way could be to write find() query inside map() function and emit the aggregated result(initially being stored on aggregation table).
However, I am not able to implement this.
Is there a way! Please help.
You can use the "query" parameter on MongoDB MapReduce. With this parameter the data sent to map function is filtered before processing.
More info on MapReduce documentation
What is the difference between the $match operator used inside the aggregate function and the regular find in Mongodb?
Why doesn't the find function allow renaming the field names like the aggregate function?
e.g. In aggregate we can pass the following string:
{ "$project" : { "OrderNumber" : "$PurchaseOrder.OrderNumber" , "ShipDate" : "$PurchaseOrder.ShipDate"}}
Whereas, find does not allow this.
Why does not the aggregate output return as a DBCursor or a List? and also why can't we get a count of the documents that are returned?
Thank you.
Why does not the aggregate output return as a DBCursor or a List?
The aggregation framework was created to solve easy problems that otherwise would require map-reduce.
This framework is commonly used to compute data that requires the full db as input and few document as output.
What is the difference between the $match operator used inside the aggregate function and the regular find in Mongodb?
One of differences, like you stated, is the return type. Find operations output return as a DBCursor.
Other differences:
Aggregation result must be under 16MB. If you are using shards, the full data must be collected in a single point after the first $group or $sort.
$match only purpose is to improve aggregation's power, but it has some other uses, like improve the aggregation performance.
and also why can't we get a count of the documents that are returned?
You can. Just count the number of elements in the resulting array or add the following command to the end of the pipe:
{$group: {_id: null, count: {$sum: 1}}}
Why doesn't the find function allow renaming the field names like the aggregate function?
MongoDB is young and features are still coming. Maybe in a future version we'll be able to do that. Renaming fields is more critical in aggregation than in find.
EDIT (2014/02/26):
MongoDB 2.6 aggregation operations will return a cursor.
EDIT (2014/04/09):
MongoDB 2.6 was released with the predicted aggregation changes.
I investigated a few things about the aggregation and find call:
I did this with a descending sort in a table of 160k documents and limited my output to a few documents.
The Aggregation command is slower than the find command.
If you access to the data like ToList() the aggregation command is faster than the find.
if you watch at the total times (point 1 + 2) the commands seem to be equal
Maybe the aggregation automatically calls the ToList() and does not have to call it again. If you dont call ToList() afterwards the find() call will be much faster.
7 [ms] vs 50 [ms] (5 documents)
I have a large MongoDB database consisting of millions of records. I want to retrieve all values for a variable which has an index associated with it. Is there a method to retrieve all values for this indexed variable which is faster than iterating over all records?
When you writing query in mongo it use own query analyser which will use optimal indexes for the query.
For considering which index will use by the analyser add .explain() at the end of the query.
I need to apply a set of filters (queries) to a collection. By default, the MongoDB applies AND operator to all queries submitted to find function. Instead of whole AND I need to apply each query sequentially (one by one). That is, I need to run the first-query and get a set of documents, run the second-query to result of first-query, and so on.
Is this Possible?
db.list.find({..q1..}).find({..q2..}).find({..q3..});
Instead Of:
db.list.find({..q1..}, {..q2..}, {..q3..});
Why do I need this?
Bcoz, the second-query needs to apply an aggregate function to result of first-query, instead of applying the aggregate to whole collection.
Yes this is possible in MongoDB. You can write nested queries as per the requirement.Even in my application I created nested MongoDb queries.If you are familiar with SQL syntax then compare this with in of sql syntax:
select cname from table where cid in (select .....)
In the same way you can create nested MongoDB queries on different collections also.