I need to perform some aggregation on one existing table and then use aggregated table to perform the map reduce.
The aggregation table is sort of a temporary used so that it can be used in map reduce. Record set in temporary table reaches around 8M.
What can be the way to avoid the temporary table?
One way could be to write find() query inside map() function and emit the aggregated result(initially being stored on aggregation table).
However, I am not able to implement this.
Is there a way! Please help.
You can use the "query" parameter on MongoDB MapReduce. With this parameter the data sent to map function is filtered before processing.
More info on MapReduce documentation
Related
I have to generate a dynamic report.
I have a complex query which returns result using db.collection.find() and has millions of records in it.
Now I want to perform aggregate operation on this result.
I tried inserting into a collection and than executing aggregate function on it using below:
db.users.find().forEach( function(myDoc) { db.usersDummy.insert(myDoc); } );
But this does not seems to be feasible to temporarily insert data and then perform aggregate operation on it.
Is there any way mongoDB supports Temporary tables or perform aggregate operation directly on find result?
As suggested by JohnnyHK i am using $match in aggregation to run the huge collection instead of creating a dummy collection
So closing this question
I am using Hadoop to apply map reduce in my MongoDB database.
I can able to execute the sample in this link.
Right now I can able to get only key, value pair in output collection after map reduce job was executed. I wonder if it is possible to save multiple columns in a map reduce output collection?
or embedded document in value column?
thanks.
Yes - use BSONWritable as your reducer output class, and create a BSONWritable object with as many columns as you need.
See example here:
https://github.com/mongodb/mongo-hadoop/blob/master/examples/treasury_yield/src/main/java/com/mongodb/hadoop/examples/treasury/TreasuryYieldReducer.java
I have a mongodb query, and I want to add a computed field. The computed field is based on where or not the item is in the results of another query. So my query returns the columns a,b,c,d, and then column e should be based on whether or not the current row would be matched by another query.
Is there an efficient way to do this in mongo? I'm not really sure how to do this one...
There is no way currently to execute a function as you describe within the database when returning a document via standard functions such as find. It's been requested by the community, but the general request is to operate only on a single document.
There are calculated fields using $project in the aggregation framework. But, they only operate on the current document in the pipeline. So, they can't summarize other queries.
You'll need to likely build your e value as part of your data access layer.
I have a large MongoDB database consisting of millions of records. I want to retrieve all values for a variable which has an index associated with it. Is there a method to retrieve all values for this indexed variable which is faster than iterating over all records?
When you writing query in mongo it use own query analyser which will use optimal indexes for the query.
For considering which index will use by the analyser add .explain() at the end of the query.
I need to apply a set of filters (queries) to a collection. By default, the MongoDB applies AND operator to all queries submitted to find function. Instead of whole AND I need to apply each query sequentially (one by one). That is, I need to run the first-query and get a set of documents, run the second-query to result of first-query, and so on.
Is this Possible?
db.list.find({..q1..}).find({..q2..}).find({..q3..});
Instead Of:
db.list.find({..q1..}, {..q2..}, {..q3..});
Why do I need this?
Bcoz, the second-query needs to apply an aggregate function to result of first-query, instead of applying the aggregate to whole collection.
Yes this is possible in MongoDB. You can write nested queries as per the requirement.Even in my application I created nested MongoDb queries.If you are familiar with SQL syntax then compare this with in of sql syntax:
select cname from table where cid in (select .....)
In the same way you can create nested MongoDB queries on different collections also.