How to excute map reduce on map reduce result in mongodb - mongodb

i want to know whether i can excute map reduce on a result of map reduce function previous like pipeline without write it on a collection, thanks all. My english is bad, hope you understand my question :(

Chaining of map reduce is not supported at this time without storing intermediary data in some kind of collection.
Again map reduce in MongoDB is not very efficient and MongoDB recommend to export data and run map reduce in proper framework like Hadoop if you have to.

Yes, but this could cost you a lot of performance, you have to store the first result in a new colllection, then run next map-reduce on the previous output collection. See this for more infomation.
However, you still can pipeline query results through aggregation pipeline, see this. So consider convert your map-reduce to aggregation.

Related

Saving common aggregations in MongoDB

Say I have a Mongo aggregation I know that I will use frequently, for example, finding the average of a dataset.
Essentially, I want someone to make an API for the database such that someone could type db.collection.average() in the mongo shell and get the result of that function, so that someone without much knowledge of the aggregation framework would easily be able to get the average (or result of any complicated aggregation function I create). What is the best way to achieve this?
As of MongoDB 3.4, you can create views that wrap a defined aggregation pipeline. Sounds like a perfect fit for your use case.

MongoDB Aggregation V/S simple query performance?

I am reasking this question as i thought this question should be on seperate thread from this one in-mongodb-know-index-of-array-element-matched-with-in-operator.
I am using mongoDB and actually i was writing all of my queries using simple queries which are find, update etc. (No Aggregations). Now i read on many SO posts see this one for example mongodb-aggregation-match-vs-find-speed. Now i thought about why increasing computation time on server because as if i will compute more then my server load will become more, so i tried to use aggregations and i thought i am going in right direction now. But later on my previous question andreas-limoli told me about not using aggregations as it is slow and for using simple queries and computing on server. Now literally i am in a delimma about what should i use, i am working with mongoDB from a year now but i don't have any knowledge about its performance when data size increases so i completely don't know which one should i pick.
Also one more thing i didn't find on anywhere, if aggregation is slower than is it because of $lookup or not, because $lookup is the foremost thing i thought about using aggregation because otherwise i have to execute many queries serially and then compute on server which appears to me very poor in front of aggregation.
Also i read about 100MB restriction on mongodb aggregation when passing data from one pipeline to other, so how people handle that case efficiently and also if they turn on Disk usage then because Disk usage slow down everything than how people handle that case.
Also i fetched 30,000 sample collection and tried to run aggregation with $match and find query and i found that aggregation was little bit faster than find query which was aggregation took 180ms to execute where as find took 220 ms to execute.
Please help me out guys please it would be really helpful for me.
Aggregation pipelines are costly queries. It might impact on your performance as an increasing data because of CPU memory. If you can achieve the with find query, go for it because Aggregation is costlier once DB data increases.
Aggregation framework in MongoDB is similar to join operations in SQL. Aggregation pipelines are generally resource intensive operations. So if in case your work is satisfied with simple queries, you should use that one at first place.
However, if it is absolute necessary then you can use aggregation pipelines in case you need to fetch the data from the multiple collections.

Why aggregate+sort is faster than find+sort in mongo?

I'm using mongoose in my project. When the number of documents in my collection becomes bigger, the method of find+sort becomes slower. So I use aggregate+$sort instead. I just wonder why?
Without seeing your data and your query it is difficult to answer why aggregate+sort is faster than find+sort.
But below are the things that holds good on find and aggregate
A well indexed(Indexing that suits your query) data will always yield faster results on your find query.
The components of aggregation pipeline which you use on your aggregate query, more operations is directly proportional to more execution time.
When you go for aggregation pipeline you can create new fields such as sum, avg and so on, which is not possible in a find.
see this thread for more info
MongoDB {aggregation $match} vs {find} speed

Run map reduce functions as a job in mongodb

Running a map reduce functions as a job in mongodb. Is it possible?
If I updated the collection with some data then the map reduce functions should run automatically as a job & produce the result in the output collection with latest data.
Can we achieve this in mongodb?
No. You would need to schedule these outside MongoDB.
What you are asking for sounds like it may be better suited to being a View within Couchbase.

When do I need map reduce for database queries?

In CouchDB you always have to use map reduce to query results.
In MongoDB you can their query methods for retrieving data, but they also let you do map-reduce.
I wonder, when do I actually need map-reduce?
Are those query methods different from map-reduce or are they just wrappers for map-reduce functions?
MapReduce is needed for aggregations in MongoDB. The normal queries follow a very different (and much faster) code path and they should always be used for real-time operations. MapReduce is definitely not intended for real-time, it's more for batch jobs.
Technically, you could write all your queries using MapReduce, but that would be both painful and slow.