First, the background. I used to have a collection logs and used map/reduce to generate various reports. Most of these reports were based on data from within a single day, so I always had a condition d: SOME_DATE. When the logs collection grew extremely big, inserting became extremely slow (slower than the app we were monitoring was generating logs), even after dropping lots of indexes. So we decided to have each day's data in a separate collection - logs_YYYY-mm-dd - that way indexes are smaller, and we don't even need an index on date. This is cool since most reports (thus map/reduce) are on daily data. However, we have a report where we need to cover multiple days.
And now the question. Is there a way to run a map/reduce (or more precisely, the map) over multiple collections as if it were only one?
A reduce function may be called once, with a key and all corresponding values (but only if there are multiple values for the key - it won't be called at all if there's only 1 value for the key).
It may also be called multiple times, each time with a key and only a subset of the corresponding values, and the previous reduce results for that key. This scenario is called a re-reduce. In order to support re-reduces, your reduce function should be idempotent.
There are two key features in a idempotent reduce function:
The return value of the reduce function should be in the same format as the values it takes in. So, if your reduce function accepts an array of strings, the function should return a string. If it accepts objects with several properties, it should return an object containing those same properties. This ensures that the function doesn't break when it is called with the result of a previous reduce.
Don't make assumptions based on the number of values it takes in. It isn't guaranteed that the values parameter contains all the values for the given key. So using values.length in calculations is very risky and should be avoided.
Update: The two steps below aren't required (or even possible, I haven't checked) on the more recent MongoDB releases. It can now handle these steps for you, if you specify an output collection in the map-reduce options:
{ out: { reduce: "tempResult" } }
If your reduce function is idempotent, you shouldn't have any problems map-reducing multiple collections. Just re-reduce the results of each collection:
Step 1
Run the map-reduce on each required collection and save the results in a single, temporary collection. You can store the results using a finalize function:
finalize = function (key, value) {
db.tempResult.save({ _id: key, value: value });
}
db.someCollection.mapReduce(map, reduce, { finalize: finalize })
db.anotherCollection.mapReduce(map, reduce, { finalize: finalize })
Step 2
Run another map-reduce on the temporary collection, using the same reduce function. The map function is a simple function that selects the keys and values from the temporary collection:
map = function () {
emit(this._id, this.value);
}
db.tempResult.mapReduce(map, reduce)
This second map-reduce is basically a re-reduce and should give you the results you need.
I used map-reduce method. here is an example.
var mapemployee = function () {
emit(this.jobid,this.Name);};
var mapdesignation = function () {
emit(this.jobid, this.Designation);};
var reduceF = function(key, values) {
var outs = {Name:null,Designation: null};
values.forEach(function(v){
if(outs.Name ==null){
outs.Name = v.Name }
if(outs.Name ==null){
outs.Nesignation = v.Designation}
});
return outs;
};
result = db.employee.mapReduce(mapemployee, reduceF, {out: {reduce: 'output'}});
result = db.designation.mapReduce(mapdesignation,reduceF, {out: {reduce: 'output'}});
Refference : http://www.itgo.me/a/x3559868501286872152/mongodb-join-two-collections
Related
So i'm new with mongodb and mapreduce in general and came across this "quirk" (or atleast in my mind a quirk)
Say I have objects in my collection like so:
{'key':5, 'value':5}
{'key':5, 'value':4}
{'key':5, 'value':1}
{'key':4, 'value':6}
{'key':4, 'value':4}
{'key':3, 'value':0}
My map function simply emits the key and the value
My reduce function simply adds the values AND before returning them adds 1 (I did this to check to see if the reduce function is even called)
My results follow:
{'_id': 3, 'value': 0}
{'_id':4, 'value': 11.0}
{'_id':5, 'value': 11.0}
As you can see, for the keys 4 & 5 I get the expected answer of 11 BUT for the key 3 (with only one entry in the collection with that key) I get the unexpected 0!
Is this natural behavior of mapreduce in general? For MongoDB? For pymongo (which I am using)?
The reduce function combines documents with the same key into one document. If the map function emits a single document for a particular key (as is the case with key 3), the reduce function will not be called.
I realize this is an older question, but I came to it and felt like I still didn't understand why this behavior exists and how to build map/reduce functionality so it's a non-issue.
The reason MongoDB doesn't call the reduce function if there is a single instance of a key is because it isn't necessary (I hope this will make more sense in a moment). The following are requirements for reduce functions:
The reduce function must return an object whose type must be identical to the type of the value emitted by the map function.
The order of the elements in the valuesArray should not affect the output of the reduce function
The reduce function must be idempotent.
The first requirement is very important and it seems a number of people are overlooking it because I've seen a number of people mapping in the reduce function then dealing with the single-key case in the finalize function. This is the wrong way to address the issue, however.
Think about it like this: If there's only a single instance of a key, a simple optimization is to skip the reducer entirely (there's nothing to reduce). Single-key values are still included in the output, but the intent of the reducer is to build an aggregate result of the multi-key documents in your collection. If the mapper and reducer are outputting the same type, you should be blissfully unaware by looking at the object structure of the output from your map/reduce functions. You shouldn't have to use a finalize function to correct the structure of your objects that didn't run through the reducer.
In short, do your mapping in your map function and reduce multi-key values into a single, aggregate result in your reduce functions.
Solution:
added new field in map: single: 0
in reduce change this field to: single: 1
in finalize make checking for this field and make required actions
$map = new MongoCode("function() {
var value = {
time: this.time,
email_id: this.email_id,
single: 0
};
emit(this.email, value);
}");
$reduce = new MongoCode("function(k, vals) {
// make some need actions here
return {
time: vals[0].time,
email_id: vals[0].email_id,
single: 1
};
}");
$finalize = new MongoCode("function(key, reducedVal) {
if (reducedVal.single == 0) {
reducedVal.time = 11111;
}
return reducedVal;
};");
"MongoDB will not call the reduce function for a key that has only a single value. The values argument is an array whose elements are the value objects that are “mapped” to the key."
http://docs.mongodb.org/manual/reference/command/mapReduce/#mapreduce-reduce-cmd
Is this natural behavior of mapreduce in general?
Yes.
I tried to pass the collection to be update as a scope variable - no dice.
I tried to invoke db.getCollection from the finalize body - no dice, I get this:
db assertion failure, assertion: 'invoke failed: JS Error: TypeError: db has no properties nofile_b:18', assertionCode: 9004
I guess it means that db is undefined within a finalize method. So, is it possible?
EDIT
Here is my finalize method:
function(key, value) {
function flatten(value, collector) {
var items = value;
if (!(value instanceof Array)) {
if (!value.items) {
collector.push(value);
return;
}
items = value.items;
}
for (var i = 0; i < items.length && collector.length < max_group_size; ++i) {
flatten(items[i], collector);
}
}
var collector = [];
flatten(value, collector);
return collector;
}
I would like to replace collector.push(value) with insert into some collection.
It is not possible to modify another collection from inside a Map/Reduce/Finalize function.
Here is a link to a question from a user with a similar question. The answer, unfortunately, is "no".
How to change the structure of MongoDB's map-reduce results?
Part of a reason for this is that MapReduce is designed to work in a sharded environment. The computations are distributed among the different shards, and the results are then aggregated. If each function running on each shard was allowed to modify collections, then each shard could end up with different data.
If you would like a separate collection to be modified as a result of a Map Reduce operation, the best strategy is to run the Map Reduce operation, get the results, and then have your application update the separate collection.
If you would like the results of multiple Map Reduce operations to be merged, it is possible to do this via an incremental Map Reduce. The documentation on this may be found here: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-IncrementalMapreduce
In SQL it is possible to add fields which are not in the table model and to render them during the query. Is MongoDB able to do the same in the model and / or in the query ?
For instance, is there "a way" to store a document in the example collection:
db.example.save(
{
"name":"randomValue",
"random": function(){ return Math.random() };
})
Where find.example.find(); Would result in an "evaluated" document result like:
{"name":"randomValue", "random":0.9879878, "_id" : { "$oid" : "4ef1d1…" }}
(the function being evaluated would be replaced by its returned value)
If this is only feasible in specifying the function in the query, how to do so ?
Short answer is: no, it is not possible.
Such functionality is a part of 'presentation' layer. When you displaying data on ui you usually prepare ui model. During this step you can call any function and extend your model from client language.
Or if you need to have some extra value (that should be evaluated by some function) in the document you can call this function before save document. For example, shell script that save random value evaluated by Math.Random in random field:
random = Math.random();
db.example.save(
{
"name":"randomValue",
"random": random
})
Hope this helpful.
There is no current way to dynamically evaluate or mix-in values to a recordset.
As you say you can store functions but you have to evaluate them by iterating through each document and calling doc.random = db.eval('doc.random') that would replace the original function with a value - which probably isn't what you want.
You can also store functions in db.system.js and call them to return a modified dataset - running db.eval does have some draw backs, map reduce may be a better fit.
Check out: http://www.mongodb.org/display/DOCS/Server-side+Code+Execution
Take a look at MongoDB.Dynamic:
http://mongodbdynamic.codeplex.com/documentation
I am new to map / reduce and trying to figure out a way to collect the following data using map / reduce instead doing it my my (slow) application logic:
I have a collection 'projects' with a 1:n relation to a collection 'tasks'. Now I'd like to receive an array of results that gives me the project names where the first is the project with the most tasks, and the last the project with the least tasks.
Or even better an array of hashes that also tells me how many tasks every project has (assuming the project name is unique:
[project_1: 23, project_2: 42, project_3: 82]
For map I tried something like:
map = function () {
emit(this.project_id, { count:1 });
}
and reduce:
reduce = function (key, values) {
var sum = 0;
values.forEach(function(doc){ sum += 1; });
return { count:sum };
}
I fired this against my tasks collection:
var mr = db.tasks.mapReduce(map, reduce, { out: "results" });
But I get crucial results when querying:
db[mr.result].find();
I am using Mongoid on Rails and am completely lost with it. Can someone point me into the right direction?
Thx in advance.
Felix
Looks generally right, but I spot at least one problem: The summation step in the reduce function should be
values.forEach(function(doc){ sum += doc.count ; });
because the function may be reducing values that are themselves the product of a prior reduction step, and that therefore have count values > 1.
That's a common oversight, mentioned here: http://www.mongodb.org/display/DOCS/Troubleshooting+MapReduce
Hope that helps!
I am using mongoose with nodejs. I am using mapReduce to fetch data grouped by a field.So all it gives me as a collection is the key with the grouping field only from every row of database.
I need to fetch all the fields from the database grouped by a field and sorted on the basis of another field.e.g.: i have a database having details of places and fare for travelling to those places and a few other fields also.Now i need to fetch the data in such a way that i get the data grouped on the basis of places sorted by the fare for them. MapReduce helps me to get that, but i cannot get the other fields.
Is there a way to get all the fields using map reduce, rather than just getting the two fields as mentioned in the above example??
I must admit I'm not sure I understand completely what you're asking.
But maybe one of the following thoughts helps you:
either) when you iterate over your mapReduce results, you could fetch complete documents from mongodb for each result. That would give you access to all fields in each document for the cost of some network traffic.
or) The value that you send into emit(key, value) can be an object. So you could construct a value object that contains all your desired fields. Just be sure to use the exactly same object structure for your reduce method's return value.
I try to illustrate with an (untested) example.
map = function() {
emit(this.place,
{
'field1': this.field1,
'field2': this.field2,
'count' : 1
});
}
reduce = function(key, values) {
var result = {
'field1': values[0].field1,
'field2': values[0].field2,
'count' : 0 };
for (v in values) {
result.count += values[v].count;
}
return obj;
}