Simple MapReduce count in MongoDB not working - mongodb

I'm trying to generate a statistical mode by using a simple count. The error I'm getting is
{
"errmsg" : "exception: reduce -> multiple not supported yet",
"code" : 10075,
"ok" : 0
}
Here's my code.
var mapFunction = function(){
emit(this.mode, 1);
};
var reduceFunction = function(key, value){
Array.sum(value)
return value;
};
db.runCommand(
{
mapReduce : 'total_contractor_earnings_MR',
map: mapFunction,
reduce: reduceFunction,
out: { replace: 'mapReduceContractorMode', db: 'large'}
}
);

Here you count a sum and do nothing with it.
Array.sum(value)
return value;
The thing you ment to write was:
return Array.sum(value);
The error occured because mongodb currently doesn't support returning an array from a reduce function

Related

Mongoid throw 16052 exception

I want to use mongoid to implement the query funcation likes 'GROUP BY',but i caught an exception:
failed with error 16052: "exception: could not create cursor over th_test.messages for query : { sr_flag: /.*541260c5aee1a93f70000001.*/ } sort : { created_at: -1 }"
My code is here:
def messages
map = %Q{
function() {
emit(this.sr_flag, { count: 1 });
}
}
reduce = %Q{
function(key, values) {
var result = { count: 0 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
}
result = Message.where(sr_flag: /.*#{self.id}.*/).map_reduce(map, reduce).out(inline: true).to_a
result
end
Can someone help me explain why? I had searched a blog.Does the mongoid set the created_at column as primary key?
I had fixed my problem.The reason was someone writed a default_scope for my Message model,but the column sorted on was not the key column of the map method.Just Using unscoped method to make program work.
result = Message.unscoped.where(sr_flag: /.*#{self.id}.*/).map_reduce(map, reduce).out(inline: true).to_a

Mongo DB Map Reduce skipping records

I am trying to skip records from mongo map reduce utility using the below command.
db.<collectionName>.mapReduce(
"function () { counter++; if (counter > <numberOfRecordsToBeSkipped>){ emit(this.fieldName, 1); } }",
"function (key, values) { return Array.sum(values); }" ,
{"out" : <CollectionName>,"scope" : "{var counter:0}" ,
"limit" : 0}
);
I keep getting the following error.
uncaught exception: map reduce failed:{
"errmsg" : "exception: map invoke failed: JS Error: ReferenceError: counter is not defined nofile_b:0",
"code" : 9014,
"ok" : 0
}
Could anyone please help me on that? I understand that scope attribute can be used to define global variables to be used in map/reduce functions.
While the syntax you have is wrong for the scope parameter:
scope: { counter: 0 }
You'll also encounter the problem that the scope document is intended to be read-only. It would not work in a sharded setup if the document passed to scope were to change only on one shard server. Further, there's no guarantee that the order of documents processed by a map-reduce will be the same from execution to execution. So, it would produce inconsistent results.
Here's a simple test with a document that has a userid field:
{
userid: 1234
}
Code:
map = function () { counter++; if (counter > 1){ emit(this.userid, 1); } }
reduce = function (key, values) { return Array.sum(values); }
In use:
db.tw.mapReduce(map, reduce, { out: "tw2", scope: { counter: 0 } })
The code executes as expected. By presetting counter to a value, I controlled how many documents were processed in the map phase. I confirmed the output matched expectations.
If I change the map code:
map = function () { counter++; if (counter > numberToSkip) { emit(this.userid, 1); } }
And try to call the mapReduce function without any changes, I'll see an errmsg with:
ReferenceError: numberToSkip is not defined near 'ounter > numberToSkip){ emit(thi' "

mongodb get list of fields in all collections by using map reduce job

I have list of collection names in mongo database.I need a script to get all fileds names in each collections by passing collection name as argument by using map reduce job in mongo database.
This is what I have so far:
mr = db.runCommand({
"mapreduce" : "collectionname",
"map" : function() { for (var key in this) { emit(key, null); } },
"reduce" : function(key, stuff) { return null; },
"out": "collectioname" + "_keys"
})
or in one line for executing it in the mongo shell:
mr = db.runCommand({
"mapreduce" : "collectionname",
"map" : function() { for (var key in this) { emit(key, null); } },
"reduce" : function(key, stuff) { return null; },
"out": "collectioname" + "_keys"
})
This command was used to get list of fields in the collection. But this one is working only on primary one. I need to loop it (get all fields in each collection in the database). Thanks a lot.
The for loop you are looking for is:
var allCollections = db.getCollectionNames();
for (var i = 0; i < allCollections.length; ++i) {
var collectioname = allCollections[i];
// now do your map reduce for one collection here
// there are the merge or reduce options for the output collection
// in case you want to gather everything in one collection instead of:
// collectioname + '_keys'
}
save this in a script.js
Then run it:
mongo myDb script.js
or write it in one line and execute it the mongo shell:
var allCollections = db.getCollectionNames(); for (var i = 0; i < allCollections.length; ++i) { var collectioname = allCollections[i]; if (collectioname === 'system.indexes') continue; db.runCommand({ "mapreduce" : collectioname, "map" : function() { for (var key in this) { emit(key, null); } }, "reduce" : function(key, stuff) { return null; }, "out": collectioname + "_keys" }) }
But, please go thoroughly over the mapReduce page and run ALL the examples there. This should make it clear how to use the emit function to have proper results at the end. You currently emit a null for a certain key. There and the return of the reduce are the places where you should use some data and aggregate the values you want (for example the collection name, which would be a constant being passed through).

MongoDB MapReduce: Not working as expected for more than 1000 records

I wrote a mapreduce function where the records are emitted in the following format
{userid:<xyz>, {event:adduser, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<abc>, {event:adduser, count:1}}
where userid is the key and the remaining are the value for that key.
After the MapReduce function, I want to get the result in following format
{userid:<xyz>,{events: [{adduser:1},{login:2}], allEventCount:3}}
To acheive this I wrote the following reduce function
I know this can be achieved by group by.. both in aggregation framework and mapreduce, but we require a similar functionality for a complex scenario. So, I am taking this approach.
var reducefn = function(key,values){
var result = {allEventCount:0, events:[]};
values.forEach(function(value){
var notfound=true;
for(var n = 0; n < result.events.length; n++){
eventObj = result.events[n];
for(ev in eventObj){
if(ev==value.event){
result.events[n][ev] += value.allEventCount;
notfound=false;
break;
}
}
}
if(notfound==true){
var newEvent={}
newEvent[value.event]=1;
result.events.push(newEvent);
}
result.allEventCount += value.allEventCount;
});
return result;
}
This runs perfectly, when I run for 1000 records, when there are 3k or 10k records, the result I get is something like this
{ "_id" : {...}, "value" :{"allEventCount" :30, "events" :[ { "undefined" : 1},
{"adduser" : 1 }, {"remove" : 3 }, {"training" : 1 }, {"adminlogin" : 1 },
{"downgrade" : 2 } ]} }
Not able to understand where this undefined came from and also the sum of the individual events is less than allEventCount. All the docs in the collection has non-empty field event so there is no chance of undefined.
Mongo DB version -- 2.2.1
Environment -- Local machine, no sharding.
In the reduce function, why should this operation fail result.events[n][ev] += value.allEventCount; when the similar operation result.allEventCount += value.allEventCount; passes?
The corrected answer as suggested by johnyHK
Reduce function:
var reducefn = function(key,values){
var result = {totEvents:0, event:[]};
values.forEach(function(value){
value.event.forEach(function(eventElem){
var notfound=true;
for(var n = 0; n < result.event.length; n++){
eventObj = result.event[n];
for(ev in eventObj){
for(evv in eventElem){
if(ev==evv){
result.event[n][ev] += eventElem[evv];
notfound=false;
break;
}
}}
}
if(notfound==true){
result.event.push(eventElem);
}
});
result.totEvents += value.totEvents;
});
return result;
}
The shape of the object you emit from your map function must be the same as the object returned from your reduce function, as the results of a reduce can get fed back into reduce when processing large numbers of docs (like in this case).
So you need to change your emit to emit docs like this:
{userid:<xyz>, {events:[{adduser: 1}], allEventCount:1}}
{userid:<xyz>, {events:[{login: 1}], allEventCount:1}}
and then update your reduce function accordingly.

Group By (Aggregate Map Reduce Functions) in MongoDB using Scala (Casbah/Rogue)

Here's a specific query I'm having trouble with. I'm using Lift-mongo-
records so that i can use Rogue. I'm happy to use Rogue specific
syntax , or whatever works.
While there are good examples for using javascript strings via java noted below, I'd like to know what the best practices might be.
Imagine here that there is a table like
comments {
_id
topic
title
text
created
}
The desired output is a list of topics and their count, for example
cats (24)
dogs (12)
mice (5)
So a user can see an list, ordered by count, of a distinct/group by
Here's some psuedo SQL:
SELECT [DISTINCT] topic, count(topic) as topic_count
FROM comments
GROUP BY topic
ORDER BY topic_count DESC
LIMIT 10
OFFSET 10
One approach is using some DBObject DSL like
val cursor = coll.group( MongoDBObject(
"key" -> MongoDBObject( "topic" -> true ) ,
//
"initial" -> MongoDBObject( "count" -> 0 ) ,
"reduce" -> "function( obj , prev) { prev.count += obj.c; }"
"out" -> "topic_list_result"
))
[...].sort( MongoDBObject( "created" ->
-1 )).skip( offset ).limit( limit );
Variations of the above do not compile.
I could just ask "what am I doing wrong" but I thought I could make my
confusion more acute:
can I chain the results directly or do I need "out"?
what kind of output can I expect - I mean, do I iterate over a
cursor, or the "out" param
is "cond" required?
should I be using count() or distinct()
some examples contain a "map" param...
A recent post I found which covers the java driver implies I should
use strings instead of a DSL :
http://blog.evilmonkeylabs.com/2011/02/28/MongoDB-1_8-MR-Java/
Would this be the preferred method in either casbah or Rogue?
Update: 9/23
This fails in Scala/Casbah (compiles but produces error {MapReduceError 'None'} )
val map = "function (){ emit({ this.topic }, { count: 1 }); }"
val reduce = "function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }"
val out = coll.mapReduce( map , reduce , MapReduceInlineOutput )
ConfiggyObject.log.debug( out.toString() )
I settled on the above after seeing
https://github.com/mongodb/casbah/blob/master/casbah-core/src/test/scala/MapReduceSpec.scala
Guesses:
I am misunderstanding the toString method and what the out.object is?
missing finalize?
missing output specification?
https://jira.mongodb.org/browse/SCALA-43 ?
This works as desired from command line:
map = function (){
emit({ this.topic }, { count: 1 });
}
reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; };
db.tweets.mapReduce( map, reduce, { out: "results" } ); //
db.results.ensureIndex( {count : 1});
db.results.find().sort( {count : 1});
Update
The issue has not been filed as a bug at Mongo.
https://jira.mongodb.org/browse/SCALA-55
The following worked for me:
val coll = MongoConnection()("comments")
val reduce = """function(obj,prev) { prev.csum += 1; }"""
val res = coll.group( MongoDBObject("topic"->true),
MongoDBObject(), MongoDBObject( "csum" -> 0 ), reduce)
res was an ArrayBuffer full of coll.T which can be handled in the usual ways.
Appears to be a bug - somewhere.
For now, I have a less-than-ideal workaround working now, using eval() (slower, less safe) ...
db.eval( "map = function (){ emit( { topic: this.topic } , { count: 1 }); } ; ");
db.eval( "reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }; ");
db.eval( " db.tweets.mapReduce( map, reduce, { out: \"tweetresults\" } ); ");
db.eval( " db.tweetresults.ensureIndex( {count : 1}); ");
Then I query the output table normally via casbah.