Mongo DB Map Reduce skipping records - mongodb

I am trying to skip records from mongo map reduce utility using the below command.
db.<collectionName>.mapReduce(
"function () { counter++; if (counter > <numberOfRecordsToBeSkipped>){ emit(this.fieldName, 1); } }",
"function (key, values) { return Array.sum(values); }" ,
{"out" : <CollectionName>,"scope" : "{var counter:0}" ,
"limit" : 0}
);
I keep getting the following error.
uncaught exception: map reduce failed:{
"errmsg" : "exception: map invoke failed: JS Error: ReferenceError: counter is not defined nofile_b:0",
"code" : 9014,
"ok" : 0
}
Could anyone please help me on that? I understand that scope attribute can be used to define global variables to be used in map/reduce functions.

While the syntax you have is wrong for the scope parameter:
scope: { counter: 0 }
You'll also encounter the problem that the scope document is intended to be read-only. It would not work in a sharded setup if the document passed to scope were to change only on one shard server. Further, there's no guarantee that the order of documents processed by a map-reduce will be the same from execution to execution. So, it would produce inconsistent results.
Here's a simple test with a document that has a userid field:
{
userid: 1234
}
Code:
map = function () { counter++; if (counter > 1){ emit(this.userid, 1); } }
reduce = function (key, values) { return Array.sum(values); }
In use:
db.tw.mapReduce(map, reduce, { out: "tw2", scope: { counter: 0 } })
The code executes as expected. By presetting counter to a value, I controlled how many documents were processed in the map phase. I confirmed the output matched expectations.
If I change the map code:
map = function () { counter++; if (counter > numberToSkip) { emit(this.userid, 1); } }
And try to call the mapReduce function without any changes, I'll see an errmsg with:
ReferenceError: numberToSkip is not defined near 'ounter > numberToSkip){ emit(thi' "

Related

Mongoid throw 16052 exception

I want to use mongoid to implement the query funcation likes 'GROUP BY',but i caught an exception:
failed with error 16052: "exception: could not create cursor over th_test.messages for query : { sr_flag: /.*541260c5aee1a93f70000001.*/ } sort : { created_at: -1 }"
My code is here:
def messages
map = %Q{
function() {
emit(this.sr_flag, { count: 1 });
}
}
reduce = %Q{
function(key, values) {
var result = { count: 0 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
}
result = Message.where(sr_flag: /.*#{self.id}.*/).map_reduce(map, reduce).out(inline: true).to_a
result
end
Can someone help me explain why? I had searched a blog.Does the mongoid set the created_at column as primary key?
I had fixed my problem.The reason was someone writed a default_scope for my Message model,but the column sorted on was not the key column of the map method.Just Using unscoped method to make program work.
result = Message.unscoped.where(sr_flag: /.*#{self.id}.*/).map_reduce(map, reduce).out(inline: true).to_a

Simple MapReduce count in MongoDB not working

I'm trying to generate a statistical mode by using a simple count. The error I'm getting is
{
"errmsg" : "exception: reduce -> multiple not supported yet",
"code" : 10075,
"ok" : 0
}
Here's my code.
var mapFunction = function(){
emit(this.mode, 1);
};
var reduceFunction = function(key, value){
Array.sum(value)
return value;
};
db.runCommand(
{
mapReduce : 'total_contractor_earnings_MR',
map: mapFunction,
reduce: reduceFunction,
out: { replace: 'mapReduceContractorMode', db: 'large'}
}
);
Here you count a sum and do nothing with it.
Array.sum(value)
return value;
The thing you ment to write was:
return Array.sum(value);
The error occured because mongodb currently doesn't support returning an array from a reduce function

MongoDB MapReduce: Not working as expected for more than 1000 records

I wrote a mapreduce function where the records are emitted in the following format
{userid:<xyz>, {event:adduser, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<abc>, {event:adduser, count:1}}
where userid is the key and the remaining are the value for that key.
After the MapReduce function, I want to get the result in following format
{userid:<xyz>,{events: [{adduser:1},{login:2}], allEventCount:3}}
To acheive this I wrote the following reduce function
I know this can be achieved by group by.. both in aggregation framework and mapreduce, but we require a similar functionality for a complex scenario. So, I am taking this approach.
var reducefn = function(key,values){
var result = {allEventCount:0, events:[]};
values.forEach(function(value){
var notfound=true;
for(var n = 0; n < result.events.length; n++){
eventObj = result.events[n];
for(ev in eventObj){
if(ev==value.event){
result.events[n][ev] += value.allEventCount;
notfound=false;
break;
}
}
}
if(notfound==true){
var newEvent={}
newEvent[value.event]=1;
result.events.push(newEvent);
}
result.allEventCount += value.allEventCount;
});
return result;
}
This runs perfectly, when I run for 1000 records, when there are 3k or 10k records, the result I get is something like this
{ "_id" : {...}, "value" :{"allEventCount" :30, "events" :[ { "undefined" : 1},
{"adduser" : 1 }, {"remove" : 3 }, {"training" : 1 }, {"adminlogin" : 1 },
{"downgrade" : 2 } ]} }
Not able to understand where this undefined came from and also the sum of the individual events is less than allEventCount. All the docs in the collection has non-empty field event so there is no chance of undefined.
Mongo DB version -- 2.2.1
Environment -- Local machine, no sharding.
In the reduce function, why should this operation fail result.events[n][ev] += value.allEventCount; when the similar operation result.allEventCount += value.allEventCount; passes?
The corrected answer as suggested by johnyHK
Reduce function:
var reducefn = function(key,values){
var result = {totEvents:0, event:[]};
values.forEach(function(value){
value.event.forEach(function(eventElem){
var notfound=true;
for(var n = 0; n < result.event.length; n++){
eventObj = result.event[n];
for(ev in eventObj){
for(evv in eventElem){
if(ev==evv){
result.event[n][ev] += eventElem[evv];
notfound=false;
break;
}
}}
}
if(notfound==true){
result.event.push(eventElem);
}
});
result.totEvents += value.totEvents;
});
return result;
}
The shape of the object you emit from your map function must be the same as the object returned from your reduce function, as the results of a reduce can get fed back into reduce when processing large numbers of docs (like in this case).
So you need to change your emit to emit docs like this:
{userid:<xyz>, {events:[{adduser: 1}], allEventCount:1}}
{userid:<xyz>, {events:[{login: 1}], allEventCount:1}}
and then update your reduce function accordingly.

How to access this._id in map function in MongoDB MapReduce?

I'm doing a MapReduce in Mongo to generate a reverse index of tokens for some documents. I am having trouble accessing document's _id in the map function.
Example document:
{
"_id" : ObjectId("4ea42a2c6fe22bf01f000d2d"),
"attributes" : {
"name" : "JCDR 50W38C",
"upi-tokens" : [
"50w38c",
"jcdr"
]
},
"sku" : "143669259486830515"
}
(The field ttributes['upi-tokens'] is a list of text tokens I want to create reverse index for.)
Map function (source of the problem):
m = function () {
this.attributes['upi-tokens'].forEach(
function (token) { emit(token, {ids: [ this._id ]} ); }
); }
Reduce function:
r = function (key, values) {
var results = new Array;
for (v in values) {
results = results.concat(v.ids);
}
return {ids:results};
}
MapReduce call:
db.offers.mapReduce(m, r, { out: "outcollection" } )
PROBLEM Resulting collection has null values everywhere where I'd expect an id instead of actual ObjectID strings.
Possible reason:
I was expecting the following 2 functions to be equivalent, but they aren't.
m1 = function (d) { print(d['_id']); }
m2 = function () { print(this['_id']); }
Now I run:
db.offers.find().forEach(m1)
db.offers.find().forEach(m2)
The difference is that m2 prints undefined for each document while m1 prints the ids as desired. I have no clue why.
Questions:
How do I get the _id of the current object in the map function for use in MapReduce? this._id or this['_id'] doesn't work.
Why exactly aren't m1 and m2 equivalent?
Got it to work... I made quite simple JS mistakes:
inner forEach() in the map function seems to overwrite 'this' object; this is no longer the main document (which has an _id) but the iterated object inside the loop)...
...or it was simply because in JS the for..in loop only returns the keys, not values, i.e.
for (v in values) {
now requires
values[v]
to access the actual array value. Duh...
The way I circumvented mistake #1 is by using for..in loop instead of ...forEach() loop in the map function:
m = function () {
for (t in this.attributes['upi-tokens']) {
var token = this.attributes['upi-tokens'][t];
emit (token, { ids: [ this._id ] });
}
}
That way "this" refers to what it needs to.
Could also do:
that = this;
this.attributes['upi-tokens'].forEach( function (d) {
...
that._id...
...
}
probably would work just fine.
Hope this helps someone.

Group By (Aggregate Map Reduce Functions) in MongoDB using Scala (Casbah/Rogue)

Here's a specific query I'm having trouble with. I'm using Lift-mongo-
records so that i can use Rogue. I'm happy to use Rogue specific
syntax , or whatever works.
While there are good examples for using javascript strings via java noted below, I'd like to know what the best practices might be.
Imagine here that there is a table like
comments {
_id
topic
title
text
created
}
The desired output is a list of topics and their count, for example
cats (24)
dogs (12)
mice (5)
So a user can see an list, ordered by count, of a distinct/group by
Here's some psuedo SQL:
SELECT [DISTINCT] topic, count(topic) as topic_count
FROM comments
GROUP BY topic
ORDER BY topic_count DESC
LIMIT 10
OFFSET 10
One approach is using some DBObject DSL like
val cursor = coll.group( MongoDBObject(
"key" -> MongoDBObject( "topic" -> true ) ,
//
"initial" -> MongoDBObject( "count" -> 0 ) ,
"reduce" -> "function( obj , prev) { prev.count += obj.c; }"
"out" -> "topic_list_result"
))
[...].sort( MongoDBObject( "created" ->
-1 )).skip( offset ).limit( limit );
Variations of the above do not compile.
I could just ask "what am I doing wrong" but I thought I could make my
confusion more acute:
can I chain the results directly or do I need "out"?
what kind of output can I expect - I mean, do I iterate over a
cursor, or the "out" param
is "cond" required?
should I be using count() or distinct()
some examples contain a "map" param...
A recent post I found which covers the java driver implies I should
use strings instead of a DSL :
http://blog.evilmonkeylabs.com/2011/02/28/MongoDB-1_8-MR-Java/
Would this be the preferred method in either casbah or Rogue?
Update: 9/23
This fails in Scala/Casbah (compiles but produces error {MapReduceError 'None'} )
val map = "function (){ emit({ this.topic }, { count: 1 }); }"
val reduce = "function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }"
val out = coll.mapReduce( map , reduce , MapReduceInlineOutput )
ConfiggyObject.log.debug( out.toString() )
I settled on the above after seeing
https://github.com/mongodb/casbah/blob/master/casbah-core/src/test/scala/MapReduceSpec.scala
Guesses:
I am misunderstanding the toString method and what the out.object is?
missing finalize?
missing output specification?
https://jira.mongodb.org/browse/SCALA-43 ?
This works as desired from command line:
map = function (){
emit({ this.topic }, { count: 1 });
}
reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; };
db.tweets.mapReduce( map, reduce, { out: "results" } ); //
db.results.ensureIndex( {count : 1});
db.results.find().sort( {count : 1});
Update
The issue has not been filed as a bug at Mongo.
https://jira.mongodb.org/browse/SCALA-55
The following worked for me:
val coll = MongoConnection()("comments")
val reduce = """function(obj,prev) { prev.csum += 1; }"""
val res = coll.group( MongoDBObject("topic"->true),
MongoDBObject(), MongoDBObject( "csum" -> 0 ), reduce)
res was an ArrayBuffer full of coll.T which can be handled in the usual ways.
Appears to be a bug - somewhere.
For now, I have a less-than-ideal workaround working now, using eval() (slower, less safe) ...
db.eval( "map = function (){ emit( { topic: this.topic } , { count: 1 }); } ; ");
db.eval( "reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }; ");
db.eval( " db.tweets.mapReduce( map, reduce, { out: \"tweetresults\" } ); ");
db.eval( " db.tweetresults.ensureIndex( {count : 1}); ");
Then I query the output table normally via casbah.