MongoDB MapReduce producing different results for each document - mongodb

This is a follow-up from this question, where I tried to solve this problem with the aggregation framework. Unfortunately, I have to wait before being able to update this particular mongodb installation to a version that includes the aggregation framework, so have had to use MapReduce for this fairly simple pivot operation.
I have input data in the format below, with multiple daily dumps:
"_id" : "daily_dump_2013-05-23",
"authors_who_sold_books" : [
{
"id" : "Charles Dickens",
"original_stock" : 253,
"customers" : [
{
"time_bought" : 1368627290,
"customer_id" : 9715923
}
]
},
{
"id" : "JRR Tolkien",
"original_stock" : 24,
"customers" : [
{
"date_bought" : 1368540890,
"customer_id" : 9872345
},
{
"date_bought" : 1368537290,
"customer_id" : 9163893
}
]
}
]
}
I'm after output in the following format, that aggregates across all instances of each (unique) author across all daily dumps:
{
"_id" : "Charles Dickens",
"original_stock" : 253,
"customers" : [
{
"date_bought" : 1368627290,
"customer_id" : 9715923
},
{
"date_bought" : 1368622358,
"customer_id" : 9876234
},
etc...
]
}
I have written this map function...
function map() {
for (var i in this.authors_who_sold_books)
{
author = this.authors_who_sold_books[i];
emit(author.id, {customers: author.customers, original_stock: author.original_stock, num_sold: 1});
}
}
...and this reduce function.
function reduce(key, values) {
sum = 0
for (i in values)
{
sum += values[i].customers.length
}
return {num_sold : sum};
}
However, this gives me the following output:
{
"_id" : "Charles Dickens",
"value" : {
"customers" : [
{
"date_bought" : 1368627290,
"customer_id" : 9715923
},
{
"date_bought" : 1368622358,
"customer_id" : 9876234
},
],
"original_stock" : 253,
"num_sold" : 1
}
}
{ "_id" : "JRR Tolkien", "value" : { "num_sold" : 3 } }
{
"_id" : "JK Rowling",
"value" : {
"customers" : [
{
"date_bought" : 1368627290,
"customer_id" : 9715923
},
{
"date_bought" : 1368622358,
"customer_id" : 9876234
},
],
"original_stock" : 183,
"num_sold" : 1
}
}
{ "_id" : "John Grisham", "value" : { "num_sold" : 2 } }
The even indexed documents have the customers and original_stock listed, but an incorrect sum of num_sold.
The odd indexed documents only have the num_sold listed, but it is the correct number.
Could anyone tell me what it is I'm missing, please?

Your problem is due to the fact that the format of the output of the reduce function should be identical to the format of the map function (see requirements for the reduce function for an explanation).
You need to change the code to something like the following to fix the problem, :
function map() {
for (var i in this.authors_who_sold_books)
{
author = this.authors_who_sold_books[i];
emit(author.id, {customers: author.customers, original_stock: author.original_stock, num_sold: author.customers.length});
}
}
function reduce(key, values) {
var result = {customers:[] , num_sold:0, original_stock: (values.length ? values[0].original_stock : 0)};
for (i in values)
{
result.num_sold += values[i].num_sold;
result.customers = result.customers.concat(values[i].customers);
}
return result;
}
I hope that helps.
Note : the change num_sold: author.customers.length in the map function. I think that's what you want

Related

Mongo query to return distinct count, large documents

I need to be able to get a count of distinct 'transactions' the problem I'm having is that using .distinct() comes back with an error because the documents too large.
I'm not familiar with aggregation either.
I need to be able to group it by 'agencyID' as you see below there are 2 different agencyID's
I need to be able to count transactions where the agencyID is 01721487 etc
db.myCollection.distinct("bookings.transactions").length
this doesn't work as I need to be able to group by agencyID and if there are too many results I get an error saying it's too large.
{
"_id" : ObjectId("5624a610a6e6b53b158b4744"),
"agencyID" : "01721487",
"paxID" : "-530189664",
"bookings" : [
{
"bookingID" : "24232",
"transactions" : [
{
"tranID" : "001",
"invoices" : [
{
"invNum" : "1312",
"type" : "r",
"inv_date" : "20150723",
"inv_time" : "0953",
"inv_val" : -300
}
],
"tranType" : "Fee",
"tranDate" : "20150723",
"tranTime" : "0952",
"opCode" : "admin",
"udf_1" : "j s"
}
],
"acctID" : "acct11",
"agt_id" : "xy"
}
],
"title" : "",
"firstname" : "",
"surname" : "f bar"
}
I've also tried this but it didn't work for me.
thank you for text data -
this is something you could play with:
db.kieron.aggregate([{
$unwind : "$bookings"
}, {
$match : {
"bookings.transactions" : {
$exists : true,
$not : {
$size : 0
}
}
}
}, {
$group : {
_id : "$agencyID",
count : {
$sum : {
$size : "$bookings.transactions"
}
}
}
}
])
as there is nested array we need to unwind it first, and then we can check size of inner array.
Happy reporting!

MongoDB update all subelements from subarray [duplicate]

This question already has answers here:
How to Update Multiple Array Elements in mongodb
(16 answers)
Closed 6 years ago.
I have a collection with a following schema:
{
"_id" : ObjectId("52dfba46daf02aa4630cf529"),
"hotelVenue" : {
"rooms" : [
{
"clientId" : "ROOM_1",
"roomName" : "Executive"
},
{
"clientId" : "ROOM_2",
"roomName" : "Premium"
}
]
}
},
{
"_id" : ObjectId("52dfc2f9daf02aa2632bc8af"),
"hotelVenue" : {
"rooms" : [
{
"clientId" : "ROOM_1",
"roomName" : "Studio Room"
},
{
"clientId" : "ROOM_2",
"roomName" : "Soho Suite"
},
{
"clientId" : "ROOM_3",
"roomName" : "Luxury Suite"
}
]
}
}
I need to genearate unique id for all the records -> subarray.
i.e., in the example there is rooms so for each and every room type, I need give a unique id basically using ObjectId().
Output should look something like the below, where roomId being generated.
{
"_id" : ObjectId("52dfba46daf02aa4630cf529"),
"hotelVenue" : {
"rooms" : [
{
"clientId" : "ROOM_1",
"roomName" : "Executive",
"roomId" : "56f8cb3f0c658b4bc26172342"
},
{
"clientId" : "ROOM_2",
"roomName" : "Premium",
"roomId" : "56f8cb3f0c658b4bc26176d4"
}
]
}
}
I have written this script where it gives the following error: Error: Line 9: Unexpected token +
db.venues.find().forEach(function(data)
{
data.hotelVenue.rooms.forEach(function(roomItem)
{
db.venues.update({_id:data._id,'data.hotelVenue.rooms.clientId' : roomItem.clientId},
{
$set:
{
'hotelVenue.rooms.'+roomItem.clientId+'.roomId' : ObjectId()
}
});
});
})
EDIT: This should do it in a more efficient manner by only saving each document once;
db.venues.
find({"hotelVenue.rooms": { $elemMatch: {roomId: {$exists: 0}}}}).
forEach(function(doc) {
doc.hotelVenue.rooms.forEach(function(room) {
if(!room.roomId) {
room.roomId = ObjectId();
}
});
db.venues.save(doc);
});
The find filters out only documents that are in need of updating. After that, it's just a matter of updating the document as needed and calling save.
Of course, backups are in order before running potentially destructive queries from random people on the Internet against your production data set.

Mongodb map reduce trivial query

I have a below map:
var mapFunction = function() {
if(this.url.match(/http:\/\/test.com\/category\/.*?\/checkout/)) {
var key=this.em;
var value = {
url : 'checkout',
count : 1,
account_id:this.accId
}emit(key,value); };
if(this.url.match(/http:\/\/test.com\/landing/)) {
var key=this.em;
var value = {
url : 'landing',
count : 1,
account_id:this.accId
}emit(key,value); };
}
Then I have defined reduce something like below:
var reduceFunction = function (keys, values) {
var reducedValue = {count_checkout:0, count_landing:0};
for (var idx = 0; idx < values.length; idx++) {
if(values[idx].url=='checkout'){
reducedValue.count_checkout++;
}
else {
reducedValue.count_landing++;
}
}
return reducedValue;
}
Now, lets say I have only 1 record:
{
"_id" : ObjectId("516a7cff6dad5949ddf3f7b6"),
"ip" : "1.2.3.4",
"accId" : 123,
"em" : "testing#test.com",
"pgLdTs" : ISODate("2013-04-11T18:30:00Z"),
"url" : "http://test.com/category/prr/checkout",
"domain" : "www.test.com",
"pgUdTs" : ISODate("2013-04-14T09:55:11.682Z"),
"title" : "Test",
"ua" : "Mozilla",
"res" : "1024*768",
"rfr" : "www.google.com"
}
Now if I fire my map reduce like below:
db.test_views.mapReduce(mapFunction,reduceFunction,{out:{inline:1}})
The I get below result returned:
{
"_id" : "testing#test.com",
"value" : {
"url" : "checkout",
"count" : 1,
"account_id" : 123
}
}
So, its basically returning me the map. Now, if I go a add another document for this email id. Finally it becomes something like below.
{
"_id" : ObjectId("516a7cff6dad5949ddf3f7b6"),
"ip" : "1.2.3.4",
"accId" : 123,
"em" : "testing#test.com",
"pgLdTs" : ISODate("2013-04-11T18:30:00Z"),
"url" : "http://test.com/category/prr/checkout",
"domain" : "www.test.com",
"pgUdTs" : ISODate("2013-04-14T09:55:11.682Z"),
"title" : "Test",
"ua" : "Mozilla",
"res" : "1024*768",
"rfr" : "www.google.com"
}
{
"_id" : ObjectId("516a7e1b6dad5949ddf3f7b7"),
"ip" : "1.2.3.4",
"accId" : 123,
"em" : "testing#test.com",
"pgLdTs" : ISODate("2013-04-11T18:30:00Z"),
"url" : "http://test.com/category/prr/checkout",
"domain" : "www.test.com",
"pgUdTs" : ISODate("2013-04-14T09:59:55.326Z"),
"title" : "Test",
"ua" : "Mozilla",
"res" : "1024*768",
"rfr" : "www.google.com"
}
Then, I go again and fire the map reduce, it gives me proper results
{
"_id" : "testing#test.com",
"value" : {
"count_checkout" : 2,
"count_landing" : 0
}
}
Can anyone please help me out in understanding why it returns me a map for single document and doesn't do the counting in reduce.
Thanks for help.
-Lalit
Can anyone please help me out in understanding why it returns me a map for single document and doesn't do the counting in reduce.
The Reduce step combines documents with the same key into a single result document. If you only have one key in the data emitted by your Map function, the data is already "reduced" and the reduce() will not be called.
This is the expected behaviour of the MapReduce algorithm.
The reduce function should return the same type of value objects as the map function emits.
Like you've experienced, when there's a single value associated with a key - the reduce function will not be called at all .
From the MongoDB MapReduce Documentation:
Requirements for the reduce Function:
...
the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )

MongoDB aggregation to return nested groups with values as keys?

My documents look like this:
{
"_id" : "Tvq579754r",
"Status" : "passed",
"Title" : "up08c",
"ProjectID" : "Tvq5p",
"Version" : "1.0.0",
"Platform" : "platform_x",
"METRIC_A" : 11114.85,
"METRIC_B" : 68.9,
"METRIC_C" : 65.35,
},
{
"_id" : "Tvq579755r",
"Status" : "passed",
"Title" : "up09c",
"ProjectID" : "Tvq5p",
"Version" : "1.0.0",
"Platform" : "platform_x",
"METRIC_A" : 21114.85,
"METRIC_B" : 168.9,
"METRIC_C" : 165.35,
},
{
"_id" : "Tvq579756r",
"Status" : "passed",
"Title" : "up09c",
"ProjectID" : "Tvq5p",
"Version" : "1.0.0",
"Platform" : "platform_x",
"METRIC_A" : 31114.85,
"METRIC_B" : 268.9,
"METRIC_C" : 265.35,
}
Now I have no problem grouping and getting $avg and $sum of my METRIC_ fields by grouping by ProjectID, Version, Platform and Title, but what I'd like to do within the aggregation framework (if possible) is to return an object that uses the grouped values as keys, such as:
{
<Project ID> : {
<Version> : {
<Platform> : {
<Title> : {
"METRIC_A": <sum of METRIC_A>,
"METRIC_B": <sum of METRIC_B>,
"METRIC_C": <sum of METRIC_C>,
}
}
}
}
}
Or, in context of my example:
{
'Tvq5p' : {
'1.0.0' : {
'platform_x' : {
'up08c' : {
"METRIC_A": 11114.85,
"METRIC_B": 68.9,
"METRIC_C": 65.35,
},
'up09c' : {
"METRIC_A": 52229.7,
"METRIC_B": 437.8,
"METRIC_C": 430.7,
}
}
}
}
}
I am currently doing it once the query results are received by the consuming service, which isn't terribly slow or anything, but I just thought it would be nice to come that way right out of Mongo. Is this even possible?
Thanks.
In MongoDB there is the group operation.
db.records.group( {
key: { 'platform_x': 1, 'title': 1 },
reduce: function(cur, result) {
result.metric_a += cur.metric_a;
result.metric_b += cur.metric_b;
result.metric_c += cur.metric_c;
},
initial: { metric_a = 0, metric_b = 0, metric_c = 0 }
} )
If that doesn't work I'd recommend a Map Reduce.

Mongodb Map/Reduce - Multiple Group By

I am trying to run a map/reduce function in mongodb where I group by 3 different fields contained in objects in my collection. I can get the map/reduce function to run, but all the emitted fields run together in the output collection. I'm not sure this is normal or not, but outputting the data for analysis takes more work to clean up. Is there a way to separate them, then use mongoexport?
Let me show you what I mean:
The fields I am trying to group by are the day, user ID (or uid) and destination.
I run these functions:
map = function() {
day = (this.created_at.getFullYear() + "-" + (this.created_at.getMonth()+1) + "-" + this.created_at.getDate());
emit({day: day, uid: this.uid, destination: this.destination}, {count:1});
}
/* Reduce Function */
reduce = function(key, values) {
var count = 0;
values.forEach(function(v) {
count += v['count'];
}
);
return {count: count};
}
/* Output Function */
db.events.mapReduce(map, reduce, {query: {destination: {$ne:null}}, out: "TMP"});
The output looks like this:
{ "_id" : { "day" : "2012-4-9", "uid" : "1234456", "destination" : "Home" }, "value" : { "count" : 1 } }
{ "_id" : { "day" : "2012-4-9", "uid" : "2345678", "destination" : "Home" }, "value" : { "count" : 1 } }
{ "_id" : { "day" : "2012-4-9", "uid" : "3456789", "destination" : "Login" }, "value" : { "count" : 1 } }
{ "_id" : { "day" : "2012-4-9", "uid" : "4567890", "destination" : "Contact" }, "value" : { "count" : 1 } }
{ "_id" : { "day" : "2012-4-9", "uid" : "5678901", "destination" : "Help" }, "value" : { "count" : 1 } }
When I attempt to use mongoexport, I can not separate day, uid, or destination by columns because the map combines the fields together.
What I would like to have would look like this:
{ { "day" : "2012-4-9" }, { "uid" : "1234456" }, { "destination" : "Home"}, { "count" : 1 } }
Is this even possible?
As an aside - I was able to make the output work by applying sed to the file and cleaning up the CSV. More work, but it worked. It would be ideal if I could get it out of mongodb in the correct format.
MapReduce only returns documents of the form {_id:some_id, value:some_value}
see: How to change the structure of MongoDB's map-reduce results?