Mongodb MapReduce for grouping up to n per category using Mongoid - mongodb

I have a weird problem with MongoDB (2.0.2) map reduce.
So, the story goes like this:
I have Ad model (look for model source extract below) and I need to group up to n ads per category in order to have a nice ordered listing I can later use to do more interesting things.
# encoding: utf-8
class Ad
include Mongoid::Document
cache
include Mongoid::Timestamps
field :title
field :slug, :unique => true
def self.aggregate_latest_active_per_category
map = "function () {
emit( this.category, { id: this._id });
}"
reduce = "function ( key, value ) {
return { ads:v };
}"
self.collection.map_reduce(map, reduce, { :out => "categories"} )
end
All fun and games up until now.
What I expect is to get a result in a form which resembles (mongo shell for db.categories.findOne() ):
{
"_id" : "category_name",
"value" : {
"ads" : [
{
"id" : ObjectId("4f2970e9e815f825a30014ab")
},
{
"id" : ObjectId("4f2970e9e815f825a30014b0")
},
{
"id" : ObjectId("4f2970e9e815f825a30014b6")
},
{
"id" : ObjectId("4f2970e9e815f825a30014b8")
},
{
"id" : ObjectId("4f2970e9e815f825a30014bd")
},
{
"id" : ObjectId("4f2970e9e815f825a30014c1")
},
{
"id" : ObjectId("4f2970e9e815f825a30014ca")
},
// ... and it goes on and on
]
}
}
Actually, it would be even better if I could get value to contain only array but MongoDB complains about not supporting that yet, but, with later use of finalize function, that is not a big problem I want to ask about.
Now, back to problem. What actually happens when I do map reduce is that it spits out something like :
{
"_id" : "category_name",
"value" : {
"ads" : [
{
"ads" : [
{
"ads" : [
{
"ads" : [
{
"ads" : [
{
"id" : ObjectId("4f2970d8e815f825a3000011")
},
{
"id" : ObjectId("4f2970d8e815f825a3000017")
},
{
"id" : ObjectId("4f2970d8e815f825a3000019")
},
{
"id" : ObjectId("4f2970d8e815f825a3000022")
},
// ... on and on and on
... and while I could probably work out a way to use this it just doesn't look like something I should get.
So, my questions (finally) are:
Am I doing something wrong and what is it?
I there something wrong with MongoDB map reduce (I mean besides all the usual things when compared to hadoop)?

Yes, you're doing it wrong. Inputs and outputs of map and reduce should be uniform. Because they are meant to be executed in parallel, and reduce might be run over partially reduced results. Try these functions:
var map = function() {
emit(this.category, {ads: [this._id]});
};
var reduce = function(key, values) {
var result = {ads: []};
values.forEach(function(v) {
v.ads.forEach(function(a) {
result.ads.push(a)
});
});
return result;
}
This should produce documents like:
{_id: category, value: {ads: [ObjectId("4f2970d8e815f825a3000011"),
ObjectId("4f2970d8e815f825a3000019"),
...]}}

Related

MongoDB - change values from one ENUM type to another

I have MongoDB entries which looks like this:
{
"_id" : ObjectId("57288862e4b05f37bc6ab91b"),
"_class" : "mydomain.ScheduleAbsenceContainer",
"containerStart" : ISODate("2016-04-06T07:30:00Z"),
"containerEnd" : ISODate("2016-04-06T10:00:00Z"),
"scheduleIntervalContainerAbsenceType" : "SCHOOL",
"scheduleIntervalContainers" : [
{
"_id" : null,
"marker" : 6,
"containerType" : "SCHOOL",
}
]
}
and I will change all scheduleIntervalContainerAbsenceType from SCHOOL to SPARE_TIME and also all containerType's from SCHOOL to SPARE_TIME.
Is there a simple possibility to do this?
Below code does what you want. It updates all the documents which has the "SCHOOL" value for "scheduleIntervalContainerAbsenceType" keys.
db.collection_name.find({"scheduleIntervalContainerAbsenceType" : "SCHOOL"})
.forEach(function (doc) {
doc.scheduleIntervalContainers.forEach(function (sch) {
if (sch.containerType === "SCHOOL") {
sch.containerType="SPARE_TIME";
}
});
doc.scheduleIntervalContainerAbsenceType="SPARE_TIME";
db.collection_name.save(doc);
});
If you want to update all the documents without checking "scheduleIntervalContainerAbsenceType" value (still updating it to "SPARE_TIME") change your query like that.
db.collection_name.find({})
.forEach(function (doc) {
doc.scheduleIntervalContainers.forEach(function (sch) {
if (sch.containerType === "SCHOOL") {
sch.containerType="SPARE_TIME";
}
});
doc.scheduleIntervalContainerAbsenceType="SPARE_TIME";
db.collection_name.save(doc);
});

Find records with field in a nested document when parent fields are not known

With a collection with documents like below, I need to find the documents where a particular field - eg. lev3_field2 (in document below) is present.
I tried the following, but this doesn't return any results, though the field lev3_field2 is present in some documents.
db.getCollection('some_collection').find({"lev3_field2": { $exists: true, $ne: null } })
{
"_id" : ObjectId("5884de15bebf420cf8bb2857"),
"lev1_field1" : "139521721",
"lev1_field2" : "276183",
"lev1_field3" : {
"lev2_field1" : "4",
"lev2_field2" : {
"lev3_field1" : "1",
"lev3_field2" : {
"lev4_field1" : "1",
"lev4_field2" : "1"
},
"lev3_field3" : "5"
},
"lev2_field3" : {
"lev3_field3" : "0",
"lev3_field4" : "0"
}
}
}
update1: this is an example, however in the real document it is not known what the parent fields are for the field to look for. So instead of lev3_field2 , I would be looking for `levM_fieldN'.
update2: Speed is not a primary concern for me, I can work with relatively a bit slower options as well, as the primary function is to find documents with the criteria discussed and once the document is found and the schema is understood, the query can be re-written for performance by including the parent keys.
To search a key in nested document you need to iterate the documents fields recursively, you can do this in JavaScript by the help of $where method in MongoDB
The below query will search if a key name exists in a documents and its subdocuments.
I have checked this with the example you have given, and it is working perfectly fine.
db.getCollection('test').find({ $where: function () {
var search_key = "lev3_field2";
function check_key(document) {
return Object.keys(document).some(function(key) {
if ( typeof(document[key]) == "object" ) {
if ( key == search_key ) {
return true;
} else {
return check_key(document[key]);
}
} else {
return ( key == search_key );
}
});
}
return check_key(this);
}}
);
There is no built-in function to iterate over document keys in MongoDB, but you can achieve this with MapReduce. The main advantage is that all the code is executed directly in the MongoDB database, and not in the js client, so there is no network overhead, hence it should be faster than client side js
here is the script :
var found;
// save a function in MongoDB to iterate over documents key and check for
// key name. Need to be done only once
db.system.js.save({
_id: 'findObjectByLabel',
value: function(obj, prop) {
Object.keys(obj).forEach(function(key) {
if (key === prop) {
found = true
}
if (!found && typeof obj[key] === 'object') {
findObjectByLabel(obj[key], prop)
}
})
}
})
// run the map reduce fonction
db.ex.mapReduce(
function() {
found = false;
var key = this._id
findObjectByLabel(this, 'lev3_field2')
value = found;
if (found) {
// if the document contains the key we are looking for,
// emit {_id: ..., value: true }
emit(key, value)
}
},
function(key, values) {
return values
}, {
'query': {},
'out': {inline:1}
}
)
this output ( run on 4 sample doc, with only one containing 'lev3_field2' )
{
"results" : [
{
"_id" : ObjectId("5884de15bebf420cf8bb2857"),
"value" : true
}
],
"timeMillis" : 18,
"counts" : {
"input" : 4,
"emit" : 1,
"reduce" : 0,
"output" : 1
},
"ok" : 1
}
to run the script, copy it to a file name "script.js" for example, and then run from your shell
mongo databaseName < script.js
It's because you're trying to see if a nested field exists. This is the query you want:
db.some_collection.find({"lev1_field3.lev2_field2.lev3_field2": { $exists: true, $ne: null } })

Retrieving value of an emedded object in mongo

Followup Question
Thanks #4J41 for your spot on resolution. Along the same lines, I'd also like to validate one other thing.
I have a mongo document that contains an array of Strings, and I need to convert this particular array of strings into an array of object containing a key-value pair. Below is my curent appraoch to it.
Mongo Record:
Same mongo record in my initial question below.
Current Query:
templateAttributes.find({platform:"V1"}).map(function(c){
//instantiate a new array
var optionsArray = [];
for (var i=0;i< c['available']['Community']['attributes']['type']['values'].length; i++){
optionsArray[i] = {}; // creates a new object
optionsArray[i].label = c['available']['Community']['attributes']['type']['values'][i];
optionsArray[i].value = c['available']['Community']['attributes']['type']['values'][i];
}
return optionsArray;
})[0];
Result:
[{label:"well-known", value:"well-known"},
{label:"simple", value:"simple"},
{label:"complex", value:"complex"}]
Is my approach efficient enough, or is there a way to optimize the above query to get the same desired result?
Initial Question
I have a mongo document like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"platform" : "A1",
"available" : {
"Community" : {
"attributes" : {
"type" : {
"values" : [
"well-known",
"simple",
"complex"
],
"defaultValue" : "well-known"
},
[......]
}
I'm trying to query the DB and retrieve only the value of defaultValue field.
I tried:
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
as well as
db.templateAttributes.findOne(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
But they both seem to retrieve the entire object hirarchy like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"available" : {
"Community" : {
"attributes" : {
"type" : {
"defaultValue" : "well-known"
}
}
}
}
}
The only way I could get it to work was with find and map function, but it seems to be convoluted a bit.
Does anyone have a simpler way to get this result?
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
).map(function(c){
return c['available']['Community']['attributes']['type']['defaultValue']
})[0]
Output
well-known
You could try the following.
Using find:
db.templateAttributes.find({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 }).toArray()[0]['available']['Community']['attributes']['type']['defaultValue']
Using findOne:
db.templateAttributes.findOne({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 })['available']['Community']['attributes']['type']['defaultValue']
Using aggregation:
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}},
{"$project": {_id:0, default:"$available.Community.attributes.type.defaultValue"}}
]).toArray()[0].default
Output:
well-known
Edit: Answering the updated question: Please use aggregation here.
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}}, {"$unwind": "$available.Community.attributes.type.values"},
{$group: {"_id": null, "val":{"$push":{label:"$available.Community.attributes.type.values",
value:"$available.Community.attributes.type.values"}}}}
]).toArray()[0].val
Output:
[
{
"label" : "well-known",
"value" : "well-known"
},
{
"label" : "simple",
"value" : "simple"
},
{
"label" : "complex",
"value" : "complex"
}
]

MongoDB: Update a field of an item in array with matching another field of that item

I have a data structure like this:
We have some centers. A center has some switches. A switch has some ports.
{
"_id" : ObjectId("561ad881755a021904c00fb5"),
"Name" : "center1",
"Switches" : [
{
"Ports" : [
{
"PortNumber" : 2,
"Status" : "Empty"
},
{
"PortNumber" : 5,
"Status" : "Used"
},
{
"PortNumber" : 7,
"Status" : "Used"
}
]
}
]
}
All I want is to write an Update query to change the Status of the port that it's PortNumber is 5 to "Empty".
I can update it when I know the array index of the port (here array index is 1) with this query:
db.colection.update(
// query
{
_id: ObjectId("561ad881755a021904c00fb5")
},
// update
{
$set : { "Switches.0.Ports.1.Status" : "Empty" }
}
);
But I don't know the array index of that Port.
Thanks for help.
You would normally do this using the positional operator $, as described in the answer to this question:
Update field in exact element array in MongoDB
Unfortunately, right now the positional operator only supports one array level deep of matching.
There is a JIRA ticket for the sort of behavior that you want: https://jira.mongodb.org/browse/SERVER-831
In case you can make Switches into an object instead, you could do something like this:
db.colection.update(
{
_id: ObjectId("561ad881755a021904c00fb5"),
"Switch.Ports.PortNumber": 5
},
{
$set: {
"Switch.Ports.$.Status": "Empty"
}
}
)
Since you don't know the array index of the Port, I would suggest you dynamically create the $set conditions on the fly i.e. something which would help you get the indexes for the objects and then modify accordingly, then consider using MapReduce.
Currently this seems to be not possible using the aggregation framework. There is an unresolved open JIRA issue linked to it. However, a workaround is possible with MapReduce. The basic idea with MapReduce is that it uses JavaScript as its query language but this tends to be fairly slower than the aggregation framework and should not be used for real-time data analysis.
In your MapReduce operation, you need to define a couple of steps i.e. the mapping step (which maps an operation into every document in the collection, and the operation can either do nothing or emit some object with keys and projected values) and reducing step (which takes the list of emitted values and reduces it to a single element).
For the map step, you ideally would want to get for every document in the collection, the index for each Switches and Ports array fields and another key that contains the $set keys.
Your reduce step would be a function (which does nothing) simply defined as var reduce = function() {};
The final step in your MapReduce operation will then create a separate collection Switches that contains the emitted Switches array object along with a field with the $set conditions. This collection can be updated periodically when you run the MapReduce operation on the original collection.
Altogether, this MapReduce method would look like:
var map = function(){
for(var i = 0; i < this.Switches.length; i++){
for(var j = 0; j < this.Switches[i].Ports.length; j++){
emit(
{
"_id": this._id,
"switch_index": i,
"port_index": j
},
{
"index": j,
"Switches": this.Switches[i],
"Port": this.Switches[i].Ports[j],
"update": {
"PortNumber": "Switches." + i.toString() + ".Ports." + j.toString() + ".PortNumber",
"Status": "Switches." + i.toString() + ".Ports." + j.toString() + ".Status"
}
}
);
}
}
};
var reduce = function(){};
db.centers.mapReduce(
map,
reduce,
{
"out": {
"replace": "switches"
}
}
);
Querying the output collection Switches from the MapReduce operation will typically give you the result:
db.switches.findOne()
Sample Output:
{
"_id" : {
"_id" : ObjectId("561ad881755a021904c00fb5"),
"switch_index" : 0,
"port_index" : 1
},
"value" : {
"index" : 1,
"Switches" : {
"Ports" : [
{
"PortNumber" : 2,
"Status" : "Empty"
},
{
"PortNumber" : 5,
"Status" : "Used"
},
{
"PortNumber" : 7,
"Status" : "Used"
}
]
},
"Port" : {
"PortNumber" : 5,
"Status" : "Used"
},
"update" : {
"PortNumber" : "Switches.0.Ports.1.PortNumber",
"Status" : "Switches.0.Ports.1.Status"
}
}
}
You can then use the cursor from the db.switches.find() method to iterate over and update your collection accordingly:
var newStatus = "Empty";
var cur = db.switches.find({ "value.Port.PortNumber": 5 });
// Iterate through results and update using the update query object set dynamically by using the array-index syntax.
while (cur.hasNext()) {
var doc = cur.next();
var update = { "$set": {} };
// set the update query object
update["$set"][doc.value.update.Status] = newStatus;
db.centers.update(
{
"_id": doc._id._id,
"Switches.Ports.PortNumber": 5
},
update
);
};

Help with map reduce in MongoDB

I'm struggling to get a firm grasp of how map reduce works and when to use it. I'm getting some random results that just isn't making sense, but maybe my understanding of mapreduce in wrong?
Here's an example of what I am doing.
I have a collection of over 15000 uk towns with the following structure;
{
"_id" : ObjectId("4e234105e138231a7f000004"),
"county" : "Powys",
"name" : "Abbey-Cwmhir",
"location" : {
"latitude" : 52.3298355191946,
"longitude" : -3.39230306446552
}
}
Each county has many towns, and I would like to get a new collection with the following structure for each county;
{
"_id" : "Powys",
"towns" : [
{
"name" : "Abbey-Cwmhir",
"loc" : [52.3298355191946, -3.39230306446552]
},
//.. etc.
]
}
So, I guess map reduce is an ideal candidate for this right? If it is, how what would be the correct map and reduce functions?
As a starting point you could use something like this:
Map function
function() {
emit( this.county,{
towns: [
{
name: this.name,
loc: this.location
}
]
} );
}
Reduce function
function(key, values) {
result = { towns: [] };
values.forEach(
function( townsgroup ) {
townsgroup.towns.forEach(
function( town ) {
result.towns.push( town );
});
});
return result;
}
Thank you dcrosta for the correction.