Struggling with a map-reduce in MongoDb

Struggling with a map-reduce in MongoDb - mongodb

Problem
I have a document with a _id and a Collection of Answers I am trying to write a map-reduce function to sum the total score of answers for each id.
Document
/* 0 */
{
"_id" : ObjectId("527b6ba88d251d58a18f3f0a"),
"Answers" : [{
"Score" : 2
}, {
"Score" : 0
}, {
"Score" : 2
}, {
"Score" : 2
}]
}
Here is the Map-Reduce I though would be correct reading the documentation
Map
function() {
this.Answers.forEach(function(val)
{
emit(this._id, val.Score);
});
}
also tried this
function() {
for (var i = 0; i < this.Answers; i++)
{
emit(this._id, this.Answers[i].Score);
}
}
Reduce
function(key, values)
{
return Array.sum(values);
}
I am getting no information back with this, but it does appear to be processing it takes 2-5 seconds to return. I guess I am not understanding something about map-reduce.
Also I am using MongoVUE to access MongoDB.
EDIT
I just ran my map reduce through the console and got this output
{
"results" : [ ],
"timeMillis" : 2506,
"counts" : {
"input" : 1655,
"emit" : 0,
"reduce" : 0,
"output" : 0
},
"ok" : 1,
}
so it's my map function that's incorrect I guess as nothing was emitted.
EDIT 2
Updated document with output from mongovue

In JavaScript loops adding the length property allows you to iterate by the count of the items in the array, so you cna change your second attempt to:
function() {
for (var i = 0; i < this.Answers.length; i++)
{
emit(this._id, this.Answers[i].Score);
}
}
It should also be noted that your reduce can run multiple times per key, specifically it can repeat every 101 rows, technically this shouldn't matter since you are summing up the array values and the previous reduce value will be passed as an array element in the new reduce so it should work just fine; however, good to keep in mind.

I think the 'this' variable is not what you expect in the .forEach() function in your map method. Try this instead;
function() {
var row = this;
this.Answers.forEach(function(val)
{
emit(row._id, val.Score);
});
}

Related

how to speed data reading in Mongo DB

I have 400,000 data in my Mongo DB collection.Every Document have count(number).I want read these documents and add all numbers to get total.I get Mongo DB Collection data using Node.js and mongoose then calculate total using for-loop.it take around two minute.i want to take it in one second.is there any way to speed this process?.I found mapreduce can speedup this.what is the most efficient way to speedup this process.
i take Mongodb model like this
exports.getDownloads = function(processPD,processDW,responseMDW) {
DailyDowloadsModel.find({},function(err,foundData){
var select;
if (err) {
log.error(clientIP +" - DB Connection downloads failed - error");
res.status(500).send();
}
else {
if(foundData.length == 0){
var responseObject = null;
if(select && select == 'count'){
responseObject = {count: 0};
}
}else {
var responseObject = foundData;
if (select && select == "count") {
responseObject = {count: foundData.length};
}
processPD(processDW,responseObject,responseMDW);
}
}
});
}
sample Document
{
"_id" : ObjectId("5719ef37264f87331a3d0c54"),
"refunds" : "0",
"downloads" : "6",
"country" : "CA",
"date" : "2013-09-06",
"product_id" : "20600001319328",
"__v" : 0
}
I want to calculate total downloads.

You have two options to calculate total.
OPTION 1: aggregation framework
Performing such actions using aggregation framework will be much faster as compared to sending all documents to client and doing math over there.
Note: your downloads field is string, it should be a number.
db.collection.aggregate([
{$group:{_id:null, total:{$sum:"$downloads"}}}
])
On my Machine (Macbook Pro), it's returning total in under half a second. I'm running test on 400000 documents.
OPTION 2: map reduce
Though it is highly recommended to update your document structure to accept downloads as number. However, if this is not an option for whatever reason, your best bet is map reduce functionality offered by MongoDB.
var map = function(){
emit(1, parseInt(this.downloads));
};
var reduce = function(key, values){
var reducedValue = Array.sum(values);
return reducedValue;
};
db.collection.mapReduce(map, reduce, {
out: { "inline" : 1}
});
map reduce is slower than aggregation framework as you can see but much faster than your original approach. It emit output as:
{
"results" : [
{
"_id" : NumberInt(1),
"value" : NumberInt(2400000)
}
],
"timeMillis" : NumberInt(4112),
"counts" : {
"input" : NumberInt(400000),
"emit" : NumberInt(400000),
"reduce" : NumberInt(4000),
"output" : NumberInt(1)
},
"ok" : NumberInt(1)
}
As you can notice, it took roughly 4 seconds to complete operation.

Use Mongo DB aggregation
db.DailyDowloadsModel.aggregate([{$group:{_id:null, totalDownloads:{$sum:"$downloads"}}}]);
But before that INDEX the download field with this command in Mongo DB.
db.DailyDowloadsModel.createIndex( { downloads: 1 });

MongoDB: Update a field of an item in array with matching another field of that item

I have a data structure like this:
We have some centers. A center has some switches. A switch has some ports.
{
"_id" : ObjectId("561ad881755a021904c00fb5"),
"Name" : "center1",
"Switches" : [
{
"Ports" : [
{
"PortNumber" : 2,
"Status" : "Empty"
},
{
"PortNumber" : 5,
"Status" : "Used"
},
{
"PortNumber" : 7,
"Status" : "Used"
}
]
}
]
}
All I want is to write an Update query to change the Status of the port that it's PortNumber is 5 to "Empty".
I can update it when I know the array index of the port (here array index is 1) with this query:
db.colection.update(
// query
{
_id: ObjectId("561ad881755a021904c00fb5")
},
// update
{
$set : { "Switches.0.Ports.1.Status" : "Empty" }
}
);
But I don't know the array index of that Port.
Thanks for help.

You would normally do this using the positional operator $, as described in the answer to this question:
Update field in exact element array in MongoDB
Unfortunately, right now the positional operator only supports one array level deep of matching.
There is a JIRA ticket for the sort of behavior that you want: https://jira.mongodb.org/browse/SERVER-831
In case you can make Switches into an object instead, you could do something like this:
db.colection.update(
{
_id: ObjectId("561ad881755a021904c00fb5"),
"Switch.Ports.PortNumber": 5
},
{
$set: {
"Switch.Ports.$.Status": "Empty"
}
}
)

Since you don't know the array index of the Port, I would suggest you dynamically create the $set conditions on the fly i.e. something which would help you get the indexes for the objects and then modify accordingly, then consider using MapReduce.
Currently this seems to be not possible using the aggregation framework. There is an unresolved open JIRA issue linked to it. However, a workaround is possible with MapReduce. The basic idea with MapReduce is that it uses JavaScript as its query language but this tends to be fairly slower than the aggregation framework and should not be used for real-time data analysis.
In your MapReduce operation, you need to define a couple of steps i.e. the mapping step (which maps an operation into every document in the collection, and the operation can either do nothing or emit some object with keys and projected values) and reducing step (which takes the list of emitted values and reduces it to a single element).
For the map step, you ideally would want to get for every document in the collection, the index for each Switches and Ports array fields and another key that contains the $set keys.
Your reduce step would be a function (which does nothing) simply defined as var reduce = function() {};
The final step in your MapReduce operation will then create a separate collection Switches that contains the emitted Switches array object along with a field with the $set conditions. This collection can be updated periodically when you run the MapReduce operation on the original collection.
Altogether, this MapReduce method would look like:
var map = function(){
for(var i = 0; i < this.Switches.length; i++){
for(var j = 0; j < this.Switches[i].Ports.length; j++){
emit(
{
"_id": this._id,
"switch_index": i,
"port_index": j
},
{
"index": j,
"Switches": this.Switches[i],
"Port": this.Switches[i].Ports[j],
"update": {
"PortNumber": "Switches." + i.toString() + ".Ports." + j.toString() + ".PortNumber",
"Status": "Switches." + i.toString() + ".Ports." + j.toString() + ".Status"
}
}
);
}
}
};
var reduce = function(){};
db.centers.mapReduce(
map,
reduce,
{
"out": {
"replace": "switches"
}
}
);
Querying the output collection Switches from the MapReduce operation will typically give you the result:
db.switches.findOne()
Sample Output:
{
"_id" : {
"_id" : ObjectId("561ad881755a021904c00fb5"),
"switch_index" : 0,
"port_index" : 1
},
"value" : {
"index" : 1,
"Switches" : {
"Ports" : [
{
"PortNumber" : 2,
"Status" : "Empty"
},
{
"PortNumber" : 5,
"Status" : "Used"
},
{
"PortNumber" : 7,
"Status" : "Used"
}
]
},
"Port" : {
"PortNumber" : 5,
"Status" : "Used"
},
"update" : {
"PortNumber" : "Switches.0.Ports.1.PortNumber",
"Status" : "Switches.0.Ports.1.Status"
}
}
}
You can then use the cursor from the db.switches.find() method to iterate over and update your collection accordingly:
var newStatus = "Empty";
var cur = db.switches.find({ "value.Port.PortNumber": 5 });
// Iterate through results and update using the update query object set dynamically by using the array-index syntax.
while (cur.hasNext()) {
var doc = cur.next();
var update = { "$set": {} };
// set the update query object
update["$set"][doc.value.update.Status] = newStatus;
db.centers.update(
{
"_id": doc._id._id,
"Switches.Ports.PortNumber": 5
},
update
);
};

How to detect the re-reduce stage in MongoDB map/reduce?

I use the following map/reduce setup to collect some data into array:
map: function() { emit(this.key, [this.item]); },
reduce: function(key, values) {
var items = [];
values.forEach( function(value) {items.concat(value.item);} );
return items;
},
out: {reduce: "result_collection"}
I want to improve the code and detect if the resulting collection has been changed during the re-reduce stage (when mongo invokes reduce with the current content of the "result_collection").
In other words, how to know that any documents have been emitted by the Map contain "item" that does not exist in the "result_collection" yet (under the same key, of course)?
This information can help at some further processing stages e.g. query "result_collection" to get the documents that have been updated during the map/reduce stage.

If you must do this, use a finalize function to adjust the value after all reduction is finished. You'll have to add more logic to the reduce function to handle the modified output.
I'll show you an example with the simple map-reduce defined by the following map and reduce functions:
var map = function() { emit(this.k, this.v) }
var reduce = function(key, values) { return Array.sum(values) }
On documents that look like { "k" : 0, "v" : 1 }, the map-reduce defined by the above functions produces result documents that look like { "_id" : 0, "value" : 17 }. Define a finalize function to modify the final document:
var finalize = function (key, reducedValue) { return { "m" : true, "v" : reducedValue } }
Now modify reduce to handle an element of values that might be an object of the above form:
var reduce2 = function(key, values) {
var sum = 0;
for (var i = 0; i < values.length; i++) {
if (typeof values[i] == "object") { sum += values[i].v }
else { sum += values[i] }
}
return sum
}
Output looks like
{ "_id" : 0, "value" : { "m" : true, "v" : 14 } }
{ "_id" : 1, "value" : { "m" : true, "v" : 34 } }
{ "_id" : 2, "value" : { "m" : true, "v" : 8 } }
so you can tell what's been modified by value.m. Your further processing can set v.m to false so you'll see what hasn't been processed yet after each map-reduce.

Ordering for a todo list with MongoDB

I'm attempting to create my own todo list using Javascript, Python and MongoDB. I'm getting stuck on how to handle the task ordering.
My current idea is to have an order field in each task document and when the order changes on the client I would grab the task list from the db and reorder each task individually/sequentially. This seems awkward because large todo lists would mean large amount of queries. Is there a way to update a field in multiple documents sequentially?
I'm also looking for advice as to whether this is the best way to do this. I want to be able to maintain the todo list order but maybe I'm going about it the wrong way.
{
"_id" : ObjectId("50a658f2cace55034c68ce95"),
"order" : 1,
"title" : "task1",
"complete" : 0
}
{
"_id" : ObjectId("50a658fecace55034c68ce96"),
"order" : 2,
"title" : "task2",
"complete" : 1
}
{
"_id" : ObjectId("50a65907cace55034c68ce97"),
"order" : 3,
"title" : "task3",
"complete" : 1
}
{
"_id" : ObjectId("50a65911cace55034c68ce98"),
"order" : 4,
"title" : "task4",
"complete" : 0
}
{
"_id" : ObjectId("50a65919cace55034c68ce99"),
"order" : 5,
"title" : "task5",
"complete" : 0
}

Mongo is very very fast with queries, you should not be as concerned with performance as if you were using a full featured relational database. If you want to be prudent, just create a todo list of 1k items and try it out, it should be pretty instant.
for (var i = 0; i < orderedListOfIds.length; i++)
{
db.collection.update({ '_id': orderedListOfIds[i] }, { $set: { order:i } })
}
then
db.collection.find( { } ).sort( { order: 1 } )

Yes, mongo allows for updating multiple documents. Just use a modifier operation and multi=True. For example, this increments order by one for all documents with order greater than five:
todos.update({'order':{'$gt':5}}, {'$inc':{'order':1}}, multi=True)
As to the best way, usually it's better to use a "natural" ordering (by name, date, priority etc) rather than create a fake field just for that.

I'm doing something similar. I added a field ind to my list items. Here's how I move a list item to a new location:
moveItem: function (sourceIndex, targetIndex) {
var id = Items.findOne({ind:sourceIndex})._id;
var movinUp = targetIndex > sourceIndex;
shift = movinUp ? -1 : 1;
lowerIndex = Math.min(sourceIndex, targetIndex);
lowerIndex += movinUp ? 1 : 0;
upperIndex = Math.max(sourceIndex, targetIndex);
upperIndex -= movinUp ? 0 : 1;
console.log("Shifting items from "+lowerIndex+" to "+upperIndex+" by "+shift+".");
Items.update({ind: {$gte: lowerIndex,$lte: upperIndex}}, {$inc: {ind:shift}},{multi:true});
Items.update(id, {$set: {ind:targetIndex}});
}

if you're using native promises (es6) in mongoose mongoose.Promise = global.Promise you can do the following to batch:
function batchUpdate(res, req, next){
let ids = req.body.ids
let items = []
for(let i = 0; i < ids.length; i++)
items.push(db.collection.findOneAndUpdate({ _id:ids[i] }, { $set: { order:i } }))
Promise.all(items)
.then(() => res.status(200).send())
.catch(next)
}

How can I find elements of a MongoDB collection that are taking up a large amount of space?

If I have a collection with thousands of elements, is there a way I can easily find which elements are taking up the most space (in terms of MB)?

There's no built-in query for this, you have to iterate the collection, gather size for each document, and sort afterwards. Here's how it'd work:
var cursor = db.coll.find();
var doc_size = {};
cursor.forEach(function (x) {
var size = Object.bsonsize(x);
doc_size[x._id] = size;
});
At this point you'll have a hashmap with document ids as keys and their sizes as values.
Note that with this approach you will be fetching the entire collection over the wire. An alternative is to use MapReduce and do this server-side (inside mongo):
> function mapper() {emit(this._id, Object.bsonsize(this));}
> function reducer(obj, size_in_b) { return { id : obj, size : size_in_b}; }
>
> var results = db.coll.mapReduce(mapper, reducer, {out : {inline : 1 }}).results
> results.sort(function(r1, r2) { return r2.value - r1.value; })
inline:1 tells mongo not to create a temporary collection for results, everything will be kept in RAM.
And a sample output from one of my collections:
[
{
"_id" : ObjectId("4ce9339942a812be22560634"),
"value" : 1156115
},
{
"_id" : ObjectId("4ce9340442a812be24560634"),
"value" : 913413
},
{
"_id" : ObjectId("4ce9340642a812be26560634"),
"value" : 866833
},
{
"_id" : ObjectId("4ce9340842a812be28560634"),
"value" : 483614
},
...
{
"_id" : ObjectId("4ce9340742a812be27560634"),
"value" : 61268
}
]
>

Figured this out! I did this in two steps using Object.bsonsize():
db.myCollection.find().forEach(function(myObject) {
db.objectSizes.save({object_id: object._id, size: Object.bsonsize(chain)});
});
db.objectSizes.find().sort({size: -1}).limit(5).pretty();

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Struggling with a map-reduce in MongoDb - mongodb

I think the 'this' variable is not what you expect in the .forEach() function in your map method. Try this instead; function() { var row = this; this.Answers.forEach(function(val) { emit(row._id, val.Score); }); }

Related

how to speed data reading in Mongo DB

MongoDB: Update a field of an item in array with matching another field of that item

How to detect the re-reduce stage in MongoDB map/reduce?

Ordering for a todo list with MongoDB

How can I find elements of a MongoDB collection that are taking up a large amount of space?

Categories

Resources