Map Reduce Mongo DB: Sum of ODD and EVEN numbers with elements - mongodb

I am trying to process a number series ( collection ) get sum of odd / even numbers separately along with elements considered for calculations of each.
The numberseries document structure is as follows:
{
_id: <Autogenerated>,
number: <any number, it can repeat. Even if it repeats, it should be added each time. >
}
The output is something like below( not exact but in general )
{
..
{
"odd":<result>, elements:{n1,n3,n5}
},
{
"even":<result>, elements:{n2,n4,n6}
}
..
}
Map Function:
mapf = function(){
var value = { sum : 0, elements :[] };
value.sum = this.number;
value.elements.push(this.number);
print(tojson(value));
if( this.number % 2 != 0 ){
emit( "odd", value );
}
if( this.number % 2 == 0 ){
emit( "even", value );
}
}
Reduce Values argument:
Values is an array of JSON emitted from map:
[{
"sum": 1,
"elements": [1]
}, {
"sum": 3,
"elements": [3]
} ... ]
Reduce Function:
reducef = function(key, values){
var result = { sum : 0 , elements:[] };
print("K " + key +"Values array " + tojson(values) );
for(var i = 0; i<values.length;i++ ){
v = values[i];
print("Key "+key+"V.JSON"+tojson(v)+" V.SUM -> "+v.sum);
result.sum += v.sum;
result.elements.push(v.elements[0]);
print(tojson(result));
}
return result;
}
I am getting sum correctly, but the elements array is not properly getting populated. It is containing only some of the elements considered for calculations.
UPDATE
As per the answer given by Neil, I further verified my code. I found that my code, without any modification, works for small dataset, but does not work for large data-set.
Below are points which I have verified as pointed out, I found my code to be correct.
print("K " + key +"Values array " + tojson(values) );
Above line in reduce function results in following values object printed.
[{
"sum": 1,
"elements": [1]
}, {
"sum": 3,
"elements": [3]
}, {
"sum": 5,
"elements": [5]
}, {
"sum": 7,
"elements": [7]
}, {
"sum": 9,
"elements": [9]
}, {
"sum": 11,
"elements": [11]
}, {
"sum": 13,
"elements": [13]
}, {
"sum": 15,
"elements": [15]
}, {
"sum": 17,
"elements": [17]
}, {
"sum": 19,
"elements": [19]
}]
Hence the line to push elements to array in final results result.elements.push(v.elements[0]); should be correct.
In map function, before emitting, I am modifying value.sum as follows
value.sum = this.number;
This ensures that sum is not zero and numbers are properly getting added due to this.
When I test this code with 20 records, 40 records, 100 records, it works perfectly.
When I test this code with 20000 records, the sum value is correct but the element array
does not contain 10000 elements each( Odd and even numbers are equally distributed in collection ) .
In later case, I get below message:
query not recording (too large)

Okay, there is a clear reason and you do appear to have read some of the documentation and at least applied this rule:
"the type of the return object must be identical to the type of the value emitted by the map function ..."
And by that this means that both the map function and the reduce function essentially have the same output, which you did:
{ sum : 0, elements :[] };
But there was a piece of documentation that has not been understood:
"MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key."
So where the whole thing goes wrong is that you have assumed that since your "map" function only emits one element, that then there will be only one element in the "elements" array. A careful re-read of the above says that this is not true. And in fact the output from "reduce" will very likely be fed back into the "reduce" function again. This is indeed how mapReduce deals with a large number of values for the "values" array.
To fix it, change this in the "reduce" function:
result.elements.push(v.elements[0]);
To this:
v.elements.forEach(function(element) {
result.elements.push(element);
}
And in that way, when the "reduce" function returns a result that has summed up a few "elements" already and pushed them to the list, then that "input" will be processed correctly and merged with any other "values" that come in with it.
BTW. I Think you actually meant this in your mapper:
var value = { sum : 1, elements :[] };
Otherwise this code down here would just be summing 0's:
result.sum += v.sum;
But aggregate does this better
All of that said the following aggregation framework statement does the same thing but better and faster with an implementation in native code:
db.collection.aggregate([
{ "$project": {
"type": { "$cond": [
{ "$eq": [ { "$mod": [ "$number", 2 ] }, 0 ] },
"even",
"odd"
]},
"number": 1
}},
{ "$group": {
"_id": "$type",
"sum": { "$sum": 1 },
"elements": { "$push": "$number" }
}}
])
And also note that in both cases you are not really "summing the elements", but rather "counting" them. So if your want the sum then the mapReduce part becomes:
//result.sum += v.sum;
v.elements.forEach(function(element) {
result.sum += element;
result.elements.push(element);
}
And the aggregate part becomes:
{ "$group": {
"_id": "$type",
"sum": { "$sum": "$number" },
"elements": { "$push": "$number" }
}}
Which truly sums the "odd" or "even" numbers as found in your collection.

Related

Ensuring exactly N items with value X remain in an array with mongodb

Assuming we have a document in my MongoDB collection like the following:
{
"_id": "coffee",
"orders": [ "espresso", "cappuccino", "espresso", ... ],
}
How do I use a single update statement that ensures there are exactly say 2 espressos in this document, without knowing how many there are to begin with?
I know that using 2 consecutive statements I can do
db.test.update(
{ _id: "coffee" },
{ "$pull": { "orders": "espresso" } }
);
followed by
db.test.update(
{ "_id": "coffee" },
{ "$push": { "orders": { "$each": ["espresso", "espresso"] } } }
);
But when combining both into a single statement, MongoDB balks with an error 40, claiming Updating the path 'orders' would create a conflict at 'orders' (understandable enough - how does MongoDB what to do first?).
So, how can I do the above in a single statement? Please note that since I'll be using the above in the context of a larger unordered bulk operation, combining the above in an ordered bulk operation won't work.
Thanks for your help!

Searching with Precedence on Array Order

My gut feeling is that the answer is no, but is it possible to perform a search in Mongodb comparing the similarity of arrays where order is important?
E.g.
I have three documents like so
{'_id':1, "my_list": ["A",2,6,8,34,90]},
{'_id':2, "my_list": ["A","F",2,6,19,8,90,55]},
{'_id':3, "my_list": [90,34,8,6,3,"A"]}
1 and 2 are the most similar, 3 is wildly different irrespective of the fact it contains all of the same values as 1.
Ideally I would do a search similar to {"my_list" : ["A",2,6,8,34,90] } and the results would be document 1 and 2.
It's almost like a regex search with wild cards. I know I can do this in python easily enough, but speed is important and I'm dealing with 1.3 million documents.
Any "comparison" or "selection" is actually more or less subjective to the actual logic applied. But as a general principle you could always consider the product of the matched indices from the array to test against and the array present in the document. For example:
var sample = ["A",2,6,8,34,90];
db.getCollection('source').aggregate([
{ "$match": { "my_list": { "$in": sample } } },
{ "$addFields": {
"score": {
"$add": [
{ "$cond": {
"if": {
"$eq": [
{ "$size": { "$setIntersection": [ "$my_list", sample ] }},
{ "$size": { "$literal": sample } }
]
},
"then": 100,
"else": 0
}},
{ "$sum": {
"$map": {
"input": "$my_list",
"as": "ml",
"in": {
"$multiply": [
{ "$indexOfArray": [
{ "$reverseArray": "$my_list" },
"$$ml"
]},
{ "$indexOfArray": [
{ "$reverseArray": { "$literal": sample } },
"$$ml"
]}
]
}
}
}}
]
}
}},
{ "$sort": { "score": -1 } }
])
Would return the documents in order like this:
/* 1 */
{
"_id" : 1.0,
"my_list" : [ "A", 2, 6, 8, 34, 90],
"score" : 155.0
}
/* 2 */
{
"_id" : 2.0,
"my_list" : ["A", "F", 2, 6, 19, 8, 90, 55],
"score" : 62.0
}
/* 3 */
{
"_id" : 3.0,
"my_list" : [ 90, 34, 8, 6, 3, "A"],
"score" : 15.0
}
The key being that when applied using $reverseArray, the values from $indexOfArray will be "larger" produced by the matching index on order from "first to last" ( reversed ) which gives a larger "weight" to matches at the beginning of the array than those as it moves towards the end.
Of course you should make consideration for things like the second document does in fact contain "most" of the matches and have more array entries would place a "larger" weight on the initial matches than in the first document.
From the above "A" scores more in the second document than in the first because the array is longer even though both matched "A" in the first position. However there is also some effect that "F" is a mismatch and therefore has a greater negative effect than it would if it was later in the array. Same applies to "A" in the last document, where at the end of the array the match has little bearing on the overall weight.
The counter to this in consideration is to add some logic to consider the "exact match" case, such as here the $size comparison from the $setIntersection of the sample and the current array. This would adjust the scores to ensure that something that matched all provided elements actually scored higher than a document with less positional matches, but more elements overall.
With a "score" in place you can then filter out results ( i.e $limit ) or whatever other logic you can apply in order to only return the actual results wanted. But the first step is calculating a "score" to work from.
So it's all generally subjective to what logic actually means a "nearest match", but the $reverseArray and $indexOfArray operations are generally key to putting "more weight" on the earlier index matches rather than the last.
Overall you are looking for "calculation" of logic. The aggregation framework has some of the available operators, but which ones actually apply are up to your end implementation. I'm just showing something that "logically works" to but more weight on "earlier matches" in an array comparison rather than "latter matches", and of course the "most weight" where the arrays are actually the same.
NOTE: Similar logic could be achieved using the includeArrayIndex option of $unwind for earlier version of MongoDB without the main operators used above. However the process does require usage of $unwind to deconstruct arrays in the first place, and the performance hit this would incur would probably negate the effectiveness of the operation.

How to convert this map reduce in aggregate framework?

i've still done this map/reduce/finalize function using mongoDB.
This is how i need that mongoDB executes that aggregation:
db.house_results.mapReduce(function(){
emit(this.house_name.toLowerCase(),this);
},function(key,values){
var house = {name:key,address:"",description:"",photo:[],lat:0,lng:0,rooms:[]};
values.forEach(function(house_val) {
/*Address*/
if(house.address=="")
house.address = house_val.house_address;
/*Photo*/
if(!house_val.photo in house.photo)
house.photo.push(house_val.house_photo);
/*Description*/
if(house.description=="")
house.description = house_val.house_description;
/*LAT - LNG*/
if(house.lat==0 || house.lng==0){
var house_position = house_val.house_position;
if(house_position && house_position.lat && house_position.lng){
house.lat = house_position.lat;
house.lng = house_position.lng;
}
}
if(house.lat==0 || house.lng==0){
if(house_val.house_lat && house_val.house_lng){
house.lat = house_val.house_lat;
house.lng = house_val.house_lng;
}
}
if(house_val.rooms)
house.rooms.push(house_val.rooms);
});
return house;
},
{
out : "map_reduce_house_test",
finalize:function(key,house_val){
if(house_val.address==undefined){ // JUST ONE RESULT IN MAP FUNCTION -> REDUCE FUNCTION IS IGNORED -> FINALIZE IS SOLUTION
var house = {name:key,address:"",description:"",photo:[],lat:0,lng:0,rooms:[]};
/*Address*/
if(house.address=="")
house.address = house_val.house_address;
/*Photo*/
if(!house_val.photo in house.photo)
house.photo.push(house_val.house_photo);
/*Description*/
if(house.description=="")
house.description = house_val.house_description;
/*LAT - LNG*/
if(house.lat==0 || house.lng==0){
var house_position = house_val.house_position;
if(house_position && house_position.lat && house_position.lng){
house.lat = house_position.lat;
house.lng = house_position.lng;
}
}
if(house.lat==0 || house.lng==0){
if(house_val.house_lat && house_val.house_lng){
house.lat = house_val.house_lat;
house.lng = house_val.house_lng;
}
}
if(house_val.rooms)
house.rooms.push(house_val.rooms);
return house;
}else
return house_val;
}
}
);
Is there a way to simplify that functions and/or is better to do the same with aggregation mongodb's function?
Which could be the fastest and simplier method?
Thanks!
There isn't really much going on in this mapReduce other than taking the first values from various fields for the common grouping key and otherwise pushing some other values onto arrays.
Therefore everything is very much the same for aggregation:
db.house_results.aggregate([
{ "$group": {
"_id": { "$toLower": "$house_name" },
"name": { "$first": { "$toLower": "$house_name" } },
"photo": { "$push": "$house_photo" },
"address": { "$first": "$house_address" },
"description": { "$first": "$house_description" },
"lat": {
"$max": {
"$cond": [
{ "$gt": [ "$house_lat", "$house_position.lat" } },
"$house_lat",
"$house_position.lat"
}
},
"lng": {
"$max": {
"$cond": [
{ "$gt": [ "$house_lng", "$house_position.lng" } },
"$house_lng",
"$house_position.lng"
}
},
"rooms": { "$push": "$house_rooms" }
}}
])
The only real difference there is the conditional handling of the "lat" and "lng" output using primarily the $cond operator.
Noting that "_id" and "name" have the same thing in them, but that is what the map reduce is doing.
Take a good look at the aggregation operators for reference, but really your data should look like this rather than it's present form, which appears to be a de-normalized dump from somewhere.
Also for reference, It probably isn't affecting you in this case, but this is the wrong way to write a mapReduce. The output from the "map" function is different to that from the "reduce" function, notably the arrays.
Even though these will only have one element in them they "should" be emitted as an array element from the "map" function as well and treated as if they where already an array element by the "reduce" function.
This is because with larger "grouping", not all matching key values are sent into the reduce function at once, and the reducer can be called to combine other values emitted by "map" to the "reduce" function with previously reduced output. That is how large data is handling, and with arrays you run the risk of output like this, with the un-expected embedding of an array within an array:
[ [4,5,6], 7, 8, 9 ]
But this is covered in the documentation where you read carefully.
At any rate, the aggregation pipeline ( one stage) will perform much faster than the present operation. But really change your data as soon as possible.

Conditional $inc in a nested MongoDB array

My database looks like this:
{
_id: 1,
values: [ 1, 2, 3, 4, 5 ]
},
{
_id: 2,
values: [ 2, 4, 6, 8, 10 ]
}, ...
I'd like to update every value in every document's nested array ("values") that meets some criterion. For instance, I'd like to increment every value that's >= 4 by one, which ought to yield:
{
_id: 1,
values: [ 1, 2, 3, 5, 6 ]
},
{
_id: 2,
values: [ 2, 5, 7, 8, 11 ]
}, ...
I'm used to working with SQL, where the nested array would be a seperated table connected with a unique ID. I'm a little lost in this new NoSQL world.
Thank you kindly,
This sort of update is not really possible using nested arrays, the reason for this is given in the positional $ operator documentation, and that states that you can only match the first array element for a given condition in the query.
So a statement like this:
db.collection.update(
{ "values": { "$gte": 4 } },
{ "$inc": { "values.$": 1 } }
)
Will not work in the sense that only the "first" array element that was matched would be incremented. So on your first document you would get this:
{ "_id" : 1, "values" : [ 1, 2, 3, 6, 6 ] }
In order to update the values as you are suggesting you would need to iterate the documents and the array elements to produce the result:
db.collecction.find({ "values": { "$gte": 4 } }).forEach(function(doc) {
for ( var i=0; i < doc.values.length; i++ ) {
if ( doc.values[i] >= 4 ) {
doc.values[i]++;
}
}
db.collection.update(
{ "_id": doc._id },
{ "$set": { "values": doc.values } }
);
})
Or whatever code equivalent of that basic concept.
Generally speaking, this sort of update does not lend itself well to a structure that contains elements in an array. If that is really your need, then the elements are better off listed within a separate collection.
Then again, the presentation of this question is more of a "hypothetical" situation without understanding your actual use case for performing this sort of udpate. So if you possibly described what you actually need to do and how your data really looks in another question, then that might get a more meaningful response in terms of the best approach for you to use.

MongoDB update all fields of array error

Im tring to set 0 the items.qty of a document obtains by a id query.
db.warehouses.update(
// query
{
_id:ObjectId('5322f07e139cdd7e31178b78')
},
// update
{
$set:{"items.$.qty":0}
},
// options
{
"multi" : true, // update only one document
"upsert" : true // insert a new document, if no existing document match the query
}
);
Return:
Cannot apply the positional operator without a corresponding query field containing an array.
This is the document that i want to set all items.qty to 0
{
"_id": { "$oid" : "5322f07e139cdd7e31178b78" },
"items": [
{
"_id": { "$oid" : "531ed4cae604d3d30df8e2ca" },
"brand": "BJFE",
"color": "GDRNCCD",
"hand": 1,
"model": 0,
"price": 500,
"qty": 0,
"type": 0
},
{
"brand": "BJFE",
"color": "GDRNCCD",
"hand": 1,
"id": "23",
"model": 0,
"price": 500,
"qty": 4,
"type": 0
},
{
"brand": "BJFE",
"color": "GDRNCCD",
"hand": 1,
"id": "3344",
"model": 0,
"price": 500,
"qty": 6,
"type": 0
}
],
"name": "a"
}
EDIT
The detail missing from the question was that the required field to update was actually in a sub-document. This changes the answer considerably:
This is a constraint of what you can possibly do with updating array elements. And this is clearly explained in the documentation. Mostly in this paragraph:
The positional $ operator acts as a placeholder for the first element that matches the query document
So here is the thing. Trying to update all of the array elements in a single statement like this will not work. In order to do this you must to the following.
db.warehouses.find({ "items.qty": { "$gt": 0 } }).forEach(function(doc) {
doc.items.forEach(function(item) {
item.qty = 0;
});
db.warehouses.update({ "_id": doc._id }, doc );
})
Which is basically the way to update every array element.
The multi setting in .update() means across multiple "documents". It cannot be applied to multiple elements of an array. So presently the best option is to replace the whole thing. Or in this case we may just as well replace the whole document since we need to do that anyway.
For real bulk data, use db.eval(). But please read the documentation first:
db.eval(function() {
db.warehouses.find({ "items.qty": { "$gt": 0 } }).forEach(function(doc) {
doc.items.forEach(function(item) {
item.qty = 0;
});
db.warehouses.update({ "_id": doc._id }, doc );
});
})
Updating all the elements in an array across the whole collection is not simple.
Original
Pretty much exactly what the error says. In order to use a positional operator you need to match something first. As in:
db.warehouses.update(
// query
{
_id:ObjectId('5322f07e139cdd7e31178b78'),
"items.qty": { "$gt": 0 }
},
// update
{
$set:{"items.$.qty":0}
},
// options
{
"multi" : true,
"upsert" : true
}
);
So where the match condition fins the position of the items that are less than 0 then that index is passed to the positional operator.
P.S : When muti is true it means it updates every document. Leave it false if you only mean one. Which is the default.
You can use the $ positional operator only when you specify an array in the first argument (i.e., the query part used to identify the document you want to update).
The positional $ operator identifies an element in an array field to update without explicitly specifying the position of the element in the array.