MongoDB map/reduce counts - mongodb

The output from MongoDB's map/reduce includes something like 'counts': {'input': I, 'emit': E, 'output': O}. I thought I clearly understand what those mean, until I hit a weird case which I can't explain.
According to my understanding, counts.input is the number of rows that match the condition (as specified in query). If so, how is it possible that the following two queries have different results?
db.mycollection.find({MY_CONDITION}).count()
db.mycollection.mapReduce(SOME_MAP, SOME_REDUCE, {'query': {MY_CONDITION}}).counts.input
I thought the two should always give the same result, independent of the map and reduce functions, as long as the same condition is used.

The map/reduce pattern is like a group function in SQL. So there are grouping some result in one row. So your can't have same number of result.
The count in mapReduce() method is the number of result after the map/reduce function.
By example. You have 2 rows :
{'id':3,'num':5}
{'id':4,'num':5}
And you apply the map function
function(){
emit(this.num, 1);
}
After this map function you get 2 rows:
{5, 1}
{5, 1}
And now you apply your reduce method :
function(k,vals) {
var sum=0;
for(var i in vals) sum += vals[i];
return sum;
}
You have now only 1 row return :
2

Is your server steady-state in between the two calls?

Related

Selecting data from MongoDB where K of N criterias are met

I have documents with four fields: A, B, C, D Now I need to find documents where at least three fields matches. For example:
Query: A=a, B=b, C=c, D=d
Returned documents:
a,b,c,d (four of four met)
a,b,c (three of four met)
a,b,d (another three of four met)
a,c,d (another three of four met)
b,c,d (another three of four met)
So far I created something like:
`(A=a AND B=b AND C=c)
OR (A=a AND B=b AND D=d)
OR (A=a AND C=c AND D=d)
OR (B=b AND C=c AND D=d)`
But this is ugly and error prone.
Is there a better way to achieve it? Also, query performance matters.
I'm using Spring Data but I believe it does not matter. My current code:
Criteria c = new Criteria();
Criteria ca = Criteria.where("A").is(doc.getA());
Criteria cb = Criteria.where("B").is(doc.getB());
Criteria cc = Criteria.where("C").is(doc.getC());
Criteria cd = Criteria.where("D").is(doc.getD());
c.orOperator(
new Criteria().andOperator(ca,cb,cc),
new Criteria().andOperator(ca,cb,cd),
new Criteria().andOperator(ca,cc,cd),
new Criteria().andOperator(cb,cc,cd)
);
Query query = new Query(c);
return operations.find(query, Document.class, "documents");
Currently in MongoDB we cannot do this directly, since we dont have any functionality supporting Permutation/Combination on the query parameters.
But we can simplify the query by breaking the condition into parts.
Use Aggregation pipeline
$project with records (A=a AND B=b) --> This will give the records which are having two conditions matching.(Our objective is to find the records which are having matches for 3 out of 4 or 4 out of 4 on the given condition)`
Next in the pipeline use OR condition (C=c OR D=d) to find the final set of records which yields our expected result.
Hope it Helps!
The way you have it you have to do all permutations in your query. You can use the aggregation framework to do this without permuting all combinations. And it is generic enough to do with any K. The downside is I think you need Mongodb 3.2+ and also Spring Data doesn't support these oparations yet: $filter $concatArrays
But you can do it pretty easy with the java driver.
[
{
$project:{
totalMatched:{
$size:{
$filter:{
input:{
$concatArrays:[ ["$A"], ["$B"], ["$C"],["$D"]]
},
as:"attr",
cond:{
$eq:["$$attr","a"]
}
}
}
}
}
},
{
$match:{
totalMatched:{ $gte:3 }
}
}
]
All you are doing is you are concatenating the values of all the fields you need to check in a single array. Then select a subset of those elements that are equal to the value you are looking for (or any condition you want for that matter) and finally getting the size of that array for each document.
Now all you need to do is to $match the documents that have a size of greater than or equal to what you want.

Strange object value instead of float by using mapReduce in mongodb with Doctrine

I use mongo query for calculating sum price for every item.
My query looks like so
$queryBuilder = new Query\Builder($this, $documentName);
$queryBuilder->field('created')->gte($startDate);
$queryBuilder->field('is_test_value')->notEqual(true);
..........
$queryBuilder->map('function() {emit(this.item, this.price)}');
$queryBuilder->reduce('function(item, valuesPrices) {
return {sum: Array.sum(valuesPrices)}
}');
And this works, no problem. But I found that in some cases (approximately 20 cases from 200 results) I have strange result in field sum - instead of sum value I see construction like
[objectObject]444444444444444
4 - is price for item.
I tried to replace reduce block to block like this:
var sum = 0;
for (var i = 0; i < valuesPrices.length; i++) {
sum += parseFloat(valuesPrices[i]);
}
return {sum: sum}
In that case I see NAN value.
I suspected that some data in field price was inserted incorrectly (not as float, but as string, object etc). I tried execute my query from mongo cli and I see that all price values are integer.
It's not "strange" at all. You "broke the rules" and now you are paying for it.
"MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key."
The primary rule of mapReduce (as cited ) is that you must return exactly the same structure from the "reducer" as you do from the "mapper". This is because the "reducer" can actually run several times for the same "key". This is how mapReduce processes large lists.
You fix this by just returning a singular value, just like you did in the emit:
return Array.sum(values);
And then there will not be a problem. Adding an object key to that makes the data inconsistent, and thus you get an error when the "reduced" result gets fed back into the "reducer" again.

Number of items in the aggregation with MongoDB 2.6

My query looks like that:
var x = db.collection.aggregate(...);
I want to know the number of items in the result set. The documentation says that this function returns a cursor. However it contains far less methods/fields than when using db.collection.find().
for (var k in x) print(k);
Produces
_firstBatch
_cursor
hasNext
next
objsLeftInBatch
help
toArray
forEach
map
itcount
shellPrint
pretty
No count() method! Why is this cursor different from the one returned by find()? itcount() returns some type of count, but the documentation says "for testing only".
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count, like that:
var cnt = x.hasNext() ? x.next().cnt : 0;
Is there a more straight forward way to get this count? As in db.collection.find(...).count()?
Barno's answer is correct to point out that itcount() is a perfectly good method for counting the number of results of the aggregation. I just wanted to make a few more points and clear up some other points of confusion:
No count() method! Why is this cursor different from the one returned by find()?
The trick with the count() method is that it counts the number of results of find() on the server side. itcount(), as you can see in the code, iterates over the cursor, retrieving the results from the server, and counts them. The "it" is for "iterate". There's currently (as of MongoDB 2.6), no way to just get the count of results from an aggregation pipeline without returning the cursor of results.
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count
Yes. This is a reasonable way to get the count of results and should be more performant than itcount() since it does the work on the server and does not need to send the results to the client. If the point of the aggregation within your application is just to produce the number of results, I would suggest using the $group stage to get the count. In the shell and for testing purposes, itcount() works fine.
Where have you read that itcount() is "for testing only"?
If in the mongo shell I do
var p = db.collection.aggregate(...);
printjson(p.help)
I receive
function () {
// This is the same as the "Cursor Methods" section of DBQuery.help().
print("\nCursor methods");
print("\t.toArray() - iterates through docs and returns an array of the results")
print("\t.forEach( func )")
print("\t.map( func )")
print("\t.hasNext()")
print("\t.next()")
print("\t.objsLeftInBatch() - returns count of docs left in current batch (when exhausted, a new getMore will be issued)")
print("\t.itcount() - iterates through documents and counts them")
print("\t.pretty() - pretty print each document, possibly over multiple lines")
}
If I do
printjson(p)
I find that
"itcount" : function (){
var num = 0;
while ( this.hasNext() ){
num++;
this.next();
}
return num;
}
This function
while ( this.hasNext() ){
num++;
this.next();
}
It is very similar var cnt = x.hasNext() ? x.next().cnt : 0; And this while is perfect for count...

search in limited number of record MongoDB

I want to search in the first 1000 records of my document whose name is CityDB. I used the following code:
db.CityDB.find({'index.2':"London"}).limit(1000)
but it does not work, it return the first 1000 of finding, but I want to search just in the first 1000 records not all records. Could you please help me.
Thanks,
Amir
Note that there is no guarantee that your documents are returned in any particular order by a query as long as you don't sort explicitely. Documents in a new collection are usually returned in insertion order, but various things can cause that order to change unexpectedly, so don't rely on it. By the way: Auto-generated _id's start with a timestamp, so when you sort by _id, the objects are returned by creation-date.
Now about your actual question. When you first want to limit the documents and then perform a filter-operation on this limited set, you can use the aggregation pipeline. It allows you to use $limit-operator first and then use the $match-operator on the remaining documents.
db.CityDB.aggregate(
// { $sort: { _id: 1 } }, // <- uncomment when you want the first 1000 by creation-time
{ $limit: 1000 },
{ $match: { 'index.2':"London" } }
)
I can think of two ways to achieve this:
1) You have a global counter and every time you input data into your collection you add a field count = currentCounter and increase currentCounter by 1. When you need to select your first k elements, you find it this way
db.CityDB.find({
'index.2':"London",
count : {
'$gte' : currentCounter - k
}
})
This is not atomic and might give you sometimes more then k elements on a heavy loaded system (but it can support indexes).
Here is another approach which works nice in the shell:
2) Create your dummy data:
var k = 100;
for(var i = 1; i<k; i++){
db.a.insert({
_id : i,
z: Math.floor(1 + Math.random() * 10)
})
}
output = [];
And now find in the first k records where z == 3
k = 10;
db.a.find().sort({$natural : -1}).limit(k).forEach(function(el){
if (el.z == 3){
output.push(el)
}
})
as you see your output has correct elements:
output
I think it is pretty straight forward to modify my example for your needs.
P.S. also take a look in aggregation framework, there might be a way to achieve what you need with it.

Mongoid aggregate methods on embedded docs?

How can I run aggregate, min, max, sum and friends on embedded docs?
For example:
Get the average cost of ALL events that a district has, where they are pretty deeply embedded.
District.schools.all.events.all.costs.avg(:value)
Obviously doesn't work.
District.avg('schools.events.costs.value')
Neither does that.
It gives this error message:
Mongo::OperationFailure: Database command 'group' failed: (errmsg: 'exception: reduce
invoke failed: JS Error: TypeError: obj.schools
has no properties reduce setup:1'; code: '9010'; ok: '0.0').
So is it possible or do I need to write my own map/reduce functions?
Yes, MapReduce would work. You could also use cursors to process a query result. Like:
min = 99999999;
max = -99999999;
sum = 0;
count = 0
db.School.find({}).forEach(function(s) {
if (s.first.events.first.cost < min)
min = s.first.events.first.cost;
if (s.first.events.first.cost > max)
max = s.first.events.first.cost;
sum += s.first.events.first.cost;
++count;
});
You now have the min and max and can calculate the average and mean from the sum and count.
Mongodb does not have the ability to calculate the aggregate functions in its query language directly. Actually, that statement is not entirely true, since there is the count() function to count the number of results returned by a query, and there is the group() function. But the group function is a lot like a MapReduce, and cannot be used on sharded databases. If you are interested in the group function, see: http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group