Find aggregation on arbitrary number of different keys - mongodb

I have a collection in mongoDB which looks like,
collection:
doc1
{ field1 : {
field1_1 : 'val1',
field1_2 : 'val2',
field1_3 : 'val3',
...
field1_N : 'valN' } }
doc2
{ field1 : {
field1_1 : 'val1',
field1_2 : 'val2',
field1_3 : 'val3',
...
field1_N : 'valN' } }
I want to find aggregation(sum, avg, min, max) on val1, val2, val3 ... valN. Is there any way to use mongo's aggregation feature? The keys are always different, and the aggregation should happen for all the values of field1
Edited:
The final output should look like,
doc1
{ field1 : {
sum: sumOf(val1, val2... valN),
avg: avgOf(val1, val2... valN)
... } }
doc2
{ field1 : {
sum: sumOf(val1, val2... valN),
avg: avgOf(val1, val2... valN)
... } }

can try this one using Map-Reduce instead of aggregate for your requirement. map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.
var mapFunction =
function() {
for (key in this.field1) {
emit(this._id, parseInt(this.field1[key]));
}
};
var reduceFunction =
function(key, values) {
return {sum:Array.sum(values), avg:Array.avg(values)};
};
db.getCollection('collectionName').mapReduce(mapFunction, reduceFunction, {out: {inline:1}});
As far I know if your document structure would be like :
field1 : [
{field1 : 'val1'},
{field1 : 'val2'},
{field1 : 'val3'},
...
{field1 : 'valN'}]
then you could solved easily by using aggregate. so for your structure mapReducemay better.

I'm not a mongodb ninja by any means, but I've a solution for your question. It's not the most efficient one, but if you are royally stuck, you can use it.
So the code below, creates a projection, of the avg data of the fields that you want to aggregate (avg) on, the down side to this is that you have to specify each field, rather than maybe a way of iterating through all fields and aggregating on that. You can definitely do iterations using JS, I've briefly come across it in the past, but it might take a small bit more tweaking than the code I've provided below.
Also, I've assumed that, field1.field1_1 : "int". Is an integer and not a string value.
db.random.aggregate(
[
{
$project:
{
_id: "$field1",
avgAmount: { $avg: ["$field1.field1_1", "$field1.field1_2", "$field1.field1.3"]}
}
}]
)
Best of luck anyway #1love.

You can try $sum for sum and $multiply for multiplication.. you can find all the aggregate functions available for MongoDb at the following link:
https://docs.mongodb.com/manual/reference/operator/aggregation/sum/
hope this helps..

Related

Replace array values inside a document

I have a collection which looks something like this:
{
paymentType: [1,2]
}
Using aggregation framework, I'm looking for a way to replace the values with certain string, for example 1 = A, 2 = B so that the final result look slike that:
{
paymentType: ['A','B']
}
I'm also using mongodb 2.4.
Please help,
Thanks!
The aggregation framework will never change the collection is acts on. The aggregation pipeline produces new documents (which are usually temporary and just piped out to the client). The simplest way to do this is just to grab each doc and replace the values, then update the field. Here's some code that has all the basic ideas:
> db.stuff.find({}, { "_id" : 1, "paymentType" : 1 }).forEach(function(doc) {
var new_paymentType = []
doc.paymentType.forEach(function(val) {
if (val === 1) new_paymentType.push("A")
if (val === 2) new_paymentType.push("B")
})
db.stuff.update({ "_id" : doc._id }, { "$set" : { "paymentType" : new_paymentType } })
})

In Mongo, how do I only display documents with the highest value for a key that they share?

Say I have the following four documents in a collection called "Store":
{ item: 'chair', modelNum: 1154, votes: 75 }
{ item: 'chair', modelNum: 1152, votes: 16 }
{ item: 'table', modelNum: 1017, votes: 24 }
{ item: 'table', modelNum: 1097, votes: 52 }
I would like to find only the documents with the highest number of votes for each item type.
The result of this simple example would return modelNum: 1154 and modelNum: 1097. Showing me the most popular model of chair and table, based on the customer inputed vote score.
What is the best way write this query and sort them by vote in descending order? I'm developing using meteor, but I don't think that should have an impact.
Store.find({????}).sort({votes: -1});
You can use $first or $last aggregation operators to achieve what you want. These operators are only useful when $group follows $sort. An example using $first:
db.collection.aggregate([
// Sort by "item" ASC, "votes" DESC
{"$sort" : {item : 1, votes : -1}},
// Group by "item" and pick the first "modelNum" (which will have the highest votes)
{"$group" : {_id : "$item", modelNum : {"$first" : "$modelNum"}}}
])
Here's the output:
{
"result" : [
{
"_id" : "table",
"modelNum" : 1097
},
{
"_id" : "chair",
"modelNum" : 1154
}
],
"ok" : 1
}
If you are looking to do this in Meteor and on the client I would just use an each loop and basic find. Minimongo keeps the data in memory so I don't think additional find calls are expensive.
like this:
Template.itemsList.helpers({
items: function(){
var itemNames = Store.find({}, {fields: {item: 1}}).map(
function( item ) { return item.item; }
);
var itemsMostVotes = _.uniq( itemNames ).map(
function( item ) {
return Store.findOne({item: item}, {sort: {votes: -1}});
}
);
return itemsMostVotes;
}
});
I have switched to findOne so this returns an array of objects rather than a cursor as find would. If you really want the cursor then you could query minimongo with the _ids from itemMostVotes.
You could also use the underscore groupBy and sortBy functions to do this.
You would need to use the aggregation framework.
So
db.Store.aggregate(
{$group:{_id:"$item", "maxVotes": {$max:"$votes"}}}
);

How to get total record fields from mongodb by using $group?

I wants to get records with all fields using $group in mongodb.
i.e: SELECT * FROM users GROUP BY state, equivalent query in mongodb.
Can any one help me.
AFAIK there is no way to return all object in group query. You can use $addToSet operator to add fields into the array to return. The example code is shown in the below. You can add all the fields to the array using addToSet operator. It will return array as a response and you should get data from those array(s).
db.users.aggregate({$group : {_id : "$state", id : {$addToSet : "$_id"}, field1 : {$addToSet : "$field1"}}});
You can do it with MapReduce instead:
db.runCommand({
mapreduce: 'tests',
map: function() {
return emit(this.state, { docs: [this] });
},
reduce: function(key, vals) {
var res = vals.pop();
vals.forEach(function(val) {
[].push.apply(res.docs, val.docs);
});
return res;
},
finalize: function(key, reducedValue) {
return reducedValue.docs;
},
out: { inline: 1 }
})
I'm using finalize function in my example because MapReduce not supports arrays in reducedValue.
But, regardless of the method, you should try to avoid such queries in productions. They are fine for rare requests, like analytics, db migrations or daily scripting, but not for frequent ones.

MongoDB: Iterate over collection by key?

How can I iterate over all documents matching each value of a specified key in a MongoDB collection?
E.g. for a collection containing:
{ _id: ObjectId, keyA: 1 },
{ _id: ObjectId, keyA: 2 },
{ _id: ObjectId, keyA: 2 },
...with an index of { keyA: 1 }, how can I run an operation on all documents where keyA:1, then keyA:2, and so on?
Specifically, I want to run a count() of the documents for each keyA value. So for this collection, the equivalent of find({keyA:1}).count(), find({keyA:2}).count(), etc.
UPDATE: whether or not the keys are indexed is irrelevant in terms of how they're iterated, so edited title and description to make Q/A easier to reference in the future.
A simpler approach to get the grouped count of unique values for keyA would be to use the new Aggregation Framework in MongoDB 2.2:
eg:
db.coll.aggregate(
{ $group : {
_id: "$keyA",
count: { $sum : 1 }
}}
)
... returns a result set where each _id is a unique value for keyA, with the count of how many times that value appears:
{
"result" : [
{
"_id" : 2,
"count" : 2
},
{
"_id" : 1,
"count" : 1
}
],
"ok" : 1
}
I am not sure I get you here but is this what you are looking for:
db.mycollection.find({ keyA: 1 }).count()
Will count all keys with keyA being 1.
If that does not answer the question do think you can be a little more specific?
Do you mean to do an aggregation for all unique key values for keyA?
It may be implemented with multiple queries:
var i=0;
var f=[];
while(i!=db.col.count()){
var k=db.col.findOne({keyA:{$not:{$in:f}}}).keyA;
i+=db.col.find({keyA:k}).count();
f.push(k);
}
The sense of this code is to collect unique values of KeyA field of objects of col collection in array f, which will be result of operation. Unfortunately, for a while doing this operation you should block any operations, which will change col collection.
UPDATE:
All can be done much easier using distinct:
db.col.distinct("KeyA")
Thanks to #Aleksey for pointing me to db.collection.distinct.
Looks like this does it:
db.ships.distinct("keyA").forEach(function(v){
db.ships.find({keyA:v}).count();
});
Of course calling count() within a loop doesn't do much; in my case I was looking for key-values with more than one document, so I did this:
db.ships.distinct("keyA").forEach(function(v){
print(db.ships.find({keyA:v}).count() > 1);
});

MongoDB, MapReduce and sorting

I might be a bit in over my head on this as I'm still learning the ins and outs of MongoDB, but here goes.
Right now I'm working on a tool to search/filter through a dataset, sort it by an arbitrary datapoint (eg. popularity) and then group it by an id. The only way I see I can do this is through Mongo's MapReduce functionality.
I can't use .group() because I'm working with more than 10,000 keys and I also need to be able to sort the dataset.
My MapReduce code is working just fine, except for one thing: sorting. Sorting just doesn't want to work at all.
db.runCommand({
'mapreduce': 'products',
'map': function() {
emit({
product_id: this.product_id,
popularity: this.popularity
}, 1);
},
'reduce': function(key, values) {
var sum = 0;
values.forEach(function(v) {
sum += v;
});
return sum;
},
'query': {category_id: 20},
'out': {inline: 1},
'sort': {popularity: -1}
});
I already have a descending index on the popularity datapoint, so it's definitely not working because of a lack of that:
{
"v" : 1,
"key" : { "popularity" : -1 },
"ns" : "app.products",
"name" : "popularity_-1"
}
I just cannot figure out why it doesn't want to sort.
Instead of inlining the result set, I can't output it to another collection and then run a .find().sort({popularity: -1}) on that because of the way this feature is going to work.
First of all, Mongo map/reduce are not designed to be used in as a query tool (as it is in CouchDB), it is design for you to run background tasks. I use it at work to analyze traffic data.
What you are doing wrong however is that you're applying the sort() to your input, but it is useless because when the map() stage is done the intermediate documents are sorted by each keys. Because your key is a document, it is being sort by product_id, popularity.
This is how I generated my dataset
function generate_dummy_data() {
for (i=2; i < 1000000; i++) {
db.foobar.save({
_id: i,
category_id: parseInt(Math.random() * 30),
popularity: parseInt(Math.random() * 50)
})
}
}
And this my map/reduce task:
var data = db.runCommand({
'mapreduce': 'foobar',
'map': function() {
emit({
sorting: this.popularity * -1,
product_id: this._id,
popularity: this.popularity,
}, 1);
},
'reduce': function(key, values) {
var sum = 0;
values.forEach(function(v) {
sum += v;
});
return sum;
},
'query': {category_id: 20},
'out': {inline: 1},
});
And this is the end result (very long to paste it here):
http://cesarodas.com/results.txt
This works because now we're sorting by sorting, product_id, popularity. You can play with the sorting how ever you like just remember that the final sorting is by key regardless of you how your input is sorted.
Anyway as I said before you should avoid doing queries with Map/Reduce it was designed for background processing. If I were you I would design my data in such a way I could access it with simple queries, there is always a trade-off in this case complex insert/updates to have simple queries (that's how I see MongoDB).
As noted in discussion on the original question:
Map/Reduce with inline output currently cannot use an explicit sort key (see SERVER-3973). Possible workarounds include relying on the emitted key order (see #crodas's answer); outputting to a collection and querying that collection with sort order; or sorting the results in your application using something like usort().
OP's preference is for inline results rather than creating/deleting temporary collections.
The Aggregation Framework in MongoDB 2.2 (currently a production release candidate) would provide a suitable solution.
Here's an example of a similar query to the original Map/Reduce, but instead using the Aggregation Framework:
db.products.aggregate(
{ $match: { category_id: 20 }},
{ $group : {
_id : "$product_id",
'popularity' : { $sum : "$popularity" },
}},
{ $sort: { 'popularity': -1 }}
)
.. and sample output:
{
"result" : [
{
"_id" : 50,
"popularity" : 139
},
{
"_id" : 150,
"popularity" : 99
},
{
"_id" : 123,
"popularity" : 55
}
],
"ok" : 1
}