Mongodb: aggregate array of integers for each array position without map/reduce

Mongodb: aggregate array of integers for each array position without map/reduce - mongodb

I am new to MongoDB. I am wondering if it is possible to aggregate each "column" in array. Assume we have below documents:
db.test.insert([{"player": "A", "year": 2010, "value": [1, 1 ]},
{"player": "A", "year": 2011, "value": [2, 1 ]},
{"player": "A", "year": 2012, "value": [2, 1 ]},
{"player": "B", "year": 2010, "value": [1, 2 ]}])
The expected result should be like :
[
{
"player": "A",
"sum_result": [
5,
3
]
},
{
"player": "B",
"sum_result": [
1,
2
]
}
]
Is it possible to do that without using Map/Reduce? Below link shows a map/reduce solution, but I am seeking an alternative way to achieve the same goal. Thanks!
Mongodb: aggregate array of integers for each array position

Most of the problems can be solved in aggregation only if the elements in the document, involved, can be reached by a path name.
In this example the values array elements cannot be reached by the path values.0 or values.1.
Hence to solve such problems you need to go for map-reduce, which provides more flexibility.
Unfortunately you cannot get the desired output using aggregation in this case, unless you change your schema. One such examples would be to make the values field - an array of documents. Here all the elements in the values array can be reached by a path name such as values.c1 (denoting values in column 1)
db.t.insert([
{"player":"A","year":2010,"value":[ {"c1":1}, {"c2":1}]},
{"player":"A","year":2011,"value":[ {"c1":2}, {"c2":1}]},
{"player":"A","year":2012,"value":[ {"c1":2}, {"c2":1}]},
{"player":"B","year":2010,"value":[ {"c1":1}, {"c2":2}]}
])
Then you could aggregate it as below:
db.t.aggregate([
{$unwind:"$value"},
{$group:{"_id":"$player",
"c1":{$sum:"$value.c1"},
"c2":{$sum:"$value.c2"}}},
{$group:{"_id":"$_id",
"c1":{$push:{"c1":"$c1"}},
"c2":{$push:{"c2":"$c2"}}}},
{$project:{"value":{$setUnion:["$c1","$c2"]}}}
])
o/p:
{ "_id" : "A", "value" : [ { "c1" : 5 }, { "c2" : 3 } ] }
{ "_id" : "B", "value" : [ { "c2" : 2 }, { "c1" : 1 } ] }
Since the solution involves the $unwind operation, which is costly, i suggest you to measure the performance of both the solutions against your data set and choose the one which best suits your application.

Thanks for your reply. Changing the schema could be the best way for this problem. As you pointed, $unwind operation might be costly. I eventually decided to use below solution.
Revised document:
db.test.insert([{"player": "A", "year": 2010, "values": {"v1":1, "v2":1 }},
{"player": "A", "year": 2011, "values": {"v1":2, "v2":1 }},
{"player": "A", "year": 2012, "values": {"v1":2, "v2":1 }},
{"player": "B", "year": 2010, "values": {"v1":1, "v2":2 }}])
Query statement:
db.test.aggregate([
{$group:{"_id":"$player",
"v1":{$sum:"$values.v1"},
"v2":{$sum:"$values.v2"}}},
{$project:{"values":{"v1":"$v1","v2":"$v2"}}}
])
Output:
{
"result": [{
"_id": "B",
"values": {
"v1": 1,
"v2": 2
}
}, {
"_id": "A",
"values": {
"v1": 5,
"v2": 3
}
}],
"ok": 1
}

Related

Encoding issue when performing aggregation to find average by group in pymongo

I have a dataset of items inside an orders collection:
> db.orders.find()
{"_id" : ObjectId("1a5"), "date": ISODate("2021-06-07T00:00:00Z"), "category": "A", "total": 150},
{"_id" : ObjectId("1a6"), "date": ISODate("2021-06-07T00:00:00Z"), "category": "B", "total": 175},
{"_id" : ObjectId("1a7"), "date": ISODate("2021-06-07T00:00:00Z"), "category": "A", "total": 200},
and I want to find the average value of total by category in pymongo. The category is sometimes not present but total is always present.
I tried this in pymongo
pipeline = [
{"$group": { "_id": "$category", "average": {"$avg": {"$total"}}}}
]
db.command('aggregate', 'orders', pipeline=pipeline, explain=True)
and I got an error
InvalidDocument: cannot encode object: {'$total'}, of type: <class 'set'>
Performing a similar query in the mongo shell directly works fine though:
> db.orders.aggregate([{$group: {_id: "$category", average: {$avg: "$total"}}}])
{ "_id" : "A", "average" : 175 }
{ "_id" : "B", "average" : 175 }
I checked using Compass that the values of total are stored as int32 so I don't understand the error message and how to fix it. I tried searching around as well but most of what I found about this error were about inserting entries into the db instead of aggregating them. Am I missing something? Thank you.

You have extra braces around "$total"; try:
pipeline = [
{"$group": { "_id": "$category", "average": {"$avg": "$total"}}}
]

MonogDB document structure: Map vs. Array for element-wise aggregations

We want to store ratings of a metric (say sales, profit) for some category (say city) in MondoDB. Example rating scale: [RED, YELLOW, GREEN], the length will be fixed. We are considering the following two document structures:
Structure 1: Ratings as an array
{
"_id": 1,
"city": "X",
"metrics": ["sales", "profit"],
"ratings" : {
"sales" : [1, 2, 3], // frequency of RED, YELLOW, GREEN ratings, fixed length array
"profit": [4, 5, 6],
},
}
{
"_id": 2,
"city": "X",
"metrics": ["sales", "profit"],
"ratings" : {
"sales" : [1, 2, 3], // frequency of RED, YELLOW, GREEN ratings, fixed length array
"profit": [4, 5, 6],
},
}
Structure 2: Ratings as a map
{
"_id": 1,
"city": "X",
"metrics": ["sales", "profit"],
"ratings" : {
"sales" : { // map will always have "RED", "YELLOW", "GREEN" keys
"RED": 1,
"YELLOW": 2,
"GREEN": 3
},
"profit" : {
"RED":4,
"YELLOW": 5,
"GREEN": 6
},
},
}
{
"_id": 2,
"city": "X",
"metrics": ["sales", "profit"],
"ratings" : {
"sales" : { // map will always have "RED", "YELLOW", "GREEN" keys
"RED": 1,
"YELLOW": 2,
"GREEN": 3
},
"profit" : {
"RED":4,
"YELLOW": 5,
"GREEN": 6
},
},
}
Our use case:
aggregate ratings grouped by city and metric
we do not intend to index on the "ratings" field
So for structure 1, to aggregate ratings, I need element-wise aggregations and it seems it will likely involve unwind steps or maybe map-reduce and the resulting document would look something like this:
{
"city": "X",
"sales": [2, 4, 6]
"profit": [8, 10, 12]
}
For structure 2, I think aggregation would be relatively straightforward using the aggregation pipeline, ex (aggregating just sales):
db.getCollection('Collection').aggregate([
{
$group: {
"_id": {"city": "$city" },
"sales_RED": {$sum: "$ratings.sales.RED"},
"sales_YELLOW": {$sum: "$ratings.sales.YELLOW"},
"sales_GREEN": {$sum: "$ratings.sales.GREEN"}
}
},
{
$project: {"_id": 0, "city": "$_id.city", "sales": ["$sales_RED", "$sales_YELLOW", "$sales_GREEN"]}
}
])
Would give the following result:
{
"city": "X",
"sales": [2, 4, 6]
}
Query:
I am tending towards the second structure mainly because I am not clear on how to achieve element-wise array aggregation in MOngoDB. From what I have seen it will probably involve unwinding. The second document structure will have a larger document size because of the repeated field names for the ratings but the aggregation itself is simple. Can you please point out, based on our use case, how would they compare in terms of computational efficiency, and if I am missing any points worth considering?

I was able to achieve the aggregation with the array structure using $arrayElemAt. (However, this still involves having to specify aggregations for individual array elements, which is the same as the case for document structure 2)
db.getCollection('Collection').aggregate([
{
$group: {
"_id": {"city": "$city" },
"sales_RED": {$sum: { $arrayElemAt: [ "$ratings.sales", 0] }},
"sales_YELLOW": {$sum: { $arrayElemAt: [ "$ratings.sales", 1] }},
"sales_GREEN": {$sum: { $arrayElemAt: [ "$ratings.sales", 2] }},
}
},
{
$project: {"_id": 0, "city": "$_id.city", "sales": ["$sales_RED", "$sales_YELLOW", "$sales_GREEN"]}
}
])

How to sort by a foreign field, the foreign field not using alphabetical/numerical order? [duplicate]

Following this question which #NeilLunn has gracefully answered, here is my problem in more detail.
This is the set of documents, some have user_id some don't. The user_id represent the user who created the document:
{ "user_id" : 11, "content" : "black", "date": somedate }
{ "user_id" : 6, "content" : "blue", "date": somedate }
{ "user_id" : 3, "content" : "red", "date": somedate }
{ "user_id" : 4, "content" : "black", "date": somedate }
{ "user_id" : 4, "content" : "blue", "date": somedate }
{ "user_id" : 90, "content" : "red", "date": somedate }
{ "user_id" : 7, "content" : "orange", "date": somedate }
{ "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
...
{ "user_id" : 4, "content" : "orange", "date": somedate }
{ "user_id" : 1, "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
{ "user_id" : 90, "content" : "purple", "date": somedate }
The front end is pulling pages, so each page will have 10 items and I do that with limit and skip and it is working very well.
In case we have a logged in user, I would like to display to that current logged in user documents which he may find more interesting first, based on the users he interacted with.
The list of users which the current user may find interesting is sorted by score and is located outside of mongo. So the first element is the most important user which I would like to show his documents first, and the last user on the list is the least important.
The list is a simple array which looks like this:
[4,7,90,1].
The system which created this user score is not located within mongo, but I can copy the data if that will help. I can also change the array to include a score number.
What I would like accomplish is the following:
Get the documents sorted by importance of the user_id from the list, so that documents from user_id 4 will be the first to show up, documents from user_id 7 second and so on. When where are no users left on the list I would like to show the rest of the documents. Like this:
all documents with user_d:4
all documents with user_d:7
all documents with user_d:90
all documents with user_d:1
all the rest of the documents
How should I accomplish this? Am I asking too much from mongo?

Given the array [4,7,90,1] what you want in your query is this:
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$or": [
{ "$eq": ["$user_id": 4] },
{ "$eq": ["$user_id": 7] },
{ "$eq": ["$user_id": 90] },
{ "$eq": ["$user_id": 1] },
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
So what that does is, for every item contained in that $or condition, the user_id field is tested against the supplied value, and $eq returns 1 or 0 for true or false.
What you do in your code is for each item you have in the array you build the array condition of $or. So it's just creating a hash structure for each equals condition, passing it to an array and plugging that in as the array value for the $or condition.
I probably should have left the $cond operator out of the previous code so this part would have been clearer.
Here's some code for the Ruby Brain:
userList = [4, 7, 90, 1];
orCond = [];
userList.each do |userId|
orCond.push({ '$eq' => [ 'user_id', userId ] })
end
pipeline = [
{ '$project' => {
'user_id' => 1,
'content' => 1,
'date' => 1,
'weight' => { '$or' => orCond }
}},
{ '$sort' => { 'weight' => -1, 'date' => -1 } }
]
If you want to have individual weights and we'll assume key value pairs, then you need to nest with $cond :
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$cond": [
{ "$eq": ["$user_id": 4] },
10,
{ "$cond": [
{ "$eq": ["$user_id": 7] },
9,
{ "$cond": [
{ "$eq": ["$user_id": 90] },
7,
{ "$cond": [
{ "$eq": ["$user_id": 1] },
8,
0
]}
]}
]}
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
Note that it's just a return value, these do not need to be in order. And you can think about the generation of that.
For generating this structure see here:
https://stackoverflow.com/a/22213246/2313887

Since mongoDB version 3.2 we can use a $filter which make this much easier to maintain in case there are more than 4 scores:
db.collection.aggregate([
{
$addFields: {
weight: [
{key: 4, score: 10}, {key: 8, score: 9}, {key: 90, score: 8}, {key: 1, score: 7}
]
}
},
{
$addFields: {
weight: {
$filter: {
input: "$weight",
as: "item",
cond: {$eq: ["$$item.key", "$user_id"]}
}
}
}
},
{
$set: {
weight: {
$cond: [{$eq: [{$size: "$weight"}, 1]}, {$arrayElemAt: ["$weight", 0]}, {score: 1}]
}
}
},
{$set: {weight: "$weight.score"}},
{$sort: {weight: -1, date: -1}}
])
See how it works on the playground example

MongoDB aggregate from array field [duplicate]

This question already has answers here:
Group by specific element of array with mongo aggregation framework
(5 answers)
Closed 8 years ago.
I have the following collection:
{
"_id" : ObjectId("52e7aa3ed3d55b9b01e23f34"),
"time" : mytime,
"type_instance" : "",
"values" : [0.23, 0.08, 0.06],
"types" : ["type0", "type1", "type2"]
}
I want to group by time to get the average of the values per index. The desired result would be something like:
{
"time" : mytime,
"values" : [avg 0, avg 1, avg 2],
"types" : ["type0", "type1", "type2"]
}
I tried to aggregate
collection.aggregate([
{ "$match": {'time': {"$gte": start}
}
}
,{ "$project": {
"time":"$time",
"values": "$values"
}
}
,{
"$group": {"_id": "$time", "avg": {avg:"$values[0]"}}
}
,{
"$sort": {"time": 1}
}
], function(err, data) {});
Off course this doesn't work, I can't use "$values[0]".
Is there a way to do this?

I think the problem could be with your document structure because your want to link indirectly the values from the values field to the ones in types field, maybe something like this would be more convenient:
{
"_id": ObjectId("52e7aa3ed3d55b9b01e23f34"),
"time" : mytime,
"type_instance" : "",
"whatever":[{
"type": 0,
"value": 0.23
},{
"type": 1,
"value": 0.08
},{
"type": 2,
"value": 0.06
}]
}
This way you could group by time and type (or index as I think you referred to it) after unwinding the whatever field:
collection.aggregate([
{$unwind: "$whatever"},
{$match: {"time": ...},
{$group:{
_id: {"$time", "$whatever.type"},
avg: {$avg: "$whatever.value"}
}}
])
This way you will get N documents per time group, being N = number of types or subdocuments in the whatever field.

Count occurrences in nested mongodb document and keeping group

I have theses documents:
[
{
"question": 1,
"answer": "Foo"
},
{
"question": 1,
"answer": "Foo"
},
{
"question": 1,
"answer": "Bar"
},
{
"question": 2,
"answer": "Foo"
},
{
"question": 2,
"answer": "Foobar"
}
]
And in my backend (php) I need to get the repartition of answers, something like:
Question 1:
"Foo": 2/3
"Bar": 1/3
Question 2:
"Foo": 1/2
"Foobar": 1/2
For now I just want to run a mongo query in order to achieve this result:
[
{
"question": 1,
"answers": {
"Foo": 2,
"Bar": 1
}
},
{
"question": 2,
"answers": {
"Foo": 1,
"Foobar": 1
}
}
]
Here is what I came up with:
db.getCollection('testAggregate').aggregate([{
$group: {
'_id': '$question',
'answers': {'$push': '$answer'},
}
}
]);
It returns:
{
"_id" : 2.0,
"answers" : [
"Foo",
"Foobar"
]
},{
"_id" : 1.0,
"answers" : [
"Foo",
"Foo",
"Bar"
]
}
And now I need to to a $group operation on the answers field in order to count the occurences, but I need to keep the group by question and I do not know how to do it. Could someone give me a hand?

You can use below aggregation.
Group by both question and answer to get the count for combination followed by group by question to get the answer and its count.
db.getCollection('testAggregate').aggregate([
{"$group":{
"_id":{"question":"$question","answer":"$answer"},
"count":{"$sum":1}
}},
{"$group":{
"_id":"$_id.question",
"answers":{"$push":{"answer":"$_id.answer","count":"$count"}}
}}
]);
You can use below code to get the format you want in 3.4.
Change $group keys into k and v followed by $addFields with $arrayToObject to transform the array into named key value pairs.
db.getCollection('testAggregate').aggregate([
{"$group":{
"_id":{"question":"$question","answer":"$answer"},
"count":{"$sum":1}
}},
{"$group":{
"_id":"$_id.question",
"answers":{"$push":{"k":"$_id.answer","v":"$count"}}
}},
{"$addFields":{"answers":{"$arrayToObject":"$answers"}}}
]);