Project multiple documents from a field's value using aggregation framework - mongodb

Consider this document
{
"_id" : ObjectId("56d06614070b7f2b117b23db"),
"name" : "joe",
"value" : 3,
}
I need to get the following aggregation result :
[{
"name" : "joe",
},
{
"name" : "joe",
},
{
"name" : "joe",
}]
Do you have an idea on how to do that with aggregation framework ?

You can use $range:
db.collection.aggregate([
{ "$project": {
"_id": 0,
"name": {
"$map": {
"input": { "$range": [0,"$value"] },
"in": "$name"
}
}
}},
{ "$unwind": "$name" }
])
That essentially supplies [0,1,2] as in input array to $map, from which you can emit each value of "name" repeated. So the "$value" field in the document is being used as the upper bound or "length" of the array to produce.
Not really any other option if your MongoDB does not support that, other than you simply transform on the cursor instead.

Related

MongoDB sorting by date as type String

Can someone help me with the query for sorting an array by date in ascending order?
I have tried the below query but the sorting is not happening as expected,
db.getCollection(xyz).aggregate([{
$match: {
"_id":{$in:[{"a" : "NA","b" : "HXYZ","c" : "12345","d" : "AA"}]}
}
},{
$sort: {'bal.date': 1}
},
{ $project: {
balances: { $slice: ["$bal",2]}
}
}
])
My collection:
/* 1 */
{
"_id" : {
"a" : "NA",
"b" : "HXYZ",
"c" : "12345",
"d" : "AA"
},
"bal" : [
{
"type" : "E",
"date" : "2015-08-02"
},
{
"type" : "E",
"date" : "2015-08-01"
},
{
"type" : "E",
"date" : "2015-07-07"
}
]
}
Please help me what is the problem in the above query.
Thanks in advance.
You are mixing $match with $sort stage
Correct syntax to used aggregation pipeline stages
db.collection.aggregate([
{ "$match": {
"_id": {
"$eq": {
"a": "NA",
"b": "HXYZ",
"c": "12345",
"d": "AA"
}
}
}},
{ "$unwind": "$bal" },
{ "$sort": { "bal.date": 1 }},
{ "$group": {
"_id": "$_id",
"bal": {
"$push": "$bal"
}
}}
])
From the looks of it, you're saving the date as String, the sort() function will sort the dates as Strings which will not give you the order you're expecting.
You can either run a script that will convert the bal.date field to Date() format and then sort() will work automatically, or you can do the converting + sorting server side.

Sum array number values across multiple documents

I have Mongo documents which have array number values in order (it's by day) and I want to sum the same values across multiple documents for each position grouped by field outside of the array.
{"_id" : "1",
"group" : "A",
"value_list" : [1,2,3,4,5,6,7]
},
{"_id" : "2",
"group" : "B",
"value_list" : [10,20,30,40,50,60,70]
},
{"_id" : "3",
"group" : "A",
"value_list" : [1,2,3,4,5,6,7]
},
{"_id" : "4",
"group" : "B",
"value_list" : [10,20,30,40,50,60,70]
}
So the results I'm after is listed below.
There are two group A documents above and at position 1 of the value_list array, both documents have the value of 1. so 1+1=2. Position 2 the value is 2 in both documents so 2+2=4, etc.
There are two group B documents above and at position 1 of the value_list array, both documents have the value of 10. so 10+10=20. Position 2 the value is 20 in both documents so 20+20=40, etc.
{"_id" : "30",
"group" : "A",
"value_list" : [2,4,6,8,10,12,14]
},
{"_id" : "30",
"group" : "A",
"value_list" : [20,40,60,80,100,120,140]
}
How would I do this using Mongo Script? Thanks, Matt
Certainly the most "scalable" way is to use the includeArrayIndex option of $unwind in order to track the positions and then $sum the "unwound" combinations, before adding back into array format:
db.getCollection('test').aggregate([
{ "$unwind": { "path": "$value_list", "includeArrayIndex": "index" } },
{ "$group": {
"_id": {
"group": "$group",
"index": "$index"
},
"value_list": { "$sum": "$value_list" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.group",
"value_list": { "$push": "$value_list" }
}},
{ "$sort": { "_id": 1 } }
])
Note you need to $sort after the first $group in order to maintain the array positions.
If you can get away with it, you could also apply all arrays into $reduce:
db.getCollection('test').aggregate([
{ "$group": {
"_id": "$group",
"value_list": { "$push": "$value_list" }
}},
{ "$addFields": {
"value_list": {
"$reduce": {
"input": "$value_list",
"initialValue": [],
"in": {
"$map": {
"input": {
"$zip": {
"inputs": ["$$this", "$$value"],
"useLongestLength": true,
}
},
"in": { "$sum": "$$this"}
}
}
}
}
}},
{ "$sort": { "_id": 1 } }
])
Essentially you create an "array of arrays" using the initial $push, which you process with $reduce. The $zip does a "pairwise" assignment per element, which are then added together at each position during $map using $sum.
While a bit more efficient, it's not really practical for large data as you would probably break the BSON limit by adding all grouped "arrays" into a single array on the grouping, before you "reduce" it.
Either method produces the same result:
/* 1 */
{
"_id" : "A",
"value_list" : [
2.0,
4.0,
6.0,
8.0,
10.0,
12.0,
14.0
]
}
/* 2 */
{
"_id" : "B",
"value_list" : [
20.0,
40.0,
60.0,
80.0,
100.0,
120.0,
140.0
]
}

How to match and group array elements with the max value in aggregation

I need help to get the array element having maximum value of a field(level) from a document. Then count the total occurences grouped by array element field "bssid".
Consider the following data
/* 1 */
{
"_id" : "18:59:36:0c:94:a3",
"timestamp" : "1460012567",
"apdata" : [{
"bssid" : "f4:b7:e2:56:e4:20",
"ssid" : "Test Network2",
"level" : -55
}, {
"bssid" : "b8:a3:86:67:03:56",
"ssid" : "Test Network1",
"level" : -76
}]
}
/* 2 */
{
"_id" : "d0:b3:3f:b9:42:38",
"timestamp" : "1460013345",
"apdata" : [{
"bssid" : "f4:b7:e2:56:e4:20",
"ssid" : "Test Network2",
"level" : -65
}, {
"bssid" : "b8:a3:86:67:03:56",
"ssid" : "Test Network1",
"level" : -46
}]
}
/* 3 */
{
"_id" : "d0:b3:3f:b9:42:41",
"timestamp" : "1460013145",
"apdata" : [{
"bssid" : "f4:b7:e2:56:e4:20",
"ssid" : "Test Network2",
"level" : -65
}, {
"bssid" : "b8:a3:86:67:03:56",
"ssid" : "Test Network1",
"level" : -46
}]
}
The output required is
{
"bssid" : "f4:b7:e2:56:e4:20",
"ssid" : "Test Network2",
"count" : 1
}, {
"bssid" : "b8:a3:86:67:03:56",
"ssid" : "Test Network1",
"count" : 2
}
Which is the count of times each bssid had the maximum value within the array of each document over the whole collection.
If you have MongoDB 3.2 available then you can do something like this:
db.sample.aggregate([
{ "$project": {
"apdata": {
"$arrayElemAt": [
{ "$filter": {
"input": "$apdata",
"as": "el",
"cond": {
"$eq": [
"$$el.level",
{ "$max": {
"$map": {
"input": "$apdata",
"as": "data",
"in": "$$data.level"
}
}}
]
}
}},
0
]
}
}},
{ "$group": {
"_id": "$apdata.bssid",
"ssid": { "$first": "$apdata.ssid" },
"count": { "$sum": 1 }
}}
])
For at least MongoDB 2.6 you need to do this:
db.sample.aggregate([
{ "$unwind": "$apdata" },
{ "$group": {
"_id": "$_id",
"apdata": { "$push": "$apdata" },
"max": { "$max": "$apdata.level" }
}},
{ "$unwind": "$apdata" },
{ "$redact": {
"$cond": {
"if": { "$eq": [ "$apdata.level", "$max" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$group": {
"_id": "$apdata.bssid",
"ssid": { "$first": "$apdata.ssid" },
"count": { "$sum": 1 }
}}
])
And for MongoDB 2.4 or 2.2 like this:
db.sample.aggregate([
{ "$unwind": "$apdata" },
{ "$group": {
"_id": "$_id",
"apdata": { "$push": "$apdata" },
"max": { "$max": "$apdata.level" }
}},
{ "$unwind": "$apdata" },
{ "$project": {
"apdata": 1,
"isMax": { "$eq": [ "$apdata.level", "$max" ] }
}},
{ "$match": { "isMax": true } },
{ "$group": {
"_id": "$apdata.bssid",
"ssid": { "$first": "$apdata.ssid" },
"count": { "$sum": 1 }
}}
])
In all cases $max is used to get the "maximum" value of of the array in each document "first", then you can use that to "filter" the array content prior to using it in a $group. The approaches to this only vary with version
MongoDB 3.2: Allows the $max to work directly on an "array" of values. So the $map is used to just get the "level" values and find out what that "max" actually is.
Then the $filter can be used to just return the array element which matches that "max" value, and finally $arrayElemAt is used to return that "only" ( out of two possible and "zero" index ) element as a plain document.
The whole process can be done in $group "only" if you basically repeat that whole statement for both the _id and in order to get the $first "ssid" value, but it's a bit easier to write in a $project separately to demonstrate.
MongoDB 2.6: This lacks the fancier operators and most notably the ability of $max to work "directly" on an array. The notable thing is the need to $unwind the array first and then actually $group just on the original document, solely in order to get that "max" value.
Then the process really needs you to $unwind again since you will be grouping on the element from the array later, and then use $redact to filter the content. This is a "logical" form of $match where you can directly compare the "level" against the computed "max" from the earlier stage. So the element that is not the "max" is removed.
MongoDB 2.4: Is again basically the same logic, except instead of $redact you actually need the physical $project in order to put a field in the document to use in filtering with $match.
All versions have the same final $group, where you supply the path to "apdata.bssid" for the grouping key and the $first result on that grouping boundary for the "ssid" and a simple $sum to count the occurrences of the grouping key in the results.
Everything returns just as follows:
{ "_id" : "f4:b7:e2:56:e4:20", "ssid" : "Test Network2", "count" : 1 }
{ "_id" : "b8:a3:86:67:03:56", "ssid" : "Test Network1", "count" : 2 }
Actually the most "efficient" form for MongoDB 3.2 would be as follows:
db.sample.aggregate([
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": "$apdata",
"as": "el",
"cond": {
"$eq": [
"$$el.level",
{ "$max": {
"$map": {
"input": "$apdata",
"as": "data",
"in": "$$data.level"
}
}}
]
}
}
},
"as": "apdata",
"in": {
"bssid": "$$apdata.bssid",
"ssid": "$$apdata.ssid"
}
}},
0
]
},
"count": { "$sum": 1 }
}}
])
With a slightly different form due to the compound _id, but it is a single $group stage only, without repetition of the whole process to find the array element data for the "max" value:
{
"_id" : {
"bssid" : "b8:a3:86:67:03:56",
"ssid" : "Test Network1"
},
"count" : 2
}
{
"_id" : {
"bssid" : "f4:b7:e2:56:e4:20",
"ssid" : "Test Network2"
},
"count" : 1
}

mongodb get filtered count

I have no extended knowledge on how to create mongodb queries, but I wanted to ask how could I query collection get something like this:
{
Total: 1000,
Filtered: 459,
DocumentArray: []
}
Of course doing that in one query, so I do not need to do something like this:
db.collection.find();
db.collection.find().count();
db.colection.count();
Well you could do something along these lines:
Considering documents like this:
{ "_id" : ObjectId("531251829df82824bdb53578"), "name" : "a", "type" : "b" }
{ "_id" : ObjectId("531251899df82824bdb53579"), "name" : "a", "type" : "c" }
{ "_id" : ObjectId("5312518e9df82824bdb5357a"), "type" : "c", "name" : "b" }
And an aggregate pipeline like this:
db.collection.aggregate([
{ "$group": {
"_id": null,
"totalCount": { "$sum": 1 },
"docs": { "$push": {
"name": "$name",
"type": "$type"
}},
}},
{ "$unwind": "$docs" },
{ "$match": { "docs.name": "a" } },
{ "$group": {
"_id": null,
"totalCount": { "$first": "$totalCount" },
"filteredCount": { "$sum": 1 },
"docs": { "$push": "$docs" }
}}
])
But I would not recommend it. It will certainly blow up on any "real" collection due to exceeding the maximum BSON document size. And I would doubt it would be performing very well. But that is how it can be done, even if the utility is purely academic.
Just do what you are doing if you need the information. That is the "right way" to do it.

Mongodb Aggregation grouping with leave the field

After applying the aggregation
db.grades.aggregate([
{$match: {'type': 'homework'}},
{$sort: {'student_id':1, 'score':1}}
])
got the result:
{
"result" : [
{
"_id" : ObjectId("50906d7fa3c412bb040eb579"),
"student_id" : 0,
"type" : "homework",
"score" : 14.8504576811645
},
{
"_id" : ObjectId("50906d7fa3c412bb040eb57a"),
"student_id" : 0,
"type" : "homework",
"score" : 63.98402553675503
},
...
How to modify the request to leave documents with a minimum value score and get a result which kept the field id. For example, in such a way:
{
"_id" : ObjectId("50906d7fa3c412bb040eb579"),
"score" : 14.8504576811645
}
Thanks.
Is this a homework question from the education site? I can't remember, but this is fairly trivial.
db.grades.aggregate([
{ "$match": { type: 'homework' } },
{ "$sort": {student_id: 1, score: 1} },
{ "$group": {
"_id": "$student_id",
"doc": { "$first": "$_id"},
"score": { "$first": "$score"}
}},
{ "$sort: { "_id": 1 } },
{ "$project": {
"_id": "$doc",
"score": 1
}}
])
All this does is use $first to get the first result when grouping by student_id. By first it means exactly that, so this is only useful after sorting and is different from $min which would take the smallest value from the grouped results.
So if you got part of the way there, not only do you keep the first score, but you also do the same operation on the _id value as well.
The additional sort is only there so the results don't trip you up, because they are likely to appear in the reverse order of student_id. Finally there is just a small use of $project to get the document form that you want.