MongoDB aggregate nested array correctly - mongodb

OK I am very new to Mongo, and I am already stuck.
Db has the following structure (much simplified for sure):
{
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"score" : "2",
"value" : "AA",
},
{
"score" : "2",
"value" : "AA",
},
{
"score" : "4",
"value" : "BB",
},
{
"score" : "3",
"value" : "CC",
}
]
},
{
"_id" : ObjectId("57fdfbc12dc30a46507044ef"),
"keyterms" : [
...
There are some Objects. Each Object have an array "keywords". Each of this Arrays Entries, which have score and value. There are some duplicates though (not really, since in the real db the keywords entries have much more fields, but concerning value and score they are duplicates).
Now I need a query, which
selects one object by id
groups its keyterms in by value
and counts the dublicates
sorts them by score
So I want to have something like that as result
// for Object 57fdfbc12dc30a46507044ec
"keyterms"; [
{
"score" : "4",
"value" : "BB",
"count" : 1
},
{
"score" : "3",
"value" : "CC",
"count" : 1
}
{
"score" : "2",
"value" : "AA",
"count" : 2
}
]
In SQL I would have written something like this
select
score, value, count(*) as count
from
all_keywords_table_or_some_join
group by
value
order by
score
But, sadly enough, it's not SQL.
In Mongo I managed to write this:
db.getCollection('tests').aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$_id",
'keyterms': {$push: "$keyterms"}
}},
{$project: {
'keyterms.score': 1,
'keyterms.value': 1
}}
])
But there is something missing: the grouping of the the keywords by their value. I can not get rid of the feeling, that this is the wrong approach at all. How can I select the keywords array and continue with that, and use an aggregate function inly on this - that would be easy.
BTW I read this
(Mongo aggregate nested array)
but I can't figure it out for my example unfortunately...

You'd want an aggregation pipeline where after you $unwind the array, you group the flattened documents by the array's value and score keys, aggregate the counts using the $sum accumulator operator and retain the main document's _id with the $first operator.
The preceding pipeline should then group the documents from the previous pipeline by the _id key so as to preserve the original schema and recreate the keyterms array using the $push operator.
The following demonstration attempts to explain the above aggregation operation:
db.tests.aggregate([
{ "$match": { "_id": ObjectId("57fdfbc12dc30a46507044ec") } },
{ "$unwind": "$keyterms" },
{
"$group": {
"_id": {
"value": "$keyterms.value",
"score": "$keyterms.score"
},
"doc_id": { "$first": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$sort": {"_id.score": -1 } },
{
"$group": {
"_id": "$doc_id",
"keyterms": {
"$push": {
"value": "$_id.value",
"score": "$_id.score",
"count": "$count"
}
}
}
}
])
Sample Output
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"value" : "BB",
"score" : "4",
"count" : 1
},
{
"value" : "CC",
"score" : "3",
"count" : 1
},
{
"value" : "AA",
"score" : "2",
"count" : 2
}
]
}
Demo

Meanwhile, I solved it myself:
aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$keyterms.value",
'keyterms': {$push: "$keyterms"},
'escore': {$first: "$keyterms.score"},
'evalue': {$first: "$keyterms.value"}
}},
{$limit: 15},
{$project: {
"score": "$escore",
"value": "$evalue",
"count": {$size: "$keyterms"}
}}
])

Related

MongoDB sorting by date as type String

Can someone help me with the query for sorting an array by date in ascending order?
I have tried the below query but the sorting is not happening as expected,
db.getCollection(xyz).aggregate([{
$match: {
"_id":{$in:[{"a" : "NA","b" : "HXYZ","c" : "12345","d" : "AA"}]}
}
},{
$sort: {'bal.date': 1}
},
{ $project: {
balances: { $slice: ["$bal",2]}
}
}
])
My collection:
/* 1 */
{
"_id" : {
"a" : "NA",
"b" : "HXYZ",
"c" : "12345",
"d" : "AA"
},
"bal" : [
{
"type" : "E",
"date" : "2015-08-02"
},
{
"type" : "E",
"date" : "2015-08-01"
},
{
"type" : "E",
"date" : "2015-07-07"
}
]
}
Please help me what is the problem in the above query.
Thanks in advance.
You are mixing $match with $sort stage
Correct syntax to used aggregation pipeline stages
db.collection.aggregate([
{ "$match": {
"_id": {
"$eq": {
"a": "NA",
"b": "HXYZ",
"c": "12345",
"d": "AA"
}
}
}},
{ "$unwind": "$bal" },
{ "$sort": { "bal.date": 1 }},
{ "$group": {
"_id": "$_id",
"bal": {
"$push": "$bal"
}
}}
])
From the looks of it, you're saving the date as String, the sort() function will sort the dates as Strings which will not give you the order you're expecting.
You can either run a script that will convert the bal.date field to Date() format and then sort() will work automatically, or you can do the converting + sorting server side.

Return specific field in aggregate

i am trying to aggregate the following data:
{
"_id" : ObjectId("527a6b7c24a8874c078b9d10"),
"Name" : "FirstName",
"Link" : "www.mylink.com/123",
"year" : 2013
}
{
"_id" : ObjectId("527a6b7c24a8874c078b9d11"),
"Name" : "FirstName",
"Link" : "www.mylink.com/124",
"year" : 2013
}
{
"_id" : ObjectId("527a6b7c24a8874c078b9d12"),
"Name" : "SecondName",
"Link" : "www.mylink.com/125",
"year" : 2013
}
I want to aggregate number of occurencies of Name field, but also want to return the corresponding Link field in the output of aggregate query. Now I am doing it like this (which does not return the Link field in the output):
db.coll.aggregate([
{ "$match": { "Year": 2013 } },
{ "$group": {
"_id": {
"Name": "$Name"
},
"count": { "$sum": 1 }
}},
{ "$project": {
"_id": "$_id",
"count": 1
}},
{ $sort: {
count: 1
} }
])
The above returns only Name field and count. But how can I also return the corresponding Link field (could be several) in the output of aggregate query?
Best Regards
db.coll.aggregate([
{ "$match": { "year": 2013 } },
{ "$group": {"_id": "$Name", "Link": {$push: "$Link"}, "count": { "$sum": 1 }}},
{ "$project": {"Name": "$_id", _id: 0, "Link": 1, "count": 1}},
{ $sort: {count: 1} }
])
Results in:
{ "Link" : [ "www.mylink.com/125" ], "count" : 1, "Name" : "SecondName" }
{ "Link" : [ "www.mylink.com/123", "www.mylink.com/124" ], "count" : 2, "Name" : "FirstName" }
Ok so the $match was correct except for a typo with 'Year' --> 'year'
The $group could be simplified a little bit. I removed an extra set of brackets so that you get id: 'FirstName' instead of id: { 'name': 'FirstName' } since we can reshape the _id to 'name' in the $project stage anyways.
You needed to add $push or $addToSet to maintain the $Link value in your grouping. $addToSet will allow for unique values in the array only, while $push will add all values, so use whichever at your discretion.
$project and $sort are straightforward, rename and include/exclude whichever fields you would like.

MongoDB: count the repetitive time of array element with MapReduce

Say for every document of a collection, it has an string array. how could I count the repetitive time of every element of the array in all this collection? Right now I can find all the distinct element, but then Map Reduce function is a little tricky that I haven't fully understood.
Doc A
{
_id:
name:
actors: ["a", "b", "c"]
}
Doc B
{
_id:
name:
actors: ["a", "d"]
}
Doc C
{
_id:
name:
actors: ["a", "c", "f"]
}
I wanne get a statistic result with a:3 b:1 c:2 d:1 f:1.
An alternative route that you could take is the aggregation framework. Considering the above collection as an example
Populate test collection:
db.collection.insert([
{ "_id" : 1, "name" : "ABC1", "actors": ["a", "b", "c"] },
{ "_id" : 2, "name" : "ABC2", "actors" : ["a", "d"] },
{ "_id" : 3, "name" : "XYZ1", "actors" : ["a", "c", "f"] }
])
Using MongoDB 3.4.4 or newer:
db.collection.aggregate([
{ "$unwind" : "$actors" },
{ "$group": { "_id": "$actors", "count": { "$sum": 1} } },
{ "$group": {
"_id": null,
"counts": {
"$push": {
"k": "$_id",
"v": "$count"
}
}
} },
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$counts" }
} }
])
Output
{
a: 3,
b: 1,
c: 2,
d: 1,
f: 1
}
Using MongoDB 3.2 and below:
The following aggregation pipeline operation uses the $unwind stage to output a document for each element in the actors array and the $group stage to group the documents by the value in the actors array then
counts the number of documents per each group (which gives the occurrence of the array elements as a group) by way of the $sum operator:
db.collection.aggregate([
{ "$unwind" : "$actors" },
{ "$group": { "_id": "$actors", "count": { "$sum": 1} } }
])
The operation returns the following results which would be a close match to your expectations but won't give you the documents as key/value pair:
/* 0 */
{
"result" : [
{
"_id" : "f",
"count" : 1
},
{
"_id" : "d",
"count" : 1
},
{
"_id" : "c",
"count" : 2
},
{
"_id" : "b",
"count" : 1
},
{
"_id" : "a",
"count" : 3
}
],
"ok" : 1
}

Mongo aggregate nested array

I have a mongo collection with following structure
{
"userId" : ObjectId("XXX"),
"itemId" : ObjectId("YYY"),
"resourceId" : 1,
"_id" : ObjectId("528455229486ca3606004ec9"),
"parameter" : [
{
"name" : "name1",
"value" : 150,
"_id" : ObjectId("528455359486ca3606004eed")
},
{
"name" : "name2",
"value" : 0,
"_id" : ObjectId("528455359486ca3606004eec")
},
{
"name" : "name3",
"value" : 2,
"_id" : ObjectId("528455359486ca3606004eeb")
}
]
}
There can be multiple documents with the same 'useId' with different 'itemId' but the parameter will have same key/value pairs in all of them.
What I am trying to accomplish is return aggregated parameters "name1", "name2" and "name3" for each unique "userId" disregard the 'itemId'. so final results would look like for each user :
{
"userId" : ObjectId("use1ID"),
"name1" : (aggregatedValue),
"name2" : (aggregatedValue),
"name3" : (aggregatedVAlue)
},
{
"userId" : ObjectId("use2ID"),
"name1" : (aggregatedValue),
"name2" : (aggregatedValue),
"name3" : (aggregatedVAlue)
}
Is it possible to accomplish this using the aggregated methods of mongoDB ? Could you please help me to build the proper query to accomplish that ?
The simplest form of this is to keep things keyed by the "parameter" "name":
db.collection.aggregate(
// Unwind the array
{ "$unwind": "$parameter"},
// Group on the "_id" and "name" and $sum "value"
{ "$group": {
"_id": {
"userId": "$userId",
"name": "$parameter.name"
},
"value": { "$sum": "$parameter.value" }
}},
// Put things into an array for "nice" processing
{ "$group": {
"_id": "$_id.userId",
"values": { "$push": {
"name": "$_id.name",
"value": "$value"
}}
}}
)
If you really need to have the "values" of names as the field values, you can do the the following. But since you are "projecting" the fields/properties then you must specify them all in your code. You cannot be "dynamic" anymore and you are coding/generating each one:
db.collection.aggregate([
// Unwind the array
{ "$unwind": "$parameter"},
// Group on the "_id" and "name" and $sum "value"
{ "$group": {
"_id": {
"userId": "$userId",
"name": "$parameter.name"
},
"value": { "$sum": "$parameter.value"}
}},
// Project out discrete "field" names with $cond
{ "$project": {
"name1": { "$cond": [
{ "$eq": [ "$_id.name", "name1" ] },
"$value",
0
]},
"name2": { "$cond": [
{ "$eq": [ "$_id.name", "name2" ] },
"$value",
0
]},
"name3": { "$cond": [
{ "$eq": [ "$_id.name", "name3" ] },
"$value",
0
]},
}},
// The $cond put "0" values in there. So clean up with $group and $sum
{ "$group": {
_id: "$_id.userId",
"name1": { "$sum": "$name1" },
"name2": { "$sum": "$name2" },
"name3": { "$sum": "$name3" }
}}
])
So while the extra steps give you the result that you want ( well with a final project to change the _id to userId ), for my mind the short version is workable enough, unless you really do need it. Consider the output from there as well:
{
"_id" : ObjectId("53245016ea402b31d77b0372"),
"values" : [
{
"name" : "name3",
"value" : 2
},
{
"name" : "name2",
"value" : 0
},
{
"name" : "name1",
"value" : 150
}
]
}
So that would be what I would use, personally. But your choice.
Not sure if I got your question but if the name field can contain only "name1", "name2", "name3" or at least you are only interested in this values, one of the possible queries could be this one:
db.aggTest.aggregate(
{$unwind:"$parameter"},
{$project: {"userId":1, "parameter.name":1,
"name1" : {"$cond": [{$eq : ["$parameter.name", "name1"]}, "$parameter.value", 0]},
"name2" : {"$cond": [{$eq : ["$parameter.name", "name2"]}, "$parameter.value", 0]},
"name3" : {"$cond": [{$eq : ["$parameter.name", "name3"]}, "$parameter.value", 0]}}},
{$group : {_id : {userId:"$userId"},
name1 : {$sum:"$name1"},
name2 : {$sum:"$name2"},
name3 : {$sum:"$name3"}}})
It firsts unwinds the parameter array, then separates name1, name2 and name3 values into different columns. There's a simple conditional statement for that. After that we can easily aggreagate by the new columns.
Hope it helps!

Mongodb Aggregation grouping with leave the field

After applying the aggregation
db.grades.aggregate([
{$match: {'type': 'homework'}},
{$sort: {'student_id':1, 'score':1}}
])
got the result:
{
"result" : [
{
"_id" : ObjectId("50906d7fa3c412bb040eb579"),
"student_id" : 0,
"type" : "homework",
"score" : 14.8504576811645
},
{
"_id" : ObjectId("50906d7fa3c412bb040eb57a"),
"student_id" : 0,
"type" : "homework",
"score" : 63.98402553675503
},
...
How to modify the request to leave documents with a minimum value score and get a result which kept the field id. For example, in such a way:
{
"_id" : ObjectId("50906d7fa3c412bb040eb579"),
"score" : 14.8504576811645
}
Thanks.
Is this a homework question from the education site? I can't remember, but this is fairly trivial.
db.grades.aggregate([
{ "$match": { type: 'homework' } },
{ "$sort": {student_id: 1, score: 1} },
{ "$group": {
"_id": "$student_id",
"doc": { "$first": "$_id"},
"score": { "$first": "$score"}
}},
{ "$sort: { "_id": 1 } },
{ "$project": {
"_id": "$doc",
"score": 1
}}
])
All this does is use $first to get the first result when grouping by student_id. By first it means exactly that, so this is only useful after sorting and is different from $min which would take the smallest value from the grouped results.
So if you got part of the way there, not only do you keep the first score, but you also do the same operation on the _id value as well.
The additional sort is only there so the results don't trip you up, because they are likely to appear in the reverse order of student_id. Finally there is just a small use of $project to get the document form that you want.