MongoDB: count the repetitive time of array element with MapReduce - mongodb

Say for every document of a collection, it has an string array. how could I count the repetitive time of every element of the array in all this collection? Right now I can find all the distinct element, but then Map Reduce function is a little tricky that I haven't fully understood.
Doc A
{
_id:
name:
actors: ["a", "b", "c"]
}
Doc B
{
_id:
name:
actors: ["a", "d"]
}
Doc C
{
_id:
name:
actors: ["a", "c", "f"]
}
I wanne get a statistic result with a:3 b:1 c:2 d:1 f:1.

An alternative route that you could take is the aggregation framework. Considering the above collection as an example
Populate test collection:
db.collection.insert([
{ "_id" : 1, "name" : "ABC1", "actors": ["a", "b", "c"] },
{ "_id" : 2, "name" : "ABC2", "actors" : ["a", "d"] },
{ "_id" : 3, "name" : "XYZ1", "actors" : ["a", "c", "f"] }
])
Using MongoDB 3.4.4 or newer:
db.collection.aggregate([
{ "$unwind" : "$actors" },
{ "$group": { "_id": "$actors", "count": { "$sum": 1} } },
{ "$group": {
"_id": null,
"counts": {
"$push": {
"k": "$_id",
"v": "$count"
}
}
} },
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$counts" }
} }
])
Output
{
a: 3,
b: 1,
c: 2,
d: 1,
f: 1
}
Using MongoDB 3.2 and below:
The following aggregation pipeline operation uses the $unwind stage to output a document for each element in the actors array and the $group stage to group the documents by the value in the actors array then
counts the number of documents per each group (which gives the occurrence of the array elements as a group) by way of the $sum operator:
db.collection.aggregate([
{ "$unwind" : "$actors" },
{ "$group": { "_id": "$actors", "count": { "$sum": 1} } }
])
The operation returns the following results which would be a close match to your expectations but won't give you the documents as key/value pair:
/* 0 */
{
"result" : [
{
"_id" : "f",
"count" : 1
},
{
"_id" : "d",
"count" : 1
},
{
"_id" : "c",
"count" : 2
},
{
"_id" : "b",
"count" : 1
},
{
"_id" : "a",
"count" : 3
}
],
"ok" : 1
}

Related

Find percent in mongo

I have a collection with 2 docs like below.
{
_id:1,
Score: 30,
Class:A,
School:X
}
{
Score:40,
Class:A,
School:Y
}
I need help in writing query to find out percentage of score like below
{
School:X,
Percent:30/70
}
{
School:Y
Percent:40/70
}
This input:
var r =
[
{"school":"X", "class":"A", "score": 30}
,{"school":"Y", "class":"A", "score": 40}
,{"school":"Z", "class":"A", "score": 20}
,{"school":"Y", "class":"B", "score": 50}
,{"school":"Z", "class":"B", "score": 17}
];
run through this pipeline:
db.foo.aggregate([
// Use $group to gather up the class and save the inputs via $push
{$group: {_id: "$class", tot: {$sum: "$score"}, items: {$push: {score:"$score",school:"$school"}}} }
// Now we have total by class, so just "reproject" that array and do some nice
// formatting as requested:
,{$project: {
items: {$map: { // overwrite input array $items; this is OK
input: "$items",
as: "z",
in: {
school: "$$z.school",
pct: {$concat: [ {$toString: "$$z.score"}, "/", {$toString:"$tot"} ]}
}
}}
}}
]);
produces this output, where _id is the Class:
{
"_id" : "A",
"items" : [
{"school" : "X", "pct" : "30/90"},
{"school" : "Y", "pct" : "40/90"},
{"school" : "Z", "pct" : "20/90"}
]
}
{
"_id" : "B",
{"school" : "Y", "pct" : "50/67"},
{"school" : "Z", "pct" : "17/67"}
]
}
From here you can $unwind if you wish.

MongoDB aggregation and projection

Can someone help me with the query for sorting an array by date in ascending order and as well display the cCode? I am able to sort the array and project it but am unable to project the cCode along with bal array,
db.collection.aggregate([
{ "$match": {
"_id": {
"$eq": {
"a": "NA",
"b": "HXYZ",
"c": "12345",
"d": "AA"
}
}
}},
{ "$unwind": "$bal" },
{ "$sort": { "bal.date": 1 }},
{ "$group": {"_id": "$_id",
"bal": {"$push": "$bal"}}},
{ $project: {
bal: { $slice: ["$bal",2]} ,"cCode": 1}
}
])
My collection:
/* 1 */
{
"_id" : {
"a" : "NA",
"b" : "HXYZ",
"c" : "12345",
"d" : "AA"
},
"cCode" : "HHH",
"bal" : [
{
"type" : "E",
"date" : "2015-08-02"
},
{
"type" : "E",
"date" : "2015-08-01"
},
{
"type" : "E",
"date" : "2015-07-07"
}
]
}
Please help me what is the problem in the above query. Thanks in advance.
Your cCode field vanished when you use $group stage. So, To get that field again in the pipeline you need to use $first aggregation. Something like this
db.collection.aggregate([
{ "$match": {
"_id": { "$eq": { "a": "NA", "b": "HXYZ", "c": "12345", "d": "AA" }}
}},
{ "$unwind": "$bal" },
{ "$sort": { "bal.date": 1 }},
{ "$group": {
"_id": "$_id",
"bal": { "$push": "$bal" },
"cCode": { "$first": "$cCode" }
}},
{ "$project": { "bal": { "$slice": ["$bal", 2] } ,"cCode": 1 }}
])

MongoDB sorting by date as type String

Can someone help me with the query for sorting an array by date in ascending order?
I have tried the below query but the sorting is not happening as expected,
db.getCollection(xyz).aggregate([{
$match: {
"_id":{$in:[{"a" : "NA","b" : "HXYZ","c" : "12345","d" : "AA"}]}
}
},{
$sort: {'bal.date': 1}
},
{ $project: {
balances: { $slice: ["$bal",2]}
}
}
])
My collection:
/* 1 */
{
"_id" : {
"a" : "NA",
"b" : "HXYZ",
"c" : "12345",
"d" : "AA"
},
"bal" : [
{
"type" : "E",
"date" : "2015-08-02"
},
{
"type" : "E",
"date" : "2015-08-01"
},
{
"type" : "E",
"date" : "2015-07-07"
}
]
}
Please help me what is the problem in the above query.
Thanks in advance.
You are mixing $match with $sort stage
Correct syntax to used aggregation pipeline stages
db.collection.aggregate([
{ "$match": {
"_id": {
"$eq": {
"a": "NA",
"b": "HXYZ",
"c": "12345",
"d": "AA"
}
}
}},
{ "$unwind": "$bal" },
{ "$sort": { "bal.date": 1 }},
{ "$group": {
"_id": "$_id",
"bal": {
"$push": "$bal"
}
}}
])
From the looks of it, you're saving the date as String, the sort() function will sort the dates as Strings which will not give you the order you're expecting.
You can either run a script that will convert the bal.date field to Date() format and then sort() will work automatically, or you can do the converting + sorting server side.

Sum of All Messages Sent

Mongodb Version 2.6.9
I'm attempting to count the total a company has been involved in a messaging interaction. I'm able to get one side of the interaction using the aggregate $group, but I've come up empty on essentially looking at the two fields and aggregating those together for each unique company ID.
The sender_id and receiver_id relate to the same company Id's.
{ "_id" : a, "sender_id" : 1, "receiver_id" : 2, payload: "data" }
{ "_id" : b, "sender_id" : 2, "receiver_id" : 5, payload: "data" }
{ "_id" : c, "sender_id" : 2, "receiver_id" : 4, payload: "data" }
{ "_id" : d, "sender_id" : 3, "receiver_id" : 2, payload: "data" }
{ "_id" : e, "sender_id" : 4, "receiver_id" : 1, payload: "data" }
Using the above data structure, I attempting to produce a result set similar to
{ "_id" : 1, count: 2}
{ "_id" : 2, count: 4}
{ "_id" : 3, count: 1}
{ "_id" : 4, count: 2}
{ "_id" : 5, count: 1}
where for example Company 2 was involved in messages a, b, c, d.
Your options are limited in 2.6 pipeline. You can try below pipeline.
$group with $push to create single value array for both sender_id and receiver_id.
$project with $setUnion to merge ids into single array.
$unwind and $group to count the occurrences.
db.collection.aggregate({
"$group": {
"_id": "$_id",
"sender_id": {
"$push": "$sender_id"
},
"receiver_id": {
"$push": "$receiver_id"
}
}
}, {
"$project": {
"id": {
"$setUnion": ["$sender_id", "$receiver_id"]
}
}
}, {
"$unwind": "$id"
}, {
"$group": {
"_id": "$id",
"count": {
"$sum": 1
}
}
})
You can use below pipeline for newer versions. Use [] to create array.
db.collection.aggregate({
$project: {
id: ["$sender_id", "$receiver_id"]
}
}, {
$unwind: "$id"
}, {
$group: {
_id: "$id",
count: {
$sum: 1
}
}
})

MongoDB aggregate nested array correctly

OK I am very new to Mongo, and I am already stuck.
Db has the following structure (much simplified for sure):
{
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"score" : "2",
"value" : "AA",
},
{
"score" : "2",
"value" : "AA",
},
{
"score" : "4",
"value" : "BB",
},
{
"score" : "3",
"value" : "CC",
}
]
},
{
"_id" : ObjectId("57fdfbc12dc30a46507044ef"),
"keyterms" : [
...
There are some Objects. Each Object have an array "keywords". Each of this Arrays Entries, which have score and value. There are some duplicates though (not really, since in the real db the keywords entries have much more fields, but concerning value and score they are duplicates).
Now I need a query, which
selects one object by id
groups its keyterms in by value
and counts the dublicates
sorts them by score
So I want to have something like that as result
// for Object 57fdfbc12dc30a46507044ec
"keyterms"; [
{
"score" : "4",
"value" : "BB",
"count" : 1
},
{
"score" : "3",
"value" : "CC",
"count" : 1
}
{
"score" : "2",
"value" : "AA",
"count" : 2
}
]
In SQL I would have written something like this
select
score, value, count(*) as count
from
all_keywords_table_or_some_join
group by
value
order by
score
But, sadly enough, it's not SQL.
In Mongo I managed to write this:
db.getCollection('tests').aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$_id",
'keyterms': {$push: "$keyterms"}
}},
{$project: {
'keyterms.score': 1,
'keyterms.value': 1
}}
])
But there is something missing: the grouping of the the keywords by their value. I can not get rid of the feeling, that this is the wrong approach at all. How can I select the keywords array and continue with that, and use an aggregate function inly on this - that would be easy.
BTW I read this
(Mongo aggregate nested array)
but I can't figure it out for my example unfortunately...
You'd want an aggregation pipeline where after you $unwind the array, you group the flattened documents by the array's value and score keys, aggregate the counts using the $sum accumulator operator and retain the main document's _id with the $first operator.
The preceding pipeline should then group the documents from the previous pipeline by the _id key so as to preserve the original schema and recreate the keyterms array using the $push operator.
The following demonstration attempts to explain the above aggregation operation:
db.tests.aggregate([
{ "$match": { "_id": ObjectId("57fdfbc12dc30a46507044ec") } },
{ "$unwind": "$keyterms" },
{
"$group": {
"_id": {
"value": "$keyterms.value",
"score": "$keyterms.score"
},
"doc_id": { "$first": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$sort": {"_id.score": -1 } },
{
"$group": {
"_id": "$doc_id",
"keyterms": {
"$push": {
"value": "$_id.value",
"score": "$_id.score",
"count": "$count"
}
}
}
}
])
Sample Output
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"value" : "BB",
"score" : "4",
"count" : 1
},
{
"value" : "CC",
"score" : "3",
"count" : 1
},
{
"value" : "AA",
"score" : "2",
"count" : 2
}
]
}
Demo
Meanwhile, I solved it myself:
aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$keyterms.value",
'keyterms': {$push: "$keyterms"},
'escore': {$first: "$keyterms.score"},
'evalue': {$first: "$keyterms.value"}
}},
{$limit: 15},
{$project: {
"score": "$escore",
"value": "$evalue",
"count": {$size: "$keyterms"}
}}
])