MongoDB count documents for each array elements - mongodb

I h
{
code : "X1",
elements : ["A", "B", "C", "D"]
},
{
code : "X2",
elements : ["C", "D"]
},
{
code : "X3",
elements : ["A"]
}
...
I would like to know the number of documents present for each type of value in the "elements" array.
es.
es.
"A" : 2
"B" : 1
"C" : 2
"D" : 2
is it possible with a single query?

You can $unwind your array to get single document per element and then run $group to count elements:
db.collection.aggregate([
{
$unwind: "$elements"
},
{
$group: {
_id: "$elements",
count: { $sum: 1 }
}
}
])
EDIT: you can use additional group with $replaceRoot and $arrayToObject to return your ids as keys and counts as values:
db.collection.aggregate([
{
$unwind: "$elements"
},
{
$group: {
_id: "$elements",
count: { $sum: 1 }
}
},
{
$group: {
_id: null,
counts: { $push: { k: "$_id", v: "$count" } }
}
},
{
$replaceRoot: {
newRoot: { $arrayToObject: "$counts" }
}
}
])
Mongo Playground

Related

Mongodb group by values and count the number of occurence

I am trying to count how many times does a particular value occur in a collection.
{
_id:1,
field1: value,
field2: A,
}
{
_id:2,
field1: value,
field2: A,
}
{
_id:3,
field1: value,
field2: C,
}
{
_id:4,
field1: value,
field2: B,
}
what I want is to count how many times A occurs, B occurs and C occurs and return the count.
The output I want
{
A: 2,
B: 1,
C: 1,
}
You can use $facet in an aggregate pipeline like this:
$facet create "three ways" where in each one filter the values by desired key (A, B or C).
Then in a $project stage you can get the $size of the matched values.
db.collection.aggregate([
{
"$facet": {
"first": [
{
"$match": {
"field2": "A"
}
}
],
"second": [
{
"$match": {
"field2": "B"
}
}
],
"third": [
{
"$match": {
"field2": "C"
}
}
]
}
},
{
"$project": {
"A": {
"$size": "$first"
},
"B": {
"$size": "$second"
},
"C": {
"$size": "$third"
}
}
}
])
Example here
This is typical use case for $group stage in Aggregation Pipeline. You can do it like this:
$group - to group all the documents by field2
$sum - to count the number of documents for each value of field2
db.collection.aggregate([
{
"$group": {
"_id": "$field2",
"count": {
"$sum": 1
}
}
}
])
Working example
Leverage the $arrayToObject operator and a final $replaceWith pipeline to get the desired result. You would need to run the following aggregate pipeline:
db.collection.aggregate([
{ $group: {
_id: { $toUpper: '$field2' },
count: { $sum: 1 }
} },
{ $group: {
_id: null,
counts: {
$push: { k: '$_id', v: '$count' }
}
} },
{ $replaceWith: { $arrayToObject: '$counts' } }
])
Mongo Playground

MongoDB aggregation filter based on max value

Suppose I have a document structure where one of the fields, X, is an array av objects as shown below.
"X" : [
{
"A" : "abc",
"B" : 123
},
{
"A" : "wer",
"B" : 124
},
{
"A" : "fgh",
"B" : 124
}
]
How can I project only the document where field B has the highest values? And if the maximum value is shared by several documents, I just want to return one of them (not important which one). In this case the result could look like:
"X" : [
{
"A" : "wer",
"B" : 124
}
]
What about this one:
db.collection.aggregate([
{
$set: {
X: {
$filter: {
input: "$X",
cond: { $eq: ["$$this.B", { $max: "$X.B" }] }
}
}
}
},
{ $set: { X: { $arrayElemAt: ["$X", 0] } } }
])
You can use $reduce
db.collection.aggregate([
{
"$project": {
"X": {
$reduce: {
input: "$X",
initialValue: {},
in: {
$cond: [ { "$gt": [ "$$this.B", "$$value.B" ]}, // Condition Check
"$$this", // If condition true ($$this - Current Object)
"$$value" // If condition false $$value - Previous Returned Object
]
}
}
}
}
}
])
Mongo Playground
Updated answer:
Another option that results in the full object being returned at the end:
[
{$unwind: {
path: "$X"
}},
{$sort: {
"X.B": -1
}},
{$group: {
_id: { _id: "$_id"},
X: {
$first: "$X"
}
}}]
Original answer:
You can use the $max operator (https://docs.mongodb.com/manual/reference/operator/aggregation/max/).
[{$project: {
X: {$max: "$X.B"}
}}]

MongoDB count number of non-missing fields

I'm using the following code to calculate average and standard deviation of a field named "b" in my collection.
db.ctg.aggregate(
[
{
$group:
{
_id: "b",
avg: { $avg: "$b" },
stdev: { $stdDevPop: "$b" }
}
}
]
)
The result is:
{ "_id" : "b", "avg" : 878.4397930385701, "stdev" : 893.8744489449962 }
I need to add number of non missing elements of "b" to my result so it looks like this:
{ "_id" : "b", "avg" : 878.4397930385701, "stdev" : 893.8744489449962, "nonmissing": 2126 }
How can I do this in the query above?
Result of $avg & $stdDevPop doesn't change even after removal of documents where b doesn't exists ($avg ignores all docs where field is non-numeric/missing), So you can try below query.
Query :
db.ctg.aggregate([
{ $match: { b: { $exists: true } } },
{
$group:
{
_id: "b",
avg: { $avg: "$b" },
stdev: { $stdDevPop: "$b" },
nonMissing: { $sum: 1 }
}
}
])

MongoDB - aggregating with nested objects, and changeable keys

I have a document which describes counts of different things observed by a camera within a 15 minute period. It looks like this:
{
"_id" : ObjectId("5b1a709a83552d002516ac19"),
"start" : ISODate("2018-06-08T11:45:00.000Z"),
"end" : ISODate("2018-06-08T12:00:00.000Z"),
"recording" : ObjectId("5b1a654683552d002516ac16"),
"data" : {
"counts" : {
"5b434d05da1f0e00252566be" : 12,
"5b434d05da1f0e00252566cc" : 4,
"5b434d05da1f0e00252566ca" : 1
}
}
}
The keys inside the data.counts object change with each document and refer to additional data that is fetched at a later date. There are unlimited number of keys inside data.counts (but usually about 20)
I am trying to aggregate all these 15 minute documents up to daily aggregated documents.
I have this query at the moment to do that:
db.getCollection("segments").aggregate([
{$match:{
"recording": ObjectId("5bf7f68ad8293a00261dd83f")
}},
{$project:{
"start": 1,
"recording": 1,
"data": 1
}},
{$group:{
_id: { $dateToString: { format: "%Y-%m-%d", date: "$start" } },
"segments": { $push: "$$ROOT" }
}},
{$sort: {_id: -1}},
]);
This does the grouping and returns all the segments in an array.
I want to also aggregate the information inside data.counts, so that I get the sum of values for all keys that are the same within the daily group.
This would save me from having another service loop through each 15 minute segment summing values with the same keys. E.g. the query would return something like this:
{
"_id" : "2019-02-27",
"counts" : {
"5b434d05da1f0e00252566be" : 351,
"5b434d05da1f0e00252566cc" : 194,
"5b434d05da1f0e00252566ca" : 111
... any other keys that were found within a day
}
}
How might I amend the query I already have, or use a different query?
Thanks!
You could use the $facet pipeline stage to create two sub-pipelines; one for segments and another for counts. These sub-pipelines can be joined by using $zip to stitch them together and $map to merge each 2-element array produced from zip. Note this will only work correctly if the sub-pipelines output sorted arrays of the same size, which is why we group and sort by start_date in each sub-pipeline.
Here's the query:
db.getCollection("segments").aggregate([{
$match: {
recording: ObjectId("5b1a654683552d002516ac16")
}
}, {
$project: {
start: 1,
recording: 1,
data: 1,
start_date: { $dateToString: { format: "%Y-%m-%d", date: "$start" }}
}
}, {
$facet: {
segments_pipeline: [{
$group: {
_id: "$start_date",
segments: {
$push: {
start: "$start",
recording: "$recording",
data: "$data"
}
}
}
}, {
$sort: {
_id: -1
}
}],
counts_pipeline: [{
$project: {
start_date: "$start_date",
count: { $objectToArray: "$data.counts" }
}
}, {
$unwind: "$count"
}, {
$group: {
_id: {
start_date: "$start_date",
count_id: "$count.k"
},
count_sum: { $sum: "$count.v" }
}
}, {
$group: {
_id: "$_id.start_date",
counts: {
$push: {
$arrayToObject: [[{
k: "$_id.count_id",
v: "$count_sum"
}]]
}
}
}
}, {
$project: {
counts: { $mergeObjects: "$counts" }
}
}, {
$sort: {
_id: -1
}
}]
}
}, {
$project: {
result: {
$map: {
input: { $zip: { inputs: ["$segments_pipeline", "$counts_pipeline"] }},
in: { $mergeObjects: "$$this" }
}
}
}
}, {
$unwind: "$result"
}, {
$replaceRoot: {
newRoot: "$result"
}
}])
Try it out here: Mongoplayground.

Intersection of several arrays

I have some documents having a array protperty Items.
I want to get the intercept between n docuements.
db.things.insert({name:"A", items:[1,2,3,4,5]})
db.things.insert({name:"B", items:[2,4,6,8]})
db.things.insert({name:"C", items:[1,2]})
db.things.insert({name:"D", items:[5,6]})
db.things.insert({name:"E", items:[9,10]})
db.things.insert({name:"F", items:[1,5]})
Data:
{ "_id" : ObjectId("57974a0d356baff265710a1c"), "name" : "A", "items" : [ 1, 2, 3, 4, 5 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1d"), "name" : "B", "items" : [ 2, 4, 6, 8 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1e"), "name" : "C", "items" : [ 1, 2 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1f"), "name" : "D", "items" : [ 5, 6 ] },
{ "_id" : ObjectId("57974a0d356baff265710a20"), "name" : "E", "items" : [ 9, 10 ] },
{ "_id" : ObjectId("57974a1a356baff265710a21"), "name" : "F", "items" : [ 1, 5 ] }
For example:
things.mane.A intercept things.mane.C intercept things.mane.F:
[ 1, 2, 3, 4, 5 ] intercept [ 1, 2 ] intercept [ 1, 5 ]
Must be: [1]
I think that it's doable using $setIntersectionbut I can't find the way.
I can do it with two documents but how to do it with more ?
db.things.aggregate({$match:{"name":{$in:["A", "F"]}}},
{$group:{_id:null, "setA":{$first:"$items"}, "setF":{$last:"$items"} } },
{
"$project": {
"set1": 1,
"set2": 1,
"commonToBoth": { "$setIntersection": [ "$setA", "$setF" ] },
"_id": 0
}
}
)
{ "commonToBoth" : [ 5, 1 ] }
A solution which is not specific to the number of input items could look like so:
db.things.aggregate(
{
$match: {
"name": {
$in: ["A", "F"]
}
}
},
{
$group: {
_id: "$items",
count: {
$sum: 1
}
}
},
{
$group: {
_id: null,
totalCount: {
$sum: "$count"
},
items: {
$push: "$_id"
}
}
},
{
$unwind: {
path: "$items"
}
},
{
$unwind: {
path: "$items"
}
},
{
$group: {
_id: "$items",
totalCount: {
$first: "$totalCount"
},
count: {
$sum: 1
}
}
},
{
$project: {
_id: 1,
presentInAllDocs: {
$eq: ["$totalCount", "$count"]
}
}
},
{
$match: {
presentInAllDocs: true
}
},
{
$group: {
_id: null,
items: {
$push: "$_id"
}
}
}
)
which will output this
{
"_id" : null,
"items" : [
5,
1
]
}
Of course you can add a last $project stage to bring the result into the desired shape.
Explanation
The basic idea behind this is that when we count the number of documents and we count the number of occurrences of each item, then the items with a count equal to the total document count appeared in each document and are therefore in the intersection result.
This idea has one important assumption: your items arrays have no duplicates in it (i.e. they are sets). If this assumption is wrong, then you would have to insert an additional stage at the beginning of the pipeline to turn the arrays into sets.
One could also build this pipeline in a different and probably shorter way but I tried to keep the resource usage as low as possible and therefore added possibly unnecessary (from the functional point of view) stages. For example, the second stage groups by the items array as my assumption is that there are far fewer different values/arrays than documents so the rest of the pipeline has to work with a fraction of the initial document count. However, from the functional point of view, we just need the total count of documents and therefore we could skip that stage and just make a $group stage counting all documents and pushing them into an array for later usage - which of course is a big hit for memory consumption as we have now an array of all possible documents.
If your are using mongo 3.2, you could use arrayElemAt to precise all arguments of $setIntersection :
db.things.aggregate([{
$match: {
"name": {
$in: ["A", "B", "C"]
}
}
}, {
$group: {
_id: 0,
elements: {
$push: "$items"
}
}
}, {
$project: {
intersect: {
$setIntersection: [{
"$arrayElemAt": ["$elements", 0]
}, {
"$arrayElemAt": ["$elements", 1]
}, {
"$arrayElemAt": ["$elements", 2]
}]
},
}
}]);
You would have to dynamically add the require number of JsonObject with index such as :
{
"$arrayElemAt": ["$elements", <index>]
}
It should match with the number of elements of your input items in ["A", "B", "C"]
If you want to deal with duplicates (some name are present multiple time), regroup all your items by name, $unwind twice and $addToSet to merge all array for a specific $name before executing the previous aggregation :
db.things.aggregate([{
$match: {
"name": {
$in: ["A", "B", "C"]
}
}
}, {
$group: {
_id: "$name",
"items": {
"$push": "$items"
}
}
}, {
"$unwind": "$items"
}, {
"$unwind": "$items"
}, {
$group: {
_id: "$_id",
items: {
$addToSet: "$items"
}
}
}, {
$group: {
_id: 0,
elements: {
$push: "$items"
}
}
}, {
$project: {
intersect: {
$setIntersection: [{
"$arrayElemAt": ["$elements", 0]
}, {
"$arrayElemAt": ["$elements", 1]
}, {
"$arrayElemAt": ["$elements", 2]
}]
},
}
}]);
It isn't a clean solution but it works