MongoDB - Average fields values over documents - mongodb

I have a collection with this schema:
{
"fields":
{
"field1": [
{"name": "abc", "value": 2},
{"name": "bcd", "value": 4},
{"name": "cde", "value": 6}
],
"field2": [
{"name": "dec", "value": 3},
{"name": "das", "value": 8},
{"name": "pam", "value": 10}
]
}
},
{
"fields":
{
"field1": [
{"name": "abc", "value": 7},
{"name": "cde", "value": 12}
],
"field2": [
{"name": "dec", "value": 3},
{"name": "das", "value": 8},
{"name": "pam", "value": 10}
]
}
}
What I'm trying to obtain is e.g. the average values of all members of 'field1', evaluating 0 if a member exist in a document but not in another (like 'bcd').
So in this example I should get:
{
'_id': 'abc',
'avg': 4.5
},
{
'_id': 'bcd',
'avg': 2
},
{
'_id': 'cde',
'avg': 9
}
I wrote this aggregation query but I'm pretty sure there is something wrong with it:
db.statuses.aggregate([
{
$unwind: '$fields.field1'
},
{
$group: {
_id: '$fields.field1.name',
avg: {
$avg: '$fields.field1.value'
}
}
},
{
$sort: {
avg: -1
}
}
])
I think I should add a step before the average calculation in which I have to build an array of all values for each name (0 if the name does not exist in a document), and then evaluate the average on these arrays. Am I right?
How could I do this?

Related

How do I summarize tags by category in mongodb

I have a collection that is shaped like this:
[
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4567"),
tags: {
language: [ 'en' ],
industries: [ 'agency', 'travel' ],
countries: [ 'ca', 'us' ],
regions: [ 'north-america' ],
}
},
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4568"),
tags: {
language: [ 'en', 'fr' ],
industries: [ 'travel' ],
countries: [ 'ca' ]
}
},
{
_id: ObjectId("5d8e8c9b8f8b9b7b7a8b4569"),
tags: {
language: [ 'en' ],
industries: [ 'agency', 'travel' ],
countries: [ 'ca', 'us' ],
regions: [ 'south-america' ]
}
},
]
and I would like to generate this as a result...
{
//* count of all documents
"count": 3,
//* count of all documents that contain any slug within the given category
"countWithCategorySlug": {
"language": 3,
"industries": 3,
"countries": 3,
"regions": 2
},
//* per category: count of documents that contain that slug in the givin category
"language" {
"en": 3,
"fr": 1
},
"industries" {
"agency": 2,
"travel": 3,
},
"countries" {
"ca": 3,
"us": 2
},
"regions" {
"north-america": 1,
"south-america": 1
}
}
super stuck so any help would be appreciated. :)
The number of categories is unknown and I have a code solution that queries the list of disctint categories and slugs then for each one generates a $group stage... The resultant query is excessively big and there needs to be a better way... problem is that I have absolutely no idea on how to optimize it...
Query
the first part before the facet is done to seperate them and make for each value 1 document like
[{
"type": "language",
"value": "en",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "industries",
"value": "agency",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "industries",
"value": "travel",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
"type": "countries",
"value": "ca",
"_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
}]
and then facet with 3 fields and count the documents
and after than transformations to have data on keys like the expected output
Playmongo
ggregate(
[{"$set": {"tags": {"$objectToArray": "$tags"}}},
{"$set":
{"tags":
{"$map":
{"input": "$tags",
"in": {"type": "$$this.k", "value": "$$this.v", "_id": "$_id"}}}}},
{"$unwind": "$tags"},
{"$replaceRoot": {"newRoot": "$tags"}},
{"$unwind": "$value"},
{"$facet":
{"count":
[{"$group": {"_id": null, "count": {"$addToSet": "$_id"}}},
{"$set": {"count": {"$size": "$count"}}}],
"category":
[{"$group": {"_id": "$type", "count": {"$addToSet": "$_id"}}},
{"$set": {"count": {"$size": "$count"}}}],
"values":
[{"$group":
{"_id": "$value",
"type": {"$first": "$type"},
"values": {"$addToSet": "$_id"}}},
{"$set": {"values": {"$size": "$values"}}},
{"$group":
{"_id": "$type",
"values":
{"$push":
{"type": "$type", "value": "$_id", "count": "$values"}}}}]}},
{"$set":
{"count":
{"$getField":
{"field": "count", "input": {"$arrayElemAt": ["$count", 0]}}},
"category":
{"$arrayToObject":
[{"$map":
{"input": "$category",
"in": {"k": "$$this._id", "v": "$$this.count"}}}]},
"values":
{"$arrayToObject":
[{"$map":
{"input": "$values",
"in":
{"k": "$$this._id",
"v":
{"$arrayToObject":
[{"$map":
{"input": "$$this.values",
"in": {"k": "$$this.value", "v": "$$this.count"}}}]}}}}]}}}])
Outputs
[{
"count": 3,
"category": {
"countries": 3,
"industries": 3,
"regions": 2,
"language": 3
},
"values": {
"regions": {
"south-america": 1,
"north-america": 1
},
"countries": {
"us": 2,
"ca": 3
},
"language": {
"fr": 1,
"en": 3
},
"industries": {
"agency": 2,
"travel": 3
}
}
}]

Grouping multiple documents with nested array of objects in MongoDB

I'm having documents that are having this structures
x = {
"scalar": 1,
"array": [
{"key": 1, "value": 2},
{"key": 2, "value": 3},
],
"array2": [
{"key": 1, "value": 2},
{"key": 2, "value": 3},
],
}
and
y = {
"scalar": 2,
"array": [
{"key": 1, "value": 3},
{"key": 3, "value": 0},
],
"array2": [
{"key": 1, "value": 3},
{"key": 3, "value": 0},
],
}
The end results I'm trying to find is this
{
"scalar": 3, # SUM of scalar
"array": [
{"key": 1, "value": 5}, # SUM by key = 1
{"key": 2, "value": 3},
{"key": 3, "value": 0},
],
"array2": [
{"key": 1, "value": 5}, # SUM by key = 1
{"key": 2, "value": 3},
{"key": 3, "value": 0},
],
}
I've tried to use double $unwind and then do push by. I'm thinking of using $reduce to get the final results
Query
one way to do it, is by facet, you want 3 groupings and facet can do that , like break into 3 seperate parts, to not mix the unwinds, i think this is the most simple way to do it
Test code here
db.collection.aggregate([
{
"$facet": {
"scalar": [
{
"$project": {
"scalar": 1
}
},
{
"$group": {
"_id": null,
"sum": {
"$sum": "$scalar"
}
}
},
{
"$unset": [
"_id"
]
}
],
"array": [
{
"$project": {
"array": 1
}
},
{
"$unwind": {
"path": "$array"
}
},
{
"$group": {
"_id": "$array.key",
"sum": {
"$sum": "$array.value"
}
}
},
{
"$project": {
"_id": 0,
"key": "$_id",
"value": "$sum"
}
}
],
"array2": [
{
"$project": {
"array2": 1
}
},
{
"$unwind": {
"path": "$array2"
}
},
{
"$group": {
"_id": "$array2.key",
"sum": {
"$sum": "$array2.value"
}
}
},
{
"$project": {
"_id": 0,
"key": "$_id",
"value": "$sum"
}
}
]
}
},
{
"$set": {
"scalar": {
"$arrayElemAt": [
"$scalar.sum",
0
]
}
}
}
])
Other alternative is to unwind both arrays, but then unwinds and groups will be mixed, making things complicated i think.
Also $reduce cant be used for grouping in MongoDB i think, because we can't construct dynamic paths.
If group-reduce and have this data (key=key value=value)
{"1" : 5 , "2" : 3}
And we see {"key" 1, "value" : 5} how we can check if the above data contains the 1 as key? We cant construct dynamic paths, like $$this.1 . Only way it to convert it to an array and back to object that will be so slow.

MongoDB groupby - pull comma separated values in a field

I want to groupby the collection and want to pull the comma-separated values.
In the below example, I want to group by "type" and want to pull $sum of "total" and all possible unique values of "value" in a field that should be comma seperated.
collection:
[
{
"type": "1",
"value": "value1",
"total": 10
},
{
"type": "1",
"value": "value3",
"total": 20
},
{
"type": "1",
"value": "value3",
"total": 30
},
{
"type": "2",
"value": "value1",
"total": 10
},
{
"type": "2",
"value": "value2",
"total": 20
}
]
The output that I am expecting:
[
{
"type": "1",
"value": "value1,value3",
"total": 60
},
{
"type": "2",
"value": "value1,value2",
"total": 30
}
]
Please help to provide the approach or code.
This can be achieved just with $group and $project aggregation methods:
https://mongoplayground.net/p/230nt_AMFIm
db.collection.aggregate([
{
$group: {
_id: "$type",
value: {
$addToSet: "$value"
},
total: {
$sum: "$total"
}
},
},
{
$project: {
_id: 0,
type: "$_id",
value: "$value",
total: "$total"
}
},
])

mongodb: Query documents with average greater than a number

How do I find the documents with with average score greater than 5. The collection looks like this:
Collection restaurants:
{"grades": [{"grade": "A", "score": 2}, {"grade": "A", "score": 6}], "name": "Morris Park Bake Shop", "restaurant_id": "30075445"}
{"grades": [{"grade": "A", "score": 8}, {"grade": "B", "score": 23}], "name": "Wendy'S", "restaurant_id": "30112340"}
{"grades": [{"grade": "A", "score": 2}, {"grade": "A", "score": 11}], "name": "Dj Reynolds Pub And Restaurant", "restaurant_id": "30191841"}
I tried
db.restaurants.find({average_socre: {$gt: 5}}, {average_socre:{$avg: "$grades.score"}})
but it doesn't work.
Using this query:
db.collection.find({
$expr: {
$gt: [
{
"$avg": "$grades.score"
},
5
]
}
},
{
average_score: {
$avg: "$grades.score"
}
})
Live

MongoDB group data melt

Say I have a small dataset:
[
{"A": 0, "B": 0, "X": 100, "Y": 100},
{"A": 1, "B": 0, "X": 50, "Y": 55},
{"A": 0, "B": 1, "X": 25, "Y": 30},
{"A": 1, "B": 1, "X": 1, "Y": 6}
]
I also have a pipeline where the final stage is a group:
[
{
"$group": {
"_id": {
"classification1": {
"$eq": ["$A", 1]
},
"classification2": {
"$eq": ["$B", 1]
}
},
"countX": {"$sum": "$X"},
"countY": {"$sum": "$Y"}
}
}
]
The output of this pipeline:
[
{"_id": {"classification1": false, "classification2": false}, "countX": 100, "countY": 100},
{"_id": {"classification1": true, "classification2": false}, "countX": 50, "countY": 55},
{"_id": {"classification1": false, "classification2": true}, "countX": 25, "countY": 30},
{"_id": {"classification1": true, "classification2": true}, "countX": 1, "countY": 6}
]
What pipeline steps would I need to reach a melted format like this?
[
{"name": "classification1", "countX": 51, "countY": 61},
{"name": "classification2", "countX": 26, "countY": 36}
]
Note that this transformation counts document 1 from the previous stage zero times, and counts document 4 twice (since both conditions are false, or both are true).
I have written a Javascript function for this, but Javascript functions cannot be invoked from the pipeline (aggregation pipelines must be serializable). Unfortunately, that means I have to unload the data from the DB, run the script on the data, and then load the transformed data back in as a temporary collection to finish the rest of the pipeline after this stage.
Any assistance is greatly appreciated.
I did some reading on facets. Somewhat verbose, but this query provides melted data in the proper format:
[
{
"$group": {
"_id": {
"classification1": {
"$eq": ["$A", 1]
},
"classification2": {
"$eq": ["$B", 1]
}
},
"countX": {"$sum": "$X"},
"countY": {"$sum": "$Y"}
}
},
{
"$facet": {
"classification1": [
{"$match": {"_id.classification1": true}},
{"$group": {"_id": null, "X": {"$sum": "$countX"}, "Y": {"$sum": "$countY"}}},
{"$addFields": {"name": "classification1"}}
],
"classification2": [
{"$match": {"_id.classification2": true}},
{"$group": {"_id": null, "X": {"$sum": "$countX"}, "Y": {"$sum": "$countY"}}},
{"$addFields": {"name": "classification2"}}
]
}
},
{
"$project": {"combine": {"$setUnion": ["$classification1", "$classification2"]}}
},
{
"$unwind": "$combine"
},
{
"$replaceRoot": {"newRoot": "$combine"}
},
{
"$project": {"_id": 0}
}
]