MongoDB multiple levels embedded array query - mongodb

I have a document like this:
{
_id: 1,
data: [
{
_id: 2,
rows: [
{
myFormat: [1,2,3,4]
},
{
myFormat: [1,1,1,1]
}
]
},
{
_id: 3,
rows: [
{
myFormat: [1,2,7,8]
},
{
myFormat: [1,1,1,1]
}
]
}
]
},
I want to get distinct myFormat values as a complete array.
For example: I need the result as: [1,2,3,4], [1,1,1,1], [1,2,7,8]
How can I write mongoDB query for this?
Thanks for the help.

Please try this, if every object in rows has only one field myFormat :
db.getCollection('yourCollection').distinct('data.rows')
Ref : mongoDB Distinct Values for a field
Or if you need it in an array & also objects in rows have multiple other fields, try this :
db.yourCollection.aggregate([{$project :{'data.rows.myFormat':1}},{ $unwind: '$data' }, { $unwind: '$data.rows' },
{ $group: { _id: '$data.rows.myFormat' } },
{ $group: { _id: '', distinctValues: { $push: '$_id' } } },
{ $project: { distinctValues: 1, _id: 0 } }])
Or else:
db.yourCollection.aggregate([{ $project: { values: '$data.rows.myFormat' } }, { $unwind: '$values' }, { $unwind: '$values' },
{ $group: { _id: '', distinctValues: { $addToSet: '$values' } } }, { $project: { distinctValues: 1, _id: 0 } }])
Above aggregation queries would get what you wanted, but those can be tedious on large datasets, try to run those and check if there is any slowness, if you're using for one-time then if needed you can consider using {allowDiskUse: true} & irrespective of one-time or not you need to check on whether to use preserveNullAndEmptyArrays:true or not.
Ref : allowDiskUse , $unwind preserveNullAndEmptyArrays

Related

MongoDB - aggregating with nested objects, and changeable keys

I have a document which describes counts of different things observed by a camera within a 15 minute period. It looks like this:
{
"_id" : ObjectId("5b1a709a83552d002516ac19"),
"start" : ISODate("2018-06-08T11:45:00.000Z"),
"end" : ISODate("2018-06-08T12:00:00.000Z"),
"recording" : ObjectId("5b1a654683552d002516ac16"),
"data" : {
"counts" : {
"5b434d05da1f0e00252566be" : 12,
"5b434d05da1f0e00252566cc" : 4,
"5b434d05da1f0e00252566ca" : 1
}
}
}
The keys inside the data.counts object change with each document and refer to additional data that is fetched at a later date. There are unlimited number of keys inside data.counts (but usually about 20)
I am trying to aggregate all these 15 minute documents up to daily aggregated documents.
I have this query at the moment to do that:
db.getCollection("segments").aggregate([
{$match:{
"recording": ObjectId("5bf7f68ad8293a00261dd83f")
}},
{$project:{
"start": 1,
"recording": 1,
"data": 1
}},
{$group:{
_id: { $dateToString: { format: "%Y-%m-%d", date: "$start" } },
"segments": { $push: "$$ROOT" }
}},
{$sort: {_id: -1}},
]);
This does the grouping and returns all the segments in an array.
I want to also aggregate the information inside data.counts, so that I get the sum of values for all keys that are the same within the daily group.
This would save me from having another service loop through each 15 minute segment summing values with the same keys. E.g. the query would return something like this:
{
"_id" : "2019-02-27",
"counts" : {
"5b434d05da1f0e00252566be" : 351,
"5b434d05da1f0e00252566cc" : 194,
"5b434d05da1f0e00252566ca" : 111
... any other keys that were found within a day
}
}
How might I amend the query I already have, or use a different query?
Thanks!
You could use the $facet pipeline stage to create two sub-pipelines; one for segments and another for counts. These sub-pipelines can be joined by using $zip to stitch them together and $map to merge each 2-element array produced from zip. Note this will only work correctly if the sub-pipelines output sorted arrays of the same size, which is why we group and sort by start_date in each sub-pipeline.
Here's the query:
db.getCollection("segments").aggregate([{
$match: {
recording: ObjectId("5b1a654683552d002516ac16")
}
}, {
$project: {
start: 1,
recording: 1,
data: 1,
start_date: { $dateToString: { format: "%Y-%m-%d", date: "$start" }}
}
}, {
$facet: {
segments_pipeline: [{
$group: {
_id: "$start_date",
segments: {
$push: {
start: "$start",
recording: "$recording",
data: "$data"
}
}
}
}, {
$sort: {
_id: -1
}
}],
counts_pipeline: [{
$project: {
start_date: "$start_date",
count: { $objectToArray: "$data.counts" }
}
}, {
$unwind: "$count"
}, {
$group: {
_id: {
start_date: "$start_date",
count_id: "$count.k"
},
count_sum: { $sum: "$count.v" }
}
}, {
$group: {
_id: "$_id.start_date",
counts: {
$push: {
$arrayToObject: [[{
k: "$_id.count_id",
v: "$count_sum"
}]]
}
}
}
}, {
$project: {
counts: { $mergeObjects: "$counts" }
}
}, {
$sort: {
_id: -1
}
}]
}
}, {
$project: {
result: {
$map: {
input: { $zip: { inputs: ["$segments_pipeline", "$counts_pipeline"] }},
in: { $mergeObjects: "$$this" }
}
}
}
}, {
$unwind: "$result"
}, {
$replaceRoot: {
newRoot: "$result"
}
}])
Try it out here: Mongoplayground.

Mongo aggregation pipeline, finding out the total number of entries in an array per user

I have a collection, lets call it 'user'. In this collection there is a property entries, which holds a variably sized array of strings,
I want to find out the total number of these strings across my collection.
db.users.find()
> [{ entries: [] }, { entries: ['entry1','entry2']}, {entries: ['entry1']}]
So far I have have made many attempts here are some of my closest.
db.users.aggregate([
{ $project:
{ numberOfEntries:
{ $size: "$entries" } }
},
{ $group:
{_id: { total_entries: { $sum: "$entries"}
}
}
}
])
What this gives me is a list of the users with the total number of entries, now what I want is each of the total_entries figures added up to get my total. Any ideas of what I am doing wrong. Or if there is a better way to start this?
A possible solution could be:
db.users.aggregate([{
$group: {
_id: 'some text here',
count: {$sum: {$size: '$entries'}}
}
}]);
This will give you the total count of all entries across all users and look like
[
{
_id: 'some text here',
count: 3
}
]
I would use $unwind in the case that you want individual entry counts.
That would look like
db.users.aggregate([
{ $unwind: '$entries' },
{$group: {
_id: '$entries',
count: {$sum: 1}
}
])
and this will give you something along the lines of:
[
{
_id: 'entry1',
count: 2
},
{
_id: 'entry2',
count: 1
}
]
In case you want the overall distinct nbr of entries:
> db.users.aggregate([
{ $unwind: "$entries" },
{ $group: { _id: "$entries" } },
{ $count: "total" }
])
{ "total" : 2 }
In case you want the overall nbr of entries:
> db.users.aggregate( [ { $unwind: "$entries" }, { $count: "total" } ] )
{ "total" : 3 }
This makes use of the "unwind" operator which flattens elements of an array from records:
> db.users.aggregate( [ { $unwind: "$entries" } ] )
{ "_id" : ObjectId("5a81a7a1318e1cfc10250430"), "entries" : "entry1" }
{ "_id" : ObjectId("5a81a7a1318e1cfc10250430"), "entries" : "entry2" }
{ "_id" : ObjectId("5a81a7a1318e1cfc10250431"), "entries" : "entry1" }
You were in the right direction though you just needed to specify an _id value of null in the $group stage to calculate accumulated values for all the input documents as a whole i.e.
db.users.aggregate([
{
"$project": {
"numberOfEntries": {
"$size": {
"$ifNull": ["$entries", []]
}
}
}
},
{
"$group": {
"_id": null, /* _id of null to get the accumulated values for all the docs */
"totalEntries": { "$sum": "$numberOfEntries" }
}
}
])
Or with just a single pipeline as:
db.users.aggregate([
{
"$group": {
"_id": null, /* _id of null to get the accumulated values for all the docs */
"totalEntries": {
"$sum": {
"$size": {
"$ifNull": ["$entries", []]
}
}
}
}
}
])

query length of nested lists

I want to query following entries that have lists of lists and get their lengths:
db.collection.aggregate([
{ $unwind: '$molecules' },
{ $group: {
_id: '$scene',
molecules: { $push: { $size: '$molecules.data' } },
} },
])
gives
{ _id: 7674, molecules: [ 2, 1 ] }

mongodb aggregation query for field value length's sum

Say, I have following documents:
{name: 'A', fav_fruits: ['apple', 'mango', 'orange'], 'type':'test'}
{name: 'B', fav_fruits: ['apple', 'orange'], 'type':'test'}
{name: 'C', fav_fruits: ['cherry'], 'type':'test'}
I am trying to query to find the total count of fav_fruits field on overall documents returned by :
cursor = db.collection.find({'type': 'test'})
I am expecting output like:
cursor.count() = 3 // Getting
Without much idea of aggregate, can mongodb aggregation framework help me achieve this in any way:
1. sum up the lengths of all 'fav_fruits' field: 6
and/or
2. unique 'fav_fruit' field values = ['apple', 'mango', 'orange', 'cherry']
You need to $project your document after the $match stage and use the $size operator which return the number of items in each array. Then in the $group stage you use the $sum accumulator operator to return the total count.
db.collection.aggregate([
{ "$match": { "type": "test" } },
{ "$project": { "count": { "$size": "$fav_fruits" } } },
{ "$group": { "_id": null, "total": { "$sum": "$count" } } }
])
Which returns:
{ "_id" : null, "total" : 6 }
To get unique fav_fruits simply use .distinct()
> db.collection.distinct("fav_fruits", { "type": "test" } )
[ "apple", "mango", "orange", "cherry" ]
Do this to get just the number of fruits in the fav_fruits array:
db.fruits.aggregate([
{ $match: { type: 'test' } },
{ $unwind: "$fav_fruits" },
{ $group: { _id: "$type", count: { $sum: 1 } } }
]);
This will return the total number of fruits.
But if you want to get the array of unique fav_fruits along with the total number of elements in the fav_fruits field of each document, do this:
db.fruits.aggregate([
{ $match: { type: 'test' } },
{ $unwind: "$fav_fruits" },
{ $group: { _id: "$type", count: { $sum: 1 }, fav_fruits: { $addToSet: "$fav_fruits" } } }
])
You can try this. It may helpful to you.
db.collection.aggregate([{ $match : { type: "test" } }, {$group : { _id : null, count:{$sum:1} } }])

Sum in nested document MongoDB

I'm trying to sum some values in an array of documents, with no luck.
This is the Document
db.Cuentas.find().pretty()
{
"Agno": "2013",
"Egresos": [
{
"Fecha": "28-01-2013",
"Monto": 150000,
"Detalle": "Pago Nokia Lumia a #josellop"
},
{
"Fecha": "29-01-2013",
"Monto": 4000,
"Detalle": "Cine, Pelicula fome"
}
],
"Ingresos": [],
"Mes": "Enero",
"Monto": 450000,
"Usuario": "MarioCares"
"_id": ObjectId(....)
}
So, i need the sum of all the "Monto" in "Egresos" for the "Usuario": "MarioCares". In this example 154000
Using aggregation i use this:
db.Cuentas.aggregate(
[
{ $match: {"Usuario": "MarioCares"} },
{ $group:
{
_id: null,
"suma": { $sum: "$Egresos.Monto" }
}
}
]
)
But i always get
{ "result" : [{ "_id" : null, "suma" : 0 }], "ok" : 1 }
What am i doing wrong ?
P.D. already see this and this
As Sammaye indicated, you need to $unwind the Egresos array to duplicate the matched doc per array element so you can $sum over each element:
db.Cuentas.aggregate([
{$match: {"Usuario": "MarioCares"} },
{$unwind: '$Egresos'},
{$group: {
_id: null,
"suma": {$sum: "$Egresos.Monto" }
}}
])
You can do also by this way. don't need to group just project your fields.
db.Cuentas.aggregate([
{ $match: { "Usuario": "MarioCares" } },
{
$project: {
'MontoSum': { $sum: "$Egresos.Monto" }
}
}
])
Since mongoDB version 3.4 you can use $reduce to sum array items:
db.collection.aggregate([
{
$match: {Usuario: "MarioCares"}
},
{
$project: {
suma: {
$reduce: {
input: "$Egresos",
initialValue: 0,
in: {$add: ["$$value", "$$this.Monto"]}
}
}
}
}
])
Playground example