"Structured" grouping query in MongoDB - mongodb

I have the following items collection :
[{
"_id": 1,
"manufactureId": 1,
"itemTypeId": "Type1"
},
{
"_id": 2,
"manufactureId": 1,
"itemTypeId": "Type2"
},
{
"_id": 3,
"manufactureId": 2,
"itemTypeId": "Type1"
}]
I would like to create a query that will return the amount of items for each item type that each manufacturer have in the following structure (or something similar) :
[
{
_id:1, //this would be the manufactureId
itemsCount:{
"Type1":1, //Type1 items count
"Type2":1 //...
}
},
{
_id:2,
itemsCount:{
"Type1":1
}
}
]
I have tried to use the aggregation framework but i couldn't figure out if there is a way to create a "structured" groupby queries with it.
I can easily achieve the desired result by post-processing this simple aggregation query result :
db.items.aggregate([{$group:{_id:{itemTypeId:"$itemTypeId",manufactureId:"$manufactureId"},count:{$sum:1}}}])
but if possible I prefer not to post-process the result.

Data stays data
I would rather use this query which, I believe, will give you the closest data structure to what you want, without post-processing.
Query
db.items.aggregate(
{
$group:
{
_id:
{
itemTypeId: "$itemTypeId",
manufactureId: "$manufactureId"
},
count:
{
$sum: 1
}
},
},
{
$group:
{
_id: "$_id.manufactureId",
itemCounts:
{
"$push":
{
itemTypeId: "$_id.itemTypeId",
count: "$count"
}
}
}
})
Output
{
"_id" : 1,
"itemCounts" : [
{
"itemTypeId" : "Type1",
"count" : 1
},
{
"itemTypeId" : "Type2",
"count" : 1
}
]
},
{
"_id" : 2,
"itemCounts" : [
{
"itemTypeId" : "Type1",
"count" : 1
}
]
}
Data transformed to object fields
This is actually an approach that I wouldn't advice in general. It is harder to manage in your application, because the field names between different objects will be inconsistent and you won't know what object fields to expect in advance. This would be a crucial point if you use a strongly typed language—automatic data binding to your domain objects will become impossible.
Anyway, the only way to get the exact data structure you want is to apply post-processing.
Query
db.items.aggregate(
{
$group:
{
_id:
{
itemTypeId: "$itemTypeId",
manufactureId: "$manufactureId"
},
count:
{
$sum: 1
}
},
},
{
$group:
{
_id: "$_id.manufactureId",
itemCounts:
{
"$push":
{
itemTypeId: "$_id.itemTypeId",
count: "$count"
}
}
}
}).forEach(function(doc) {
var obj = {
_id: doc._id,
itemCounts: {}
};
doc.itemCounts.forEach(function(typeCount) {
obj.itemCounts[typeCount.itemTypeId] = typeCount.count;
});
printjson(obj);
})
Output
{ "_id" : 1, "itemCounts" : { "Type1" : 1, "Type2" : 1 } }
{ "_id" : 2, "itemCounts" : { "Type1" : 1 } }

Related

Mongodb aggregation count by nested object key

db.artists.insertMany([
{ "_id" : 1, "achievements" : {"third_record":true, "second_record": true} },
{ "_id" : 3, "achievements" : {"sixth_record":true, "second_record": true} },
{ "_id" : 2, "achievements" : {"first_record":true, "fifth_record": true} },
{ "_id" : 4, "achievements" : {"first_record":true, "second_record": true} },
])
I would like to count how many first_record, second_record, etc achievements have been obtained, I don't know beforehand the names of the achievements. I just want it to count all the achievements matched in the first stage. How do I use aggregation to count this? I saw another question suggest using unwind but that seems to be for arrays only and not objects?
May be this:
db.collection.aggregate([
{
$project: {
as: {
$objectToArray: "$achievements"
}
}
},
{
$unwind: "$as"
},
{
$group: {
_id: "$as.k",
number: {
$sum: {
"$cond": [
{
$eq: [
"$as.v",
true
]
},
1,
0
]
}
}
}
}
])
Idea
convert object to array
unwind to get them separate
group by id, adding 1 for true, 0 for false.

How to delete Duplicate objects inside array in multiple documents in mongodb?

I am trying to delete the duplicate object inside the array in multiple documents in Mongodb.
I try many ways but not able to fix
Document Structure:-
{
"_id" : ObjectId("5a544fe234602415114601d3"),
"GstDetails" : [
{
"_id" : ObjectId("5e4837374d62f4c95163908e"),
"StateId" : "1",
"GstIn" : "33ABFFM1655H1ZF",
"StateDesc" : "TAMIL NADU",
"CityDesc" : "CHENNAI"
},
{
"_id" : ObjectId("5e4837484d62f4c9516395e8"),
"StateId" : "1",
"GstIn" : "33ABFFM1655H1ZF",
"StateDesc" : "TAMIL NADU",
"CityDesc" : "CHENNAI"
}
]
}
Like that many more documents
I tried:-
db.Supplier.find({ "GstDetails": { $size: 2 } }).limit(1).forEach(function (doc) {
var stateId;
doc.GstDetails.forEach(function (data) {
if (data.StateId == stateId) {
pull doc.GstDetails[0];
} else {
stateId = data.StateId
}
print(JSON.stringify(doc));
});
db.Supplier.save(doc)
});
Check if aggregation below meets your requirements:
db.Supplier.aggregate([
{
$unwind: "$GstDetails"
},
{
$group: {
_id: {
_id: "$_id",
StateId: "$GstDetails.StateId"
},
GstDetails: {
$push: "$GstDetails"
}
}
},
{
$addFields: {
GstDetails: {
$slice: [
"$GstDetails",
1
]
}
}
},
{
$unwind: "$GstDetails"
},
{
$group: {
_id: "$_id._id",
GstDetails: {
$push: "$GstDetails"
}
}
}
])
MongoPlayground
Note: This read-only query. If it is OK, you need to add as last stage below operator (once you execute it, it will update your documents, no rollback available):
{$out: "Supplier"}

MongoDB two groups Aggregate

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.
I would like to transform that :
{
"_id" : ObjectId("5836b919885383034437d4a7"),
"Identificador" : "G-3474",
"Miembros" : [
{
"_id" : ObjectId("5836b916885383034437d238"),
"Nombre" : "Pilar",
"Email" : "pcarrillocasa#gmail.es",
"Edad" : 24,
"País" : "España",
"Tipo" : "Usuario individual",
"Apellidos" : "Carrillo Casa",
"Teléfono" : 637567234,
"Ciudad" : "Santander",
"Identificador" : "U-3486",
"Información_creación" : {
"Fecha_creación" : {
"Mes" : 4,
"Día" : 22,
"Año" : 2016
},
"Hora_creación" : {
"Hora" : 15,
"Minutos" : 34,
"Segundos" : 20
}
}
}
}
into that
{
"Nombre_Grupo" : "Amigo invisible"
"Ciudades" : [
{
"Ciudad" : "Madrid",
"Miembros": 30
},
{
"Ciudad" : "Almería",
"Miembros": 10
}
{
"Ciudad" : "Badajoz",
"Miembros": 20
}
]
}
with MongoDB.
I tried with that:
db.Grupos_usuarios.aggregate([
{ $group: { _id: "$Nombre_Grupo",total: { $sum: "$amount" } },
$group: { _id: "$Ciudad",total: { $sum: "$amount" } } }
])
but I could not get what I needed.
May somebody help me to know what I am doing bad?
The following aggregation gets the output you are looking for.
The $unwind stage deconstructs an array field from the input documents to output a document for each element. These documents are used to group by the Miembros.Ciudad and get the total Miembros for each Ciudad. In the second group stage we Pivot data to get all the Ciudades from the previous grouping into an array. The last $project is for formatting the output.
db.test.aggregate( [
{
$unwind: "$Miembros"
},
{
$group: {
_id: "$Miembros.Ciudad",
total: { $sum: 1 }
}
},
{
$group: {
_id: "Amigo invisible",
Ciudades: { $push: { Ciudad: "$_id", Miembros: "$total"} }
}
},
{
$project: {
Nombre_Grupo: "$_id",
Ciudades: 1,
_id: 0
}
}
] )

Using mongodb $lookup on a single collection

I have a collection with documents like this
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"bsk" : {
"bskItems" : [
{
"id" : 4,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "reblochon"
}
},
{
"id" : 5,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Pinot Noir"
}
},
{
"id" : 13,
"bskItemLineType" : "PromotionItem",
"promotionApplied" : {
"bskIds" : [
4,
5
]
}
},
{
"id" : 8,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Food"
}
},
{
"id" : 10,
"bskItemLineType" : "SubTotalItem"
},
{
"id" : 12,
"bskItemLineType" : "TenderItem"
},
{
"id" : 14,
"bskItemLineType" : "ChangeDue"
}
]
}
}
I want an output where I can see the "promotionsApplied" and the descriptions of the items they applied to. For the document above the "promotionsApplied" were to "bsk.BskItems.id" 4 and 5 so I would like the output to be:
{
"_id": xxxxx,
"promotionAppliedto : "reblochon"
},
{
"_id": xxxxx,
"promotionAppliedto : "Pinot Noir"
}
the query below:
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.product.description":1,"bsk.bskItems.id":1}},
{$unwind: "$bsk.bskItems"},
])
gets me the descriptions
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.promotionApplied.bskIds":1}},
{$unwind: "$bsk.bskItems"},
{$unwind:"$bsk.bskItems.promotionApplied.bskIds"},
])
gets me the promotions applied. I was hoping to be able to use $lookup to join the two based on _id and bsk.bskItems.promotionApplied.bskIds and _id and bsk.bskItems.id, but I can't figure out how.
I don't know if you solved your problem or if this is relevant anymore but I figured out your question:
db.DSTest.aggregate([
{
$unwind: "$bsk.bskItems"
},
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
},
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
},
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
},
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
},
{
$unwind: "$product"
},
{
$match: {
"product.p": {$ne: ""}, "groupCount": { $gt: 1}
}
},
{
$project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p"
}
}
])
With the dummy document you gave this is the result I get:
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "reblochon"
}
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "Pinot Noir"
}
But my advise is to put some thought in your database structure next time. You had apples and pears, so we had to make an Asian pear in order to get to this result. Also from the aggregation levels you see it was not an easy job. That could have been much easier if you had separated the arrays that contained the field product from the ones that contained the field promotionApplied.
To break it down and explain what is happening step by step:
{
$unwind: "$bsk.bskItems"
}
By unwinding we are flattening our array. We need this in order to access the fields inside the array and do operations on them . More about $unwind
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
}
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] }
With this line we just make sure that every document gets an basket item id. In your case they all do, I just added it to make sure. And if some document didn't have a value for that field we set it to 0 (you can set it to -1 or whatever you want)
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
}
Here we are creating an array for the field "$bsk.bskItems.promotionApplied.bskIds". Since not all documents have this field we have to add to them all, otherwise we are comparing oranges with apples.
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
As said before, we have to make our documents look all alike so we also add $bsk.bskItems.product.description to the ones that don't have this field. Those who don't have the field we set it to an empty string
Now all our documents have the same structure and we can start with the actual sorting out.
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
}
Since we want to access the ids inside $bsk.bskItems.promotionApplied.bskIds we have to unwind this array as well.
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
}
baItId: 1 and product: 1, are just being passed on. The proAppliedId will contain our bsk.bskItems.promotionApplied.bskIds. If they are 0 then the get the same id as the field $baItId, otherwise they keep their id.
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
}
Now finally we can group our documents by $proAppliedId that we created in the previous aggregation pipeline.
We also push the product values in an array. So there will be now arrays that contain two entries.
One with the value that we look for and one with an empty string because we did that in a previous aggregation pipeline "product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
We also create a new field called groupCount to count the documents that were grouped together.
{ $project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p" } }
In the final project we just build the final document by how we want it to look like.
Hope you understand now why thinking, were and how we save things, matter.
Using document type database - it will be better to store promotion metadtaa instead of only id.
Please see attached example
"promotionApplied" : [{
bskId : 4,
name : "name",
otherData : "otherData"
}, {
bskId : 5,
name : "name5",
otherData : "otherData5"
}
]

count array occurrences across all documents with mongo

Im trying to pull data on a collection of documents which looks like:
[
{
name: 'john',
sex: 'male',
hobbies: ['football', 'tennis', 'swimming']
},
{
name: 'betty'
sex: 'female',
hobbies: ['football', 'tennis']
},
{
name: 'frank'
sex: 'male',
hobbies: ['football', 'tennis']
}
]
I am trying to use the aggregation framework to present the data, split by sex, counting the most common hobbies. The results should look something like.
{ _id: 'male',
total: 2,
hobbies: {
football: 2,
tennis: 2,
swimming: 1
}
},
{ _id: 'female',
total: 1,
hobbies: {
football: 1,
tennis: 1
}
}
So far I can get the total of each sex, but i'm not sure how I could possibly use unwind to get the totals of the hobbies array.
My code so far:
collection.aggregate([
{
$group: {
_id: '$sex',
total: { $sum: 1 }
}
}
])
Personally I am not a big fan of transforming "data" as the names of keys in a result. The aggregation framework principles tend to aggree as this sort of operation is not supported either.
So the personal preference is to maintain "data" as "data" and accept that the processed output is actually better and more logical to a consistent object design:
db.people.aggregate([
{ "$group": {
"_id": "$sex",
"hobbies": { "$push": "$hobbies" },
"total": { "$sum": 1 }
}},
{ "$unwind": "$hobbies" },
{ "$unwind": "$hobbies" },
{ "$group": {
"_id": {
"sex": "$_id",
"hobby": "$hobbies"
},
"total": { "$first": "$total" },
"hobbyCount": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.sex",
"total": { "$first": "$total" },
"hobbies": {
"$push": { "name": "$_id.hobby", "count": "$hobbyCount" }
}
}}
])
Which produces a result like this:
[
{
"_id" : "female",
"total" : 1,
"hobbies" : [
{
"name" : "tennis",
"count" : 1
},
{
"name" : "football",
"count" : 1
}
]
},
{
"_id" : "male",
"total" : 2,
"hobbies" : [
{
"name" : "swimming",
"count" : 1
},
{
"name" : "tennis",
"count" : 2
},
{
"name" : "football",
"count" : 2
}
]
}
]
So the initial $group does the count per "sex" and stacks up the hobbies into an array of arrays. Then to de-normalize you $unwind twice to get singular items, $group to get the totals per hobby under each sex and finally regroup an array for each sex alone.
It's the same data, it has a consistent and organic structure that is easy to process, and MongoDB and the aggregation framework was quite happy in producing this output.
If you really must convert your data to names of keys ( and I still recommend you do not as it is not a good pattern to follow in design ), then doing such a tranformation from the final state is fairly trivial for client code processing. As a basic JavaScript example suitable for the shell:
var out = db.people.aggregate([
{ "$group": {
"_id": "$sex",
"hobbies": { "$push": "$hobbies" },
"total": { "$sum": 1 }
}},
{ "$unwind": "$hobbies" },
{ "$unwind": "$hobbies" },
{ "$group": {
"_id": {
"sex": "$_id",
"hobby": "$hobbies"
},
"total": { "$first": "$total" },
"hobbyCount": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.sex",
"total": { "$first": "$total" },
"hobbies": {
"$push": { "name": "$_id.hobby", "count": "$hobbyCount" }
}
}}
]).toArray();
out.forEach(function(doc) {
var obj = {};
doc.hobbies.sort(function(a,b) { return a.count < b.count });
doc.hobbies.forEach(function(hobby) {
obj[hobby.name] = hobby.count;
});
doc.hobbies = obj;
printjson(doc);
});
And then you are basically processing each cursor result into the desired output form, which really isn't an aggregation function that is really required on the server anyway:
{
"_id" : "female",
"total" : 1,
"hobbies" : {
"tennis" : 1,
"football" : 1
}
}
{
"_id" : "male",
"total" : 2,
"hobbies" : {
"tennis" : 2,
"football" : 2,
"swimming" : 1
}
}
Where that should also be fairly trival to implement that sort of manipulation into stream processing of the cursor result to tranform as required, as it is basically just the same logic.
On the other hand, you can always implement all the manipulation on the server using mapReduce instead:
db.people.mapReduce(
function() {
emit(
this.sex,
{
"total": 1,
"hobbies": this.hobbies.map(function(key) {
return { "name": key, "count": 1 };
})
}
);
},
function(key,values) {
var obj = {},
reduced = {
"total": 0,
"hobbies": []
};
values.forEach(function(value) {
reduced.total += value.total;
value.hobbies.forEach(function(hobby) {
if ( !obj.hasOwnProperty(hobby.name) )
obj[hobby.name] = 0;
obj[hobby.name] += hobby.count;
});
});
reduced.hobbies = Object.keys(obj).map(function(key) {
return { "name": key, "count": obj[key] };
}).sort(function(a,b) {
return a.count < b.count;
});
return reduced;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
var obj = {};
value.hobbies.forEach(function(hobby) {
obj[hobby.name] = hobby.count;
});
value.hobbies = obj;
return value;
}
}
)
Where mapReduce has it's own distinct style of output, but the same principles are used in accumulation and manipulation, if not likely as efficient as the aggregation framework can do:
"results" : [
{
"_id" : "female",
"value" : {
"total" : 1,
"hobbies" : {
"football" : 1,
"tennis" : 1
}
}
},
{
"_id" : "male",
"value" : {
"total" : 2,
"hobbies" : {
"football" : 2,
"tennis" : 2,
"swimming" : 1
}
}
}
]
At the end of the day, I still say that the first form of processing is the most efficient and provides to my mind the most natural and consistent working of the data output, without even attempting to convert the data points into the names of keys. It's probably best to consider following that pattern, but if you really must, then there are ways to manipulate results into a desired form in various approaches to processing.
Since mongoDB version 3.4 you can use $reduce avoid the first grouping by sex which means holding the entire collection in t2o documents. You can also avoid the need for code, by using $arrayToObject
db.collection.aggregate([
{
$group: {
_id: {sex: "$sex", hobbies: "$hobbies"},
count: {$sum: 1},
totalIds: {$addToSet: "$_id"}
}
},
{
$group: {
_id: "$_id.sex",
hobbies: {$push: {k: "$_id.hobbies", v: "$count"}},
totalIds: {$push: "$totalIds"}
}
},
{
$set: {
hobbies: {$arrayToObject: "$hobbies"},
totalIds: {
$reduce: {
input: "$totalIds",
initialValue: [],
in: {$concatArrays: ["$$value", "$$this"]}}
}
}
},
{
$set: {
count: {$size: {$setIntersection: "$totalIds"}},
totalIds: "$$REMOVE"
}
}
])
Which works if you have an ObjectId.
Playground example 3.4
Otherwise, you can start with $unwind and $group, or since mongoDB version 4.4 you can add an ObjectId with a stage:
{
$set: {
o: {
$function: {
"body": "function (x) {x._id=new ObjectId(); return x}",
"args": [{_id: 1}],
"lang": "js"
}
}
}
},
Playground example creating _id
Since mongoDB version 5.0 you can calculate the total using $setWindowFields:
db.collection.aggregate([
{
$setWindowFields: {
partitionBy: "$sex",
output: {totalCount: {$count: {}}}
}
},
{$unwind: "$hobbies"},
{
$group: {
_id: {sex: "$sex", hobbies: "$hobbies"},
count: {$sum: 1},
totalCount: {$first: "$totalCount"}
}
},
{
$group: {
_id: "$_id.sex",
hobbies: {$push: {k: "$_id.hobbies", v: "$count"}},
total: {$first: "$totalCount"}
}
},
{$set: {hobbies: {$arrayToObject: "$hobbies"}}}
])
Playground example 5.0