Mongo remove duplicates in array of objects based on field - mongodb

New to Mongo, have found lots of examples of removing dupes from arrays of strings using the aggregation framework, but am wondering if possible to remove dupes from array of objects based on a field in the object. Eg
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
},
{
"id" : 1,
"val" : "xxxxxx"
}
]
}
Here I'd like to remove dupes based on the id field. So would end up with
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
}
]
}
Picking the first/any object with given id works. Just want to end up with one per id. Is this doable in aggregation framework? Or even outside aggregation framework, just looking for a clean way to do this. Need to do this type of thing across many documents in collection, which seems like a good use case for aggregation framework, but as I mentioned, newbie here...thanks.

Well, you may get desired result 2 ways.
Classic
Flatten - Remove duplicates (pick first occurrence) - Group by
db.collection.aggregate([
{
$unwind: "$values"
},
{
$group: {
_id: "$values.id",
values: {
$first: "$values"
},
id: {
$first: "$_id"
},
name: {
$first: "$name"
}
}
},
{
$group: {
_id: "$id",
name: {
$first: "$name"
},
values: {
$push: "$values"
}
}
}
])
MongoPlayground
Modern
We need to use $reduce operator.
Pseudocode:
values : {
var tmp = [];
for (var value in values) {
if !(value.id in tmp)
tmp.push(value);
}
return tmp;
}
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$concatArrays: [
"$$value",
{
$cond: [
{
$in: [
"$$this.id",
"$$value.id"
]
},
[],
[
"$$this"
]
]
}
]
}
}
}
}
}
])
MongoPlayground

You can use $reduce, Try below query :
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$cond: [
{ $in: ["$$this.id", "$$value.id"] }, /** Check if 'id' exists in holding array if yes push same array or concat holding array with & array of new object */
"$$value",
{ $concatArrays: ["$$value", ["$$this"]] }
]
}
}
}
}
}
]);
Test : MongoDB-Playground

Related

How do I fetch only the first element from the array?

How do I fetch only the first element from the "topicsName" array?
Data I have input:
{
"_id" : ObjectId("606b7046a0ccf72222c00c2f"),
"groupId" : ObjectId("5f06cca74e51ba15f5167b86"),
"insertedAt" : "2021-04-05T20:17:10.144521Z",
"isActive" : true,
"staffId" : [
"606b6c34a0ccf72222c5a4df",
"606b6c48a0ccf722228aa035"
],
"subjectName" : "Maths",
"teamId" : ObjectId("6069a6a9a0ccf704e7f4b537"),
"updatedAt" : "2022-04-29T07:57:31.072067Z",
"syllabus" : [
{
"chapterId" : "626b9b94ae6cd2092024f3ee",
"chapterName" : "chap1",
"topicsName" : [
{
"topicId" : "626b9b94ae6cd2092024f3ef",
"topicName" : "1.1"
},
{
"topicId" : "626b9b94ae6cd2092024f3f0",
"topicName" : "1.2"
}
]
},
{
"chapterId" : "626b9b94ae6cd2092024f3f1",
"chapterName" : "chap2",
"topicsName" : [
{
"topicId" : "626b9b94ae6cd2092024f3f2",
"topicName" : "2.1"
},
{
"topicId" : "626b9b94ae6cd2092024f3f3",
"topicName" : "2.2"
}
]
}
]
}
The Query I used to try to fetch the element:- "topicId" : "626b9b94ae6cd2092024f3ef" from the
"topicsName" array.
db.subject_staff_database
.find(
{ _id: ObjectId("606b7046a0ccf72222c00c2f") },
{
syllabus: {
$elemMatch: {
chapterId: "626b9b94ae6cd2092024f3f1",
topicsName: { $elemMatch: { topicId: "626b9b94ae6cd2092024f3f2" } },
},
},
}
)
.pretty();
I was trying to fetch only the first element from the "topicsName" array, but it fetched both the elements in that array.
You can do the followings in an aggregation pipeline.
$match with your given id locate documents
$reduce to flatten the syllabus and topicsName arrays
$filter to get the expected element
db.collection.aggregate([
{
$match: {
"syllabus.topicsName.topicId": "626b9b94ae6cd2092024f3ef"
}
},
{
"$project": {
result: {
"$reduce": {
"input": "$syllabus.topicsName",
"initialValue": [],
"in": {
"$concatArrays": [
"$$value",
"$$this"
]
}
}
}
}
},
{
"$project": {
result: {
"$filter": {
"input": "$result",
"as": "r",
"cond": {
$eq: [
"$$r.topicId",
"626b9b94ae6cd2092024f3ef"
]
}
}
}
}
}
])
Here is the Mongo playground for your reference.
Welcome Ganesh Sowdepalli,
You are not only asking to "fetch only the first element from the array", but to fetch only the matching element of a nested array property of an object item in array.
Edit: (according to #ray's comment)
One way to do it is using an aggregation pipeline:
db.subject_staff_database.aggregate([
{
$match: {"_id": ObjectId("606b7046a0ccf72222c00c2f")}
},
{
$project: {
syllabus: {
$filter: {
input: "$syllabus",
as: "item",
cond: {$eq: ["$$item.chapterId", "626b9b94ae6cd2092024f3f1"
]
}
}
}
}
},
{
$unwind: "$syllabus"
},
{
$project: {
"syllabus.topicsName": {
$filter: {
input: "$syllabus.topicsName",
as: "item",
cond: {$eq: ["$$item.topicId", "626b9b94ae6cd2092024f3f2"]}
}
},
"syllabus.chapterId": 1,
"syllabus.chapterName": 1,
_id: 0
}
}
])
As you can see on this playground example.
If you want the actual first element, not by _id, look here on my first understanding to your question.
The aggregation pipeline allows us to do several operation on the results.
Since syllabus is an array that may contain more than one matching chapterId, we need to $filter it for the items we want.

Count and apply condition to slice the mongodb array document

My document structure looks like this:
{
"_id" : ObjectId("5aeeda07f3a664c55e830a08"),
"profileId" : ObjectId("5ad84c8c0e71892058b6a543"),
"list" : [
{
"content" : "answered your post",
"createdBy" : ObjectId("5ad84c8c0e71892058b6a540")
},
{
"content" : "answered your post",
"createdBy" : ObjectId("5ad84c8c0e71892058b6a540")
},
{
"content" : "answered your post",
"createdBy" : ObjectId("5ad84c8c0e71892058b6a540")
},
],
}
I want to count array of
list field. And apply condition before slicing that
if the list<=10 then slice all the elements of list
else 10 elements.
P.S I used this query but is returning null.
db.getCollection('post').aggregate([
{
$match:{
profileId:ObjectId("5ada84c8c0e718s9258b6a543")}
},
{$project:{notifs:{$size:"$list"}}},
{$project:{notifications:
{$cond:[
{$gte:["$notifs",10]},
{$slice:["$list",10]},
{$slice:["$list","$notifs"]}
]}
}}
])
Your first $project stage effectively wipes out all result fields but the one(s) that it explicitly projects (only notifs in your case). That's why the second $project stage cannot $slice the list field anymore (it has been removed by the first $project stage).
Also, I think your $cond/$slice combination can be more elegantly expressed using the $min operator. So there's at least the following two fixes for your problem:
Using $addFields:
db.getCollection('post').aggregate([
{ $match: { profileId: ObjectId("5ad84c8c0e71892058b6a543") } },
{ $addFields: { notifs: { $size: "$list" } } },
{ $project: {
notifications: {
$slice: [ "$list", { $min: [ "$notifs", 10 ] } ]
}
}}
])
Using a calculation inside the $project - this avoids a stage so should be preferable.
db.getCollection('post').aggregate([
{ $match: { profileId: ObjectId("5ad84c8c0e71892058b6a543") } },
{ $project: {
notifications: {
$slice: [ "$list", { $min: [ { $size: "$list" }, 10 ] } ]
}
}}
])

MongoDb - Pop array element based on if condition

I am trying to update my mongo database which has following structure.
{
"_id" : ObjectId("5a64d076bfd103df081967ae"),
"values" : [
{
"date" : "2018-01-22",
"Price" : "1289.4075"
},
{
"date" : "2018-01-22",
"Price" : "1289.4075"
},
{
"date" : "2015-05-18",
"Price" : 1289.41
}
],
"Code" : 123456,
"schemeStatus" : "Inactive"
}
I want to compare first 2 array element's date value i.e values[0].date and values[1].date. If both matches then I want to delete values[0] so that there will be only 1 entry with that date.
You can use aggregation framework's pipeline with $out as a last stage to update your collection
db.collection.aggregate([
{
$addFields: {
sameDate: {
$let: {
vars: {
fst: { $arrayElemAt: [ "$values", 0 ] },
snd: { $arrayElemAt: [ "$values", 1 ] }
},
in: { $cond: { if: { $eq: [ "$$fst.date", "$$snd.date" ] }, then: 1, else: 0 } }
}
}
}
},
{
$project: {
_id: 1,
values : { $cond: { if: { $eq: [ "$sameDate", 0 ] }, then: "$values", else: { $slice: [ "$values", 1, { $size: "$values" } ] } } },
Code: 1,
schemeStatus: 1
}
},
{ $out: "collection" }
])
Some more important operators used here:
$cond to handle if-else logic
$let to define some helper variables
$arrayElemAt to get first and second element
$slice to pop first element

How to write a custom function to split a document into multiple documents of same Id

I am trying to split a document which has the following fields of string type:
{
"_id" : "17121",
"firstName": "Jello",
"lastName" : "New",
"bio" :"He is a nice person."
}
I want to split the above document into three new documents For Example:
{
"_id": "17121-1",
"firstName": "Jello"
}
{
"_id": "17121-2",
"firstName": "New"
}
{
"_id": "17121-3",
"bio": "He is a nice person."
}
Can anyone suggest how to proceed?
db.coll1.find().forEach(function(obj){
// I want to extract every single field. How to iterate on the field within this Bson object(obj) to collect every field.?
});
or any suggestion to do with aggregation pipeline in MongoDB.
You can use the below aggregation query.
The below query will convert each document fields into key value document array followed by $unwind while keeping the index and $replaceRoot with merge to produce the desired output.
$objectToArray to produce array (keyvalarr) with key (name of the array field)-value (array field) pair.
$match to remove the _id key value document.
$arrayToObject to produce the named key value while adding new _id key value pair and flatten array key values.
db.coll.aggregate([
{
"$project": {
"keyvalarr": {
"$objectToArray": "$$ROOT"
}
}
},
{
"$unwind": {
"path": "$keyvalarr",
"includeArrayIndex": "index"
}
},
{
"$match": {
"keyvalarr.k": {
"$ne": "_id"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
{
"k": "_id",
"v": {
"$concat": [
{
"$substr": [
"$_id",
0,
-1
]
},
"-",
{
"$substr": [
"$index",
0,
-1
]
}
]
}
},
"$keyvalarr"
]
}
}
}
])
Anu. Here are two options you can use.
The first option is pretty straightforward, but it requires you to hardcode _id' indexes yourself.
db.users.aggregate([
{
$project: {
pairs : [
{ firstName: '$firstName', _id : { $concat : [ { $substr : [ '$_id', 0, 50 ] }, '-1' ] } },
{ lastName: '$lastName', _id : { $concat : [ '$_id', '-2' ] } },
{ bio: '$bio', _id : { $concat : [ { $substr : [ '$_id', 0, 50 ] }, '-3' ] } }
]
}
},
{
$unwind : '$pairs'
},
{
$replaceRoot: { newRoot: '$pairs' }
}
])
The second option does a little bit more job and is somewhat more tricky. But it is probably easier to extend if you ever need to add another field.
db.users.aggregate([
{
$project: {
pairs : [
{ firstName: '$firstName' },
{ lastName: '$lastName' },
{ bio: '$bio' }
]
}
},
{
$addFields: {
pairsReference : '$pairs'
}
},
{
$unwind: '$pairs'
},
{
$addFields: {
'pairs._id' : { $concat: [ { $substr : [ '$_id', 0, 50 ] }, '-', { $substr: [ { $indexOfArray : [ '$pairsReference', '$pairs' ] }, 0, 2 ] } ] }
}
},
{
$replaceRoot: { newRoot: '$pairs' }
}
])
You can redirect results of both queries into another collection by using $out stage.
UPD:
The only reason you get the error is that one of the _ids is not a string.
Replace the first parameter of $concat ($_id) with the following expression:
{ $substr : [ '$_id', 0, 50 ] }

Using mongodb $lookup on a single collection

I have a collection with documents like this
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"bsk" : {
"bskItems" : [
{
"id" : 4,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "reblochon"
}
},
{
"id" : 5,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Pinot Noir"
}
},
{
"id" : 13,
"bskItemLineType" : "PromotionItem",
"promotionApplied" : {
"bskIds" : [
4,
5
]
}
},
{
"id" : 8,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Food"
}
},
{
"id" : 10,
"bskItemLineType" : "SubTotalItem"
},
{
"id" : 12,
"bskItemLineType" : "TenderItem"
},
{
"id" : 14,
"bskItemLineType" : "ChangeDue"
}
]
}
}
I want an output where I can see the "promotionsApplied" and the descriptions of the items they applied to. For the document above the "promotionsApplied" were to "bsk.BskItems.id" 4 and 5 so I would like the output to be:
{
"_id": xxxxx,
"promotionAppliedto : "reblochon"
},
{
"_id": xxxxx,
"promotionAppliedto : "Pinot Noir"
}
the query below:
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.product.description":1,"bsk.bskItems.id":1}},
{$unwind: "$bsk.bskItems"},
])
gets me the descriptions
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.promotionApplied.bskIds":1}},
{$unwind: "$bsk.bskItems"},
{$unwind:"$bsk.bskItems.promotionApplied.bskIds"},
])
gets me the promotions applied. I was hoping to be able to use $lookup to join the two based on _id and bsk.bskItems.promotionApplied.bskIds and _id and bsk.bskItems.id, but I can't figure out how.
I don't know if you solved your problem or if this is relevant anymore but I figured out your question:
db.DSTest.aggregate([
{
$unwind: "$bsk.bskItems"
},
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
},
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
},
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
},
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
},
{
$unwind: "$product"
},
{
$match: {
"product.p": {$ne: ""}, "groupCount": { $gt: 1}
}
},
{
$project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p"
}
}
])
With the dummy document you gave this is the result I get:
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "reblochon"
}
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "Pinot Noir"
}
But my advise is to put some thought in your database structure next time. You had apples and pears, so we had to make an Asian pear in order to get to this result. Also from the aggregation levels you see it was not an easy job. That could have been much easier if you had separated the arrays that contained the field product from the ones that contained the field promotionApplied.
To break it down and explain what is happening step by step:
{
$unwind: "$bsk.bskItems"
}
By unwinding we are flattening our array. We need this in order to access the fields inside the array and do operations on them . More about $unwind
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
}
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] }
With this line we just make sure that every document gets an basket item id. In your case they all do, I just added it to make sure. And if some document didn't have a value for that field we set it to 0 (you can set it to -1 or whatever you want)
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
}
Here we are creating an array for the field "$bsk.bskItems.promotionApplied.bskIds". Since not all documents have this field we have to add to them all, otherwise we are comparing oranges with apples.
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
As said before, we have to make our documents look all alike so we also add $bsk.bskItems.product.description to the ones that don't have this field. Those who don't have the field we set it to an empty string
Now all our documents have the same structure and we can start with the actual sorting out.
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
}
Since we want to access the ids inside $bsk.bskItems.promotionApplied.bskIds we have to unwind this array as well.
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
}
baItId: 1 and product: 1, are just being passed on. The proAppliedId will contain our bsk.bskItems.promotionApplied.bskIds. If they are 0 then the get the same id as the field $baItId, otherwise they keep their id.
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
}
Now finally we can group our documents by $proAppliedId that we created in the previous aggregation pipeline.
We also push the product values in an array. So there will be now arrays that contain two entries.
One with the value that we look for and one with an empty string because we did that in a previous aggregation pipeline "product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
We also create a new field called groupCount to count the documents that were grouped together.
{ $project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p" } }
In the final project we just build the final document by how we want it to look like.
Hope you understand now why thinking, were and how we save things, matter.
Using document type database - it will be better to store promotion metadtaa instead of only id.
Please see attached example
"promotionApplied" : [{
bskId : 4,
name : "name",
otherData : "otherData"
}, {
bskId : 5,
name : "name5",
otherData : "otherData5"
}
]