How to delete Duplicate objects inside array in multiple documents in mongodb? - mongodb

I am trying to delete the duplicate object inside the array in multiple documents in Mongodb.
I try many ways but not able to fix
Document Structure:-
{
"_id" : ObjectId("5a544fe234602415114601d3"),
"GstDetails" : [
{
"_id" : ObjectId("5e4837374d62f4c95163908e"),
"StateId" : "1",
"GstIn" : "33ABFFM1655H1ZF",
"StateDesc" : "TAMIL NADU",
"CityDesc" : "CHENNAI"
},
{
"_id" : ObjectId("5e4837484d62f4c9516395e8"),
"StateId" : "1",
"GstIn" : "33ABFFM1655H1ZF",
"StateDesc" : "TAMIL NADU",
"CityDesc" : "CHENNAI"
}
]
}
Like that many more documents
I tried:-
db.Supplier.find({ "GstDetails": { $size: 2 } }).limit(1).forEach(function (doc) {
var stateId;
doc.GstDetails.forEach(function (data) {
if (data.StateId == stateId) {
pull doc.GstDetails[0];
} else {
stateId = data.StateId
}
print(JSON.stringify(doc));
});
db.Supplier.save(doc)
});

Check if aggregation below meets your requirements:
db.Supplier.aggregate([
{
$unwind: "$GstDetails"
},
{
$group: {
_id: {
_id: "$_id",
StateId: "$GstDetails.StateId"
},
GstDetails: {
$push: "$GstDetails"
}
}
},
{
$addFields: {
GstDetails: {
$slice: [
"$GstDetails",
1
]
}
}
},
{
$unwind: "$GstDetails"
},
{
$group: {
_id: "$_id._id",
GstDetails: {
$push: "$GstDetails"
}
}
}
])
MongoPlayground
Note: This read-only query. If it is OK, you need to add as last stage below operator (once you execute it, it will update your documents, no rollback available):
{$out: "Supplier"}

Related

How to combine results in a Mongo aggregation query

I'm new to aggregation queries in Mongo and been really struggling trying to produce the output I want. I have the following aggregation query:
db.events.aggregate([
{ $match: { requestState: "APPROVED" } },
{ $unwind: { path: "$payload.description" } },
{ $group: { _id: { instr: "$payload.description", bu: "$createdByUser", count: { $sum: 1 } } } }
]);
that returns the following results:
{ "_id" : { "instr" : "ABC-123", "bu" : "BU2", "count" : 1 } }
{ "_id" : { "instr" : "ABC-123", "bu" : "BU1", "count" : 1 } }
{ "_id" : { "instr" : "DEF-456", "bu" : "BU1", "count" : 1 } }
How can I amend the aggregation query so that there are only 2 documents returned instead of 3? With the two "ABC-123" results combined into a single result with a new array of counts with the "bu" and "count" fields i.e.
{ "_id" : { "instr" : "ABC-123", "counts": [ { "bu" : "BU1", "count" : 1 }, { "bu" : "BU2", "count" : 1 } ] } }
Many thanks
You can add another stage to only $group by _id.instr and another stage to $project to your desired output shape
db.events.aggregate([
{
$match: { requestState: "APPROVED" }
},
{
$unwind: { path: "$payload.description" }
},
{
$group: {
_id: { instr: "$payload.description", bu: "$createdByUser", count: { $sum: 1 } }
}
},
{
$group: {
_id: { instr: "$_id.instr" },
counts: { $push: { bu: "$_id.bu", count: "$_id.count" } }
}
},
{
$project: {
_id: { instr: "$_id.instr", counts: "$counts" }
}
}
]);

Fetch distinct values from Mongo DB nested array and output to a single array

given below is my data in mongo db.I want to fetch all the unique ids from the field articles ,which is nested under the jnlc_subjects index .The result should contain only the articles array with distinct object Ids.
Mongo Data
{
"_id" : ObjectId("5c9216f1a21a4a31e0c7fa56"),
"jnlc_journal_category" : "Biology",
"jnlc_subjects" : [
{
"subject" : "Conservation Biology",
"views" : "123",
"articles" : [
ObjectId("5c4e93d0135edb6812200d5f"),
ObjectId("5c4e9365135edb6a12200d60"),
ObjectId("5c4e93a8135edb6912200d61")
]
},
{
"subject" : "Micro Biology",
"views" : "20",
"articles" : [
ObjectId("5c4e9365135edb6a12200d60"),
ObjectId("5c4e93d0135edb6812200d5f"),
ObjectId("5c76323fbaaccf5e0bae7600"),
ObjectId("5ca33ce19d677bf780fc4995")
]
},
{
"subject" : "Marine Biology",
"views" : "8",
"articles" : [
ObjectId("5c4e93d0135edb6812200d5f")
]
}
]
}
Required result
I want to get output in following format
articles : [
ObjectId("5c4e9365135edb6a12200d60"),
ObjectId("5c4e93a8135edb6912200d61"),
ObjectId("5c76323fbaaccf5e0bae7600"),
ObjectId("5ca33ce19d677bf780fc4995"),
ObjectId("5c4e93d0135edb6812200d5f")
]
Try as below:
db.collection.aggregate([
{
$unwind: "$jnlc_subjects"
},
{
$unwind: "$jnlc_subjects.articles"
},
{ $group: {_id: null, uniqueValues: { $addToSet: "$jnlc_subjects.articles"}} }
])
Result:
{
"_id" : null,
"uniqueValues" : [
ObjectId("5ca33ce19d677bf780fc4995"),
ObjectId("5c4e9365135edb6a12200d60"),
ObjectId("5c4e93a8135edb6912200d61"),
ObjectId("5c4e93d0135edb6812200d5f"),
ObjectId("5c76323fbaaccf5e0bae7600")
]
}
Try with this
db.collection.aggregate([
{
$unwind:{
path:"$jnlc_subjects",
preserveNullAndEmptyArrays:true
}
},
{
$unwind:{
path:"$jnlc_subjects.articles",
preserveNullAndEmptyArrays:true
}
},
{
$group:{
_id:"$_id",
articles:{
$addToSet:"$jnlc_subjects.articles"
}
}
}
])
If you don't want to $group with _id ypu can use null instead of $_id
According to description as mentioned into above question,as a solution to it please try executing following aggregate operation.
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$match: {
"_id": ObjectId("5c9216f1a21a4a31e0c7fa56")
}
},
// Stage 2
{
$unwind: {
path: "$jnlc_subjects",
}
},
// Stage 3
{
$unwind: {
path: "$jnlc_subjects.articles"
}
},
// Stage 4
{
$group: {
_id: null,
articles: {
$addToSet: '$jnlc_subjects.articles'
}
}
},
// Stage 5
{
$project: {
articles: 1,
_id: 0
}
},
]
);

Cant find duplicate values for array part in mongodb

db.school.find({ "merchant" : "cc8c0421-e7fc-464d-9e1d-37e168b216c3" })
this is an example document from school collection of that query:
{
"_id" : ObjectId("57fafasf2323232323232f57682cd42"),
"status" : "wait",
"merchant" : "cc8c0421-e7fc-464d-9e1d-37e168b216c3",
"isValid" : false,
"fields" : { "schoolid" : {
"value" : "2323232",
"detail" : {
"revisedBy" : "teacher",
"revisionDate" : ISODate("2015-06-24T09:22:44.288+0000")
},
"history" : [
]
}}
}
I want to see which has duplcate schoolid. SO i do this:
db.school.aggregate([
{$match:{ "merchant" : "cc8c0421-e7fc-464d-9e1d-37e168b216c3"
{ $group: {
_id: { fields.schoolid.value: "$fields.schoolid.value" },
count: { $sum: 1 }
} },
{ $match: {
count: { $gte: 2 }
} },
{ $sort : { count : -1} },
{ $limit : 10 }
]);
but it gives error.
a lot of errors for a lot of lines
i tried to do like this
_id: { "fields.schoolid.value": "$fields.schoolid.value" },
or
_id: { 'fields.schoolid.value': "$'fields.schoolid.value'" },
but did not work. ow can i use it?
According to the document you provided, there is no fields field, so the group stage can't work. Your query should be :
db.school.aggregate([
{ $match: { "merchant" : "cc8c0421-e7fc-464d-9e1d-37e168b216c3"}},
{ $group: {
_id: { value: "$fields.schoolid.value" },
count: { $sum: 1 }
} },
{ $match: {
count: { $gte: 2 }
} },
{ $sort : { count : -1} },
{ $limit : 10 }
]);
Also note that fields.schoolid.value is not a valid fieldname, you need to enclode it in "" or to remove the "."

Merge duplicates and remove the oldest

I have a collection where there are some duplicate documents. In example:
First document:
{
"_id" : ObjectId("56f3d7cc1de31cb20c08ae6b"),
"AddedDate" : ISODate("2016-05-01T00:00:00.000Z"),
"Place": "THISPLACE",
"PresentInDB" : [
{
"InDB" : ISODate("2016-05-01T00:00:00.000Z")
}
],
"Checked" : [],
"Link": "http://www.mylink.com/first/84358"
}
Second document:
{
"_id" : ObjectId("577740526c1e542904725238"),
"AddedDate" : ISODate("2016-05-02T00:00:00.000Z"),
"Place": "THISPLACE",
"PresentInDB" : [
{
"InDB" : ISODate("2016-05-02T00:00:00.000Z")
},
{
"InDB" : ISODate("2016-05-03T00:00:00.000Z")
}
],
"Checked" : [
{
"Done" : ISODate("2016-05-02T00:00:00.000Z")
},
],
"Link": "http://www.mylink.com/second/84358"
}
Link field contains same sequense of numbers in both documents, 84358.
So I would like to achieve those steps:
Loop over each document in the collection.
Match the number sequence in each document in the Link field (i.e. 84358 above) and if there are several documents in
collection that have that sequence in the Link field. And also if Place field match in both documents:
Merge PresentInDB and Checked fields - > merge PresentInDB and Checked fields by adding array values from the newest document (by date in AddedDate
field) to the oldest document.
Remove the newest document.
How could I achieve such a query?
In MongoDB 3.3.6 release is introduced a $split operator for dealing with strings in aggregation framework (Jira). Before this release you only could solve this with a map/reduce solution.
After MongoDB 3.3.6 release: Aggregation framework solution
db.duplicatedCollection.aggregate(
[
{
$project: {
_id : 1,
AddedDate : 1,
Place : 1,
PresentInDB : 1,
Checked : 1,
Link : 1,
sequenceNumber: { $arrayElemAt: [ {$split: ["$Link", "/"]}, -1 ]},
}
},
{
$sort: { AddedDate: 1 }
},
{
$group: {
_id : {
sequenceNumber : "$sequenceNumber",
Place : "$Place"
},
id : { $first: "$_id"},
AddedDate: { $first: "$AddedDate" },
Place : { $first: "$Place" },
PresentInDB: {
$push: '$PresentInDB'
},
Checked: {
$push: '$Checked'
},
Link: { $first: "$Link"}
}
},
{
$unwind: "$PresentInDB"
},
{
$unwind: {
path : "$PresentInDB",
preserveNullAndEmptyArrays: true
}
},
{
$unwind: "$Checked"
},
{
$unwind: {
path : "$Checked",
preserveNullAndEmptyArrays: true
}
},
{
$group: {
_id : "$id",
AddedDate: { $first: "$AddedDate" },
Place : { $first: "$Place" },
PresentInDB : {
$addToSet: '$PresentInDB'
},
Checked : {
$addToSet: '$Checked'
},
Link: { $first: "$Link"}
}
},
{
$out: "duplicatedCollection"
}
]
);
Before MongoDB 3.3.6 release: Map/Reduce solution
Map Function:
var mapFunction = function() {
var linkArray = this.Link.split("/");
var sequenceNumber = linkArray[linkArray.length - 1];
var keyDoc = {
place : this.Place,
sequenceNumber: sequenceNumber,
};
emit(keyDoc, this);
};
Reduce Function:
var reduceFunction = function(key, values) {
var reducedDoc = {};
reducedDoc._id = values[0]._id;
reducedDoc.AddedDate = values[0].AddedDate;
reducedDoc.Link = values[0].Link;
reducedDoc.PresentInDB = [];
reducedDoc.Checked = [];
var presentInDbMillisArray = [];
var checkedMillisArray = [];
values.forEach(function(doc) {
if (reducedDoc.AddedDate < doc.AddedDate) {
reducedDoc._id = doc._id;
reducedDoc.AddedDate = doc.AddedDate;
reducedDoc.Link = doc.Link;
}
// PresentInDB field merge
doc.PresentInDB.forEach(function(presentInDBElem) {
var millis = presentInDBElem.InDB.getTime();
if (!Array.contains(presentInDbMillisArray, millis)) {
reducedDoc.PresentInDB.push(presentInDBElem);
presentInDbMillisArray.push(millis);
}
});
// same here with Checked field
doc.Checked.forEach(function(checkedElem) {
var millis = checkedElem.Done.getTime();
if (!Array.contains(checkedMillisArray, millis)) {
reducedDoc.Checked.push(checkedElem);
checkedMillisArray.push(millis);
}
});
});
return reducedDoc;
};
Map/Reduce:
db.duplicatedCollection.mapReduce(
mapFunction,
reduceFunction,
{
"out": "duplicatedCollection"
}
);
Unwrap the value from the map/reduce returned documents:
db.duplicatedCollection.find(
{
value : {
$exists: true
}
}
).forEach(function(doc) {
db.duplicatedCollection.insert(doc.value);
db.duplicatedCollection.remove({_id : doc._id});
});
You can use a single aggregation query to do that :
db.device.aggregate([{
"$unwind": "$PresentInDB"
}, {
"$match": {
"Link": /84358/
}
}, {
"$sort": {
"AddedDate": 1
}
}, {
"$group": {
_id: 0,
PresentInDB: {
$addToSet: '$PresentInDB'
},
AddedDate: {
$first: "$AddedDate"
},
id: {
$first: "$_id"
},
Link: {
$first: "$Link"
}
}
}, {
$out: "documents"
}])
$unwind your array to work on it
$match your id (here containing 84358)
$sort by ascending date
$group with :
a $addToSet to merge all your PresentInDB into one single array without duplicates
a $first for each field to keep. Keeping the first means you only want the older one since we previously sorted by ascending date
$out will save the results to a new collection called documents here

Using mongodb $lookup on a single collection

I have a collection with documents like this
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"bsk" : {
"bskItems" : [
{
"id" : 4,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "reblochon"
}
},
{
"id" : 5,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Pinot Noir"
}
},
{
"id" : 13,
"bskItemLineType" : "PromotionItem",
"promotionApplied" : {
"bskIds" : [
4,
5
]
}
},
{
"id" : 8,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Food"
}
},
{
"id" : 10,
"bskItemLineType" : "SubTotalItem"
},
{
"id" : 12,
"bskItemLineType" : "TenderItem"
},
{
"id" : 14,
"bskItemLineType" : "ChangeDue"
}
]
}
}
I want an output where I can see the "promotionsApplied" and the descriptions of the items they applied to. For the document above the "promotionsApplied" were to "bsk.BskItems.id" 4 and 5 so I would like the output to be:
{
"_id": xxxxx,
"promotionAppliedto : "reblochon"
},
{
"_id": xxxxx,
"promotionAppliedto : "Pinot Noir"
}
the query below:
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.product.description":1,"bsk.bskItems.id":1}},
{$unwind: "$bsk.bskItems"},
])
gets me the descriptions
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.promotionApplied.bskIds":1}},
{$unwind: "$bsk.bskItems"},
{$unwind:"$bsk.bskItems.promotionApplied.bskIds"},
])
gets me the promotions applied. I was hoping to be able to use $lookup to join the two based on _id and bsk.bskItems.promotionApplied.bskIds and _id and bsk.bskItems.id, but I can't figure out how.
I don't know if you solved your problem or if this is relevant anymore but I figured out your question:
db.DSTest.aggregate([
{
$unwind: "$bsk.bskItems"
},
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
},
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
},
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
},
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
},
{
$unwind: "$product"
},
{
$match: {
"product.p": {$ne: ""}, "groupCount": { $gt: 1}
}
},
{
$project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p"
}
}
])
With the dummy document you gave this is the result I get:
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "reblochon"
}
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "Pinot Noir"
}
But my advise is to put some thought in your database structure next time. You had apples and pears, so we had to make an Asian pear in order to get to this result. Also from the aggregation levels you see it was not an easy job. That could have been much easier if you had separated the arrays that contained the field product from the ones that contained the field promotionApplied.
To break it down and explain what is happening step by step:
{
$unwind: "$bsk.bskItems"
}
By unwinding we are flattening our array. We need this in order to access the fields inside the array and do operations on them . More about $unwind
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
}
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] }
With this line we just make sure that every document gets an basket item id. In your case they all do, I just added it to make sure. And if some document didn't have a value for that field we set it to 0 (you can set it to -1 or whatever you want)
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
}
Here we are creating an array for the field "$bsk.bskItems.promotionApplied.bskIds". Since not all documents have this field we have to add to them all, otherwise we are comparing oranges with apples.
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
As said before, we have to make our documents look all alike so we also add $bsk.bskItems.product.description to the ones that don't have this field. Those who don't have the field we set it to an empty string
Now all our documents have the same structure and we can start with the actual sorting out.
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
}
Since we want to access the ids inside $bsk.bskItems.promotionApplied.bskIds we have to unwind this array as well.
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
}
baItId: 1 and product: 1, are just being passed on. The proAppliedId will contain our bsk.bskItems.promotionApplied.bskIds. If they are 0 then the get the same id as the field $baItId, otherwise they keep their id.
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
}
Now finally we can group our documents by $proAppliedId that we created in the previous aggregation pipeline.
We also push the product values in an array. So there will be now arrays that contain two entries.
One with the value that we look for and one with an empty string because we did that in a previous aggregation pipeline "product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
We also create a new field called groupCount to count the documents that were grouped together.
{ $project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p" } }
In the final project we just build the final document by how we want it to look like.
Hope you understand now why thinking, were and how we save things, matter.
Using document type database - it will be better to store promotion metadtaa instead of only id.
Please see attached example
"promotionApplied" : [{
bskId : 4,
name : "name",
otherData : "otherData"
}, {
bskId : 5,
name : "name5",
otherData : "otherData5"
}
]