MongoDB - Unwind array using aggregation and remove duplicates

MongoDB - Unwind array using aggregation and remove duplicates - mongodb

I am unwinding an array using MongoDB aggregation framework and the array has duplicates and I need to ignore those duplicates while doing a grouping further.
How can I achieve that?

you can use $addToSet to do this:
db.users.aggregate([
{ $unwind: '$data' },
{ $group: { _id: '$_id', data: { $addToSet: '$data' } } }
]);
It's hard to give you more specific answer without seeing your actual query.

You have to use $addToSet, but at first you have to group by _id, because if you don't you'll get an element per item in the list.
Imagine a collection posts with documents like this:
{
body: "Lorem Ipsum...",
tags: ["stuff", "lorem", "lorem"],
author: "Enrique Coslado"
}
Imagine you want to calculate the most usual tag per author. You'd make an aggregate query like that:
db.posts.aggregate([
{$project: {
author: "$author",
tags: "$tags",
post_id: "$_id"
}},
{$unwind: "$tags"},
{$group: {
_id: "$post_id",
author: {$first: "$author"},
tags: {$addToSet: "$tags"}
}},
{$unwind: "$tags"},
{$group: {
_id: {
author: "$author",
tags: "$tags"
},
count: {$sum: 1}
}}
])
That way you'll get documents like this:
{
_id: {
author: "Enrique Coslado",
tags: "lorem"
},
count: 1
}

Previous answers are correct, but the procedure of doing $unwind -> $group -> $unwind could be simplified.
You could use $addFields + $reduce to pass to the pipeline the filtered array which already contains unique entries and then $unwind only once.
Example document:
{
body: "Lorem Ipsum...",
tags: [{title: 'test1'}, {title: 'test2'}, {title: 'test1'}, ],
author: "First Last name"
}
Query:
db.posts.aggregate([
{$addFields: {
"uniqueTag": {
$reduce: {
input: "$tags",
initialValue: [],
in: {$setUnion: ["$$value", ["$$this.title"]]}
}
}
}},
{$unwind: "$uniqueTag"},
{$group: {
_id: {
author: "$author",
tags: "$uniqueTag"
},
count: {$sum: 1}
}}
])

Related

MongoDB - duplicate documents removal

Context: I have a MongoDB database with some duplicated documents.
Problem: I want to remove all duplicated documents. (For each duplicated document, I only want to save one, which can be arbitrarily chosen.)
Minimal illustrative example:
The documents all have the following fields (there are also other fields, but those are of no relevance here):
{
"_id": {"$oid":"..."},
"name": "string",
"user": {"$oid":"..."},
}
Duplicated documents: A document is considered duplicated if there are two or more documents with the same "name" and "user" (i.e. the document id is of no relevance here).
How can I remove the duplicated documents?

EDIT:
Since mongoDB version 4.2, one option is to use $group and $merge In order to move all unique documents to a new collection:
removeList = db.collection.aggregate([
{
$group: {
_id: {name: "$name", user: "$user"},
doc: {$first: "$$ROOT"}
}
},
{$replaceRoot: {newRoot: "$doc"}},
{$merge: {into: "newCollection"}}
])
See how it works on the playground example
For older version, you do the same using $out.
Another option is to get a list of all documents to remove and remove them with another query:
db.collection.aggregate([
{
$group: {
_id: {name: "$name", user: "$user"},
doc: {$first: "$$ROOT"},
remove: {$push: "$_id"}
}
},
{
$set: {
remove: {
$filter: {
input: "$remove",
cond: {$ne: ["$$this", "$doc._id"]}
}
}
}
},
{$group: {_id: 0, remove: { $push: "$remove"}}},
{$set: { _id: "$$REMOVE",
remove: {
$reduce: {
input: "$remove",
initialValue: [],
in: {$concatArrays: ["$$value", "$$this"]}
}
}
}
}
])
db.collection.deleteMany({_id: {$in: removeList}})

mongodb - How can I group documents with duplicates and sort their numbers to an array without duplicates?

Here is my sample data:
[
{ name: "bob",number:20 },
{ name: "bob",number:10 },
{ name: "kol",number:20 },
{ name: "bob",number:20 }
{ name: "kol",number:10 },
{ name: "kol",number:10 }
]
I want to get a document per name with an array of unique numbers sorted.
expected output:
[
{
name:"bob",
number: [10,20] // with sort, not 20,10
},
{
name:"kol",
number: [10,20]
}
]
I tried something like this but not working as I want. I did not get the array of numbers sorted.
user.aggregate([
{
$group: {
_id: '$name',
name: { $first: '$name' },
number: {$addToSet: '$number'}
}
},
{$sort: {name: 1}}
])

You just need to $sort before you $group by the name and use $push in order to keep all numbers, instead of $addToSet. For keeping the numbers a set just use $setIntersection to remove duplicates:
db.collection.aggregate([
{$sort: {number: 1}},
{
$group: {
_id: "$name",
number: {$push: "$number"}
}
},
{
$project: {
name: "$_id",
number: {$setIntersection: ["$number"]},
_id: 0
}
},
{$sort: {name: 1}}
])
See how it works on the playground example

$lookup from nested array without overwriting array

My collection looks like:
collectionName: {
_id: ObjectId("..."),
thing1: 'whatever',
thing2: 100,
arrayOfThings: [{
item1: 'something'
item2: 200,
other_id: ObjectId("..."),
]}
}
Essentially I want to be able to find this entry by its _id, then for each of the items in the arrayOfThings I want to add an "other" field which is the entry in my other collection with the _id in "other_id".
Resulting in:
collectionName: {
_id: ObjectId("..."),
thing1: 'whatever',
thing2: 100,
arrayOfThings: [{
item1: 'something'
item2: 200,
other_id: ObjectId("..."),
other: {
otherField1: 'random data',
otherField2: 3000
]}
}
Everything I've tried either overwrites the entire arrayOfThings array with the array that is returned from the other collection or returns several objects, each with only one entry in the arrayOfThings array by doing something like this:
aggregate([
{ $match: { _id: req.params._id }},
{ $unwind: "$arrayOfThings" },
{ $lookup: { from: "otherCollection", localField: "arrayOfThings.other_id", foreignField: "_id", as: "arrayOfThings.other" }},
]);
Any help is appreciated, thanks.

$match, $unwind and $lookup stages remain same
$addFields to get first element from lookup result
$group by _id and reconstruct arrayOfThings, and get other fields using $first
db.col1.aggregate([
// $match, $unwind and $lookup skipping,
{
$addFields: {
"arrayOfThings.other": { $arrayElemAt: ["$arrayOfThings.other", 0] }
}
},
{
$group: {
_id: "$_id",
arrayOfThings: { $push: "$arrayOfThings" },
thing1: { $first: "$thing1" },
thing2: { $first: "$thing2" }
}
}
])
Playground

Pushing several elements into aggregated list

How I can do correct push into my aggregated list ?
db.getCollection('rty').aggregate(
{ $match: {'id': 110451}},
{ $unwind: '$matches'},
{ $match: {'matches.majority.uuid': {'$exists': true}}},
{ $group: {_id: '$id', list: {$push: {'$matches.majority.uuid' , 'matches.majority.confidence'}}}})
When I push only uuid it's working, but how I can use two fields here ...

Refer to $push on aggregation, please try it as below
db.getCollection('rty').aggregate(
{ $match: {'id': 110451}},
{ $unwind: '$matches'},
{ $match: {'matches.majority.uuid': {'$exists': true}}},
{ $group: {_id: '$id', list:
{$push:
{uid: '$matches.majority.uuid' ,
conf: 'matches.majority.confidence'}}}});

Get original document field as part of aggregate result

I am wanting to get all of the document fields in my aggregate results but as soon as I use $group they are gone. Using $project allows me to readd whatever fields I have defined in $group but no luck on getting the other fields:
var doc = {
_id: '123',
name: 'Bob',
comments: [],
attendances: [{
answer: 'yes'
}, {
answer: 'no'
}]
}
aggregate({
$unwind: '$attendances'
}, {
$match: {
"attendances.answer": { $ne:"no" }
}
}, {
$group: {
_id: '$_id',
attendances: { $sum: 1 },
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}}
}
}, {
$project: {
comments: 1,
}
}
This results in:
[{
_id: 5317b771b6504bd4a32395be,
comments: 12
},{
_id: 53349213cb41af00009a94d0,
comments: 0
}]
How do I get 'name' in there? I have tried adding to $group as:
name: '$name'
as well as in $project:
name: 1
But neither will work

You can't project fields that are removed during the $group operation.
Since you are grouping by the original document _id and there will only be one name value, you can preserve the name field using $first:
db.sample.aggregate(
{ $group: {
_id: '$_id',
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}},
name: { $first: "$name" }
}}
)
Example output would be:
{ "_id" : "123", "comments" : 0, "name" : "Bob" }
If you are grouping by criteria where there could be multiple values to preserve, you should either $push to an array in the $group or use $addToSet if you only want unique names.
Projecting all the fields
If you are using MongoDB 2.6 and want to get all of the original document fields (not just name) without listing them individually you can use the aggregation variable $$ROOT in place of a specific field name.