MongoDB: Find duplicate docs where a field has lowest values

MongoDB: Find duplicate docs where a field has lowest values - mongodb

so I have this problem
I have this duplicate collection that goes like:
{name: "a", otherField: 1, _id: "id1"},
{name: "a", otherField: 2, _id: "id2"},
{name: "a", otherField: 3, _id: "id3"},
{name: "b", otherField: 1, _id: "id4"}
{name: "b", otherField: 2, _id: "id5"}
My goal is to get id of with less otherField that will look like:
{"name": "a", _id: "id1"},
{"name": "a", _id: "id2"},
{"name": "b", _id: "id4"}
Since highest otherField from a and b is "id3" and "id5", I want id other than the highest otherField
How to achieve this through query in mongodb?
Thank you

You can try below query :
db.collection.aggregate([
/** group all docs based on name & push docs to data field & find max value for otherField field */
{
$group: {
_id: "$name",
data: {
$push: "$$ROOT"
},
maxOtherField: {
$max: "$otherField"
}
}
},
/** Recreate data field array with removing doc which has max otherField value */
{
$addFields: {
data: {
$filter: {
input: "$data",
cond: {
$ne: [
"$$this.otherField",
"$maxOtherField"
]
}
}
}
}
},
/** unwind data array */
{
$unwind: "$data"
},
/** Replace data field as new root for each doc in coll */
{
$replaceRoot: {
newRoot: "$data"
}
}
])
Test : MongoDB-Playground
Note : We might lean towards sorting docs on field otherField, but it's not preferable on huge datasets.

Related

Is there a way to project max value in a range then finding documents within a new range starting at this max value in just one aggregate?

Given the following data in a Mongo collection:
{
_id: "1",
dateA: ISODate("2021-12-31T00:00.000Z"),
dateB: ISODate("2022-01-11T00:00.000Z")
},
{
_id: "2",
dateA: ISODate("2022-01-02T00:00.000Z"),
dateB: ISODate("2022-01-08T00:00.000Z")
},
{
_id: "3",
dateA: ISODate("2022-01-03T00:00.000Z"),
dateB: ISODate("2022-01-05T00:00.000Z")
},
{
_id: "4",
dateA: ISODate("2022-01-09T00:00.000Z"),
dateB: null
},
{
_id: "5",
dateA: ISODate("2022-01-11T00:00.000Z"),
dateB: ISODate("2022-01-11T00:00.000Z")
},
{
_id: "6",
dateA: ISODate("2022-01-12T00:00.000Z"),
dateB: null
}
And given the range below:
ISODate("2022-01-01T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
I want to find all values with dateA within given range, then I want to decrease the range starting it from the max dateB value, and finally fetching all documents that doesn't contain dateB.
In resume:
I'll start with range
ISODate("2022-01-01T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
Then change to range
ISODate("2022-01-08T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
Then find with
dateB: null
Finally, the result would be the document with
_id: "4"
Is there a way to find the document with _id: "4" in just one aggregate?
I know how to do it programmatically using 2 queries, but the main goal is to have just one request to the database.

You can use $max to find the maxDateB first. Then perform a self $lookup to apply the $match and find doc _id: "4".
db.collection.aggregate([
{
$match: {
dateA: {
$gte: ISODate("2022-01-01"),
$lt: ISODate("2022-01-10")
}
}
},
{
"$group": {
"_id": null,
"maxDateB": {
"$max": "$dateB"
}
}
},
{
"$lookup": {
"from": "collection",
"let": {
start: "$maxDateB",
end: ISODate("2022-01-10")
},
"pipeline": [
{
$match: {
$expr: {
$and: [
{
$gte: [
"$dateA",
"$$start"
]
},
{
$lt: [
"$dateA",
"$$end"
]
},
{
$eq: [
"$dateB",
null
]
}
]
}
}
}
],
"as": "result"
}
},
{
"$unwind": "$result"
},
{
"$replaceRoot": {
"newRoot": "$result"
}
}
])
Here is the Mongo Playground for your

Assuming the matched initial dateA range is not huge, here is alternate approach that exploits $push and $filter and avoids the hit of a $lookup stage:
db.foo.aggregate([
{$match: {dateA: {$gte: new ISODate("2022-01-01"), $lt: new ISODate("2022-01-10")} }},
// Kill 2 birds with one stone here. Get the max dateB AND prep
// an array to filter later. The items array will be as large
// as the match above but the output of this stage is a single doc:
{$group: {_id: null,
maxDateB: {$max: "$dateB" },
items: {$push: "$$ROOT"}
}},
{$project: {X: {$filter: {
input: "$items",
cond: {$and: [
// Each element of 'items' is passed as $$this so use
// dot notation to get at individual fields. Note that
// all other peer fields to 'items' like 'maxDateB' are
// in scope here and addressable using '$':
{$gt: [ "$$this.dateA", "$maxDateB"]},
{$eq: [ "$$this.dateB", null ]}
]}
}}
}}
]);
This yields a single doc result (I added an additional doc _id 41 to test the null equality for more than 1 doc):
{
"_id" : null,
"X" : [
{
"_id" : "4",
"dateA" : ISODate("2022-01-09T00:00:00Z"),
"dateB" : null
},
{
"_id" : "41",
"dateA" : ISODate("2022-01-09T00:00:00Z"),
"dateB" : null
}
]
}
It is possible to $unwind and $replaceRoot after this but there is little need to do so.

Add a total to aggregated sub-totals in MongoDB

Let's say I have documents in my MongoDB collection that look like this:
{ name: "X", ...}
{ name: "Y", ...}
{ name: "X", ...}
{ name: "X", ...}
I can create a pipeline view using aggregation that shows sub-totals i.e.
$group: {
_id: '$name',
count: {
$sum: 1
}
}
which results in:
{ _id: "X",
count: 3 },
{ _id: "Y",
count: 1}
but how do I add a total in this view i.e.
{ _id: "X",
count: 3 },
{ _id: "Y",
count: 1},
{_id: "ALL",
count: 4}

Query1
group to count
union with the same collection, with pipeline to add the total count, in one extra document
Test code here
coll1.aggregate(
[{"$group":{"_id":"$name", "count":{"$sum":1}}},
{"$unionWith":
{"coll":"coll1",
"pipeline":[{"$group":{"_id":"ALL", "count":{"$sum":1}}}]}}])
Query2
without $union for MongoDB < 4.4
group and count
group by null and collect the documents, and total count
add to docs array the extra document with the total count
unwind and replace root to restore the structure
Test code here
aggregate(
[{"$group":{"_id":"$name", "count":{"$sum":1}}},
{"$group":
{"_id":null, "docs":{"$push":"$$ROOT"}, "total":{"$sum":"$count"}}},
{"$project":
{"docs":
{"$concatArrays":["$docs", [{"_id":"ALL", "count":"$total"}]]}}},
{"$unwind":"$docs"},
{"$replaceRoot":{"newRoot":"$docs"}}])

Try this one:
db.collection.aggregate([
{
$group: {
_id: "$name",
count: { $count: {} }
}
},
{
$unionWith: {
coll: "collection",
pipeline: [
{
$group: {
_id: "ALL",
count: { $count: {} }
}
}
]
}
}
])
Mongo Playground

How to make a lookup of the "Author" field of various arrays in mongodb?

My problem is that I want to do a Lookup of the field "Author" for the array of objects "Reviews", "Followers" and "Watching" but I don't know why it gives me this result in the others arrays, that value repeats the same number of times of the documents in the "Reviews" array.
.unwind({ path: '$reviews', preserveNullAndEmptyArrays: true })
.lookup({
from: 'users',
let: { userId: '$reviews.author' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$userId'] } } },
{
$project: {
name: 1,
username: 1,
photo: 1,
rank: 1,
'premium.status': 1,
online: 1,
},
},
],
as: 'reviews.author',
})
.unwind({ path: '$followers', preserveNullAndEmptyArrays: true })
.lookup({
from: 'users',
let: { userId: '$followers.author' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$userId'] } } },
{
$project: {
name: 1,
username: 1,
photo: 1,
rank: 1,
'premium.status': 1,
online: 1,
},
},
],
as: 'followers.author',
})
.unwind({ path: '$watching', preserveNullAndEmptyArrays: true })
.lookup({
from: 'users',
let: { userId: '$watching.author' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$userId'] } } },
{
$project: {
name: 1,
username: 1,
photo: 1,
rank: 1,
'premium.status': 1,
online: 1,
},
},
],
as: 'watching.author',
})
.group({
_id: '$_id',
data: {
$first: '$$ROOT',
},
reviews: {
$push: '$reviews',
},
followers: {
$push: '$followers',
},
watching: {
$push: '$watching',
},
})
This is the result when "Reviews" has documents:
The "Followers / Watching" array has nothing in the database but it is shown here in this way, repeating that value the same number of documents that are in reviews, I don't know what happens.
And when all arrays are empty, this happens:
It keeps showing that, but I don't know how to repair it.
In summary, "Reviews", "Watching", and "Followers" have an "Author" field, and I want to do a lookup to the author field of watching, and also for followers and reviews but I have these problems. Please I need help.
Example: This is how it looks in the database:
Thank you very much in advance.

The $unwind stage creates a new document for every element in the array you are unwinding. Each new document will contain a copy of every other field in the document.
If the original document looks like
{
_id: "unique",
Array1:["A","B","C"],
Array2:["D","E","F"],
}
After unwinding "Array1", there will be 3 documents in the pipeline:
[
{
_id: "unique",
Array1:"A",
Array2:["D","E","F"]
},{
_id: "unique",
Array1:"B",
Array2:["D","E","F"]
},{
_id: "unique",
Array1:"C",
Array2:["D","E","F"]
}
]
Then unwinding "watchers" will expand each of the watchers arrays so that you have the cartesian product of the arrays. Playground
In your case, the original document has 2 reviews, but no followers and no watchers, so at the start of the pipeline contains one document, similar to:
[
{
_id: "ObjectId",
data: "other data"
reviews: [{content:"review1", author:"ObjectId"},
{content:"review2", author:"ObjectId"}]
}
]
After the first unwind, you have 2 documents:
[
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review1", author:"ObjectId"}
},
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review2", author:"ObjectId"}
}
]
The first lookup replaces the author ID with data for each document, then the second unwind is applied to each document.
Since that array is empty, the lookup returns an empty array, and the third unwind is applied.
Just before the $group stage, the pipeline contains 2 documents with the arrays:
[
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review1", author:"ObjectId"},
followers: {author: []},
watchers: {author: []}
},
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review2", author:"ObjectId"},
followers: {author:[]},
watchers: {author: []}
}
]
Since both documents have the same _id they are grouped together, with the final result containing the first document as "data".
For the arrays, as each document is encountered, the value of the corresponding field is pushed onto the array, resulting in each array having 2 values.

Duplicate elements in a mongo db collection

Is there an quick efficient way to duplicate elements in a mongo db collections based on a property. In the example below, I am trying to duplicate the elements based on a jobId.
I am using Spring boot, so any example using Spring boot API would be even more helpful.
Original Collection
{ _id: 1, jobId: 1, product: "A"},
{ _id: 2, jobId: 1, product: "B"},
{ _id: 3, jobId: 1, product: "C"},
After duplication
{ _id: 1, jobId: 1, product: "A"},
{ _id: 2, jobId: 1, product: "B"},
{ _id: 3, jobId: 1, product: "C"},
{ _id: 4, jobId: 2, product: "A"},
{ _id: 5, jobId: 2, product: "B"},
{ _id: 6, jobId: 2, product: "C"},

You can use following aggregation:
db.col.aggregate([
{
$group: {
_id: null,
values: { $push: "$$ROOT" }
}
},
{
$addFields: {
size: { $size: "$values" },
range: { $range: [ 0, 3 ] }
}
},
{
$unwind: "$range"
},
{
$unwind: "$values"
},
{
$project: {
_id: { $add: [ "$values._id", { $multiply: [ "$range", "$size" ] } ] },
jobId: { $add: [ "$values.jobId", "$range" ] },
product: "$values.product",
}
},
{
$sort: {
_id: 1
}
},
{
$out: "outCollection"
}
])
The algorithm is quite simple here: we want to iterate over two sets:
first one defined by all items from your source collection (that's why I'm grouping by null)
second one defined artificially by $range operator. It will define how many times we want to multiply our collection (3 times in this example)
Double unwind generates as much documents as we need. Then the formula for each _id is following: _id = _id + range * size. Last step is just to redirect the aggregation output to your collection.

Get original document field as part of aggregate result

I am wanting to get all of the document fields in my aggregate results but as soon as I use $group they are gone. Using $project allows me to readd whatever fields I have defined in $group but no luck on getting the other fields:
var doc = {
_id: '123',
name: 'Bob',
comments: [],
attendances: [{
answer: 'yes'
}, {
answer: 'no'
}]
}
aggregate({
$unwind: '$attendances'
}, {
$match: {
"attendances.answer": { $ne:"no" }
}
}, {
$group: {
_id: '$_id',
attendances: { $sum: 1 },
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}}
}
}, {
$project: {
comments: 1,
}
}
This results in:
[{
_id: 5317b771b6504bd4a32395be,
comments: 12
},{
_id: 53349213cb41af00009a94d0,
comments: 0
}]
How do I get 'name' in there? I have tried adding to $group as:
name: '$name'
as well as in $project:
name: 1
But neither will work

You can't project fields that are removed during the $group operation.
Since you are grouping by the original document _id and there will only be one name value, you can preserve the name field using $first:
db.sample.aggregate(
{ $group: {
_id: '$_id',
comments: { $sum: { $size: { $ifNull: [ "$comments", [] ] }}},
name: { $first: "$name" }
}}
)
Example output would be:
{ "_id" : "123", "comments" : 0, "name" : "Bob" }
If you are grouping by criteria where there could be multiple values to preserve, you should either $push to an array in the $group or use $addToSet if you only want unique names.
Projecting all the fields
If you are using MongoDB 2.6 and want to get all of the original document fields (not just name) without listing them individually you can use the aggregation variable $$ROOT in place of a specific field name.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MongoDB: Find duplicate docs where a field has lowest values - mongodb

Related

Is there a way to project max value in a range then finding documents within a new range starting at this max value in just one aggregate?

Add a total to aggregated sub-totals in MongoDB

How to make a lookup of the "Author" field of various arrays in mongodb?

Duplicate elements in a mongo db collection

Get original document field as part of aggregate result

Categories

Resources