find duplicates in array per document in mongodb - mongodb

Let's say that I have some document with this structure:
_id: ObjectId('444455'),
name: 'test',
email: 'email,
points: {
spendable: 23,
history: [
{
comment: 'Points earned by transaction #1234',
points: 1
},
{
comment: 'Points earned by transaction #456',
points: 3
},
{
comment: 'Points earned by transaction #456',
points: 3
}
]
}
}
Now I have a problem that some documents contains duplicates objects in the points.history array.
Is there a way to easily find all those duplicates by a query?
I already tried this query: Find duplicate records in MongoDB
but that shows the total count of every duplicated line in all documents. I need a overview of the duplicates per document like this:
{
_id: ObjectId('444455') //_id of the document not of the array item itself
duplicates: [
{
comment: 'Points earned by transaction #456
}
]
}, {
_id: ObjectId('444456') //_id of the document not of the array item itself
duplicates: [
{
comment: 'Points earned by transaction #66234
},
{
comment: 'Points earned by transaction #7989
}
]
}
How can I achieve that?

Try below aggregate pipeline
collectionName.aggregate([
{
$unwind: "$points.history"
},
{
$group: {
_id: {
id: "$_id",
comment: "$points.history.comment",
points: "$points.history.points"
},
sum: {
$sum: 1
},
}
},
{
$match: {
sum: {
$gt: 1
}
}
},
{
$project: {
_id: "$_id._id",
duplicates: {
comment: "$_id.comment"
}
}
}
])

Related

Mongoose - filter matched documents and assign the resultant length to a field

I have this collection(some irrelevant fields were omitted for brevity):
clients: {
userId: ObjectId,
clientSalesValue: Number,
currentDebt: Number,
}
Then I have this query that matches all the clients for a specific user, then calculates the sum of all debts and sales and put those results in a separate field each of them:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
}
},
{
$unset: ['_id']
}
]).exec();
This works as expected, it returns an array with only one item which is an object, but now I need to also include in that resultant object a field for the amount of debtors, that is for the amount of clients that have currentDebt > 0, how can I do that is the same query? is it possible?
PD: I cannot modify the $match condition, it need to always return all the clients for the corresponding users.
To include a count of how many matching documents have a positive currentDebt, you can use the $sum and $cond operators like so:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
numDebtors: {
$sum: {
$cond: [{ $gt: ['$currentDebt', 0] }, 1, 0]
}
},
}
},
{
$unset: ['_id']
}
]).exec();

categraji data by using MongoDb aggregation

Payload in excel sheets that consist of 4 columns i.e Date, status, amount, orderId.You need to structure the data / categorize the columns according to months and in each month orders are categorized as per status.
Umbrella Status:
INTRANSIT - ‘intransit’, ‘at hub’, ‘out for delivery’
RTO - ‘RTO Intransit’, ‘RTO Delivered’
PROCESSING - ‘processing’
For example:
Response should look like: -
May :
1.INTRANSIT
2. RTO
3.PROCESSING
June:
1.INTRANSIT
2. RTO
3.PROCESSING
You can use different aggregation operators provided in MongoDB.For example: -group, facet, Match, unwind, bucket, project, lookup, etc.
I tried it with this:
const pipeline = [{
$facet:
{
"INTRANSIT": [{ $match: { Status: { $in: ['INTRANSIT', 'AT HUB', 'OUT FOR
DELIVERY'] } } }, { $group: { _id: "$Date", numberofbookings: { $sum: 1 } }
}],
"RTO": [{ $match: { Status: { $in: ['RTO INTRANSIT', 'RTO DELIVERED'] } } },
{ $group: { _id: "$Date", numberofbookings: { $sum: 1 } } }],
"PROCESSING": [{ $match: { Status: { $in: ['PROCESSING'] } } }, {
$group: {
_id: date.getMonth("$Date"),
numberofbookings: { $sum: 1 }
}
}]
}
}];
const aggCursor = coll.aggregate(pipeline);

MongoDB aggregation: How to get the index of a document in a collection depending sorted by a document property

Assume I have a collection with millions of documents. Below is a sample of how the documents look like
[
{ _id:"1a1", points:[2,3,5,6] },
{ _id:"1a2", points:[2,6] },
{ _id:"1a3", points:[3,5,6] },
{ _id:"1b1", points:[1,5,6] },
{ _id:"1c1", points:[5,6] },
// ... more documents
]
I want to query a document by _id and return a document that looks like below:
{
_id:"1a1",
totalPoints: 16,
rank: 29
}
I know I can query the whole document, sort by descending order then get the index of the document I want by _id and add one to get its rank. But I have worries about this method.
If the documents are in millions won't this be 'overdoing' it. Querying a whole collection just to get one document? Is there a way to achieve what I want to achieve without querying the whole collection? Or the whole collection has to be involved because of the ranking?
I cannot save them ranked because the points keep on changing. The actual code is more complex but the take away is that I cannot save them ranked.
Total points is the sum of the points in the points array. The rank is calculated by sorting all documents in descending order. The first document becomes rank 1 and so on.
an aggregation pipeline like the following can get the result you want. but how it operates on a collection of millions of documents remains to be seen.
db.collection.aggregate(
[
{
$group: {
_id: null,
docs: {
$push: { _id: '$_id', totalPoints: { $sum: '$points' } }
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
},
{
$sort: { totalPoints: -1 }
},
{
$group: {
_id: null,
docs: { $push: '$$ROOT' }
}
},
{
$set: {
docs: {
$map: {
input: {
$filter: {
input: '$docs',
as: 'x',
cond: { $eq: ['$$x._id', '1a3'] }
}
},
as: 'xx',
in: {
_id: '$$xx._id',
totalPoints: '$$xx.totalPoints',
rank: {
$add: [{ $indexOfArray: ['$docs._id', '1a3'] }, 1]
}
}
}
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
}
])

MongoDB sum with match

I have a collection with the following data structure:
{
_id: ObjectId,
text: 'This contains some text',
type: 'one',
category: {
name: 'Testing',
slug: 'test'
},
state: 'active'
}
What I'm ultimately trying to do is get a list of categories and counts. I'm using the following:
const query = [
{
$match: {
state: 'active'
}
},
{
$project: {
_id: 0,
categories: 1
}
},
{
$unwind: '$categories'
},
{
$group: {
_id: { category: '$categories.name', slug: '$categories.slug' },
count: { $sum: 1 }
}
}
]
This returns all categories (that are active) and the total counts for documents matching each category.
The problem is that I want to introduce two additional $match that should still return all the unique categories, but only affect the counts. For example, I'm trying to add a text search (which is indexed on the text field) and also a match for type.
I can't do this at the top of the pipeline because it would then only return categories that match, not only affect the $sum. So basically it would be like being able to add a $match within the $group only for the $sum. Haven't been able to find a solution for this and any help would be greatly appreciated. Thank you!
You can use $cond inside of your $group statement:
{
$group: {
_id: { category: '$categories.name', slug: '$categories.slug' },
count: { $sum: { $cond: [ { $eq: [ "$categories.type", "one" ] }, 1, 0 ] } }
}
}

Mongodb aggregate match and group documents, with additional check

So I have 2 models: Question and Answer.
Answer has: questionId, userId, answer (String).
I need an aggregation pipline that will:
match all answers by questionId
see if the current user already voted (is his id in matched documents)
group answers and count them
I implemented 1 and 3 like this:
const q = ObjectId('5d6e52a68558b63fb9302efd');
const user = ObjectId('5d0b3f7daceeb50c477b49e0');
Answer.aggregate([
{ $match: { questionId: q } },
{ $group: { _id: '$answer', count: { $sum: 1 } } },
])
I am missing a step between those 2 aggregation pipelines, where I would iterate thru matched documents, and check if userId matches user.
I would like to get some object like this:
{
didIVote: true,
result: [ { _id: 'YES', count: 5 }, { _id: 'NO', count: 2 } ]
}
Or maybe even like this:
[
{ _id: 'YES', count: 5, didIVote: true },
{ _id: 'NO', count: 2, didIVote: false },
]
In the $group stage, create an array with the users that voted
for each answer.
Add an aditional $project stage to check if the user is in the array.
const q = ObjectId('5d6e52a68558b63fb9302efd');
const user = ObjectId('5d0b3f7daceeb50c477b49e0');
Answer.aggregate([
{ $match: { questionId: q } },
{
$group: {
_id: '$answer',
count: { $sum: 1 },
voted: { $addToSet: "$userId" }
}
},
{
$project: {
count: 1,
didIVote: { $in: [ user, "$voted" ] },
}
}
]);