MongoDB aggregation group by similar string - mongodb

Im starting to learn aggregations for Mongo, but for my project i found a lot of brands in my collection with very similar names, like 'BrandA' and 'BrandA tech'. Is there a way to group them at the end of my aggregation?
I have 2 collections in my database:
The first one is for brands:
{
_id: ObjectId(),
name: String
}
The second one is for products:
{
_id: ObjectId(),
name: String,
brand: ObjectId() // referring to _id of brands
}
Now lets say i have the following brands:
{_id: ObjectId('5a9fd2b8045b020013de2a47'), name: 'brand1'},
{_id: ObjectId('5a9fcf94d28420245451a39c'), name: 'brand2'},
{_id: ObjectId('5a9fcf94d28420245451a39a'), name: 'brand1 sub1'},
{_id: ObjectId('5a9fe8bf045b020013de2a6d'), name: 'sub2 brand2'}
And the following products:
{_id: ObjectId(''), name: 'item1', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item2', brand: ObjectId('5a9fcf94d28420245451a39c')},
{_id: ObjectId(''), name: 'item3', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item4', brand: ObjectId('5a9fcf94d28420245451a39a')},
{_id: ObjectId(''), name: 'item5', brand: ObjectId('5a9fe8bf045b020013de2a6d')},
{_id: ObjectId(''), name: 'item6', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item7', brand: ObjectId('5a9fcf94d28420245451a39c')},
{_id: ObjectId(''), name: 'item8', brand: ObjectId('5a9fcf94d28420245451a39a')}
The query I have now:
db.getCollection('products').aggregate([
{$group: {
_id: '$brand',
amount: { $sum: 1 },
}},
{
$sort: { 'amount': -1 }
},{$lookup: {
from: 'brands',
localField: '_id',
foreignField: '_id',
as: 'lookup'
}},
{$unwind: {path: '$lookup'}},
{$project: {
_id: '$_id',
brandName: '$lookup.name',
amount: '$amount'
}}
]);
Result:
{_id: ObjectId('5a9fd2b8045b020013de2a47'), brandName: 'brand1', amount: 3}
{_id: ObjectId('5a9fcf94d28420245451a39c'), brandName: 'brand2', amount: 2}
{_id: ObjectId('5a9fcf94d28420245451a39a'), brandName: 'brand1 sub1', amount: 2}
{_id: ObjectId('5a9fe8bf045b020013de2a6d'), brandName: 'sub2 brand2', amount: 1}
Result I want:
{_id: ObjectId(null), brandName: 'brand1', amount: 5},
{_id: ObjectId(null), brandName: 'brand2', amount: 3}
Is it possible to to group the result I have now by finding similar strings in brandName? Like grouping 'brand1' and 'brand1 sub1' or 'brand2' and 'sub2 brand2'?

I think that you could do what you want by using $split and $unwind
split will transform your string into an array of words and unwind will create as many entries as you have words in the array.
Then you can apply the pipeline you already prepared to count the occurences.

a change in the model could easily achieve this. just add the items in an array to a brand.
then you instantly get a count by using the array's length and the query speed is faster.

Related

MongoDB Aggregate functions convert object array to string array

I have some documents in a collection. Every document has a challenge_id field. I want to map those document array into a string array. The final string array should consist of challenge ids from each document
input:
[
{
_id: ObjectId("62c3e31931e7df585c39e4e1"),
activity_id: ObjectId("62c3e31931e7df585c39e4df"),
challenge_id: ObjectId("62bd543c3a3937000958f2dd"),
status: "active",
createdAt: ISODate("2022-07-05T07:07:05.823Z"),
updatedAt: ISODate("2022-07-05T07:07:05.823Z")
},
{
_id: ObjectId("62c3e33f299750585cc70b23"),
activity_id: ObjectId("62c3e33e299750585cc70b21"),
challenge_id: ObjectId("62bd543c3a3937000958f2dd"),
status: "active",
createdAt: ISODate("2022-07-05T07:07:43.612Z"),
updatedAt: ISODate("2022-07-05T07:07:43.612Z")
},
{
_id: ObjectId("62c3e359341e86585c65c714"),
activity_id: ObjectId("62c3e359341e86585c65c712"),
challenge_id: ObjectId("62bd543c3a3937000958f2dd"),
status: "active",
createdAt: ISODate("2022-07-05T07:08:09.409Z"),
updatedAt: ISODate("2022-07-05T07:08:09.409Z")
}
]
output should looks like:
['62bd543c3a3937000958f2dd','62bd543c3a3937000958f2dd', '62bd543c3a3937000958f2dd' ]
Is it possible to do this with an aggregate function ? How ?
You can use $group like this:
db.collection.aggregate([
{$group: {_id: 0, res: {$push: {$toString: "$challenge_id"}}}},
{$project: {res: 1, _id: 0}}
])
See how it works on the playground example

Count of unique items in mongodb documents with Array of Strings

I'm having a problem that seems like it can be solved by some aggregation samples I've seen, but I've not come up with an answer yet.
Basically I have documents like so:
{
date: '2015-01-14 00:00:00.000Z',
attendees: ['john', 'jane', 'james', 'joanne'],
groupName: '31'
}
And I need to find the unique attendees for a groupName and their attendance count. So for example, with the data:
{
date: '2015-01-13 00:00:00.000Z',
attendees: ['john', 'jane', 'james', 'joanne'],
groupName: '31'
},
{
date: '2015-01-14 00:00:00.000Z',
attendees: ['james', 'joanne'],
groupName: '31'
},
{
date: '2015-01-15 00:00:00.000Z',
attendees: ['joanne'],
groupName: '31'
}
I'd like to get something like:
[{
name: 'joanne',
count: 3
}, {
name: 'john',
count: 1
}, {
name: 'james',
count: 2
}]
I can't seem to find an aggregation to get this type of result. Any help is appreciated.
you can do this:
db.collection.aggregate([
{$unwind: '$attendees'},
{$group: {_id: '$attendees', count: {$sum: 1}}},
{$project: {_id:0, name: '$_id', count: '$count'}}
])

Mongodb aggregation: get field value using accumulator of another field on the same document

Say I want to do an aggregation that gets me a set of users where each user is the most recent document inserted with their name. For example, if I did:
db.users.insert({name: 'Bob', _id: 1})
db.users.insert({name: 'Jim', _id: 2})
db.users.insert({name: 'Bob', _id: 3})
I would want to get back
[
{name: 'Jim', _id: 2},
{name: 'Bob', _id: 3}
]
This is simple enough - just group by name and get id as {$max: '$_id'}. However, say I introduce a phoneNumber field, and I want to also retrieve the phoneNumber of 'Jim' and 'Bob'. How can I retrieve this field so that it matches up with the proper id? For example:
db.users.insert({name: 'Bob', phone: '555-1234', _id: 1})
db.users.insert({name: 'Jim', phone: '555-5678', _id: 2})
db.users.insert({name: 'Bob', phone: '555-9101', _id: 3})
I would want to get back
[
{name: 'Jim', phone: '555-5678', _id: 2},
{name: 'Bob', phone: '555-9101', _id: 3}
]
Solved it:
If you add {$sort: {_id: -1}}, you can then use $first in the aggregation to get any of the correct fields.

Mongodb aggregation taking average

In MongoDb, I have a collection which has following data:
[{
_id: ObjectId('....'),
data: [{
type: 'internal',
name: 'abc',
value: 60
}, {
type: 'internal',
name: 'def',
value: 20
}, {
type: 'external',
name: 'def',
value: 20
}]
}, {
_id: ObjectId('....'),
data: [{
type: 'internal',
name: 'abc',
value: 30
}, {
type: 'internal',
name: 'def',
value: 40
}, {
type: 'external',
name: 'def',
value: 10
}]
}]
Now If I want to group by type and take average of value field, I can do like
db.testcollection.aggregate([
{$unwind: '$data'},
{$group: {_id: '$data.type', avg: {$avg: '$data.value'}}}
]);
But, If values of one document of same type has to be treated has one value(sum of both) and then average has to be calculated, what will be the query ?
In my example for type internal it should be:
((60+20)+(30+40))/2
rather than
(60+20+30+40)/4
You can do it in 2 $group phases. The first one is used to compute the sum within a document and the second one is used to compute the average across documents.
db.testcollection.aggregate([
{$unwind: "$data"},
{$group: {_id: {_id:"$_id", type:"$data.type"}, sum:{"$sum": "$data.value"}}}
{$group:{_id:"$_id.type", avg:{"$avg":"$sum"}}}
]);

MongoDB aggregation for distinct values

I have 2 collections:
user
{
_id: 'user_id1',
username: 'user1',
}
{
_id: 'user_id2',
username: 'user2',
}
{
_id: 'user_id3',
username: 'user3',
}
inbox
{
_id: 'inbox_id1',
from: {_id: 'user_id1', username: 'user1'},
to: {_id: 'user_id2', username: 'user2'},
text: 'Hello there',
timestamp: new Date(),
}
{
_id: 'inbox_id2',
from: {_id: 'user_id1', username: 'user1'},
to: {_id: 'user_id2', username: 'user2'},
text: 'Trying again...',
timestamp: new Date(),
}
{
_id: 'inbox_id3',
from: {_id: 'user_id3', username: 'user3'},
to: {_id: 'user_id2', username: 'user2'},
text: 'You there?',
timestamp: new Date(),
}
Whenever a user goes into his inbox, I would like to show him the thread list, which should include a list of latest messages from each user. So basically I would like to get distinct documents (based on the from._id field), and only the latest document (based on timestamp field).
So my results for user2 should include only 2 documents (inbox_id2 and inbox_id3).
I know I need to use aggregation for it, but not sure how exactly.
I was able to solve it based on a similar question: MongoDB : Aggregation framework : Get last dated document per grouping ID
My solution looks like this:
db.inbox.aggregate([
{$match : {'to.username': 'user2'}},
{'$sort': {'from._id': 1, 'timestamp': -1}},
{'$group': {
'_id': '$from._id',
'timestamp': {'$first': '$timestamp'},
'text': {'$first': '$text'},
'from': {'$first': '$from.username'},
}},
]);