Group values by sub string in MongoDB - mongodb

I have this documents in my collection :
{_id: "aaaaaaaa", email: "mail1#orange.fr"},
{_id: "bbbbbbbb", email: "mail2#orange.fr"},
{_id: "cccccccc", email: "mail3#orange.fr"},
{_id: "dddddddd", email: "mail4#gmail.com"},
{_id: "eeeeeeee", email: "mail5#gmail.com"},
{_id: "ffffffff", email: "mail6#yahoo.com"}
And i would like this result :
{
result: [
{domain: "orange.fr", count: 3},
{domain: "gmail.com", count: 2},
{domain: "yahoo.com", count: 1},
]
}
I'm not sure you can use the aggregator and $regex operator

Aggregation Framework
I don't believe that with the present document structure you can achieve the desired result by using the aggregation framework. If you stored the domain name in a separate field, it would have become trivial:
db.items.aggregate(
{
$group:
{
_id: "$emailDomain",
count: { $sum: 1 }
},
}
)
Map-Reduce
It's possible to implement what you want using a simple map-reduce aggregation. Naturally, the performance will not be good on large collections.
Query
db.emails.mapReduce(
function() {
if (this.email) {
var parts = this.email.split('#');
emit(parts[parts.length - 1], 1);
}
},
function(key, values) {
return Array.sum(values);
},
{
out: { inline: 1 }
}
)
Output
[
{
"_id" : "gmail.com",
"value" : 2
},
{
"_id" : "yahoo.com",
"value" : 1
},
{
"_id" : "orange.fr",
"value" : 3
}
]

Aggregation Framework
MongoDB 3.4(Released Nov 29, 2016) onwords in aggregation framework have many methods
[
{
$project: {
domain: {
$substr: ["$email", {
$indexOfBytes: ["$email", "#"]
}, {
$strLenBytes: "$email"
}]
}
},
{
$group: {
_id: '$domain',
count: {
$sum: 1
}
}
},
{
$sort: {
'count': -1
}
},
{
$group: {
_id: null,
result: {
$push: {
'domain': "$_id",
'count': '$count'
}
}
}
}
]
Results
{
_id: null,
result: [
{domain: "#orange.fr", count: 3},
{domain: "#gmail.com", count: 2},
{domain: "#yahoo.com", count: 1},
]
}

Related

Projection and group on nested object mongodb aggregation query

How to get the nested object in projection and group in mongodb aggregate query.
[
{
city: "Mumbai",
meta: {
luggage: 2,
scanLuggage: 1,
upiLuggage: 1
},
cash: 10
},
{
city: "Mumbai",
meta: {
luggage: 4,
scanLuggage: 3,
upiLuggage: 1
},
cash: 24
},
]
I want to $match the above on the basis of city, and return the sum of each luggage type.
My code is as follows but $project is not working -
City.aggregate([
{
$match: { city: 'Mumbai' }
},
{
$project: {
city: 1,
mata.luggage: 1,
meta.scanLuggage: 1,
meta.upiLuggage: 1
}
},
{
$group: {
id: city,
luggage: {$sum: '$meta.luggage'},
scanLuggage: {$sum: '$meta.scanLuggage'},
upiLuggage: {$sum: '$meta.upiLuggage'}
}
}
])
But the $project is throwing error. I want my output to look like -
{
city: 'Mumbai',
luggage: 6,
scanLuggage: 4,
upiLuggage: 2
}
You should specify nested fields in quotes when using in $project, and also for grouping key should be _id.
db.collection.aggregate([
{
$match: {
city: "Mumbai"
}
},
{
$project: {
city: 1,
"meta.luggage": 1,
"meta.scanLuggage": 1,
"meta.upiLuggage": 1
}
},
{
$group: {
_id: "$city",
luggage: {
$sum: "$meta.luggage"
},
scanLuggage: {
$sum: "$meta.scanLuggage"
},
upiLuggage: {
$sum: "$meta.upiLuggage"
}
}
}
])
This is the playground link.

MongoDB- arrays from aggregation result

I have the following MongoDB query:
db.my_collection.aggregate([
{
$group: {"_id":"$day", count: { $sum: "$myValue" }
}}])
It returns the following result:
{
"_id" : ISODate("2020-02-10T00:00:00.000+01:00"),
"count" : 10
},
{
"_id" : ISODate("2020-02-01T00:00:00.000+01:00"),
"count" : 2
}
Is it possible to make two arrays from this result as below?
{
"days": [ISODate("2020-02-10T00:00:00.000+01:00"), ISODate("2020-02-01T00:00:00.000+01:00")],
"values": [10, 2]
}
Yes, just add another $group stage:
db.my_collection.aggregate([
{
$group: {
"_id": "$day", count: {$sum: "$myValue"}
}
},
{
$group: {
"_id": null,
days: {$push: "$_id"},
values: {$push: "$count"}
}
}
])

Date range filter in mongodb Groupby aggregation query

I want to filter each group aggregation by different date-range.For dayMonthStatus I want to filter by $currentDate - 1 , for monthStatus - current monthnumber, for weekStatus
- current weeknumber.
Sample json:
{
"createdAt" : "2019-10-02T04:55:13.472Z",
"Day-month" : "2-10",
"Month" : NumberInt(10),
"Year" : NumberInt(2019),
"Week" : NumberInt(39)
}
I have tried $cond operator but get only blank values or error "errmsg" : "An object representing an expression must have exactly one field", below is the groupby aggregation code on which I want to apply filter.
db.collection.aggregate([
// current aggregation stages,
{
$facet: {
"dayMonthStatus": [
{ $group: { _id: { status: "$Ctrans.status", "dayMonth": "$Day-month" }, count: { $sum: 1 } } }
],
"monthStatus": [
{ $group: { _id: { status: "$Ctrans.status", "month": "$Month" }, count: { $sum: 1 } } }
],
"yearStatus": [
{ $group: { _id: { status: "$Ctrans.status", "year": "$Year" }, count: { $sum: 1 } } }
],
"weekStatus": [
{ $group: { _id: { status: "$Ctrans.status", "week": "$Week" }, count: { $sum: 1 } } }
]
}
}
])
I have tried $match in groupby in below format:
"dayMonthStatus": [
{ $group: { _id: { status: "$Customer-transaction.status", "dayMonth": "$Day-month" }, count: { $sum: 1 },
} },{ $match: {"dayMonth": { '$gte': "1-10", '$lt': "3-10"}}}
]

MongoDB use two unwind on aggregate for getting the value of repetition count

I have a dataset like this:
{
"_id" : ObjectId("5bacc9295af10e2764648baa"),
"slug" : ["Maruti", "Honda"],
"page" : "Ford"
},
{
"_id" : ObjectId("5bacc9295af10e2764648bab"),
"slug" : ["Maruti", "Honda", "Tata"],
"page" : "Hyundai"
},
{
"_id" : ObjectId("5bacc9295af10e2764648bac"),
"slug" : ["Maruti"],
"page" : "Ford"
},
{
"_id" : ObjectId("5bacc9295af10e2764648bad"),
"slug" : ["Ford", "Hyundai"],
"page" : "Tata"
}
Now if I want to get the repetition count of Page then I will Do the Aggregate Query Like this:
MyCollectionName.aggregate([
{ $unwind: { path: "$page" } },
{ $group: { _id: "$page", count: { $sum: 1 } } },
{
$project: {
_id: 0,
vehiclename: "$_id",
count: { $multiply: ["$count", 1] }
}
},
{ $sort: { count: -1 } }
])
.then(data => {
console.log(data)
//get the result like this which is fine
[
{ vehiclename : 'Ford', count: 2},
{ vehiclename : 'Hyundai', count: 1},
{ vehiclename : 'Tata', count: 1}
]
})
.catch(e => {
console.log(e)
})
Similarly if I do for Slug then my Query will be like this:
MyCollectionName.aggregate([
{ $unwind: { path: "$slug" } },
{ $group: { _id: "$slug", count: { $sum: 1 } } },
{
$project: {
_id: 0,
vehiclename: "$_id",
count: { $multiply: ["$count", 1] }
}
},
{ $sort: { count: -1 } }
])
.then(data => {
console.log(data)
//get the result like this which is fine
[
{ vehiclename : 'Maruti', count: 3},
{ vehiclename : 'Honda', count: 2},
{ vehiclename : 'Tata', count: 1},
{ vehiclename : 'Ford', count: 1},
{ vehiclename : 'Hyundai', count: 1}
]
})
.catch(e => {
console.log(e)
})
Now I want to do this on Single query Instead of Seperate query.
I am bit confused of using unwind and after getting the both combination value on a single query.
Desired output will be like this:
[
{ vehiclename : 'Maruti', count: 3},
{ vehiclename : 'Ford', count: 3},
{ vehiclename : 'Honda', count: 2},
{ vehiclename : 'Tata', count: 2},
{ vehiclename : 'Hyundai', count: 1}
]
Any help is really Appreciated.
I got the solution. Please notify if I am doing something wrong..
MyCollectionName.aggregate([
{
$facet: {
groupByPage: [
{ $unwind: "$page" },
{
$group: {
_id: "$page",
count: { $sum: 1 }
}
}
],
groupBySlug: [
{ $unwind: "$slug" },
{
$group: {
_id: "$slug",
count: { $sum: 1 }
}
}
]
}
},
{
$project: {
pages: {
$concatArrays: ["$groupByPage", "$groupBySlug"]
}
}
},
{ $unwind: "$pages" },
{
$group: {
_id: "$pages._id",
count: { $sum: "$pages.count" }
}
},
{ $sort: { count: -1 } }
])
.then(data => {
console.log(data)
})
.catch(e => {
console.log(e)
})

count array occurrences across all documents with mongo

Im trying to pull data on a collection of documents which looks like:
[
{
name: 'john',
sex: 'male',
hobbies: ['football', 'tennis', 'swimming']
},
{
name: 'betty'
sex: 'female',
hobbies: ['football', 'tennis']
},
{
name: 'frank'
sex: 'male',
hobbies: ['football', 'tennis']
}
]
I am trying to use the aggregation framework to present the data, split by sex, counting the most common hobbies. The results should look something like.
{ _id: 'male',
total: 2,
hobbies: {
football: 2,
tennis: 2,
swimming: 1
}
},
{ _id: 'female',
total: 1,
hobbies: {
football: 1,
tennis: 1
}
}
So far I can get the total of each sex, but i'm not sure how I could possibly use unwind to get the totals of the hobbies array.
My code so far:
collection.aggregate([
{
$group: {
_id: '$sex',
total: { $sum: 1 }
}
}
])
Personally I am not a big fan of transforming "data" as the names of keys in a result. The aggregation framework principles tend to aggree as this sort of operation is not supported either.
So the personal preference is to maintain "data" as "data" and accept that the processed output is actually better and more logical to a consistent object design:
db.people.aggregate([
{ "$group": {
"_id": "$sex",
"hobbies": { "$push": "$hobbies" },
"total": { "$sum": 1 }
}},
{ "$unwind": "$hobbies" },
{ "$unwind": "$hobbies" },
{ "$group": {
"_id": {
"sex": "$_id",
"hobby": "$hobbies"
},
"total": { "$first": "$total" },
"hobbyCount": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.sex",
"total": { "$first": "$total" },
"hobbies": {
"$push": { "name": "$_id.hobby", "count": "$hobbyCount" }
}
}}
])
Which produces a result like this:
[
{
"_id" : "female",
"total" : 1,
"hobbies" : [
{
"name" : "tennis",
"count" : 1
},
{
"name" : "football",
"count" : 1
}
]
},
{
"_id" : "male",
"total" : 2,
"hobbies" : [
{
"name" : "swimming",
"count" : 1
},
{
"name" : "tennis",
"count" : 2
},
{
"name" : "football",
"count" : 2
}
]
}
]
So the initial $group does the count per "sex" and stacks up the hobbies into an array of arrays. Then to de-normalize you $unwind twice to get singular items, $group to get the totals per hobby under each sex and finally regroup an array for each sex alone.
It's the same data, it has a consistent and organic structure that is easy to process, and MongoDB and the aggregation framework was quite happy in producing this output.
If you really must convert your data to names of keys ( and I still recommend you do not as it is not a good pattern to follow in design ), then doing such a tranformation from the final state is fairly trivial for client code processing. As a basic JavaScript example suitable for the shell:
var out = db.people.aggregate([
{ "$group": {
"_id": "$sex",
"hobbies": { "$push": "$hobbies" },
"total": { "$sum": 1 }
}},
{ "$unwind": "$hobbies" },
{ "$unwind": "$hobbies" },
{ "$group": {
"_id": {
"sex": "$_id",
"hobby": "$hobbies"
},
"total": { "$first": "$total" },
"hobbyCount": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.sex",
"total": { "$first": "$total" },
"hobbies": {
"$push": { "name": "$_id.hobby", "count": "$hobbyCount" }
}
}}
]).toArray();
out.forEach(function(doc) {
var obj = {};
doc.hobbies.sort(function(a,b) { return a.count < b.count });
doc.hobbies.forEach(function(hobby) {
obj[hobby.name] = hobby.count;
});
doc.hobbies = obj;
printjson(doc);
});
And then you are basically processing each cursor result into the desired output form, which really isn't an aggregation function that is really required on the server anyway:
{
"_id" : "female",
"total" : 1,
"hobbies" : {
"tennis" : 1,
"football" : 1
}
}
{
"_id" : "male",
"total" : 2,
"hobbies" : {
"tennis" : 2,
"football" : 2,
"swimming" : 1
}
}
Where that should also be fairly trival to implement that sort of manipulation into stream processing of the cursor result to tranform as required, as it is basically just the same logic.
On the other hand, you can always implement all the manipulation on the server using mapReduce instead:
db.people.mapReduce(
function() {
emit(
this.sex,
{
"total": 1,
"hobbies": this.hobbies.map(function(key) {
return { "name": key, "count": 1 };
})
}
);
},
function(key,values) {
var obj = {},
reduced = {
"total": 0,
"hobbies": []
};
values.forEach(function(value) {
reduced.total += value.total;
value.hobbies.forEach(function(hobby) {
if ( !obj.hasOwnProperty(hobby.name) )
obj[hobby.name] = 0;
obj[hobby.name] += hobby.count;
});
});
reduced.hobbies = Object.keys(obj).map(function(key) {
return { "name": key, "count": obj[key] };
}).sort(function(a,b) {
return a.count < b.count;
});
return reduced;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
var obj = {};
value.hobbies.forEach(function(hobby) {
obj[hobby.name] = hobby.count;
});
value.hobbies = obj;
return value;
}
}
)
Where mapReduce has it's own distinct style of output, but the same principles are used in accumulation and manipulation, if not likely as efficient as the aggregation framework can do:
"results" : [
{
"_id" : "female",
"value" : {
"total" : 1,
"hobbies" : {
"football" : 1,
"tennis" : 1
}
}
},
{
"_id" : "male",
"value" : {
"total" : 2,
"hobbies" : {
"football" : 2,
"tennis" : 2,
"swimming" : 1
}
}
}
]
At the end of the day, I still say that the first form of processing is the most efficient and provides to my mind the most natural and consistent working of the data output, without even attempting to convert the data points into the names of keys. It's probably best to consider following that pattern, but if you really must, then there are ways to manipulate results into a desired form in various approaches to processing.
Since mongoDB version 3.4 you can use $reduce avoid the first grouping by sex which means holding the entire collection in t2o documents. You can also avoid the need for code, by using $arrayToObject
db.collection.aggregate([
{
$group: {
_id: {sex: "$sex", hobbies: "$hobbies"},
count: {$sum: 1},
totalIds: {$addToSet: "$_id"}
}
},
{
$group: {
_id: "$_id.sex",
hobbies: {$push: {k: "$_id.hobbies", v: "$count"}},
totalIds: {$push: "$totalIds"}
}
},
{
$set: {
hobbies: {$arrayToObject: "$hobbies"},
totalIds: {
$reduce: {
input: "$totalIds",
initialValue: [],
in: {$concatArrays: ["$$value", "$$this"]}}
}
}
},
{
$set: {
count: {$size: {$setIntersection: "$totalIds"}},
totalIds: "$$REMOVE"
}
}
])
Which works if you have an ObjectId.
Playground example 3.4
Otherwise, you can start with $unwind and $group, or since mongoDB version 4.4 you can add an ObjectId with a stage:
{
$set: {
o: {
$function: {
"body": "function (x) {x._id=new ObjectId(); return x}",
"args": [{_id: 1}],
"lang": "js"
}
}
}
},
Playground example creating _id
Since mongoDB version 5.0 you can calculate the total using $setWindowFields:
db.collection.aggregate([
{
$setWindowFields: {
partitionBy: "$sex",
output: {totalCount: {$count: {}}}
}
},
{$unwind: "$hobbies"},
{
$group: {
_id: {sex: "$sex", hobbies: "$hobbies"},
count: {$sum: 1},
totalCount: {$first: "$totalCount"}
}
},
{
$group: {
_id: "$_id.sex",
hobbies: {$push: {k: "$_id.hobbies", v: "$count"}},
total: {$first: "$totalCount"}
}
},
{$set: {hobbies: {$arrayToObject: "$hobbies"}}}
])
Playground example 5.0