Aggregation with mongodb - mongodb

We are saving player stats for each match in MongoDb.
{idPlayer: 27, idTeam: 6, matchId: 1, score: 90},
{idPlayer:38, idTeam: 9, matchId:1, score: 6},
{idPlayer:5, idTeam:8, matchId:2, score: 20}
We want to know how many matches a team has played:
We want result as:
{idTeam, sumMatches}
{idTeam: 8, sumMatches: 6}
{idTeam: 9, sumMatches: 4}
We are tryning with aggregations but we don't get this result.
Any idea how to aproach this issue?

This should do it:
db.collection.aggregate([
{
$group: {
_id: "$idTeam",
matches: {
$addToSet: "$matchId"
}
}
},
{
$project: {
_id: 0,
idTeam: "$_id",
sumMatches: {
$size: "$matches"
}
}
}
])

Related

MongoDB - Get rank of the document based on frequency

[
{_id: 1, query: 'A', createdAt: 1660610671 },
{_id: 2, query: 'A', createdAt: 1660610672 },
{_id: 3, query: 'A', createdAt: 1660610673 },
{_id: 4, query: 'A', createdAt: 1660610674 },
{_id: 5, query: 'B', createdAt: 1660610675 },
{_id: 6, query: 'C', createdAt: 1660610676 },
{_id: 7, query: 'C', createdAt: 1660610677 },
{_id: 8, query: 'C', createdAt: 1660610678 },
{_id: 9, query: 'D', createdAt: 1660610680 },
{_id: 10, query: 'D', createdAt: 1660610681 },
]
I have the above database structure. I want to get rank from the frequency of the query value in a specific period.
Maybe it would be something like this.
Queries.getRank({ key: 'query', createdAt: {$gte: startUnix, $lt: endUnix } })
I expect the result as below.
Rank
[
{rank: 1, query: 'A', frequency: 4},
{rank: 2, query: 'C', frequency: 3},
{rank: 3, query: 'D', frequency: 2},
{rank: 4, query: 'B', frequency: 1}
]
Is there a way to achieve it? Thanks.
$match - Filter document within the range for createdAt field (if needed).
$group - Group by query and perform $count as frequency.
$project - Decorate the output document(s).
$setWindowFields - With $rank to perform ranking by sorting frequency descending. May consider $denseRank for the document with the same rank.
db.collection.aggregate([
// $match stage
{
$group: {
_id: "$query",
frequency: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
query: "$_id",
frequency: "$frequency"
}
},
{
$setWindowFields: {
partitionBy: null,
sortBy: {
frequency: -1
},
output: {
rank: {
$rank: {}
}
}
}
},
])
Demo # Mongo Playground
You can write the following aggregation pipeline:
db.collection.aggregate([
{
"$group": {
"_id": "$query",
"frequency": {
"$sum": 1
}
}
},
{
"$project": {
"query": "$_id",
"frequency": 1,
"_id": 0
}
},
{
"$sort": {
frequency: -1
}
},
{
"$group": {
"_id": null,
"array": {
"$push": "$$ROOT"
}
}
},
{
"$unwind": {
path: "$array",
"includeArrayIndex": "rank"
}
},
{
"$project": {
_id: 0,
rank: {
"$add": [
"$rank",
1
]
},
frequency: "$array.frequency",
query: "$array.query"
}
}
]);
Playground link.
In this, we first calculate the frequency for each query, then we sort it by the frequency, and finally, we push all documents in an array and calculate the rank, using array index.

Mongo db aggregation - $push and $slice top results

I have the following documents in my db:
{uid: 1, score: 10}
{uid: 2, score: 11}
{uid: 3, score: 1}
{uid: 4, score: 6}
{uid: 5, score: 2}
{uid: 6, score: 3}
{uid: 7, score: 8}
{uid: 8, score: 10}
I want to split them into buckets by score - i.e.:
score
uids
(bucket name in aggregation)
[0,4)
3,5,6
0
[4,7)
4
4
[7,inf
1,2,7,8
7
For this, I created the following aggregation which works just fine:
db.scores.aggregation(
[
{
$bucket:
{
groupBy: "$score",
boundaries: [0, 4, 7],
default: 7,
output:
{
"total": {$sum: 1},
"top_frustrated":
{
$push: {
"uid": "$uid", "score": "$score"
}
},
},
}
},
]
)
However, I would like to return only the top 3 of every bucket - i.e, buckets 0, 4 should be the same, but bucket 7 should have only uids 1,2,8 returned (as uid 7 has the lowest score) - but to include the total count of documents as well, i.e. output of bucket "7" should look like:
{ "total" : 4, "top_scores" :
[
{"uid" : 2, "score" : 11},
{"uid" : 1, "score" : 10},
{"uid" : 8, "score" : 10},
]
}
I tried using $addFields with $sortArray and $slice, but it either won't work or return errors.
I can of course use $project but I was wondering if there is a more efficient way.
I am using Amazon DocumentDB.
You can use the $topN accumulator, instead of $push, like this:
db.collection.aggregate([
{
"$bucket": {
"groupBy": "$score",
"boundaries": [
0,
4,
7
],
"default": 7,
"output": {
"total": {
"$sum": 1
},
"top_frustrated": {
"$topN": {
"n": 3,
"sortBy": {
"score": -1
},
"output": {
"uid": "$uid",
"score": "$score"
}
}
}
},
}
},
])
Playground link.
The only catch here is this operator is present in MongoDB 5.2 and above.
For older versions, this will work:
db.collection.aggregate([
{
"$sort": {
score: -1
}
},
{
$bucket: {
groupBy: "$score",
boundaries: [
0,
4,
7
],
default: 7,
output: {
"total": {
$sum: 1
},
"top_frustrated": {
$push: {
"uid": "$uid",
"score": "$score"
}
},
},
}
},
{
"$project": {
total: 1,
top_frustrated: {
"$slice": [
"$top_frustrated",
3
]
}
}
}
])
Playground link.

group by date in mongoDB while spliting each date's data to two categories

I am working on a problem similar to this and the answer posted there helped me figure out how I can work with my problem.
Additional question I have is - how can I modify my query if I want my output in the following format? :
{"label": "2020-08-21", "value": 5, "A": 2, "Others": 3}
(example output {"label": "2020-08-21", "value": 7, India: 3, Others: 4})
Thank you in advance!! :)
In this case you can simply group:
db.collection.aggregate([
{
$project: {
createdAt: {$dateToString: {format: "%Y-%m-%d", date: "$timestamp"}},
recipe: 1
}
},
{
$group: {
_id: "$createdAt",
A: {$sum: {$cond: [{$eq: ["$recipe", "A"]}, 1, 0]}},
value: {$sum: 1}
}
},
{
$project: {
_id: 0, label: "$_id", value: 1, A: 1, other: {$subtract: ["$value", "$A"]}
}
}
])
Playground example

It doesn't update and doesn't show any errors

I have a structure that looks like this:
{
_id: 10,
line_items: [
{
_id: 2,
name: "name",
quantity: 2,
},
{
_id: 3,
name: "name2",
quantity: 1,
}
],
sub_total: 100
}
And i'm trying to do an update:
query={_id: 10, 'line_items.$._id': 2}
db.orders.update(query, {$push: {$inc: {'line_items.$.quantity': 1}}, $inc: {sub_total: 32}})
But it doesn't do anything and doesn't show any errors. What's wrong?
There are several issues with your attempt:
you need to use $elemMatch when querying array fields for your case
your $push is incorrect. you can simply use $inc
Here is a working solution:
db.collection.update({
_id: 10,
line_items: {
$elemMatch: {
_id: 2
}
}
},
{
$inc: {
"line_items.$.quantity": 1,
sub_total: 32
}
})
Here is the Mongo playground for your reference.

Duplicate elements in a mongo db collection

Is there an quick efficient way to duplicate elements in a mongo db collections based on a property. In the example below, I am trying to duplicate the elements based on a jobId.
I am using Spring boot, so any example using Spring boot API would be even more helpful.
Original Collection
{ _id: 1, jobId: 1, product: "A"},
{ _id: 2, jobId: 1, product: "B"},
{ _id: 3, jobId: 1, product: "C"},
After duplication
{ _id: 1, jobId: 1, product: "A"},
{ _id: 2, jobId: 1, product: "B"},
{ _id: 3, jobId: 1, product: "C"},
{ _id: 4, jobId: 2, product: "A"},
{ _id: 5, jobId: 2, product: "B"},
{ _id: 6, jobId: 2, product: "C"},
You can use following aggregation:
db.col.aggregate([
{
$group: {
_id: null,
values: { $push: "$$ROOT" }
}
},
{
$addFields: {
size: { $size: "$values" },
range: { $range: [ 0, 3 ] }
}
},
{
$unwind: "$range"
},
{
$unwind: "$values"
},
{
$project: {
_id: { $add: [ "$values._id", { $multiply: [ "$range", "$size" ] } ] },
jobId: { $add: [ "$values.jobId", "$range" ] },
product: "$values.product",
}
},
{
$sort: {
_id: 1
}
},
{
$out: "outCollection"
}
])
The algorithm is quite simple here: we want to iterate over two sets:
first one defined by all items from your source collection (that's why I'm grouping by null)
second one defined artificially by $range operator. It will define how many times we want to multiply our collection (3 times in this example)
Double unwind generates as much documents as we need. Then the formula for each _id is following: _id = _id + range * size. Last step is just to redirect the aggregation output to your collection.