Will try to keep this concise with the input, result and desired/expected result. Need to find the minimum, maximum number of rows/records between the same "winCode" and the last time it occurred in the ordered data. So it makes me want to first group them by "winCode" which works perfectly, but I am not able to come up with something that would display how many records it took for the same "winCode" to appear last time, the minimum and maximum. Check desired output for more details. Below is the paste from: https://mongoplayground.net/p/bCzTO8ZLxNi
Input/collection
[
{
code: "1",
results: {
winCode: 3
}
},
{
code: "10",
results: {
winCode: 3
}
},
{
code: "8",
results: {
winCode: 2
}
},
{
code: "5",
results: {
winCode: 5
}
},
{
code: "5",
results: {
winCode: 4
}
},
{
code: "6",
results: {
winCode: 4
}
},
{
code: "7",
results: {
winCode: 5
}
},
{
code: "3",
results: {
winCode: 3
}
},
{
code: "9",
results: {
winCode: 2
}
},
{
code: "2",
results: {
winCode: 2
}
}
]
Current query
db.collection.aggregate([
{
$sort: {
code: -1
}
},
{
$group: {
_id: "$results.winCode",
count: {
$sum: 1
},
lastTimeOccurredCode: {
$first: "$code" // Any way to get it to display a count from the start to this point on how many records it went through to get the $first result?
},
}
},
{
$sort: {
_id: -1
}
},
])
Current output
[
{
"_id": 5,
"count": 2,
"lastTimeOccurredCode": "5"
},
{
"_id": 4,
"count": 2,
"lastTimeOccurredCode": "5"
},
{
"_id": 3,
"count": 3,
"lastTimeOccurredCode": "1"
},
{
"_id": 2,
"count": 3,
"lastTimeOccurredCode": "2"
}
]
Desired output
[
{
"_id": 5,
"count": 2,
"lastTimeOccurredRecordsCount": 4,
"minRecordsBetween": 3,
"maxRecordsBetween": 3
},
{
"_id": 4,
"count": 2,
"lastTimeOccurredRecordsCount": 5,
"minRecordsBetween": 1,
"maxRecordsBetween": 1
},
{
"_id": 3,
"count": 3,
"lastTimeOccurredRecordsCount": 1,
"minRecordsBetween": 1,
"maxRecordsBetween": 6
},
{
"_id": 2,
"count": 3,
"lastTimeOccurredRecordsCount": 3,
"minRecordsBetween": 1,
"maxRecordsBetween": 6
}
]
I have tried to add an $accumulator function, but I would need the $first functions result in it, but it's not available at the same $group stage. Feel like I am missing something here.
You can use $setWindowFields to define index and reduce to find the diff between them. If you want the index to be according to {$sort: {code: -1}}, then keep the $setWindowFields sortBy according to this example and remove the redundant {$sort: {code: -1}} step. If you want the index to be according to another sorting logic that only update the $setWindowFields sortBy.
Use $setWindowFields to define index
$sort according to your what you need (if it is different than the prev sort)
$group according to the $results.winCode and keep all index data.
Calculate the diff
Format
db.collection.aggregate([
{$setWindowFields: {
sortBy: {code: -1},
output: {index: {$sum: 1, window: {documents: ["unbounded", "current"]}}}
}},
{$sort: {code: -1}},
{$group: {
_id: "$results.winCode",
count: {$sum: 1},
lastTimeOccurredCode: {$first: "$code"},
index: {$push: "$index"}
}},
{$project: {
count: 1,
lastTimeOccurredCode: 1,
diff: {
$reduce: {
input: {$range: [1, {$size: "$index"}]},
initialValue: [],
in: {$concatArrays: [
"$$value",
[{$subtract: [
{$arrayElemAt: ["$index", "$$this"]},
{$arrayElemAt: ["$index", {$subtract: ["$$this", 1]}]}
]}]
]
}
}
}
}},
{$set: {
minRecordsBetween: {$min: "$diff"},
maxRecordsBetween: {$max: "$diff"},
diff: "$$REMOVE"
}},
{$sort: {_id: -1}}
])
See how it works on the playground example
Related
I'm having a claim type:
type TClaim: {
insuredId: number,
treatmentInfo: { amount: number }[]
}
and a list of claims:
[
{
insuredId: 1,
treatmentInfo: [{amount: 1}, {amount: 2}]
},
{
insuredId: 1,
treatmentInfo: [{amount: 3}, {amount: 4}]
},
{
insuredId: 2,
treatmentInfo: [{amount: 1}, {amount: 2}]
}
]
I want to get the result like:
[{insuredId: 1, numberOfClaims: 2, amount: 10},{insuredId: 2, numberOfClaims: 1, amount: 3}]
I'm using the $facet operator in mongodb aggregation, one for counting numberOfClaims and one for calculating the amount of each insurer. But I can't combine it to get the result that I want.
$facet: {
totalClaims: [ { $group: { _id: '$insuredId', totalClaims: { $count: {} } } } ],
amount: [ { $unwind: { path: '$treatmentInfo'}},
{ $group:
{ _id: '$insuredId',
amount: { $sum: '$treatmentInfo.amount',
},
},
},
]
Is there a reason why you want to use $facet? - I am just curious
You just need to add a new fields that sums up all the amount in the array first and then do a group stage by insuredId. The query is pretty much self-explanatory.
db.collection.aggregate([
{
"$addFields": {
"totalAmount": {
"$sum": "$treatmentInfo.amount"
}
}
},
{
"$group": {
"_id": "$insuredId",
"numberOfClaims": {
"$sum": 1
},
"amount": {
"$sum": "$totalAmount"
}
}
}
])
Result:
[
{
"_id": 1,
"amount": 10,
"numberOfClaims": 2
},
{
"_id": 2,
"amount": 3,
"numberOfClaims": 1
}
]
MongoDB Playground
I have 4 products. I want to know the count of product-4 for users who has product-1 or product-2
Sample data:
[
{
"user_id": 1,
"product_type": "product-1"
},
{
"user_id": 1,
"product_type": "product-4"
},
{
"user_id": 1,
"product_type": "product-4"
},
{
"user_id": 2,
"product_type": "product-1"
}
]
user-1 has two product-4 and one product-1 (that counts 2)
user-2 has only product-1, but no product-4 (hence that does not count)
This is how I tried
db.collection.aggregate([
{
$match: {
product_type: {
$in: [
"product-1",
"product-2",
],
},
},
},
{
$group: {
_id: "$user_id",
},
},
{
$match: {
user_id: { $in: "$_id"}, // I want to use $group's result in here
product_type: "product-4",
},
}
]);
Expected results are:
[
{
"_id": 1,
"count": 2
},
{
"_id": 2,
"count": 0
}
]
Note:
I dont have a backend, I have to this using mongodb only.
Does this answer your question?
db.collection.aggregate([
{$group: {_id: "$user_id", data: {$push: "$product_type"}}},
{$match: {$expr: {$or: [
{$in: ["product-1", "$data"]},
{$in: ["product-2", "$data"]}
]}}},
{$project: {
count: {
$size: {
$filter: {
input: "$data",
cond: {$eq: ["$$this", "product-4"]}
}
}
}
}}
])
See how it works on the playground example
I have the following documents:
_id: "Team 1"
count: 1200
_id: "Team 2"
count: 1170
_id: "Team 3"
count: 1006
_id: "Team 4"
count: 932
_id: "Team 5"
count: 931
_id: "Team 6"
count: 899
_id: "Team 7"
count: 895
The list is already sorted and everything, I just need to project this as an array of top 5 based on count and then the rest should be summed as 'others'. If possible I'd like to also add the percentage that each element in the list makes up of the full count. Like this:
[
{"name":"Team 1", "count":1200, "percent":25},
{"name":"Team 2", "count":1170,"percent":15},
{"name":"Team 3", "count":1006,"percent":10},
{"name":"Team 4", "count":932,"percent":5},
{"name":"Team 5", "count":931,"percent":5},
{"name":"Other", "count":1794, "percent":40}]
]
Query
$setWindowFields to sort and add the sort-rank to each document
group by null with 3 accumulators
push the first 5 documents unchanged
sum the count of the rest (rank>5)
total sum
$map to divide the counts with the total sum for the 5 top documents, to get the percentage also
add also the percentage for the rest of documents
unwind and replace the root, with those documents that have count and percentage
Playmongo (put the mouse at the end of each stage to see the stage in and out)
aggregate(
[{"$setWindowFields":
{"output": {"rank": {"$rank": {}}}, "sortBy": {"count": -1}}},
{"$group":
{"_id": null,
"top5":
{"$push": {"$cond": [{"$lte": ["$rank", 5]}, "$$ROOT", "$$REMOVE"]}},
"other": {"$sum": {"$cond": [{"$lte": ["$rank", 5]}, 0, "$count"]}},
"all": {"$sum": "$count"}}},
{"$project":
{"_id": 0,
"docs":
{"$concatArrays":
[{"$map":
{"input": "$top5",
"in":
{"name": "$$this._id",
"count": "$$this.count",
"percentage":
{"$multiply": [{"$divide": ["$$this.count", "$all"]}, 100]}}}},
[{"name": "other",
"count": "$other",
"percentage":
{"$multiply": [{"$divide": ["$other", "$all"]}, 100]}}]]}}},
{"$unwind": "$docs"}, {"$replaceRoot": {"newRoot": "$docs"}}])
another way to do it using $facet since $setWindowFields only works with mongodb v5 or later
mongoPlayground
db.collection.aggregate([
{ $sort: { count: -1 } },
{
"$facet": {
others: [
{ "$skip": 5 },
{
"$group": {
"_id": "others",
"count": { "$sum": "$count" }
}
}
],
top5: [ { "$limit": 5 } ]
}
},
{
"$project": { result: { "$concatArrays": [ "$others", "$top5" ] } }
},
{
"$addFields": { totalCount: { "$sum": "$result.count" } }
},
{ $unwind: "$result" },
{
$project: {
_id: "$result._id",
count: "$result.count",
percent: {
$round: [
{ "$multiply": [ { $divide: [ "$result.count", "$totalCount" ] }, 100 ] },
0
]
}
}
}
])
If you have mongoDB version 5.0 or higher you can use $setWindowFields like in #Takis nice answer. Otherwise, you can group, $slice and $reduce your way to the answer:
$sort to have the highest count on top and group to put them all in one array called all and to $sum up.
$slice the all array to keep only the top N.
$reduce the top N to sum them up.
Add the others to the top N array with count sum-sum(topN)
$unwind and format
db.collection.aggregate([
{$sort: {count: -1}},
{$group: {_id: null, all: {$push: "$$ROOT"}, sum: {$sum: "$count"}}},
{$project: {_id: null, sum: 1, res: {$slice: ["$all", 5]}}},
{$project: {sum: 1, res: 1, topN: {
$reduce: {
input: "$res",
initialValue: 0,
in: {$add: ["$$value", "$$this.count"]}
}
}
}
},
{
$project: {_id: 0, sum: 1, res: {
$concatArrays: [
[{_id: "other", count: {$subtract: ["$sum", "$topN"]}}],
"$res"
]
}
}
},
{$unwind: "$res"},
{$project: {_id: "$res._id", count: "$res.count",
percent: { $round: [{$multiply:
[{$divide: ["$res.count", "$sum"]}, 100]}, 0]
}
}
}
])
Playground example
I'm trying to get a list of current holders at specific times from a collection. My collection looks like this:
[
{
"time": 1,
"holdings": [
{ "owner": "A", "tokens": 2 },
{ "owner": "B", "tokens": 1 }
]
},
{
"time": 2,
"holdings": [
{ "owner": "B", "tokens": 2 }
]
},
{
"time": 3,
"holdings": [
{ "owner": "A", "tokens": 3 },
{ "owner": "B", "tokens": 1 },
{ "owner": "C", "tokens": 1 }
]
},
{
"time": 4,
"holdings": [
{ "owner": "C", "tokens": 0 }
]
}
]
tokens show the current holdings of an owner if the holdings have changed to the last document. I would like to change the collection so that holdings always includes the full current holdings for any point in time.
At time: 1, the holdings are: A: 2, B: 1.
At time: 2, the holdings are: A: 2, B: 2. The collections does not include A's holdings however, because they haven't changed. So what I'd like to get is:
[
{
"time": 1,
"holdings": [
{ "owner": "A", "tokens": 2 },
{ "owner": "B", "tokens": 1 }
]
},
{
"time": 2,
"holdings": [
{ "owner": "A", "tokens": 2 }, // merged from prev doc.
{ "owner": "B", "tokens": 2 }
]
},
{
"time": 3,
"holdings": [
{ "owner": "A", "tokens": 3 },
{ "owner": "B", "tokens": 1 },
{ "owner": "C", "tokens": 1 }
]
},
{
"time": 4,
"holdings": [
{ "owner": "A", "tokens": 3 }, // merged from prev
{ "owner": "B", "tokens": 1 }, // merged from prev
{ "owner": "C", "tokens": 0 }
]
}
]
From what I understand $mergeObjects does that, but I don't understand how I can merge all previous docs in order up to the current doc for each doc. So I'm looking for a way to combine setWindowFields with mergeObjects I think.
This is a nice challenge.
So far, I got this complicated solution:
Get all of our timestamps in all of our documents. This is the purpose of the first 4 steps. $setWindowFields is used to accumulate this data.
$group by owner and calculate the empty timestamps as wantedTimes- next 5 steps.
$set empty timestamps with tokens: null to be filled with actual data and $unwind to separate - next 3 steps
Use $setWindowFields to find the last known token for each owner at each timestamp.
Fill this last known state for documents with unknown token - 2 steps
$group and format answer:
db.collection.aggregate([
{
$setWindowFields: {
sortBy: {time: 1},
output: {
allTimes: {$addToSet: "$time", window: {documents: ["unbounded", "current"]}
}
}
}
},
{
$setWindowFields: {
sortBy: {time: -1},
output: {
allTimes: {$addToSet: "$allTimes", window: {documents: ["unbounded", "current"]}
}
}
}
},
{
$set: {
allTimes: {
$reduce: {
input: "$allTimes",
initialValue: [],
in: {"$concatArrays": ["$$value", "$$this"]}
}
}
}
},
{$set: {allTimes: {$setIntersection: "$allTimes"}}},
{$unwind: "$holdings"},
{$sort: {time: 1}},
{$group: { _id: "$holdings.owner",
tokens: {$push: {tokens: "$holdings.tokens", time: "$time"}},
times: {$push: "$time"}, firstTime: {$first: "$time"},
allTimes: {$first: "$allTimes"}}
},
{
$addFields: {
wantedTimes: {
$filter: {
input: "$allTimes",
as: "item",
cond: {$gte: ["$$item", "$firstTime"]}
}
}
}
},
{
$project: {
tokens: 1,
wantedTimes: {$setDifference: ["$wantedTimes", "$times"]}
}
},
{
$set: {
data: {
$map: {
input: "$wantedTimes",
as: "item",
in: {time: "$$item", tokens: null}
}
}
}
},
{$project: {tokens: {"$concatArrays": ["$tokens", "$data"]}}},
{$unwind: "$tokens"},
{
$setWindowFields: {
partitionBy: "$_id",
sortBy: {"tokens.time": 1},
output: {
lastTokens: {
$push: "$tokens.tokens",
window: {documents: ["unbounded", "current"]}
}
}
}
},
{
$set: {
lastTokens: {
$filter: {
input: "$lastTokens",
as: "item",
cond: {$ne: ["$$item", null]}
}
}
}
},
{
$set: {
"tokens.tokens": {$ifNull: ["$tokens.tokens", {$last: "$lastTokens"}]}
}
},
{
$group: {
_id: "$tokens.time",
holdings: {$push: {owner: "$_id", tokens: "$tokens.tokens" }}
}
},
{$project: {time: "$_id", holdings: 1, _id: 0}},
{$sort: {time: 1}}
])
Playground example
From a performance perspective I recommend you split it into 2 calls, the first will be a quick findOne just to get the maximum time value in the collection.
Once you have that value the pipeline can be much leaner:
const maxItem = await db.collection.findOne({}).sort({ time: -1 });
db.collection.aggregate([
{
$unwind: "$holdings"
},
{
$group: {
_id: "$holdings.owner",
times: {
$push: {
time: "$time",
tokens: "$holdings.tokens"
}
},
minTime: {
$min: "$time"
}
}
},
{
$addFields: {
times: {
$reduce: {
input: {
$range: [
"$minTime",
maxItem.time + 1 // this is max time
]
},
initialValue: {
values: [],
lastIndex: 0
},
in: {
values: {
"$concatArrays": [
"$$value.values",
[
{
$cond: [
{
$in: [
"$$this",
"$times.time"
]
},
{
"$arrayElemAt": [
"$times",
"$$value.lastIndex"
]
},
{
"$mergeObjects": [
{
tokens: 0
},
{
"$arrayElemAt": [
"$times",
{
$subtract: [
"$$value.lastIndex",
1
]
}
]
},
{
time: "$$this"
}
]
}
]
}
]
]
},
lastIndex: {
$cond: [
{
$in: [
"$$this",
"$times.time"
]
},
{
$sum: [
"$$value.lastIndex",
1
]
},
"$$value.lastIndex"
]
}
}
}
}
}
},
{
$unwind: "$times.values"
},
{
$group: {
_id: "$times.values.time",
holdings: {
$push: {
owner: "$_id",
tokens: "$times.values.tokens"
}
}
}
},
{
$project: {
_id: 0,
time: "$_id",
holdings: 1
}
},
{
$sort: {
time: 1
}
}
])
This is still quite a heavy query as it requires to $unwind and $group the entire collection, however there is no workaround this due to the requirements. if the collection is too big for this approach I recommend iteration owner by owner, or time by time and doing separate updates accordingly.
Mongo Playground
If you don't care about performance at all and want it in a single query you can still use the same pipeline, you will have to first extract the max time in the collection, this will require you to add an initial $group stage, like so:
db.collection.aggregate([
{
$group: {
_id: null,
maxTime: {
$max: "$time"
},
roots: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$roots"
},
{
$replaceRoot: {
newRoot: {
"$mergeObjects": [
"$roots",
{
maxTime: "$maxTime"
}
]
}
}
},
... same pipeline ...
])
I have a mongodb database with a collection of companies that look like this (it's just a sample, the actual collection is much larger):
[
{
"_id": 100,
"name": "Test Name 1",
"level": "1"
},
{
"_id": 101,
"name": "Test Name 2",
"level": "1"
},
{
"_id": 102,
"name": "Test Name 3",
"level": "2"
}
]
Where "level" can only range from 0 to 5
I'm trying to make an aggregate query with $group and $project that counts how many companies there are in each level, but according to the API specification I need follow, it needs to be formatted like this, in a single object:
{
"metrics": {
"companies": {
"total": <integer>,
"level1": <integer>,
"level2": <integer>,
"level3": <integer>,
"level4": <integer>,
"level5": <integer>
}
}
}
The closest I could get to this was using $group and $project like this:
Companies.aggregate([{
$group: {
_id: {
level: "$level"
},
count: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
level: "$_id.level",
total: "$count"
}
}
])
Which gives the following result:
[
{
"level": 3,
"total": 108
},
{
"level": 5,
"total": 172
},
{
"level": 2,
"total": 624
},
{
"level": 4,
"total": 98
},
{
"level": 1,
"total": 137
},
{
"level": 0,
"total": 94
}
]
However, this result is an array and I need to put the data for each level in a single object with new keys "level1", "level2", etc, according to the specification.
I believe I need to make another $group operation but I couldn't find out how to do it.
Any ideas?
I'm not sure If I understand, but I suppose you just need to map it, like here:
> var aux = new Object;
> db.Companies.aggregate([
{
$group: {
_id: {
level: "$level"
},
count: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
level: "$_id.level",
total: "$count"
}
}
]).forEach(function(a){aux["level"+a.level] = a.total;});
> printjson(aux);
{ "level2" : 1, "level1" : 2 }
I believe there could be better solution, but this one is working:
db.companies.aggregate([{
$group:{_id:{level: "$level"}, count: {$sum: 1}}},
{$group:{"_id": 0, levels: {$push: {_id:"$_id.level", count: "$count"}}, total: {$sum: "$count"}}},
{$unwind: "$levels"},
{$sort: {"levels._id": 1}},
{$group:{_id: 0, levels: {$push: {levels:"$levels.count"}}, "total": {$avg:"$total"}}},
{$project: {total: "$total", level1: {$arrayElemAt: ["$levels",0]}, level2: {$arrayElemAt: ["$levels", 1]}, level3: {$arrayElemAt: ["$levels",2]}, level4: {$arrayElemAt: ["$levels",3]},level5: {$arrayElemAt: ["$levels",4]} }},
{$project: {_id: 0, metrics: {companies: {total: "$total", level1: "$level1.levels", level2: "$level2.levels", level3: "$level3.levels",level4: "$level4.levels", level5: "$level5.levels"}}}}
])
Returned result:
{ "metrics" :
{ "companies" :
{ "total" : 7,
"level1" : 1,
"level2" : 2,
"level3" : 2,
"level4" : 1,
"level5" : 1
} } }