Finding top N entries from the Array - mongodb

My collection is structured like this:
{
"_id": 1,
"Trips": [
{
"EndID": 5,
"Tripcount": 12
},
{
"EndID": 6,
"Tripcount": 19
}
],
"_id": 2,
"Trips": [
{
"EndID": 4,
"Tripcount": 12
},
{
"EndID": 5,
"Tripcount": 19
}
], ...
}
As it can be seen, every document has a Trips array. Now what I want to find, is the top N Tripcounts of all the Trips arrays combined across the documents in the collection. Is that possible?
I already have the following, however this only takes the single greatest Tripcount from each Trips array and then outputs 50 of them. So actually having the top 2 trips in one Trips array results in this query dropping the second one:
var group = db.eplat1.aggregate([
{ "$unwind": "$Trips"},
{ "$sort": {
"Trips.Tripcount": -1
}
},
{ "$limit": 50 },
{ "$group": {
"_id": 1,
"Trips": {
"$push": {
"Start": "$_id",
"Trips": "$Trips"
}
}
}}
], {allowDiskUse: true})
Note that I believe this problem is different to this one, as there only one document is given.

Basically you need to sort the array elements ($unwind/$sort/$group) and then you can do your $sort for the top values and $limit the results.
Finally you $slice for the "top N" in the documents in the array.
db.eplat1.aggregate([
{ "$unwind": "$Trips" },
{ "$sort": { "_id": 1, "Tips.TripCount": -1 } },
{ "$group": {
"_id": "$_id",
"Trips": { "$push": "$Trips" },
"maxTrip": { "$max": "$Trips.TripCount" }
}},
{ "$sort": { "maxTrip": -1 } },
{ "$limit": 50 },
{ "$addFields": { "Trips": { "$slice": [ "$Trips", 0 , 2 ] } } }
])

Related

summing count result in two group

My collection's data are something like this :
[
{
ANumberAreaCode: "+98",
BNumberAreaCode: "+1",
AccountingTime: 1629754886,
Length: 123
},
{
ANumberAreaCode: "+44",
BNumberAreaCode: "+98",
AccountingTime: 1629754786,
Length: 123
},
{
ANumberAreaCode: "+98",
BNumberAreaCode: "+96",
AccountingTime: 1629754886,
Length: 998
}
]
I'm going to group on countries codes and count result (summing country codes in ANumberAreaCode and BNumberAreaCode ) .
This is my group sample :
{ "$group": {
"_id": {
"ANumberAreaCode": "$ANumberAreaCode",
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"BNumberAreaCode": "$BNumberAreaCode",
},
"count": { "$sum": 1 }
}},
now , how can i summing count result of two above queries for common countries ?
I'm looking for a query that give me this result :
+98 : 3
+44 : 1
+1 :1
+96 :1
You can use this aggregation pipeline:
$facet to get both group, by A and B. This creates two objects: groupA and groupB.
Then using $concatArrays into $project stage it will concat two ouputs.
Deconstructs the array using $unwind
And $group again by values using $sum to get the total.
db.collection.aggregate([
{
"$facet": {
"groupA": [
{
"$group": {
"_id": "$ANumberAreaCode",
"total": {
"$sum": 1
}
}
}
],
"groupB": [
{
"$group": {
"_id": "$BNumberAreaCode",
"total": {
"$sum": 1
}
}
}
]
}
},
{
"$project": {
"result": {
"$concatArrays": [
"$groupA",
"$groupB"
]
}
}
},
{
"$unwind": "$result"
},
{
"$group": {
"_id": "$result._id",
"total": {
"$sum": "$result.total"
}
}
}
])
Example here

MongoDB - Group by number, and then match by max of all groups

In my MondoDB, I would like to group my data by a number (machine_quality), and then compare this number with maximum value of ALL machine_quality, not just maximum value per every single group.
My nonworking query:
db.records.aggregate([
{
'$group': {
'_id': '$machine_quality',
'total': {'$sum': 1}
}
},
{
'$match': {
'_id': {
'$gte': {
'$subtract': [{'$max': '$_id'}, 3]
}
}
}
}
])
Question:
Part of query {'$max': '$_id'} only reffers to each group separately, and therefore will be always equal to group's _id. However I would like max to compare with maximum _id across ALL groups. Is there any convenient way to do that?
Any thoughts appreciated.
One way to do this is to use $facet, this way you can do 2 "parallel looking" group into 1 pipeline. (the second group will be your group, group by null is to find the global max)
Test code here
Query (after the facet,you can unwind your groups)
db.collection.aggregate([
{
"$facet": {
"global_max": [
{
"$group": {
"_id": null,
"m": {
"$max": "$machine_quality"
}
}
},
{
"$project": {
"_id": 0
}
}
],
"groups": [
{
"$group": {
"_id": "$machine_quality",
"names": {
"$push": "$name"
}
}
},
{
"$addFields": {
"machine_quality": "$_id"
}
},
{
"$project": {
"_id": 0
}
}
]
}
},
{
"$project": {
"global_max": {
"$let": {
"vars": {
"v": {
"$arrayElemAt": [
"$global_max",
0
]
}
},
"in": "$$v.m"
}
},
"groups": 1
}
}
])
This has the limitations of $facet 16MB document size see

Single array of objects sort and slice not working

I have a single entry on a collection like this:
{
"_id" : ObjectId("60c6f7a5ef86bd1a5402e928"),
"cid" : 1,
"array1" : [
{ "type": "car", value: 20 },
{ "type": "bike", value: 50 },
{ "type": "bus", value: 5 },
{ "type": "cycle", value: 100 },
...... 9000 more entry something like this
],
"array2" : [
{ "type": "laptop", value: 200 },
{ "type": "desktop", value: 15 },
{ "type": "tablet", value: 55 },
{ "type": "mobile", value: 90 },
...... 9000 more entry something like this
]
}
Now I want to sort and slice the data for the pagination purpose.
For that I wrote the query which works well on slice case but not on sort case.
This is my query which works for slice case
let val = await SomeCollectionName.findOne(
{ cid: 1 },
{ _id: 1 , array1: { $slice: [0, 10] } } ---> its return the 10 data. Initially it return from 0 to 10, then next call $slice: [10, 10]
).exec();
if (val) {
//console.log('Got the value')
}
console.log(error)
This is my query When I add sort with slice
let val = await SomeCollectionName.findOne(
{ cid: 1 },
{ _id: 1 , array1: { $sort: { value: -1 }, $slice: [0, 10] } }
).exec();
if (val) {
//console.log('Got the value')
}
console.log(error)
Is there anyone who guide me where I'm wrong or suggest me what is the efficient way for getting the data.
UPDATE
I am getting the answer from the above question and looking for the same implementation for two array.
Everything is same. Earlier I was dealing with 1 array now this time I have to deal with two array.
Just curious to know that how these things happen
I wrote the aggregation query but one array results is fine but others are returning the same data throughout the array.
This is my query as per the suggestion of dealing with single array with sort and slice
db.collection.aggregate([
{
"$match": {
"cid": 1
}
},
{
$unwind: "$array1"
},
{
$unwind: "$array2"
},
{
"$sort": {
"array1.value": -1,
"array2.value": -1,
}
},
{
$skip: 0
},
{
$limit: 3
},
{
$group:{
"_id":"$_id",
"array1":{$push:"$array1"},
"array2":{$push:"$array2"}
}
}
])
The issue is that $sort is not supported by findOne() in its projection parameter.
You can instead use aggregation to achieve the expected result,
db.collection.aggregate([
{
"$match": {
"cid": 1
}
},
{
$unwind: "$array1"
},
{
"$sort": {
"array1.value": -1
}
},
{
$skip: 0
},
{
$limit: 3
},
{
$group: {
"_id": "$_id",
"array1": {
$push: {
"type": "$array1.type",
"value": "$array1.value"
}
},
"array2": {
"$first": "$array2"
}
},
},
{
$unwind: "$array2"
},
{
"$sort": {
"array2.value": -1
}
},
{
$skip: 0
},
{
$limit: 3
},
{
$group: {
"_id": "$_id",
"array2": {
$push: {
"type": "$array2.type",
"value": "$array2.value"
}
},
"array1": {
"$first": "$array1"
}
},
}
])
Aggregation
$unwind

MongoDB Get average of group considering rank of document

I have documents getting in order like:
{
"_id": "abcde1",
"value" : 300
},
{
"_id": "abcde2",
"value" : 200
},
{
"_id": "abcde3",
"value" : 400
},
{
"_id": "abcde4",
"value" : 500
},
{
"_id": "abcde5",
"value" : 600
}
i.e,
I want average of "_id" of first 2, first 4 and all 5 documents matching like in single query:
{
"value_2" : 250, // Average of first 2 documents
"value_4" : 350, // Average of first four documents
"value_5" : 400 // Average of all 5 documents
}
Is it possible to Group documents based on rank of document.
I can do 3 results in 3 separate queries. Is it possible in single query?
You could try running the following pipeline:
db.collection.aggregate([
// previous pipeline here
{
"$group": {
"_id": null,
"values": { "$push": "$value" }
}
},
{ "$unwind": { "path": "$values", "includeArrayIndex": "rank" } },
{
"$group": {
"_id": null,
"value_2_sum": {
"$sum": {
"$cond": [
{ "$lt": ["$rank", 2] },
"$values",
0
]
}
},
"value_2_count": {
"$sum": {
"$cond": [
{ "$lt": ["$rank", 2] },
1,
0
]
}
},
"value_4_sum": {
"$sum": {
"$cond": [
{ "$lt": ["$rank", 4] },
"$values",
0
]
}
},
"value_4_count": {
"$sum": {
"$cond": [
{ "$lt": ["$rank", 4] },
1,
0
]
}
},
"value_5": { "$avg": "$values" }
}
},
{
"$project": {
"value_2" : { "$divide": ["$value_2_sum", "$value_2_count"] }, // Average of first 2 documents
"value_4" : { "$divide": ["$value_4_sum", "$value_4_count"] }, // Average of first four documents
"value_5" : 1
}
}
])
You could use a $facet aggregation stage:
// { _id: "abcde1", value: 300 }
// { _id: "abcde2", value: 200 }
// { _id: "abcde3", value: 400 }
// { _id: "abcde4", value: 500 }
// { _id: "abcde5", value: 600 }
db.collection.aggregate([
{ $facet: {
value_2: [ { $limit: 2 }, { $group: { _id: null, value_2: { $avg: "$value" } } } ],
value_4: [ { $limit: 4 }, { $group: { _id: null, value_4: { $avg: "$value" } } } ],
value_5: [ { $limit: 5 }, { $group: { _id: null, value_5: { $avg: "$value" } } } ]
}},
// {
// value_2: [ { _id: null, value_2: 250 } ],
// value_4: [ { _id: null, value_4: 350 } ],
// value_5: [ { _id: null, value_5: 400 } ]
// }
{ $set: {
value_2: { $first: "$value_2.value_2" },
value_4: { $first: "$value_4.value_4" },
value_5: { $first: "$value_5.value_5" }
}}
])
// { "value_2" : 250, "value_4" : 350, "value_5" : 400 }
The $facet stage allows us to run multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.
Each field is thus produced by its own aggregation pipeline whose first stage is a simple $limit, followed by a $group stage that'll produce the $avg (average) of all considered documents.
The second part of the pipeline (the $set stage) is just there to clean-up the $facet output to the format you wished for.

How to find set intersection of sets between the documents in a single collection in MongoDB?

The below collection named "coll" was maintained in the mongodb.
{
{"_id":1, "set":[1,2,3,4,5]},
{"_id":2, "set":[0,2,6,4,5]},
{"_id":3, "set":[1,2,5,10,22]}
}
How to find the intersection of the set elements in the above collection documents with _id's 1 and 3.
Use the aggregation framework to get the desired result. The aggregation set operator that would do the magic is $setIntersection.
The following aggregation pipeline achieves what you are after:
db.test.aggregate([
{
"$match": {
"_id": { "$in": [1, 3] }
}
},
{
"$group": {
"_id": 0,
"set1": { "$first": "$set" },
"set2": { "$last": "$set" }
}
},
{
"$project": {
"set1": 1,
"set2": 1,
"commonToBoth": { "$setIntersection": [ "$set1", "$set2" ] },
"_id": 0
}
}
])
Output:
/* 0 */
{
"result" : [
{
"set1" : [1,2,3,4,5],
"set2" : [1,2,5,10,22],
"commonToBoth" : [1,2,5]
}
],
"ok" : 1
}
UPDATE
For three or more documents to be intersected, you'd need the $reduce operator to flatten the arrays. This will allow you to intersect any number of arrays, so instead of just doing an intersection of the two arrays from docs 1 and 3, this will apply to multiple arrays as well.
Consider running the following aggregate operation:
db.test.aggregate([
{ "$match": { "_id": { "$in": [1, 3] } } },
{
"$group": {
"_id": 0,
"sets": { "$push": "$set" },
"initialSet": { "$first": "$set" }
}
},
{
"$project": {
"commonSets": {
"$reduce": {
"input": "$sets",
"initialValue": "$initialSet",
"in": { "$setIntersection": ["$$value", "$$this"] }
}
}
}
}
])