MongoDB grouping based on intervals - mongodb

I would like to group my data based on number intervals in measurements. Can I do this with the aggregation framework, or with some map-reduce function?
I would like to group by color and whether the size is larger or smaller than 5. I would also want to add e.g. "medium" for sizes between 3 and 5.
I can group by size and color, but then each different size will have its own object.
I know this can be done by checking each different object's size by db.collection.find(), and then adding them according to my specifications, but that would be very slow.
Example:
Objects:
{
color: "red",
size: 2
}
{
color: "red",
size: 4
}
{
color: "blue",
size: 2
}
{
color: "blue",
size: 1
}
{
color: "blue",
size: 7
}
Output:
{
_id: {
color: "red",
size: "small"
}
total size: 6
}
{
_id: {
color: "red",
size: "large"
}
total size: 0
}
{
_id: {
color: "blue",
size: small
}
total size: 3
}
{
_id: {
color: "blue",
size: "large"
}
total size: 7
}

This is easy using $cond:
db.collection.aggregate([
{ "$group": {
"_id": {
"color": "$color",
"size": {
"$cond": [
{ "$lt": [ "$size", 3 ] },
"small",
{ "$cond": [
{ "$lt": [ "$size", 6 ] },
"medium",
"large"
]}
]
}
},
"total_size": { "$sum": "$size" }
}}
])
So just conditionally select the value in the grouping key based on the current value in the document and count.

Related

Comparing 2 fields from $project in a mongoDB pipeline

In a previous post I created a mongodb query projecting the number of elements matching a condition in an array. Now I need to filter this number of elements depending on another field.
This is my db :
db={
"fridges": [
{
_id: 1,
items: [
{
itemId: 1,
name: "beer"
},
{
itemId: 2,
name: "chicken"
}
],
brand: "Bosch",
size: 195,
cooler: true,
color: "grey",
nbMax: 2
},
{
_id: 2,
items: [
{
itemId: 1,
name: "beer"
},
{
itemId: 2,
name: "chicken"
},
{
itemId: 3,
name: "lettuce"
}
],
brand: "Electrolux",
size: 200,
cooler: true,
color: "white",
nbMax: 2
},
]
}
This is my query :
db.fridges.aggregate([
{
$match: {
$and: [
{
"brand": {
$in: [
"Bosch",
"Electrolux"
]
}
},
{
"color": {
$in: [
"grey",
"white"
]
}
}
]
}
},
{
$project: {
"itemsNumber": {
$size: {
"$filter": {
"input": "$items",
"as": "item",
"cond": {
$in: [
"$$item.name",
[
"beer",
"lettuce"
]
]
}
}
}
},
brand: 1,
cooler: 1,
color: 1,
nbMax: 1
}
}
])
The runnable example.
Which gives me this :
[
{
"_id": 1,
"brand": "Bosch",
"color": "grey",
"cooler": true,
"itemsNumber": 1,
"nbMax": 2
},
{
"_id": 2,
"brand": "Electrolux",
"color": "white",
"cooler": true,
"itemsNumber": 2,
"nbMax": 2
}
]
What I expect is to keep only the results having a itemsNumber different from nbMax. In this instance, the second fridge with _id:2 would not match the condition and should not be in returned. How can I modify my query to get this :
[
{
"_id": 1,
"brand": "Bosch",
"color": "grey",
"cooler": true,
"itemsNumber": 1,
"nbMax": 2
}
]
You can put a $match stage with expression condition at the end of your query,
$ne to check both fields should not same
{
$match: {
$expr: { $ne: ["$nbMax", "$itemsNumber"] }
}
}
Playground

MongoDB - Get the first matching document multiple times - Which is better?

Imagine a collection looks like this:
{
created_at: Date,
color: string, // 'blue' | 'red' | 'yellow'
}
I have thousands or millions of documents in a collection with random created_at date-times and different colors.
Now I want to have the latest created document for color 'blue' AND the latest created document for color 'red'.
Which approach is best? (Feel free to suggest another one)
A) Multiple queries in parallel
const latest = await Promise.all([
collection.findOne({ color: 'blue' }, { sort: { created_at: -1 } }),
collection.findOne({ color: 'red' }, { sort: { created_at: -1 } }),
]);
B) One aggregate query - group
const latest = await collection.aggregate([
{ $match: { color: { $in: ['blue', 'red'] } } },
{ $sort: { created_at: -1 } },
{
$group: {
_id: "$color",
latest: { $first: "$$ROOT" }
}
},
]).toArray();
C) One aggregate query - facet
const latest = await collection.aggregate([
{ $match: { color: { $in: ['blue', 'red'] } } },
{ $sort: { created_at: -1 } },
{
$facet: {
blue: [
{ $match: { color: 'blue' } },
{ $limit: 1 },
],
red: [
{ $match: { color: 'red' } },
{ $limit: 1 },
],
}
},
]).toArray();
Which one is the best for performance? What if I want to do this for more than 2 colors?
Sidequestion: I actually wonder about this a lot for $facet operations. Since $facet cannot use indices, it seems better to just do multiple queries in parallel in some situations. If you have lots of documents per group (in this case 'color'), using an index seems useful. So I guess option C is not very good. For option B, I wonder if MongoDB first needs to get all documents that match the generic condition, sort all of them, group all of them, to then only take the first...
Thanks in advance!
None of your solutions use index (which I guess is because you haven't created it).
Create the following index:
{ color: 1, created_at: -1 }
Then you can run explain plan on your queries and see how they behave.
I "think" the first one would be fastest.
It seems options A is the best, at least in the situation I tested.
I did a performance test as follows:
I created a collection with 3.000.000 documents with 2 fields:
created_at: Random Date between 2018-01-01T00:00:00.000Z & 2030-01-01T00:00:00.000Z
color: Random string, 31 different options
I did a query to get the latest created document for each of 17 different colors.
Results
No index
{ color: 1, created_at: -1 }
{ created_at: -1, color: 1 }
A1) One FindOne (shell)
70 ms
60 ms
60 ms
A2) FindOnes in parallel (JS)
21944 ms
543 ms
572 ms
B) Aggregate - Group - First
QueryExceededMemoryLimitNoDiskUseAllowed
9120 ms
10200 ms
C) Aggregate - Facet
QueryExceededMemoryLimitNoDiskUseAllowed
3860 ms
4770 ms
Option A1 is the fastest, but that's just for one color (I couldn't do it in parallel via the shell). So the real winner is option A2! Also note that A2 is done via a locally run JS-script that connects to external DB. It still is way faster than the other options that are done via the shell directly.
I checked the impact of doing 5 findOnes in parallel instead of 17:
A3) FindOnes in parallel (JS) | 6656 ms | 334 ms | 339 ms
This is significantly faster. I suspect that if you need way more than 17 cases, other options might be better. It isn't relevant for my needs, so I didn't test this.
For completeness, the exact queries:
A1)
db.getCollection('colors').findOne({ color: 'Ivory' }, { sort: { created_at: -1 } })
A2)
const latest = await Promise.all(colors.map((c) => mongodb.collection('colors').findOne({ color: c }, { sort: { created_at: -1 } })));
B)
db.getCollection('colors').aggregate([
{ $match: { color: { $in: [
'Ivory',
'Teal',
'Silver',
'Purple',
'Navy blue',
'Pea green',
'Gray',
'Orange',
'Maroon',
'Charcoal',
'Aquamarine',
'Coral',
'Fuchsia',
'Wheat',
'Lime',
'Crimson',
'Khaki'
] } } },
{ $sort: { created_at: -1 } },
{
$group: {
_id: "$color",
latest: { $first: "$$ROOT" }
}
},
]);
C)
db.getCollection('colors').aggregate([
{ $match: { color: { $in: [
'Ivory',
'Teal',
'Silver',
'Purple',
'Navy blue',
'Pea green',
'Gray',
'Orange',
'Maroon',
'Charcoal',
'Aquamarine',
'Coral',
'Fuchsia',
'Wheat',
'Lime',
'Crimson',
'Khaki'
] } } },
{ $sort: { created_at: -1 } },
{
$facet: {
Ivory: [
{ $match: { color: 'Ivory' } },
{ $limit: 1 },
],
Teal: [
{ $match: { color: 'Teal' } },
{ $limit: 1 },
],
Silver: [
{ $match: { color: 'Silver' } },
{ $limit: 1 },
],
Purple: [
{ $match: { color: 'Purple' } },
{ $limit: 1 },
],
Navyblue: [
{ $match: { color: 'Navy blue' } },
{ $limit: 1 },
],
Peagreen: [
{ $match: { color: 'Pea green' } },
{ $limit: 1 },
],
Gray: [
{ $match: { color: 'Gray' } },
{ $limit: 1 },
],
Orange: [
{ $match: { color: 'Orange' } },
{ $limit: 1 },
],
Maroon: [
{ $match: { color: 'Maroon' } },
{ $limit: 1 },
],
Charcoal: [
{ $match: { color: 'Charcoal' } },
{ $limit: 1 },
],
Aquamarine: [
{ $match: { color: 'Aquamarine' } },
{ $limit: 1 },
],
Coral: [
{ $match: { color: 'Coral' } },
{ $limit: 1 },
],
Fuchsia: [
{ $match: { color: 'Fuchsia' } },
{ $limit: 1 },
],
Wheat: [
{ $match: { color: 'Wheat' } },
{ $limit: 1 },
],
Lime: [
{ $match: { color: 'Lime' } },
{ $limit: 1 },
],
Crimson: [
{ $match: { color: 'Crimson' } },
{ $limit: 1 },
],
Khaki: [
{ $match: { color: 'Khaki' } },
{ $limit: 1 },
],
}
},
]);

How to get distinct combinations of two fields from a collection when one of the fields is in an array of subdocuments

From a collection consisting of documents representing products similar to the following:
[
{
code: "0WE3A5CMY",
name: "lorem",
category: "voluptas",
variants: [
{
color: "PapayaWhip",
stock: 17,
barcode: 4937310396997
},
{
color: "RoyalBlue",
stock: 13,
barcode: 9787252504890
},
{
color: "DodgerBlue",
stock: 110,
barcode: 97194456959791
}
]
},
{
code: "0WE3A5CMX",
name: "ipsum",
category: "temporibus",
variants: [
{
color: "RoyalBlue",
stock: 113,
barcode: 23425202111840
},
{
color: "DodgerBlue",
stock: 10,
barcode: 2342520211841
}
]
},
{
code: "0WE3A5CMZ",
name: "dolor",
category: "temporibus",
variants: [
{
color: "MaroonRed",
stock: 17,
barcode: 3376911253701
},
{
color: "RoyalBlue",
stock: 12,
barcode: 3376911253702
},
{
color: "DodgerBlue",
stock: 4,
barcode: 3376911253703
}
]
}
]
I would like to retrieve distinct combinations of variants.color and category. So the result should be:
[
{
category: 'voluptas',
color: 'PapayaWhip',
},
{
category: 'voluptas',
color: 'RoyalBlue',
},
{
category: 'voluptas',
color: 'DodgerBlue',
},
{
category: 'temporibus',
color: 'RoyalBlue',
},
{
category: 'temporibus',
color: 'DodgerBlue',
}
]
Based on some cursory research I think I will have to use an aggregate but I've never worked with those and could use some help. I've tried the solution at How to efficiently perform "distinct" with multiple keys?
I've tried the method mentioned by jcarter in the comments but it doesn't solve my problem. If I do:
db.products.aggregate([
{
$group: {
_id: {
"category": "$category",
"color": "$variants.color"
}
}
}
])
I get the result:
[
{
"_id": {
"category": "temporibus",
"color": [
"MaroonRed",
"RoyalBlue",
"DodgerBlue"
]
}
},
{
"_id": {
"category": "temporibus",
"color": [
"RoyalBlue",
"DodgerBlue"
]
}
},
{
"_id": {
"category": "voluptas",
"color": [
"PapayaWhip",
"RoyalBlue",
"DodgerBlue"
]
}
}
]
Which isn't what I need.
Since variants is an array you need to unwind it & group on two fields to get unique docs based on category + 'variants.color' combo.
As group stage results something like :
[
{
"_id": {
"category": "voluptas",
"color": "DodgerBlue"
}
},
{
"_id": {
"category": "voluptas",
"color": "PapayaWhip"
}
}
]
then using $replaceRoot stage you can make _id object field as root for each document to get desired result.
Query :
db.collection.aggregate([
{
$unwind: "$variants"
},
{
$group: { _id: { "category": "$category", "color": "$variants.color" } }
},
{
$replaceRoot: { newRoot: "$_id" }
}
])
Test : mongoplayground

Can I avoid using the same $match criteria twice when using $unwind?

Take the following data as an example:
{
_id: 1,
item: "abc",
stock: [
{ size: "S", color: "red", quantity: 25 },
{ size: "S", color: "blue", quantity: 10 },
{ size: "M", color: "blue", quantity: 50 }
]
}
{
_id: 2,
item: "def",
stock: [
{ size: "S", color: "blue", quantity: 20 },
{ size: "M", color: "blue", quantity: 5 },
{ size: "M", color: "black", quantity: 10 },
{ size: "L", color: "red", quantity: 2 }
]
}
{
_id: 3,
item: "ijk",
stock: [
{ size: "M", color: "blue", quantity: 15 },
{ size: "L", color: "blue", quantity: 100 },
{ size: "L", color: "red", quantity: 25 }
]
}
Say I'm going to filter out the stocks that matches the criteria size = 'L'. I already have a multikey index on the stock.size field.
In the aggregation pipeline, if I use the following two operations:
[{$unwind: {path: "$stock"}},
{$match: {"stock.size": "L"}}]
I will get the desired results, but when the db gets very large, the $unwind step will have to scan the whole collection, without utilizing the existing index, which is very inefficient.
If I reverse the order of the $unwind and $match operations, the $match will utilize the index to apply an early filtering, but the final result will not be as desired: it will fetch the extra stocks that are not of size L, but have sibling L-sized stocks that belong to the same item.
Would I have to use the same $match operation twice, i.e. both before and after the $unwind, to make it both utilizing the index and return the correct results?
Yes you can use $match stage twice in the aggregation pipeline but here only the first $match stage will use the index second one will do the collscan.
[
{ "$match": { "stock.size": "L" }},
{ "$unwind": { "path": "$stock" }},
{ "$match": { "stock.size": "L" }}
]
If you want to avoid the $match twice then use $filter aggregation
[
{ "$match": { "stock.size": "L" } },
{ "$addFields": {
"stock": {
"$filter": {
"input": "$stock",
"as": "st",
"cond": { "$eq": ["$stock.size", "L"] }
}
}
}}
]

MongoDB Query - query on values of any key in a sub-object: $match combined with $elemMatch

How can I filter on all userIDs that have color blue and size 50 in the same element of the list? Only user 1347 should be output.
{
"userId": "12347",
"settings": [
{ name: "SettingA", color: "blue", size: 10 },
{ name: "SettingB", color: "blue", size: 20 },
{ name: "SettingC", color: "green", size: 50 }
],
}
{
"userId": "1347",
"settings": [
{ name: "SettingA", color: "blue", size: 10 },
{ name: "SettingB", color: "blue", size: 50 },
{ name: "SettingC", color: "green", size: 20 }
]
}
If this can be done with $elemMatch, how can I include it in the following query, assuming the following two elements needs to be in the same list: { "rounds.round_values.decision" : "Fold"},
{ "rounds.round_values.gameStage" : "PreFlop"}
I tried this query but it doesn't yield any results. I've read that because elemMatch deosnt' work in projections. But how can I tell $filter to only return objects that have the $elemmMatch conditions met?
db.games.aggregate([
{ $match: { $and: [
{ Template: "PPStrategy4016" },
{ FinalOutcome: "Lost" }]
}},
{ $elemMatch: {
{ "rounds.round_values.decision" : "Fold"},
{ "rounds.round_values.gameStage" : "PreFlop"}
} },
{
$group: {
_id: null,
total: {
$sum: "$FinalFundsChange"
}
}
} ] )
Following the given documents, the query is something such as follows:
db.games.aggregate(
{$unwind : "$settings"},
{$match: {"settings.color" : "blue", "settings.size" : 50}} ,
{$group: {_id: null, total: {$sum: "$settings.size"}}} )
If you have difficulties in transforming it into your own domain, pleas supply some example documents from your domain.