how to use $match after $group in mongodb aggregation - mongodb

I have 4 products. I want to know the count of product-4 for users who has product-1 or product-2
Sample data:
[
{
"user_id": 1,
"product_type": "product-1"
},
{
"user_id": 1,
"product_type": "product-4"
},
{
"user_id": 1,
"product_type": "product-4"
},
{
"user_id": 2,
"product_type": "product-1"
}
]
user-1 has two product-4 and one product-1 (that counts 2)
user-2 has only product-1, but no product-4 (hence that does not count)
This is how I tried
db.collection.aggregate([
{
$match: {
product_type: {
$in: [
"product-1​",
"product-2",
],
},
},
},
{
$group: {
_id: "$user_id",
},
},
{
$match: {
user_id: { $in: "$_id"}, // I want to use $group's result in here
product_type: "product-4",
},
}
]);
Expected results are:
[
{
"_id": 1,
"count": 2
},
{
"_id": 2,
"count": 0
}
]
Note:
I dont have a backend, I have to this using mongodb only.

Does this answer your question?
db.collection.aggregate([
{$group: {_id: "$user_id", data: {$push: "$product_type"}}},
{$match: {$expr: {$or: [
{$in: ["product-1", "$data"]},
{$in: ["product-2", "$data"]}
]}}},
{$project: {
count: {
$size: {
$filter: {
input: "$data",
cond: {$eq: ["$$this", "product-4"]}
}
}
}
}}
])
See how it works on the playground example

Related

Mongoose aggregate: how to create fields dynamically from the user request

Please someone help me! I can't find the solution in documentation or other topics.
I'm using mongodb aggregation in Mongoose/Nest.js project to return the document data with some formatting and filtering. I have the structure of the mongo document like
{
_id: '1',
outputs: [
{
fileName: 'fileName1',
data: [
{
columnName1: 3,
columnName2: 4,
........
columnName30: 5
},
{
columnName1: 1,
columnName2: 2,
........
columnName30: 3
},
...........
]
},
{
fileName: 'fileName1',
data: [
{
columnName1: 3,
columnName2: 4,
........
columnName30: 5
},
{
columnName1: 1,
columnName2: 2,
........
columnName30: 3
},
...........
]
}
........
]
}
I've already done some formatting, but now I need to include to the response only requested by the user fields (columnNamesToChoose). And filter their values depending on gte, lte of mainColumnName. Inside $project I was going to use some mapping like this, but it doesn't work. Could you please help me to fix this part of code?
...columnNamesToChoose.map((columnName) => ({ [columnName]: {
$map: {
input: {
$filter: {
input: '$outputs.data',
as: 'item',
cond: {
$and: [
{ $gte: [`$$item.${mainColumnName}`, gte] },
{ $lte: [`$$item.${mainColumnName}`, lte] },
],
},
},
},
as: 'file',
in: `$$file.${columnName}`,
},
} })),
This is the full code of aggregation:
mainColumnName = 'column1' (from the body of the user request)
columnNamesToChoose = ['column2', 'column5'] (from the body of the user request)
myModel.aggregate([
{
$match: { _id: Number(id) },
},
{ $unwind: '$outputs' },
{
$match: { 'outputs.fileName': fileName },
},
{
$project: {
_id: '$_id',
fileName: '$outputs.fileName',
[mainColumnName]: {
$map: {
input: {
$filter: {
input: '$outputs.data',
as: 'item',
cond: {
$and: [
{ $gte: [`$$item.${mainColumnName}`, gte] },
{ $lte: [`$$item.${mainColumnName}`, lte] },
],
},
},
},
as: 'file',
in: `$$file.${mainColumnName}`,
},
},
},
},
])
My result:
{
"0": {
"column2": [
4,
2,
1,
5
]
},
"1": {
"column5": [
1,
8,
9,
0
]
},
"_id": 1,
"fileName": "somefilename.txt",
"column1": [
3,
1,
2,
20
],
}
Expected result:
{
"_id": 1,
"fileName": "somefilename.txt",
"column1": [
3,
1,
2,
20
],
"column2": [
4,
2,
1,
5
],
"column5": [
1,
8,
9,
0
],
}
One option is to first $reduce and then $unwind, $match and $group, where the $group stage is built dynamically on the code (for-loop) according to the input:
db.collection.aggregate([
{$match: {_id: id}},
{$project: {
outputs: {
$reduce: {
input: "$outputs",
initialValue: [],
in: {
$concatArrays: [
"$$value",
{$cond: [
{$eq: ["$$this.fileName", fileName]},
"$$this.data",
[]
]
}
]
}
}
}
}
},
{$unwind: "$outputs"},
{$match: {"outputs.columnName1": {$gte: gte, $lte: lte}}},
{$group: {
_id: 0,
column1: {$push: "$outputs.columnName1"},
column2: {$push: "$outputs.columnName2"},
column5: {$push: "$outputs.columnName5"}
}},
{$set: {fileName: fileName}}
])
See how it works on the playground example
On js it will look something like:
const matchStage = {$match: {}};
matchStage.$match[`outputs.${mainColumnName}`] = {$gte: gte, $lte: lte};
const groupStage = {$group: {_id: 0}};
for (const col of columnNamesToChoose ) {
groupStage.$group[col] = {$push: `"$outputs.${col}"`}
};
const aggregation = [
{$match: {_id: id}},
{$project: {
outputs: {$reduce: {
input: "$outputs",
initialValue: [],
in: {$concatArrays: [
"$$value",
{$cond: [
{$eq: ["$$this.fileName", fileName]},
"$$this.data",
[]
]}
]}
}}
}},
{$unwind: "$outputs"},
matchStage,
groupStage,
{$set: {fileName: fileName}}
],
const res = await myModel.aggregate(aggregation)

How to get or return only the matching object from an nested array using mongoose

{
"_id": "6339f99ee18b2481a04b4fe8",
"userId": "60a8a51cf2229813a45d2238",
"array1": [
{
"someId1": "6339f99ee18b2481a04b4fe9",
"customIndex": 2,
"array2": [
{
"someId2": "6339f99ee18b2481a04b4fea",
"startDate": 2022-10-10T19:56:26.000+00:00,
"endDate": 2022-10-12T19:56:26.000+00:00,
}
]
},
{
"someId1": "6345ca40112b743fd8172be0",
"customIndex": 4,
"array2": [
{
"someId2": "6345ca40112b743fd8172be1",
"startDate": 2022-10-10T19:56:26.000+00:00,
"endDate": 2022-10-27T19:56:26.000+00:00,
}
]
}
]
}
I have above structure in mongoDB and want to get only that object from array1 which matches the conditions of endDate > 2022-10-17
Here's what I try to do:
result= await Collection.find({
userId: { '$in': userIdList},
'array1.array2.endDate': { "$gte": 2022-10-17}
})
But above return the both objects from array1 even though the endDate for one object is less than 2022-10-17
How can I get the the response like below? Also, Am I using the right Mongoose calls to achieve what I am trying to achieve.
Expected response that I am trying to achieve:
{
"_id": "6339f99ee18b2481a04b4fe8",
"userId": "60a8a51cf2229813a45d2238",
"array1": [
{
"someId1": "6345ca40112b743fd8172be0",
"customIndex": 4,
"array2": [
{
"someId2": "6345ca40112b743fd8172be1",
"startDate": 2022-10-10T19:56:26.000+00:00,
"endDate": 2022-10-27T19:56:26.000+00:00,
}
]
}
]
}
If array1 can contain several such items, and array2 contain several such items, one option is using $reduce with $filter and $mergeObjects for this:
db.collection.aggregate([
{$match: {userId: {'$in': userIdList}}}
{$project: {
userId: 1,
array1: {
$reduce: {
input: "$array1",
initialValue: [],
in: {$concatArrays: [
"$$value",
[{$mergeObjects: [
"$$this",
{array2: {
$filter: {
input: "$$this.array2",
as: "innerItem",
cond: {$gte: [
"$$innerItem.endDate",
{$dateFromParts: {year: 2022, month: 10, day: 17}}
]}
}
}}
]}]
]}
}
}
}},
{$project: {
userId: 1,
array1: {$filter: {
input: "$array1",
cond: {$gt: [{$size: "$$this.array2"}, 0]}
}}
}}
])
See how it works on the playground example

Get current state from snapshot documents - mongoDB

I'm trying to get a list of current holders at specific times from a collection. My collection looks like this:
[
{
"time": 1,
"holdings": [
{ "owner": "A", "tokens": 2 },
{ "owner": "B", "tokens": 1 }
]
},
{
"time": 2,
"holdings": [
{ "owner": "B", "tokens": 2 }
]
},
{
"time": 3,
"holdings": [
{ "owner": "A", "tokens": 3 },
{ "owner": "B", "tokens": 1 },
{ "owner": "C", "tokens": 1 }
]
},
{
"time": 4,
"holdings": [
{ "owner": "C", "tokens": 0 }
]
}
]
tokens show the current holdings of an owner if the holdings have changed to the last document. I would like to change the collection so that holdings always includes the full current holdings for any point in time.
At time: 1, the holdings are: A: 2, B: 1.
At time: 2, the holdings are: A: 2, B: 2. The collections does not include A's holdings however, because they haven't changed. So what I'd like to get is:
[
{
"time": 1,
"holdings": [
{ "owner": "A", "tokens": 2 },
{ "owner": "B", "tokens": 1 }
]
},
{
"time": 2,
"holdings": [
{ "owner": "A", "tokens": 2 }, // merged from prev doc.
{ "owner": "B", "tokens": 2 }
]
},
{
"time": 3,
"holdings": [
{ "owner": "A", "tokens": 3 },
{ "owner": "B", "tokens": 1 },
{ "owner": "C", "tokens": 1 }
]
},
{
"time": 4,
"holdings": [
{ "owner": "A", "tokens": 3 }, // merged from prev
{ "owner": "B", "tokens": 1 }, // merged from prev
{ "owner": "C", "tokens": 0 }
]
}
]
From what I understand $mergeObjects does that, but I don't understand how I can merge all previous docs in order up to the current doc for each doc. So I'm looking for a way to combine setWindowFields with mergeObjects I think.
This is a nice challenge.
So far, I got this complicated solution:
Get all of our timestamps in all of our documents. This is the purpose of the first 4 steps. $setWindowFields is used to accumulate this data.
$group by owner and calculate the empty timestamps as wantedTimes- next 5 steps.
$set empty timestamps with tokens: null to be filled with actual data and $unwind to separate - next 3 steps
Use $setWindowFields to find the last known token for each owner at each timestamp.
Fill this last known state for documents with unknown token - 2 steps
$group and format answer:
db.collection.aggregate([
{
$setWindowFields: {
sortBy: {time: 1},
output: {
allTimes: {$addToSet: "$time", window: {documents: ["unbounded", "current"]}
}
}
}
},
{
$setWindowFields: {
sortBy: {time: -1},
output: {
allTimes: {$addToSet: "$allTimes", window: {documents: ["unbounded", "current"]}
}
}
}
},
{
$set: {
allTimes: {
$reduce: {
input: "$allTimes",
initialValue: [],
in: {"$concatArrays": ["$$value", "$$this"]}
}
}
}
},
{$set: {allTimes: {$setIntersection: "$allTimes"}}},
{$unwind: "$holdings"},
{$sort: {time: 1}},
{$group: { _id: "$holdings.owner",
tokens: {$push: {tokens: "$holdings.tokens", time: "$time"}},
times: {$push: "$time"}, firstTime: {$first: "$time"},
allTimes: {$first: "$allTimes"}}
},
{
$addFields: {
wantedTimes: {
$filter: {
input: "$allTimes",
as: "item",
cond: {$gte: ["$$item", "$firstTime"]}
}
}
}
},
{
$project: {
tokens: 1,
wantedTimes: {$setDifference: ["$wantedTimes", "$times"]}
}
},
{
$set: {
data: {
$map: {
input: "$wantedTimes",
as: "item",
in: {time: "$$item", tokens: null}
}
}
}
},
{$project: {tokens: {"$concatArrays": ["$tokens", "$data"]}}},
{$unwind: "$tokens"},
{
$setWindowFields: {
partitionBy: "$_id",
sortBy: {"tokens.time": 1},
output: {
lastTokens: {
$push: "$tokens.tokens",
window: {documents: ["unbounded", "current"]}
}
}
}
},
{
$set: {
lastTokens: {
$filter: {
input: "$lastTokens",
as: "item",
cond: {$ne: ["$$item", null]}
}
}
}
},
{
$set: {
"tokens.tokens": {$ifNull: ["$tokens.tokens", {$last: "$lastTokens"}]}
}
},
{
$group: {
_id: "$tokens.time",
holdings: {$push: {owner: "$_id", tokens: "$tokens.tokens" }}
}
},
{$project: {time: "$_id", holdings: 1, _id: 0}},
{$sort: {time: 1}}
])
Playground example
From a performance perspective I recommend you split it into 2 calls, the first will be a quick findOne just to get the maximum time value in the collection.
Once you have that value the pipeline can be much leaner:
const maxItem = await db.collection.findOne({}).sort({ time: -1 });
db.collection.aggregate([
{
$unwind: "$holdings"
},
{
$group: {
_id: "$holdings.owner",
times: {
$push: {
time: "$time",
tokens: "$holdings.tokens"
}
},
minTime: {
$min: "$time"
}
}
},
{
$addFields: {
times: {
$reduce: {
input: {
$range: [
"$minTime",
maxItem.time + 1 // this is max time
]
},
initialValue: {
values: [],
lastIndex: 0
},
in: {
values: {
"$concatArrays": [
"$$value.values",
[
{
$cond: [
{
$in: [
"$$this",
"$times.time"
]
},
{
"$arrayElemAt": [
"$times",
"$$value.lastIndex"
]
},
{
"$mergeObjects": [
{
tokens: 0
},
{
"$arrayElemAt": [
"$times",
{
$subtract: [
"$$value.lastIndex",
1
]
}
]
},
{
time: "$$this"
}
]
}
]
}
]
]
},
lastIndex: {
$cond: [
{
$in: [
"$$this",
"$times.time"
]
},
{
$sum: [
"$$value.lastIndex",
1
]
},
"$$value.lastIndex"
]
}
}
}
}
}
},
{
$unwind: "$times.values"
},
{
$group: {
_id: "$times.values.time",
holdings: {
$push: {
owner: "$_id",
tokens: "$times.values.tokens"
}
}
}
},
{
$project: {
_id: 0,
time: "$_id",
holdings: 1
}
},
{
$sort: {
time: 1
}
}
])
This is still quite a heavy query as it requires to $unwind and $group the entire collection, however there is no workaround this due to the requirements. if the collection is too big for this approach I recommend iteration owner by owner, or time by time and doing separate updates accordingly.
Mongo Playground
If you don't care about performance at all and want it in a single query you can still use the same pipeline, you will have to first extract the max time in the collection, this will require you to add an initial $group stage, like so:
db.collection.aggregate([
{
$group: {
_id: null,
maxTime: {
$max: "$time"
},
roots: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$roots"
},
{
$replaceRoot: {
newRoot: {
"$mergeObjects": [
"$roots",
{
maxTime: "$maxTime"
}
]
}
}
},
... same pipeline ...
])

MongoDB - Aggregate get specific objects in an array

How can I get only objects in the sales array matching with 2021-10-14 date ?
My aggregate query currently returns all objects of the sales array if at least one is matching.
Dataset Documents
{
"name": "#0",
"sales": [{
"date": "2021-10-14",
"price": 3.69,
},{
"date": "2021-10-15",
"price": 2.79,
}]
},
{
"name": "#1",
"sales": [{
"date": "2021-10-14",
"price": 1.5,
}]
}
Aggregate
{
$match: {
sales: {
$elemMatch: {
date: '2021-10-14',
},
},
},
},
{
$group: {
_id: 0,
data: {
$push: '$sales',
},
},
},
{
$project: {
data: {
$reduce: {
input: '$data',
initialValue: [],
in: {
$setUnion: ['$$value', '$$this'],
},
},
},
},
}
Result
{"date": "2021-10-14","price": 3.69},
{"date": "2021-10-15","price": 2.79},
{"date": "2021-10-14","price": 1.5}
Result Expected
{"date": "2021-10-14","price": 3.69},
{"date": "2021-10-14","price": 1.5}
You actually need to use a $replaceRoot or $replaceWith pipeline which takes in an expression that gives you the resulting document filtered using $arrayElemAt (or $first) and $filter from the sales array:
[
{ $match: { 'sales.date': '2021-10-14' } },
{ $replaceWith: {
$arrayElemAt: [
{
$filter: {
input: '$sales',
cond: { $eq: ['$$this.date', '2021-10-14'] }
}
},
0
]
} }
]
OR
[
{ $match: { 'sales.date': '2021-10-14' } },
{ $replaceRoot: {
newRoot: {
$arrayElemAt: [
{
$filter: {
input: '$sales',
cond: { $eq: ['$$this.date', '2021-10-14'] }
}
},
0
]
}
} }
]
Mongo Playground
In $project stage, you need $filter operator with input as $reduce operator to filter the documents.
{
$project: {
data: {
$filter: {
input: {
$reduce: {
input: "$data",
initialValue: [],
in: {
$setUnion: [
"$$value",
"$$this"
],
}
}
},
cond: {
$eq: [
"$$this.date",
"2021-10-14"
]
}
}
}
}
}
Sample Mongo Playground
How about using $unwind:
.aggregate([
{$match: { sales: {$elemMatch: {date: '2021-10-14'} } }},
{$unwind: '$sales'},
{$match: {'sales.date': '2021-10-14'}},
{$project: {date: '$sales.date', price: '$sales.price', _id: 0}}
])
This will separate the sales into different documents, each containing only one sale, and allow you to match conditions easily.
See: https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/

Mongodb - aggregation $push if conditional

I am trying to aggregate a batch of documents. There are two fields in the documents I would like to $push. However, lets say they are "_id" and "A" fields, I only want $push "_id" and "A" if "A" is $gt 0.
I tried two approaches.
First one.
db.collection.aggregate([{
"$group":{
"field": {
"$push": {
"$cond":[
{"$gt":["$A", 0]},
{"id": "$_id", "A":"$A"},
null
]
}
},
"secondField":{"$push":"$B"}
}])
But this will push a null value to "field" and I don't want it.
Second one.
db.collection.aggregate([{
"$group":
"field": {
"$cond":[
{"$gt",["$A", 0]},
{"$push": {"id":"$_id", "A":"$A"}},
null
]
},
"secondField":{"$push":"$B"}
}])
The second one simply doesn't work...
Is there a way to skip the $push in else case?
ADDED:
Expected documents:
{
"_id":objectid(1),
"A":2,
"B":"One"
},
{
"_id":objectid(2),
"A":3,
"B":"Two"
},
{
"_id":objectid(3),
"B":"Three"
}
Expected Output:
{
"field":[
{
"A":"2",
"_id":objectid(1)
},
{
"A":"3",
"_id":objectid(2)
},
],
"secondField":["One", "Two", "Three"]
}
You can use "$$REMOVE":
This system variable was added in version 3.6 (mongodb docs)
db.collection.aggregate([{
$group:{
field: {
$push: {
$cond:[
{ $gt: ["$A", 0] },
{ id: "$_id", A:"$A" },
"$$REMOVE"
]
}
},
secondField:{ $push: "$B" }
}
])
In this way you don't have to filter nulls.
This is my answer to the question after reading the post suggested by #Veeram
db.collection.aggregate([{
"$group":{
"field": {
"$push": {
"$cond":[
{"$gt":["$A", 0]},
{"id": "$_id", "A":"$A"},
null
]
}
},
"secondField":{"$push":"$B"}
},
{
"$project": {
"A":{"$setDifference":["$A", [null]]},
"B":"$B"
}
}])
One more option is to use $filter operator:
db.collection.aggregate([
{
$group : {
_id: null,
field: { $push: { id: "$_id", A : "$A"}},
secondField:{ $push: "$B" }
}
},
{
$project: {
field: {
$filter: {
input: "$field",
as: "item",
cond: { $gt: [ "$$item.A", 0 ] }
}
},
secondField: "$secondField"
}
}])
On first step you combine your array and filter them on second step
$group: {
_id: '$_id',
tasks: {
$addToSet: {
$cond: {
if: {
$eq: [
{
$ifNull: ['$tasks.id', ''],
},
'',
],
},
then: '$$REMOVE',
else: {
id: '$tasks.id',
description: '$tasks.description',
assignee: {
$cond: {
if: {
$eq: [
{
$ifNull: ['$tasks.assignee._id', ''],
},
'',
],
},
then: undefined,
else: {
id: '$tasks.assignee._id',
name: '$tasks.assignee.name',
thumbnail: '$tasks.assignee.thumbnail',
status: '$tasks.assignee.status',
},
},
},
},
},
},
},
}