How to construct a mongo query that aggregates on two fields? - mongodb

TL;DR; How do I query mongo in a way that aggregates a collection two different ways?
I'm learning to query MongoDB. Suppose that I have a collection that contains the results of users submitting a questionnaire and then reviewers signing off that they have reviewed it.
A record in my collection looks like this:
{
_id: 0,
Questions:
[
{
label: "fname",
response: "Sir"
},
{
label: "lname",
response: "Robin"
},
{
label: "What is your name?",
response: "Sir Robin of Camelot"
},
{
label: "What is your quest?",
response: "To seek the holy grail"
},
{
label: "What is the capital of Asyria?",
response: "I don't know that."
}
],
Signatures:
[
"Lancelot",
"Arthur"
]
}
I am creating a report that will display a summary of each record.
For each record, I need to display the following:
The first name
The last name
The number of signatures.
I am able to write an aggregate query that gets the first and last name.
I am also able to write an aggregate query that gets the number of signatures.
However, I am stuck when I try to write an aggregate query that gets both.
// In order to query the first and last name, I use this query:
[
{ $unwind: "$Questions" },
{ $match: { "Questions.Label": { $in: ["fname", "lname"] } } },
{ $project: { "Questions": 1 } },
{ $group: { _id: "$_id", Questions: { $push: "$Questions" } } }
]
// In order to query the number of signatures, I use this query:
[
{ $project: { "SignatureCount": { $size: "$Signatures" } } }
]
Everything works with these two queries, but I want to write a single query that returns the data all together.
So, for example, if the example record above were the only record in my collection, I would want the query to return this:
{
_id: 0,
Questions:
[
{
label: "fname",
response: "Sir"
},
{
label: "lname",
response: "Robin"
}
],
SignatureCount: 2
}

You can rewrite the query to get both in single project stage in 3.4 version.
Use $filter to filter Questions.
[
{"$match":{"Questions.Label":{"$in":["fname", "lname"]}},
{"$project":{
"Questions":{
"$filter":{
"input":"$Questions",
"as":"q",
"cond":{"$in":["$$q.Label",["fname","lname"]]}
}
},
"SignatureCount":{"$size":"$Signatures"}
}}
]

Related

after aggregation how to check two fields are equal inside a document in mongodb

{
id: 1,
name: "sree",
userId: "001",
paymentData: {
user_Id: "001",
amount: 200
}
},
{
id: 1,
name: "sree",
userId: "001",
paymentData: {
user_Id: "002",
amount: 200
}
}
I got this result after unwind in aggregation any way to check user_Id equal to userId
Are you looking to only retrieve the results when they are equal (meaning you want to filter out documents where the values are not the same) or are you looking to add a field indicating whether the two are equal?
In either case, you append subsequent stage(s) to the aggregation pipeline to achieve your desired result. If you want to filter the documents, the new stage may be:
{
$match: {
$expr: {
$eq: [
"$userId",
"$paymentData.user_Id"
]
}
}
}
See how it works in this playground example.
If instead you want to add a field that compares the two values, then this stage may be what you are looking for:
{
$addFields: {
isEqual: {
$eq: [
"$userId",
"$paymentData.user_Id"
]
}
}
}
See how it works in this playground example.
You could also combine the two as in:
{
$addFields: {
isEqual: {
$eq: [
"$userId",
"$paymentData.user_Id"
]
}
}
},
{
$match: {
isEqual: true
}
}
Playground demonstration here

How to write a query in mongo db to find elements in which the nested array does not have an element with the given id?

There is a given array with data .
[
{
"_id":"62d9553f2f7d45d51e9e6850",
"title":"lol",
"body":"lol",
"readIds":[
{
"id":"62c56b2cffe059fdfdd19230",
"createdAt":1658417024700
}
],
"mode":"custom",
"createdAt":"2022-07-21T13:31:43.633Z",
"updatedAt":"2022-07-21T15:23:44.701Z"
},
{
"_id":"62d954e2ea35817546d42702",
"title":"tesssst",
"body":"tesssst",
"readIds":[
],
"mode":"custom",
"createdAt":"2022-07-21T13:30:10.759Z",
"updatedAt":"2022-07-21T13:30:10.759Z"
},
]
And I need to write a query to filter the array and get only those elements which readIds hasn't object with id equal to '62c56b2cffe059fdfdd19230'.
My attempts were like this:
.find({
readIds: {
$elemMatch: {
id: {
$not: {
$eq: id,
},
},
},
},
})
I'm not sure i got your question correctly, but isn't the following working?
db.getCollection('stuff').find({"readIds.id": {$ne: "62c56b2cffe059fdfdd19230"}})
you can test it here: Mongo Playground

optimize indexes in MongoDB

I have a Order collection with records looking like this:
{
"_id": ObjectId,
"status": String Enum,
"products": [{
"sku": String UUID,
...
}, ...],
...
},
My goal is to find find what products user buy together. Given an sku, i would like to browse the past order and find, for orders that contains more than 1 product AND of course the product with the looked up sku, what other products were bought along.
So I created a aggregation pipeline that works :
[
// exclude cancelled orders
{
'$match': {
'status': {
'$nin': [
'CANCELLED', 'CHECK_OUT'
]
}
}
},
// add a fields with product size and just the products sku
{
'$addFields': {
'size': {
'$size': '$products'
},
'skus': '$products.sku'
}
},
// limit to orders with 2 products or more including the looked up SKU
{
'$match': {
'size': {
'$gte': 2
},
'skus': {
'$elemMatch': {
'$eq': '3516215049767'
}
}
}
},
// group by skus
{
'$unwind': {
'path': '$skus'
}
}, {
'$group': {
'_id': '$skus',
'count': {
'$sum': 1
}
}
},
// sort by count, exclude the looked up sku, limit to 4 results
{
$sort': {
'count': -1
}
}, {
'$match': {
'_id': {
'$ne': '3516215049767'
}
}
}, {
'$limit': 4
}
]
Althought this works, this collection contains more than 10K docs and I have an alert on my MongoDB instance telling me than the ratio Scanned Objects / Returned has gone above 1000.
So my question is, how can my query be improve? and what indexes can I add to improve this?
db.Orders.stats();
{
size: 14329835,
count: 10571,
avgObjSize: 1355,
storageSize: 4952064,
freeStorageSize: 307200,
capped: false
nindexes: 2,
indexBuilds: [],
totalIndexSize: 466944,
totalSize: 5419008,
indexSizes: { _id_: 299008, status_1__created_at_1: 167936 },
scaleFactor: 1,
ok: 1,
operationTime: Timestamp({ t: 1635415716, i: 1 })
}
Let's start with rewriting the query a little bit to make it more efficient.
Currently you're matching all the orders with a certain status and after that you're starting with data manipulations, this means every single stage is doing work on a larger than needed data set.
What we can do is move all the queries into the first stage, this is made possible using Mongo's dot notation, like so:
{
'$match': {
'status': {
'$nin': [
'CANCELLED', 'CHECK_OUT',
],
},
'products.sku': '3516215049767', // mongo allows you to do this using the dot notation.
'products.1': { $exists: true }, // this requires the array to have at least two elements.
},
},
Now this achieves two things:
We start the pipeline only with relevant results, no need to calculate the $size of the array anymore to many unrelevant documents. This already will boost your performance greatly.
Now we can create a compound index that will support this specific query, before we couldn't do that as index usage is limited to the first step and that only included the status field. ( just as an anecdote is that Mongo actually does optimize pipelines, but in this specific case no optimization was possible to to the usage of $addFields )
The index that I recommend building is:
{ status: 1, "products.sku": 1 }
This will allow the best match to start off your pipeline.

How to query mongodb to fetch results based on values nested parameters?

I am working with MongoDB for the first time.
I have a collection whose each document is roughly of the following form in MongoDB:
{
"name":[
{
"value":"abc",
"created_on":"2020-02-06 06:11:21.340611+00:00"
},
{
"value":"xyz",
"created_on":"2020-02-07 06:11:21.340611+00:00"
}
],
"score":[
{
"value":12,
"created_on":"2020-02-06 06:11:21.340611+00:00"
},
{
"value":13,
"created_on":"2020-02-07 06:11:21.340611+00:00"
}
]
}
How will I form a query so that I get the latest updated values of each field in the given document. I went through Query Embedded Documents, but I wasn't able to figure out how It is.
My expected output is:
{
"name": "xyz",
"score": "13"
}
If you always do push new/latest values to arrays name & score, then you can try below query, it would get last element from array as in general new/latest values will always be added as last element in an array :
db.collection.aggregate([
{ $addFields: { name: { $arrayElemAt: ['$name', -1] }, score: { $arrayElemAt: ['$score', -1] } } },
{ $addFields: { name: '$name.value', score: '$score.value' } }])
Test : MongoDB-Playground

Aggregate and reduce a nested array based upon an ObjectId

I have an Event document structured like so and I'm trying to query against the employeeResponses array to gather all responses (which may or may not exist) for a single employee:
[
{
...
eventDate: 2019-10-08T03:30:15.000+00:00,
employeeResponses: [
{
_id:"5d978d372f263f41cc624727",
response: "Available to work.",
notes: ""
},
...etc
];
}
];
My current mongoose aggregation is:
const eventResponses = await Event.aggregate([
{
// find all events for a selected month
$match: {
eventDate: {
$gte: startOfMonth,
$lte: endOfMonth,
},
},
},
{
// unwind the employeeResponses array
$unwind: {
path: "$employeeResponses",
preserveNullAndEmptyArrays: true,
},
},
{
$group: {
_id: null,
responses: {
$push: {
// if a response id matches the employee's id, then
// include their response; otherwise, it's a "No response."
$cond: [
{ $eq: ["$employeeResponses._id", existingMember._id] },
"$employeeResponses.response",
"No response.",
],
},
},
},
},
{ $project: { _id: 0, responses: 1 } },
]);
As you'll no doubt notice, the query above won't work after more than 1 employee records a response because it treats each individual response as a T/F condition, instead of all of the responses within the employeeResponses array as a single T/F condition.
As a result, I had remove all subsequent queries after the initial $match and do a manual reduce:
const responses = eventResponses.reduce((acc, { employeeResponses }) => {
const foundResponse = employeeResponses.find(response => response._id.equals(existingMember._id));
return [...acc, foundResponse ? foundResponse.response : "No response."];
}, []);
I was wondering if it's possible to achieve the same reduce result above, but perhaps using mongo's $reduce function? Or refactor the aggregation query above to treat all responses within the employeeResponses as a single T/F condition?
The ultimate goal of this aggregation is extract any previously recorded employee's responses and/or lack of a response from each found Event within a current month and place their responses into a single array:
["I want to work.", "Available to work.", "Not available to work.", "No response.", "No response." ...etc]
You can use $filter with $map to reshape your data and filter by _id. Then you can keep using $push with $ifNull to provide default value if an array is empty:
db.collection.aggregate([
{
$addFields: {
employeeResponses: {
$map: {
input: {
$filter: {
input: "$employeeResponses",
cond: {
$eq: [ "$$this._id", "5d978d372f263f41cc624727"]
}
}
},
in: "$$this.response"
}
}
}
},
{
$group: {
_id: null,
responses: { $push: { $ifNull: [ { $arrayElemAt: [ "$employeeResponses", 0 ] }, "No response" ] } }
}
}
])
Mongo Playground