Is there a way to write a nested query in mongoDB? - mongodb

I an new to mongoDB and I am trying to achieve below SQL query equivalent in mongoDB
SELECT ROUND((SELECT COUNT() FROM INFODOCS WHERE ML_PRIORITY = HIGH AND PROCESSOR_ID = userid)
/ (SELECT COUNT() FROM INFODOCS WHERE PROCESSOR_ID = userid) * 100)
AS EFFORTS FROM DUMMY;
EFFORTS = Total High Priority Infodocs / Total Infodocs for a given Processor
I tried to write an aggregation pipeline using $match, $group, $count but the issue is once I get an output for one subquery i did not find anyway how can i compute another subquery and finally use the outputs of both subquery to determine the final result.

The mongo-y way would not to execute two different queries to get the 2 different counts, but to do a sum it dynamically with one query.
You can achieve this in many different ways, here is an example how to use $cond while $grouping to do a conditional sum.
db.collection.aggregate([
{
$match: {
PROCESSOR_ID: "1"
},
},
{
$group: {
_id: null,
totalCount: {
$sum: 1
},
priorityHighCount: {
$sum: {
$cond: [
{
$eq: [
"$ML_PRIORITY",
"HIGH"
]
},
1,
0
]
}
}
}
},
{
$project: {
EFFORTS: {
$round: {
"$multiply": [
{
$divide: [
"$priorityHighCount",
"$totalCount"
]
},
100
]
}
}
}
}
])

Related

MongoDB - count by field, and sort by count

I am new to MongoDB, and new to making more than super basic queries and i didn't succeed to create a query that does as follows:
I have such collection, each document represents one "use" of a benefit (e.g first row states the benefit "123" was used once):
[
{
"id" : "1111",
"benefit_id":"123"
},
{
"id":"2222",
"benefit_id":"456"
},
{
"id":"3333",
"benefit_id":"456"
},
{
"id":"4444",
"benefit_id":"789"
}
]
I need to create q query that output an array. at the top is the most top used benefit and how many times is was used.
for the above example the query should output:
[
{
"benefit_id":"456",
"cnt":2
},
{
"benefit_id":"123",
"cnt": 1
},
{
"benefit_id":"789",
"cnt":1
}
]
I have tried to work with the documentation and with $sortByCount but with no success.
$group
$group by benefit_id and get count using $sum
$sort by count descending order
db.collection.aggregate([
{
$group: {
_id: "$benefit_id",
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } }
])
Playground
$sortByCount
Same operation using $sortByCount operator
db.collection.aggregate([
{ $sortByCount: "$benefit_id" }
])
Playground

Is it possible to list in mongodb the list of elements whose value is less than 10% of another field?

I basically have a database where I record motorcycles and their mileage.
{
"motorcycle":"A",
"current_km":4600,
"review_km":5000
},
{
"motorcycle":"B",
"current_km":4000,
"review_km":5000
},
{
"motorcycle":"C",
"current_km":4900,
"review_km":5000
},
{
"motorcycle":"D",
"current_km":3000,
"review_km":5000
}
I have a field called current_km that determines your current mileage and I have another field called review_km, which consists of specifying the mileage in which your review should be done, as long as your current mileage (current_km) is greater than 10% of Mileage review (review_km).
So I would like to list the elements where:
current_km is greater than:
(review_km - ( review_km * 0.10))
for example:
current_km = 4600;
review_km = 5000;
result = 5000 - (5000 * 0.10);
4600 (current_km)> = 4500 (result) // in this case it is showed
In my database it would show the results of motorcycles A and C
how can I do it? I don't know if it is possible to do it in mongodb directly.
Need to use aggregation with $subtract and $multiply,
$addFields add new fields, we are generating result field, equation (review_km - ( review_km * 0.10)) using $subtract and $multiply
$match equation in $expr if current_km >= result if its correct then returns document
db.collection.aggregate([
{
$addFields: {
result: {
$subtract: [
"$review_km",
{
$multiply: [
"$review_km",
0.10
]
}
]
}
}
},
{
$match: {
$expr: {
$gte: [
"$current_km",
"$result"
]
}
}
}
])
Working Playground: https://mongoplayground.net/p/s2qenvuzLKF
Shorter version
If you don't want result field in response then combined condition in $match and $addFields is no longer needed
db.collection.aggregate([
{
$match: {
$expr: {
$gte: [
"$current_km",
{
$subtract: [
"$review_km",
{
$multiply: [
"$review_km",
0.10
]
}
]
}
]
}
}
}
])
Working Playground: https://mongoplayground.net/p/fii__3tTika

Aggregation is very slow

I have a collection with a structure similar to this.
{
"_id" : ObjectId("59d7cd63dc2c91e740afcdb"),
"dateJoined": ISODate("2014-12-28T16:37:17.984Z"),
"activatedMonth": 5,
"enrollments" : [
{ "month":-10, "enrolled":'00'},
{ "month":-9, "enrolled":'00'},
{ "month":-8, "enrolled":'01'},
//other months
{ "month":8, "enrolled":'11'},
{ "month":9, "enrolled":'11'},
{ "month":10, "enrolled":'00'}
]
}
month in enrollments sub document is a relative month from dateJoined.
activatedMonth is a month of activation relative to dateJoined. So, this will be different for each document.
I am using Mongodb aggregation framework to process queries like "Find all documents that are enrolled from 10 months before dateJoined activating to 25 months after dateJoined activating".
"enrolled" values 01, 10, 11 are considered enrolled and 00 is considered not enrolled. For a document to be considered to to be enrolled, it should be enrolled for every month in the range.
I am applying all the filters that I can apply in the match stage but this can be empty in most cases. In projection phase I am trying to find out the all the document with at least one not-enrolled month. if the size is zero, then the document is enrolled.
Below is the query that I am using. It takes 3 to 4 seconds to finish. It is more or less same time with or with out the group phase. My data is relatively smaller in size ( 0.9GB) and total number of documents are 41K and sub document count is approx. 13 million.
I need to reduce the processing time. I tried creating an index on enrollments.month and enrollment.enrolled and is of no use and I think it is because of the fact that project stage cant use indexes. Am I right?
Are there are any other things that I can do to the query or the collection structure to improve performance?
let startMonth = -10;
let endMonth = 25;
mongoose.connection.db.collection("collection").aggregate([
{
$match: filters
},
{
$project: {
_id: 0,
enrollments: {
$size: {
$filter: {
input: "$enrollment",
as: "enrollment",
cond: {
$and: [
{
$gte: [
'$$enrollment.month',
{
$add: [
startMonth,
"$activatedMonth"
]
}
]
},
{
$lte: [
'$$enrollment.month',
{
$add: [
startMonth,
"$activatedMonth"
]
}
]
},
{
$eq: [
'$$enrollment.enroll',
'00'
]
}
]
}
}
}
}
}
},
{
$match: {
enrollments: {
$eq: 0
}
}
},
{
$group: {
_id: null,
enrolled: {
$sum: 1
}
}
}
]).toArray(function(err,
result){
//some calculations
}
});
Also, I definitely need the group stage as I will group the counts based on different field. I have omitted this for simplicity.
Edit:
I have missed a key details in the initial post. Updated the question with the actual use case why I need projection with a calculation.
Edit 2:
I converted this to just a count query to see how it performs (based on comments to this question by Niel Lunn.
My query:
mongoose.connection.db.collection("collection")
.find({
"enrollment": {
"$not": {
"$elemMatch": { "month": { "$gte": startMonth, "$lte": endMonth }, "enrolled": "00" }
}
}
})
.count(function(e,count){
console.log(count);
});
This query is taking 1.6 seconds. I tried with following indexes separately:
1. { 'enrollment.month':1 }
2. { 'enrollment.month':1 }, { 'enrollment.enrolled':1 } -- two seperate indexes
3. { 'enrollment.month':1, 'enrollment.enrolled':1} - just one index on both fields.
Winning query plan is not using keys in any of these cases, it does a COLLSCAN always. What am I missing here?

Mongo $subtract date doesn't work in aggregation $match block

I am creating a mongo aggregation query which use a $subtract operator in my $match block. As explained in these codes below.
This query doesn't work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: {
$subtract: [new Date(), 24 * 60 * 60 * 1000]
}
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
However, this second query work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: new Date(new Date() - 24 * 60 * 60 * 1000)
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
I need to use $subtract on my $match block so I can't use the last query.
As of mongodb 3.6 you can use $subtract in the $match stage via the $expr. Here's the docs: https://docs.mongodb.com/manual/reference/operator/query/expr/
I was able to get a query like what you're describing via this $expr and a new system variable in mongodb 4.2 called $$NOW. Here is my query, which gives me orders that have been created within the last 4 hours:
[
{ $match:
{ $expr:
{ $gt: [
"$_created_at",
{ $subtract: [ "$$NOW", 4 * 60 * 60 * 1000] } ]
}
}
}
]
Well you cannot do that and you are not meant to do so either. Another valid thing is that you say to "need" to do this but in reality you really do not.
Pretty much all of the general aggregation operators outside of the pipeline operators are really only valid within a $project or a $group pipeline stage. Mostly within $project but certainly not in others.
A $match pipeline is really the same as a general "query" operation, so the only things valid in there are the query operators.
As for the case for your "need", any "value" that is submitted within an aggregation pipeline and particularly within a $match needs to be evaluated outside of the actual pipeline before the BSON representation is sent to the server.
The only exception is the notation that defines variables in the document, particularly "fieldnames" such a "$fieldname" and then only really in $project or $group. So that means something that "refers" to an existing value of a document, and that is something that cannot be done within any type of "query" document expression.
If you need to work with the value of another field in the document then you work it out with $project first, as in:
db.collection.aggregate([
{ "$project": {
"fieldMath": { "$subtract": [ "$fieldOne", "$fieldTwo" ] }
}},
{ "$match": { "fieldMath": { "$gt": 2 } }}
])
For any other purpose you really want to evaluate the value "outside" the pipeline.
The above answers the question you asked, but this answers the question you didn't ask.
Your pipeline doesn't make any sense since grouping on the "timestamp" alone would be unlikely to group anything since the values are of millisecond accuracy and there is likely not to be more than just a few at best for very active systems.
It appears like you are looking for the math to group by "day", which you can do like this:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]
},
"total": { "$sum": "$total" }
}}
])
That "rounds" your timestamp value to a single day and has a much better chance of "aggregating" something than you would otherwise have.
Or you can use the "date aggregation operators" to do much the same thing with a composite key.
So if you want to "query" then it evaluates externally. If you want to work on a value "within the document" then you must do so in either a $project or $group pipeline stage.
The $subtract operator is a projection-operator. It is only available during a $project step. So your options are:
(not recommended) Add a $project-step before your $match-step to convert the timestamp field of all documents for the following match-step. I would not recommend you to do this because this operation needs to be performed on every single document on your database and prevents the database from using an index on the timestamp field, so it could cost you a lot of performance.
(recommended) Generate the Date you want to match against in the shell / in your application. Generate a new Date() object, store it in a variable, subtract 24 hours from it and perform your 2nd query using that variable.

Best usage for MongoDB Aggregate request

I would like to highlight a list of _id documents (with a limit) ranked in descending order (via their timestamp) based on a list of ObjectId.
Corresponding to this:
db.collection.aggregate( [ { $match: { _id: { $in: [ObjectId("X"), ObjectId("Y") ] } } }, { $sort: { timestamp: -1 } }, { $group: { _id: "$_id" } }, { $skip: 0 }, { $limit: 100 } ] )
Knowing that the list from the loop may contain way more than 1000 ObjectId (in $in array), do you think my solution is viable? Is not there a faster and less resource intensive way?
Best Regards.