I am creating a mongo aggregation query which use a $subtract operator in my $match block. As explained in these codes below.
This query doesn't work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: {
$subtract: [new Date(), 24 * 60 * 60 * 1000]
}
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
However, this second query work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: new Date(new Date() - 24 * 60 * 60 * 1000)
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
I need to use $subtract on my $match block so I can't use the last query.
As of mongodb 3.6 you can use $subtract in the $match stage via the $expr. Here's the docs: https://docs.mongodb.com/manual/reference/operator/query/expr/
I was able to get a query like what you're describing via this $expr and a new system variable in mongodb 4.2 called $$NOW. Here is my query, which gives me orders that have been created within the last 4 hours:
[
{ $match:
{ $expr:
{ $gt: [
"$_created_at",
{ $subtract: [ "$$NOW", 4 * 60 * 60 * 1000] } ]
}
}
}
]
Well you cannot do that and you are not meant to do so either. Another valid thing is that you say to "need" to do this but in reality you really do not.
Pretty much all of the general aggregation operators outside of the pipeline operators are really only valid within a $project or a $group pipeline stage. Mostly within $project but certainly not in others.
A $match pipeline is really the same as a general "query" operation, so the only things valid in there are the query operators.
As for the case for your "need", any "value" that is submitted within an aggregation pipeline and particularly within a $match needs to be evaluated outside of the actual pipeline before the BSON representation is sent to the server.
The only exception is the notation that defines variables in the document, particularly "fieldnames" such a "$fieldname" and then only really in $project or $group. So that means something that "refers" to an existing value of a document, and that is something that cannot be done within any type of "query" document expression.
If you need to work with the value of another field in the document then you work it out with $project first, as in:
db.collection.aggregate([
{ "$project": {
"fieldMath": { "$subtract": [ "$fieldOne", "$fieldTwo" ] }
}},
{ "$match": { "fieldMath": { "$gt": 2 } }}
])
For any other purpose you really want to evaluate the value "outside" the pipeline.
The above answers the question you asked, but this answers the question you didn't ask.
Your pipeline doesn't make any sense since grouping on the "timestamp" alone would be unlikely to group anything since the values are of millisecond accuracy and there is likely not to be more than just a few at best for very active systems.
It appears like you are looking for the math to group by "day", which you can do like this:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]
},
"total": { "$sum": "$total" }
}}
])
That "rounds" your timestamp value to a single day and has a much better chance of "aggregating" something than you would otherwise have.
Or you can use the "date aggregation operators" to do much the same thing with a composite key.
So if you want to "query" then it evaluates externally. If you want to work on a value "within the document" then you must do so in either a $project or $group pipeline stage.
The $subtract operator is a projection-operator. It is only available during a $project step. So your options are:
(not recommended) Add a $project-step before your $match-step to convert the timestamp field of all documents for the following match-step. I would not recommend you to do this because this operation needs to be performed on every single document on your database and prevents the database from using an index on the timestamp field, so it could cost you a lot of performance.
(recommended) Generate the Date you want to match against in the shell / in your application. Generate a new Date() object, store it in a variable, subtract 24 hours from it and perform your 2nd query using that variable.
Related
I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.
I am trying to do this calculation but I am having some trouble. Anyone have any ideas how I can do calculation inside my aggregation? I wanna do something like this:
const test = await Test.aggregate([
{
$sort: {
$divide: [
'value',
Math.pow(1.1, new Date() - 'date'),
],
},
},
]);
For example here, I wanna do 1.1^number of days has passed. The Test schema has a "value" of type Float and a date of type date.
You have two issues here:
You're trying to use javascript functions within the pipeline, while this is possible by using $function it is not recommended, especially if you can execute the same logic using Mongodb operators.
$sort stage has this following structure:
{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }
As you can tell it's not being followed in your example as you're trying to use an expression.
So how can we solve these?
Well you can use $pow instead of Math.pow, $$NOW instead of new Date() and $subtract instead of the - javascript operator.
You will also need to add a "sortField" to sort by to match the $sort stage structure, all of this would look like this:
db.collection.aggregate([
{
"$addFields": {
"sortField": {
$divide: [
"$value",
{
$pow: [
1.1,
{
$subtract: [
"$$NOW",
"$date"
]
}
]
}
]
}
}
},
{
$sort: {
sortField: 1
}
}
])
Mongo Playground
Mind you subtracting dates will give you result in miliseconds, you will have to divide it by the required number ( 60 * 1000 * 60 * 24 for a day ) to get the right mesaurement.
I'm facing an issue where the aggregate function is performing very slowly where it takes about 30 seconds to gather all my data. Assume 1 of the record in this structure:
{
"_id":{
"$oid":"5909a5cefece40f172895a6b"
},
"Record":1,
"Link":"https://www.google.com",
"Location":["loc1", "loc2", "loc3"],
"Organization":["org1", "org2", "org3"],
"Date":2017,
"PeoplePPL":["ppl1", "ppl2", "ppl3"]
}
And the aggregate query as follows:
db.testdata_4.aggregate([{
"$unwind": "$PeoplePPL"
},{
"$unwind": "$Location"
},{
"$match": {
Date: {
$gte: lowerBoundYear,
$lte: upperBoundYear
}
}
},{
"$group": {
"_id": {
"People": "$PeoplePPL",
"Date": "$Date"
},
Links: {
$addToSet: "$Link"
},
Locations: {
$addToSet: "$Location"
}
}
},{
"$group": {
"_id": "$_id.People",
Record: {
$push: {
"Country": "$Locations",
"Year": "$_id.Date",
"Links": "$Links"
}
}
}
}]).toArray()
There are a total of 154 records in the "testdata_4" collection, and upon aggregation, there will be 5571 records returned with the query time of 28 seconds. I have performed the ensureIndex() on "Locations" and "Date". Is this supposed to be normal as the number of records returned increases?
If it isn't normal, may I know if there's a workaround to decrease my query time to 5 seconds at max instead of having it at 28 seconds or more?
It's very likely that the index on Date isn't being used.
The $match and $sort operators can take advantage of indexes when they are being used at the beginning of the pipeline. In this case, the filters are applied after several $unwind stages, which mean it likely isn't be used.
Suggestions:
Move the $match stage to the beginning of the pipeline
The "Location", "Date" and "Link" fields aren't arrays, so it isn't immediately clear why there are $unwind aggregation stages on these fields. You may want to remove these.
I have this query for the MongoDB aggregation framework. I cannot figure out why I can't get this query to run. I checked the documentation and am still perplexed. Can anyone let me know what is wrong.
db.acquisitions.aggregate([
{ $match: {"acquired_year":{$gte:1999} } },
{ $group: {_id:"$acquired_year", "total_acquisition_amount(BBn)": { $divide :[ {$sum:"$acquistion_price"}, 1000000000 ] } }},
{ $sort : {"acquired_year" : -1} }
])
Read the $group manual page, which also lists all valid "accumulators", which means the operators that must be the first argument to any field property referenced after the _id.
This should then lead you to work out that if you want to $divide on a summed total, you need to place that operation in a separate aggregation pipeline stage with $project:
db.acquisitions.aggregate([
{ "$match": { "acquired_year":{ "$gte": 1999 } }},
{ "$group": {
"_id":"$acquired_year",
"total_acquisition_amount(BBn)": { "$sum": "$acquistion_price" }
}},
{ "$project": {
"total_acquisition_amount(BBn)": {
"$divide": [ "$totatotal_acquisition_amount(BBn)", 1000000000 ]
}
}},
{ "$sort": { "_id": -1 }}
])
The only way you can otherwise use math and other operators is "within" an accumulator like $sum, which does not apply in this case since the division must occur "after" the total has been determined.
Also, as a result of $group, the "acquired_year" field is no longer part of the document emitted, but instead this is the _id value, so you apply the sort on that instead.
How can I query a specific month in mongodb, not date range, I need month to make a list of customer birthday for current month.
In SQL will be something like that:
SELECT * FROM customer WHERE MONTH(bday)='09'
Now I need to translate that in mongodb.
Note: My dates are already saved in MongoDate type, I used this thinking that will be easy to work before but now I can't find easily how to do this simple thing.
With MongoDB 3.6 and newer, you can use the $expr operator in your find() query. This allows you to build query expressions that compare fields from the same document in a $match stage.
db.customer.find({ "$expr": { "$eq": [{ "$month": "$bday" }, 9] } })
For other MongoDB versions, consider running an aggregation pipeline that uses the $redact operator as it allows you to incorporate with a single pipeline, a functionality with $project to create a field that represents the month of a date field and $match to filter the documents
which match the given condition of the month being September.
In the above, $redact uses $cond tenary operator as means to provide the conditional expression that will create the system variable which does the redaction. The logical expression in $cond will check
for an equality of a date operator field with a given value, if that matches then $redact will return the documents using the $$KEEP system variable and discards otherwise using $$PRUNE.
Running the following pipeline should give you the desired result:
db.customer.aggregate([
{ "$match": { "bday": { "$exists": true } } },
{
"$redact": {
"$cond": [
{ "$eq": [{ "$month": "$bday" }, 9] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
This is similar to a $project +$match combo but you'd need to then select all the rest of the fields that go into the pipeline:
db.customer.aggregate([
{ "$match": { "bday": { "$exists": true } } },
{
"$project": {
"month": { "$month": "$bday" },
"bday": 1,
"field1": 1,
"field2": 1,
.....
}
},
{ "$match": { "month": 9 } }
])
With another alternative, albeit slow query, using the find() method with $where as:
db.customer.find({ "$where": "this.bday.getMonth() === 8" })
You can do that using aggregate with the $month projection operator:
db.customer.aggregate([
{$project: {name: 1, month: {$month: '$bday'}}},
{$match: {month: 9}}
]);
First, you need to check whether the data type is in ISODate.
IF not you can change the data type as the following example.
db.collectionName.find().forEach(function(each_object_from_collection){each_object_from_collection.your_date_field=new ISODate(each_object_from_collection.your_date_field);db.collectionName.save(each_object_from_collection);})
Now you can find it in two ways
db.collectionName.find({ $expr: {
$eq: [{ $year: "$your_date_field" }, 2017]
}});
Or by aggregation
db.collectionName.aggregate([{$project: {field1_you_need_in_result: 1,field12_you_need_in_result: 1,your_year_variable: {$year: '$your_date_field'}, your_month_variable: {$month: '$your_date_field'}}},{$match: {your_year_variable:2017, your_month_variable: 3}}]);
Yes you can fetch this result within date like this ,
db.collection.find({
$expr: {
$and: [
{
"$eq": [
{
"$month": "$date"
},
3
]
},
{
"$eq": [
{
"$year": "$date"
},
2020
]
}
]
}
})
If you're concerned about efficiency, you may want to store the month data in a separate field within each document.