How can I add a computed field using an arbitrary function with Mongo aggregate framework?

How can I add a computed field using an arbitrary function with Mongo aggregate framework? - mongodb

I'm using MongoDB's Aggregate Framework. I have an existing field in each of my documents:
time: Date
And I wish create a new field timeBlock based on a simple funcion:
var dateToTimeBlock = function(dateString, timeBlock){
return new Date(dateString).valueOf() / timeBlock
}
I understand that $group can add fields, but the functions used to calculate those fields seems to be built into mongo, eg, $avg, $add, etc. Is it possible to generate a computed value based on an arbitrary field?

You can compute fields using aggregation framework. If you want to use native JavaScript functions (like valueOf on Date object) you will have to use Map-Reduce.
Although, aggregation framework is not as flexible as Map-Reduce, in most cases it will be significantly faster. If performance is critical I would precalculate those values and use a simple query instead.
If you want to use aggregation, you can simplify it by converting the Date into an integer or add a new field in the document that's an integer. Then you can than do your calculations easily with $divide.
db.coll.aggregate([
{ $project : { dateToTime: { $divide : [ "$timeInt", timeBlock ] }}}
]);
Or if timeBlock is a field in the same document you can do it like this:
db.coll.aggregate([
{ $project : { dateToTime: { $divide : [ "$timeInt", "$timeBlock" ] }}}
]);

Related

Difference between aggregation and operator

I have been reading up on some mongodb documentation and ran across some confusing terminology, namely how to differentiate when a symbol will be used as an aggregate function or as an operator.
For example, the $size function either calculates the number of items in an array, or checks if the number of elements in an array is equal to a number, is there any way to know what the function will do at what time? Through trial and error I discovered that the $size will throw an error unless a number is passed to it in the $match step, but is there some rule/guideline so I can know what it will do beforehand?
db.collection.aggregate([
{
$project: {
key: 1,
number: {
$size: "$key"
}
}
},
{
$match: {
key: {
$size: 1
}
}
}
])

For querying data in MongoDB you can use the find method or the aggregate method. There are operators which you can use with these methods.
These query operators are used with the find method. Some of these can also used with the $match stage of the aggregate method (details later in the post).
These aggregation pipeline operators are used within the aggregate's stages. Some of these can also be used with the find method (details later in the post).
You will notice that, there are common operator names; for example, $eq, $gte, $or, $type, $size, etc. But, their usage and / or functionality can be different. The $eq operator has same function but different usage syntax and the $typeoperator has different functionality (and usage syntax).
And, some of these operators can be used with both the methods.
Some Usage Scenarios:
Lets consider a users collection and some queries:
{ "_id" : 1, "age" : 21, "firstname" : "John" }
{ "_id" : 2, "age" : 18, "firstname" : "John" }
{ "_id" : 3, "age" : "39", "firstname" : "Johnson" }
The query:
db.users.find( { firstname: { $eq: "John"}, age: { $gt: 20 } } )
This query's filter is same as { firstname: "John"}, age: { $gt: 20 } }. This uses the query operators $eq and $gt. The same query can be written in the aggregate method's $match stage:
db.users.aggregate([
{ $match: { firstname: "John", age: { $gt: 20 } } },
])
The operators used in this case are the same query operators. The query comparison operators can be used with the aggregate method's $match and $lookup stages.
Another Scenario:
db.users.aggregate([
{ $project: { ageGreaterThan20: { $gt: [ "$age", 20 ] } } },
])
This is the usage of aggregation comparison operator $gt. Note this is used within the aggregation query, but within the $project stage.
As you had used the query operators in the aggregation query, you can also use the aggregation operators within the find method. But, this must be used with the "special" $expr operator. For example:
db.users.find( { $expr: { $gt: [ "$age", 20 ] } } )
The advantage of the $expr is that - there are number of aggregation operators which can be used within the find queries. For example, using the $strLenCP:
db.users.find( { $expr: { $gt: [ { $strLenCP: "$firstname" }, 4 ] } } )
You can also use the $expr within the aggregation, within the $match or $lookup stages:
db.users.aggregate([
{ $match: { $expr: { $gt: [ "$age", 20 ] } } },
])
Finally:
I have been reading up on some mongodb documentation and ran across
some confusing terminology, namely how to differentiate when a symbol
will be used as an aggregate function or as an operator. ... but is
there some rule/guideline so I can know what it will do beforehand?
Reading helps, practicing helps better and experience helps the best. You use a specific operator in a specific scenario or use case. To achieve some functionality you use an appropriate method and operator(s).
Reference: Operators

As per my understanding, I'm pointing out few things :
Difference between aggregation and operator
You'll have certain basic functions for crud operations like .find() or .insert() or .delete() or .update() on MongoDB (There are few others like .count(), .distinct() but those are primary)
Versus
aggregation is a whole framework heavily used for complex reads, only two stages in aggregation is capable of writes $out and $merge.
Operators :
There are different types of operators :
Query and Projection Operators : These operators are crucial and are used to filter docs and to transform fields of docs in the response, Usually used in filter and project part of .find(filter, project) or .update(filter, update, project/options) etc.. These are also used in $match stage of aggregation pipeline ($match is similar to filter in .find()). Ex. :- $and, $or, $in, $or & more.
Update Operators : By name, these operators help to update documents in the collection. Usually used in update part of .update() or .findOneAndUpdate() etc.. Ex. :- $set, $unset, $inc & more.
Aggregation :
When it comes to aggregation they call it aggregation framework, definitely for a reason as you can do a lot of things with data using aggregation.
Aggregation has aggregation pipeline , which has syntax .aggregate([]). pipeline is an array with stages. Each stage in aggregation does certain operation on data flowing through them. Ex. :- $match, $project, $group etc..
As we know each document is independent on it's own, most of aggregation stages operate independently on each doc flowing through them.
Aggregation pipeline Operators :
These operators are generally used to achieve what you're looking for, certain operators can't be used in conjunction with other operators or in a stage.
Let's say in $match stage you would mostly use Query operators but not Aggregation operators as aggregation operators can't directly be used in $match in contrast with $project or $addFields stages.
Example with Stages & Operators :
Let's say you got to make Smoothie :
Out of a bunch of groceries you would filter needed fruits(docs in fruits collection) by matching with what you wanted to blend using $match stage ($and to match fruits/veggies, $lt to filter only fruits that hasn't expired and $size to limit no.of fruits needed).
You would peel off skin or chop into pieces to keep just keep useful parts of fruits(docs) using $project.
You would group all of the fruits into a blender using $group & add ingredients like cream & sugar using $addFields - You're too cautious about sugar quantity so you'll use operators like $size to check size and $multiply no.of nutrients based on conditions ($cond) you would $divide both sugar/nutrients to count nutrition value.
You'll iterate on adding ice pieces again and again using $map, $filter to remove un crushed ice pieces.
uhh, you always forget to add veggies (A different collection needs to be merged) based on fruits that already got blended you use $lookup to lookup for matching veggies(docs) & either blend in again with group or just put it on top.
Finally either you drink it from blender (just return the docs no more stages) or take into glass using $out or $merge stage (Remember as I said, only two stages that can write to a collection).
Off course every stage is Optional - You can eat fruits as is instead of blending (get all docs and all their fields) but you know what you wouldn't do that (for many reasons like performance, unnecessary data flowing through network) unless it's pre-prepared product (like a small configuration collection which has limited data with few docs & every doc is need & can be easily retrieved in one DB call).
Note :
Usually you can't use aggregation operators in filter part i.e; .find(filter) or in '$match' stage unless you use $expr.
Starting MongoDB v4.2 you can run aggregation pipeline in update-with-an-aggregation-pipeline where you can take advantage of aggregation stages/operators in update part.
While you search MongoDB's documentation often you would find the same operator in multiple places, Let's say if you search for $size you would find multiple references one is Query operator or as Projection operator or as aggregation operator - So depends on need/where you wanted to use you can refer their documentation for usage cause though name seems to be similar or does almost same but functionality may differ & syntax also differs.
You can always MongoDB for free at MongoDB University.

Mongodb shell aggregate query comparing dates not returning results

I cant spot what the issue here is. This returns nothing
db.mycoll.aggregate([
{$project:{CreatedAt:"$CreatedAt",Now:"$$NOW",DateFloor:{$add:["$$NOW",-24*60*60000]}}},
{$match:{CreatedAt:{$gte:"$DateFloor"}}}
])
But this returns results - substituting DateFloor with actual value
db.mycoll.aggregate([
{$project:{CreatedAt:"$CreatedAt",Now:"$$NOW",DateFloor:{$add:["$$NOW",-24*60*60000]}}},
{$match:{CreatedAt:{$gte: ISODate("2020-04-28T23:17:56.547Z")}}}
])

Issue with your query is when you're doing :
{$match:{CreatedAt:{$gte:"$DateFloor"}}}
You're actually checking for documents where CreatedAt field's value to be greater than input string value "$DateFloor", So $match is not considering "$DateFloor" as another field in the same document rather it's considering it as a string value. So to compare two fields from same document you need to use $gte with $expr (which will let you use aggregation expressions within the query language).
{
$match: {
{
$expr: {
$gte: ["$CreatedAt", "$DateFloor"];
}
}
}
}
So you might get confused when I say aggregation expressions & why $gte needs to be wrapped inside $expr - In your actual query $gte refers to comparison operator in MongoDB but in this above query $gte refers to aggregation pipeline operator where both technically does the same but which is what needed to compare two fields from same document.

How to extract the creation date of _id and add it as a new field using the aggregation framework?

I have been trying for a while to extract the insertion date of a mongodb document and add it as a new field to the same document.
I'm trying to do it using the mongo and mongo shell aggregation framework without getting good results.
Here is my query
db.getCollection('my_collection').aggregate(
[
{
$match: {
MY QUERY CRITERIA
}
},
{
$addFields: { "insertTime": "$_id.getTimestamp()" }
}
]
)
I am trying to extracr insertion time from _id using the function getTimestamp() but for sure there is somtehing about aggregation framework syntax that I am missing because I can not do what I am trying to do in my query.
This works perfect:
ObjectId("5c34f746ccb26800019edd53").getTimestamp()
ISODate("2019-01-08T19:17:26Z")
But this does not work at all:
"$_id.getTimestamp()"
What I am missing?
Thanks in advance

Mongodb query to select less than and equal based on custom comparator

I am using mongodb database and I need to run less than and equal filter based on custom comparator. Following is more details.
"profile" collection is having "level" field as string
{"name":"Test1", "level":"intermediate"}
Following are value of level and its corresponding weight
novice
intermediate
experienced
advance
I want to write query like as below so that it should return all the profile collection which level less than and equal to "experienced" (i.e. includes result for "novice", "intermediate" and "experienced"
db.profile.find( { level: { $lte: "experienced" } } )
I understand, I need to provide custom comparator. But how can i do?

You can't use custom comparators in a MongoDB Query. The ones available are: $eq, $gt, $gte, $lt, $lte, $ne, $in, $nin.
You can, however, use $in to get what you want:
db.profile.find( { level: { $in: [ "experienced", "intermediate ", "novice" ] } } );

How to add a field to a document which contains the result of the comparison of two other fields

I would like to speed up an query on my mongoDB which uses $where to compare two fields in the document, which seems to be really slow.
My query look like this:
db.mycollection.find({ $where : "this.lastCheckDate < this.modificationDate})
What I would like to do is add a field to my document, i.e. isCheckDateLowerThenModDate, on which I could execute a probably much faster query:
db.mycollection.find({"isCheckDateLowerThenModDate":true})
I quite new to mongoDB an have no idea how to do this. I would appreciate if someone could give me some hints or examples on
How to initialize such a field on an existing collection
How to maintain this field. Which means how to update this field when lastCheckDate or modificationDate changes.
Thanks in advance for your help!

You are thinking in a right way!
1.How to initialize such a field on an existing collection.
Most simple way is to load each document (from your language), calculate this field, update and save.
Or you could perform an update via mongo shell:
db.mycollection.find().forEach(function(doc) {
if(doc.lastCheckDate < doc.modificationDate)
{
doc.isCheckDateLowerThenModDate = true;
}
else
{
doc.isCheckDateLowerThenModDate = false;
}
db.mycollection.save(doc);
});
2.How to maintain this field. Which means how to update this field when
lastCheckDate or modificationDate changes.
You have to do it yourself from your client code. Make some wrapper for update, save operations and recalculate this value each time there. To be absolutely sure that this update works -- write unit tests.

The $where clause is slow because it is evaluating each document using the JavaScript interpreter.
There are a few alternatives:
1) Assuming your use case is "look for records that need updating", take advantage of a sparse index:
add a boolean field like needsChecking and $set this whenever the modificationDate is updated
in your "check" procedure, find the documents that have this field set (should be fast due to the sparse index)
db.mycollection.find({'needsChecking':true});
after you've done whatever check is needed, $unset the needsChecking field.
2) A new (and faster) feature in MongoDB 2.2 is the Aggregation Framework.
Here is an example of adding a "isUpdated" field based on the date comparison, and then filtering the matching documents:
db.mycollection.aggregate(
{ $project: {
_id: 1,
name: 1,
type: 1,
modificationDate: 1,
lastCheckDate: 1,
isUpdated: { $gt:["$modificationDate","$lastCheckDate"] }
}},
{ $match : {
isUpdated : true,
}}
)
Some current caveats of using the Aggregation Framework are:
you have to specify fields to include aside from _id
the result is limited to the current maximum BSON document size (16Mb in MongoDB 2.2)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse