Within a mongodb $match, how to test for field MATCHING , rather than field EQUALLING - mongodb

Can anyone tell me how to add a $match stage to an aggregation pipeline to filter for where a field MATCHES a query, (and may have other data in it too), rather than limiting results to entries where the field EQUALS the query?
The query specification...
var query = {hello:"world"};
...can be used to retrieve the following documents using the find() operation of MongoDb's native node driver, where the query 'map' is interpreted as a match...
{hello:"world"}
{hello:"world", extra:"data"}
...like...
collection.find(query);
The same query map can also be interpreted as a match when used with $elemMatch to retrieve documents with matching entries contained in arrays like these documents...
{
greetings:[
{hello:"world"},
]
}
{
greetings:[
{hello:"world", extra:"data"},
]
}
{
greetings:[
{hello:"world"},
{aloha:"mars"},
]
}
...using an invocation like [PIPELINE1] ...
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
]).toArray()
However, trying to get a list of the matching greetings with unwind [PIPELINE2] ...
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
]).toArray()
...produces all the array entries inside the documents with any matching entries, including the entries which don't match (simplified result)...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world", extra:"data"}},
{greetings:{hello:"world"}},
{greetings:{aloha:"mars"}},
]
I have been trying to add a second match stage, but I was surprised to find that it limited results only to those where the greetings field EQUALS the query, rather than where it MATCHES the query [PIPELINE3].
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
{$match:{greetings:query}},
]).toArray()
Unfortunately PIPELINE3 produces only the following entries, excluding the matching hello world entry with the extra:"data", since that entry is not strictly 'equal' to the query (simplified result)...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
]
...where what I need as the result is rather...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
{greetings:{"hello":"world","extra":"data"}
]
How can I add a second $match stage to PIPELINE2, to filter for where the greetings field MATCHES the query, (and may have other data in it too), rather than limiting results to entries where the greetings field EQUALS the query?

What you're seeing in the results is correct. Your approach is a bit wrong. If you want the results you're expecting, then you should use this approach:
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
{$match:{"greetings.hello":"world"}},
]).toArray()
With this, you should get the following output:
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
{greetings:{"hello":"world","extra":"data"}
]
Whenever you're using aggregation in MongoDB and want to create an aggregation pipeline that yields documents you expect, you should always start your query with the first stage. And then eventually add stages to monitor the outputs from subsequent stages.
The output of your $unwind stage would be:
[{
greetings:{hello:"world"}
},
{
greetings:{hello:"world", extra:"data"}
},
{
greetings:{hello:"world"}
},
{
greetings:{aloha:"mars"}
}]
Now if we include the third stage that you used, then it would match for greetings key that have a value {hello:"world"} and with that exact value, it would find only two documents in the pipeline. So you would only be getting:
{ "greetings" : { "hello" : "world" } }
{ "greetings" : { "hello" : "world" } }

Related

Mongodb aggregate $count

I would like to count the number of documents returned by an aggregation.
I'm sure my initial aggregation works, because I use it later in my programm. To do so I created a pipeline variable (here called pipelineTest, ask me if you want to see it in detail, but it's quite long, that's why I don't give the lines here).
To count the number of documents returned, I push my pipeline with :
{$count: "totalCount"}
Now I would like to get (or log) totalCount value. What should I do ?
Here is the aggregation :
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.?)
Thanks for your help, I read lot and lot doc about aggregation and I still don't understand how to read the result of an aggregation...
Assuming you're using async/await syntax - you need to await on the result of the aggregation.
You can convert the cursor to an array, get the first element of that array and access totalCount.
pipelineTest.push({$count: "totalCount"});
cursorTest = await collection.aggregate(pipelineTest, options).toArray();
console.log(cursorTest[0].totalCount);
Aggregation
db.mycollection.aggregate([
{
$count: "totalCount"
}
])
Result
[ { totalCount: 3 } ]
Your Particulars
Try the following:
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.totalCount)

what is the difference between MongoDB find and aggregate in below queries?

select records using aggregate:
db.getCollection('stock_records').aggregate(
[
{
"$project": {
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
},
{
"$match": {
"$and": [
{
"info.store_id": "563dcf3465512285781608802a"
},
{
"info.created_date": {
$gt: ISODate("2021-07-18T21:07:42.313+00:00")
}
}
]
}
}
])
select records using find:
db.getCollection('stock_records').find(
{
'info.store_id':'563dcf3465512285781608802a',
'info.created_date':{ $gt:ISODate('2021-07-18T21:07:42.313+00:00')}
})
What is difference between these queries and which is best for select by id and date condition?
I think your question should be rephrased to "what's the difference between find and aggregate".
Before I dive into that I will say that both commands are similar and will perform generally the same at scale. If you want specific differences is that you did not add a project option to your find query so it will return the full document.
Regarding which is better, generally speaking unless you need a specific aggregation operator it's best to use find instead, it performs better
Now why is the aggregation framework performance "worse"? it's simple. it just does "more".
Any pipeline stage needs aggregation to fetch the BSON for the document then convert them to internal objects in the pipeline for processing - then at the end of the pipeline they are converted back to BSON and sent to the client.
This, especially for large queries has a very significant overhead compared to a find where the BSON is just sent back to the client.
Because of this, if you could execute your aggregation as a find query, you should.
Aggregation is slower than find.
In your example, Aggregation
In the first stage, you are returning all the documents with projected fields
For example, if your collection has 1000 documents, you are returning all 1000 documents each having specified projection fields. This will impact the performance of your query.
Now in the second stage, You are filtering the documents that match the query filter.
For example, out of 1000 documents from the stage 1 you select only few documents
In your example, find
First, you are filtering the documents that match the query filter.
For example, if your collection has 1000 documents, you are returning only the documents that match the query condition.
Here You did not specify the fields to return in the documents that match the query filter. Therefore the returned documents will have all fields.
You can use projection in find, instead of using aggregation
db.getCollection('stock_records').find(
{
'info.store_id': '563dcf3465512285781608802a',
'info.created_date': {
$gt: ISODate('2021-07-18T21:07:42.313+00:00')
}
},
{
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
)

Difference between aggregation and operator

I have been reading up on some mongodb documentation and ran across some confusing terminology, namely how to differentiate when a symbol will be used as an aggregate function or as an operator.
For example, the $size function either calculates the number of items in an array, or checks if the number of elements in an array is equal to a number, is there any way to know what the function will do at what time? Through trial and error I discovered that the $size will throw an error unless a number is passed to it in the $match step, but is there some rule/guideline so I can know what it will do beforehand?
db.collection.aggregate([
{
$project: {
key: 1,
number: {
$size: "$key"
}
}
},
{
$match: {
key: {
$size: 1
}
}
}
])
For querying data in MongoDB you can use the find method or the aggregate method. There are operators which you can use with these methods.
These query operators are used with the find method. Some of these can also used with the $match stage of the aggregate method (details later in the post).
These aggregation pipeline operators are used within the aggregate's stages. Some of these can also be used with the find method (details later in the post).
You will notice that, there are common operator names; for example, $eq, $gte, $or, $type, $size, etc. But, their usage and / or functionality can be different. The $eq operator has same function but different usage syntax and the $typeoperator has different functionality (and usage syntax).
And, some of these operators can be used with both the methods.
Some Usage Scenarios:
Lets consider a users collection and some queries:
{ "_id" : 1, "age" : 21, "firstname" : "John" }
{ "_id" : 2, "age" : 18, "firstname" : "John" }
{ "_id" : 3, "age" : "39", "firstname" : "Johnson" }
The query:
db.users.find( { firstname: { $eq: "John"}, age: { $gt: 20 } } )
This query's filter is same as { firstname: "John"}, age: { $gt: 20 } }. This uses the query operators $eq and $gt. The same query can be written in the aggregate method's $match stage:
db.users.aggregate([
{ $match: { firstname: "John", age: { $gt: 20 } } },
])
The operators used in this case are the same query operators. The query comparison operators can be used with the aggregate method's $match and $lookup stages.
Another Scenario:
db.users.aggregate([
{ $project: { ageGreaterThan20: { $gt: [ "$age", 20 ] } } },
])
This is the usage of aggregation comparison operator $gt. Note this is used within the aggregation query, but within the $project stage.
As you had used the query operators in the aggregation query, you can also use the aggregation operators within the find method. But, this must be used with the "special" $expr operator. For example:
db.users.find( { $expr: { $gt: [ "$age", 20 ] } } )
The advantage of the $expr is that - there are number of aggregation operators which can be used within the find queries. For example, using the $strLenCP:
db.users.find( { $expr: { $gt: [ { $strLenCP: "$firstname" }, 4 ] } } )
You can also use the $expr within the aggregation, within the $match or $lookup stages:
db.users.aggregate([
{ $match: { $expr: { $gt: [ "$age", 20 ] } } },
])
Finally:
I have been reading up on some mongodb documentation and ran across
some confusing terminology, namely how to differentiate when a symbol
will be used as an aggregate function or as an operator. ... but is
there some rule/guideline so I can know what it will do beforehand?
Reading helps, practicing helps better and experience helps the best. You use a specific operator in a specific scenario or use case. To achieve some functionality you use an appropriate method and operator(s).
Reference: Operators
As per my understanding, I'm pointing out few things :
Difference between aggregation and operator
You'll have certain basic functions for crud operations like .find() or .insert() or .delete() or .update() on MongoDB (There are few others like .count(), .distinct() but those are primary)
Versus
aggregation is a whole framework heavily used for complex reads, only two stages in aggregation is capable of writes $out and $merge.
Operators :
There are different types of operators :
Query and Projection Operators : These operators are crucial and are used to filter docs and to transform fields of docs in the response, Usually used in filter and project part of .find(filter, project) or .update(filter, update, project/options) etc.. These are also used in $match stage of aggregation pipeline ($match is similar to filter in .find()). Ex. :- $and, $or, $in, $or & more.
Update Operators : By name, these operators help to update documents in the collection. Usually used in update part of .update() or .findOneAndUpdate() etc.. Ex. :- $set, $unset, $inc & more.
Aggregation :
When it comes to aggregation they call it aggregation framework, definitely for a reason as you can do a lot of things with data using aggregation.
Aggregation has aggregation pipeline , which has syntax .aggregate([]). pipeline is an array with stages. Each stage in aggregation does certain operation on data flowing through them. Ex. :- $match, $project, $group etc..
As we know each document is independent on it's own, most of aggregation stages operate independently on each doc flowing through them.
Aggregation pipeline Operators :
These operators are generally used to achieve what you're looking for, certain operators can't be used in conjunction with other operators or in a stage.
Let's say in $match stage you would mostly use Query operators but not Aggregation operators as aggregation operators can't directly be used in $match in contrast with $project or $addFields stages.
Example with Stages & Operators :
Let's say you got to make Smoothie :
Out of a bunch of groceries you would filter needed fruits(docs in fruits collection) by matching with what you wanted to blend using $match stage ($and to match fruits/veggies, $lt to filter only fruits that hasn't expired and $size to limit no.of fruits needed).
You would peel off skin or chop into pieces to keep just keep useful parts of fruits(docs) using $project.
You would group all of the fruits into a blender using $group & add ingredients like cream & sugar using $addFields - You're too cautious about sugar quantity so you'll use operators like $size to check size and $multiply no.of nutrients based on conditions ($cond) you would $divide both sugar/nutrients to count nutrition value.
You'll iterate on adding ice pieces again and again using $map, $filter to remove un crushed ice pieces.
uhh, you always forget to add veggies (A different collection needs to be merged) based on fruits that already got blended you use $lookup to lookup for matching veggies(docs) & either blend in again with group or just put it on top.
Finally either you drink it from blender (just return the docs no more stages) or take into glass using $out or $merge stage (Remember as I said, only two stages that can write to a collection).
Off course every stage is Optional - You can eat fruits as is instead of blending (get all docs and all their fields) but you know what you wouldn't do that (for many reasons like performance, unnecessary data flowing through network) unless it's pre-prepared product (like a small configuration collection which has limited data with few docs & every doc is need & can be easily retrieved in one DB call).
Note :
Usually you can't use aggregation operators in filter part i.e; .find(filter) or in '$match' stage unless you use $expr.
Starting MongoDB v4.2 you can run aggregation pipeline in update-with-an-aggregation-pipeline where you can take advantage of aggregation stages/operators in update part.
While you search MongoDB's documentation often you would find the same operator in multiple places, Let's say if you search for $size you would find multiple references one is Query operator or as Projection operator or as aggregation operator - So depends on need/where you wanted to use you can refer their documentation for usage cause though name seems to be similar or does almost same but functionality may differ & syntax also differs.
You can always MongoDB for free at MongoDB University.

How to fetch value from inner object in mongo query result

I'm new to mongodb.
I have this object in my collection:
{
"_id" : ObjectId("5b549be38d9f1c00160117d3"),
"name" : "Name of object",
"a" : {
"b" : {
"c" : 100
}
}
}
What interests me is the 100 value, I want to fetch it from the object.
When I query the collection like this:
db.getCollection('myCollection').find({}, {'name':1, 'a.b.c':1})
I only get the same object with the inner objects.
Is there a way to query it so that I will get a result like this:
{"Name": "Name of object", "c":100}
By using Mongo aggregate query you can get the result. In $project stage of Mongo aggregate query you can add the conditions as per requirement.
Please try this query, might you will get the result:
db.myCollection.aggregate({
$project: {
"name": "$name",
"c": "$a.b.c",
_id: 0
}
})
As suggested by #Mayuri, You can achieve this by using .aggregate() but if you wanted to look at difference between $project in aggregation (Vs) projection in .find(), check out :
So one first thing with $project is it is much more powerful & can accept a lot more features that are helpful to transform the fields. But projection in .find() is pretty straight forward that can accept only few things : projection i.e; value to a field in projection can be one of these :
1 or true to include the field in the return documents.
0 or false to exclude the field.
Projection Operators : $, $elemMatch, $slice, $meta
Actual Issue :
In your query when you're doing 'a.b.c':1 in projection it means you're returning c field in the output, As field c is nested in b & a the output structure doesn't change & you'll be getting the c value under the same structure, but if you use aggregation { $project: { "c": "$a.b.c" } } it means you're assigning value of "$a.b.c" to a field named c. How ? : So when $ is used against a field in aggregation it will access field's value, which helps for assignment.

Aggregation pipeline and indexes

From http://docs.mongodb.org/manual/core/indexes/#multikey-indexes, it is possible to create an index on an array field using a multikey index. http://docs.mongodb.org/manual/applications/aggregation/#pipeline-operators-and-indexes lists some ways of how an index can be used in aggregation framework. However, there may be times that I may need to perform an $unwind on an array field to perform a $group. My question is, are multikey indexes (or any index using such array field) can still be used once they are operated on in the middle of the pipeline?
Generally, only pipeline operators that can be flattened to a normal query ($match, $limit, $sort, and $skip) will be able to use the indexes on a collection. This is one of the reasons the $geoNear operator added in 2.4 has to be at the start of the pipeline.
Once you mutate the documents with $project, $group, or $unwind the index is no longer valid/usable.
If you have an index on an array field you can still use it before the $unwind to speed up the selection of documents to pipeline and then further refine the selected documents with a second $match.
Consider documents like:
{ tags: [ 'cat', 'bird', 'blue' ] }
With an index on tags.
If you only wanted to group the tags starting with b then you could perform an aggregation like:
{ pipeline: [
{ $match : { tags : /^b/ } },
{ $unwind : '$tags' },
{ $match : { tags : /^b/ } },
/* the rest */
] }
The first $match does the coarse grain match using the index on tags.
The second match after the $unwind won't be able to use the index (the document above is now 3 documents) but can evaluate each of those documents to filter out the extra documents that get created (to remove { tags : 'cat' } from the example).
HTH - Rob.
Hmm #Rob does give the right answer but I see how he could lead you down the wrong path a little:
If you have an index on an array field you can still use it before and after the $unwind to speed up the selection of documents to pipeline and then further refine the selected documents.
Basically the example he gives:
{ pipeline: [
{ $match : { tags : /^b/ } },
{ $unwind : '$tags' },
{ $match : { tags : /^b/ } },
/* the rest */
] }
Will not use a multikey index past $unwind. So it will be able to search for all ROOT documents which have a tag name starting with b however, it will not be able to $unwind and then filter the subdocuments out in the second $match using an index.
The $match will only work on an index before the mutation.
So basically once you have mutated the document and loaded it onto the pipeline it becomes almost impossible to use an index currently.