MongoDB: Why $literal required ? And where it can be used? - mongodb

I have gone through MongoDB $literal in Aggregation framework, but I don't understand where it could be used ? more importantly, why it is required ?
Example from official MongoDB documentation,
db.records.aggregate( [
{ $project: { costsOneDollar: { $eq: [ "$price", { $literal: "$1" } ] } } }
])
Instead of the above example using $literal, why can't I use as below ?
db.records.aggregate( [
{ $project: { costsOneDollar: { $eq: [ "$price", "$1" ] } } }
] )
Also provide some other example which shows the best(or effective) usage of $literal.

For your basic case I think the documentation is fairly self explanatory:
In expression, the dollar sign $ evaluates to a field path; i.e. provides access to the field. For example, the $eq expression $eq: [ "$price", "$1" ] performs an equality check between the value in the field named price and the value in the field named 1 in the document.
So since $ is reserved for evaluation of field path values within the document, then this would be considered to acutally be looking for a "field" named 1 within the document. So the actual comparsion would likely be between the field named "price" and since there is no field named "1" then this would be treated as null and therefore false for every document.
On the other hand where the field "price" actually has a value equal to "$1", then the usage of $literal allows that "value" ( and not the field path reference ) to be considered. Hence "literal".
The operator has actually been around for some time ( since MongoDB 2.2 actually ) but under the guise of $const, which though not doucmented is still the basic operator, and $literal is really just an "alias" for that.
The usage mainly is and always has been to use where an expression is required to have some "specific value" as instructed within the pipeline. Take this simple statement:
{ "$project": { "myField": "one" } }
So for any number of reasons you might want to do that, and basically return a "literal" value in such a statement. But if you tried, it would result in a error as it essentially does not resolve to either a "field path" or a boolean condition for field selection, as is required here. So if you instead use:
{ "$project": { "myField": { "$literal": "one" } } }
Then you have "myField" with a value of "one" just like you asked for.
Other usages are more historic, such as:
{ "$project": { "array": { "$literal": ["A","B","C" ] } } },
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"trans": { "$push": {
"$cond": [
{ "$eq": [ "$array", "A" ] },
"$fieldA",
{ "$cond": [
{ "$eq": [ "$array", "B" ] },
"$fieldB",
"$fieldC"
]}
]
}}
}}
Which might more modernly be replaced with something like:
{ "$project": {
"trans": {
"$map": {
"input": ["A","B","C"],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$fieldA",
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
"$fieldB",
"$fieldC"
]}
]
}
}
}
}}
As a construct to move selected fields into an array based on position, with the difference being that as "array" and a field assignment the $literal is necessary, but as the "input" argument the plain array notation is just fine.
So the general cases are:
Where something reserved such as $ is needed as the value to match
Where there is a specific value to inject as a field assignment, and not as an argument to another operator expression.

The $1 example you give would try and compare the price field with the 1 field. By specifying the $literal operator, you're telling MongoDB that it is the exact string "$1". The same might be true if you wanted to use a MongoDB function name as a field name in your code, or even using a query snippet as a field value.

Related

MongoDb - checking existence of param in array during $addFields

given this fantasy dataset (sorry in advance but I couldn't manage to format it properly):
I would need to make a MongoDb pipe where a new field is created and, this new field, should be able to add a parameter valued with an empty string whene the "k" parameter is missing or empty.
Here is my attempt:
...
{
"$addFields": {
"colors_field": {
"r": "$colors.r",
"g": "$colors.g",
"b": "$colors.b",
"k": {
"$cond": {
"if": {
"$or": [
{"$eq": [ "$colors.k", "" ]},
{"$eq": [ "$colors.k", null ]},
{"$colors.k": { "$exists": false}}
]
},
"then": "",
"else": "$colors.k"
}
}
}
}
}
I tried with the $exists but I can't make it work because of the way I call the value to check: either I put "$colors.k" and it returns a "MongoError: Unrecognized expression" or I don't and it'll return a "MongoError: FieldPath field names may not contain '.'".
I also tried to check for the length of that field but it'll crash if said field is missing.
The problem with your query is that you are using a query operatos $exists
inside a pipeline (only $match stage allows this).
We dont have an "$exists" operator in aggregation but we can do it using $type and "missing", but here you want exists or null, so $ifnull is the right way to do it(that does exaclty this,name is missleading, its if not-exists or null).
Query (3 ways to do it)
$type and "missing" (the right way to go if you wanted only exists), here you want null, also so it becomes bigger query
$ifnull this is the shortest way
$cond not-exists or null are both false values, so this works also, BUT
be careful with this, because if color was false, it would make it also ""
*in your case the short and safe solution is the 2
Test code here
db.collection.aggregate([
{
"$set": {
"k-cond": {
"$cond": [
{
"$or": [
{
"$eq": [
"$colors.k",
null
]
},
{
"$eq": [
{
"$type": "$colors.k"
},
"missing"
]
}
]
},
"",
"$colors.k"
]
},
"k-ifnull": {
"$ifNull": [
"$colors.k",
""
]
},
"k-if": {
"$cond": [
"$colors.k",
"$colors.k",
""
]
}
}
}
])

Find Index of first Matching Element $gte with $indexOfArray

MongoDB has $indexOfArray to let you find the element's array index, for example:
$indexOfArray: ["$article.date", ISODate("2019-03-29")]
Is it possible to use comparison operators with $indexOfArray together, like:
$indexOfArray: ["$article.date", {$gte: ISODate("2019-03-29")}]
Not it's not possible with $indexOfArray as that will only look for an equality match to an expression as the second argument.
Instead you can make a construct like this:
db.data.insertOne({
"_id" : ObjectId("5ca01e301a97dd8b468b3f55"),
"array" : [
ISODate("2018-03-01T00:00:00Z"),
ISODate("2018-03-02T00:00:00Z"),
ISODate("2018-03-03T00:00:00Z")
]
})
db.data.aggregate([
{ "$addFields": {
"matchedIndex": {
"$let": {
"vars": {
"matched": {
"$arrayElemAt": [
{ "$filter": {
"input": {
"$zip": {
"inputs": [ "$array", { "$range": [ 0, { "$size": "$array" } ] }]
}
},
"cond": { "$gte": [ { "$arrayElemAt": ["$$this", 0] }, new Date("2018-03-02") ] }
}},
0
]
}
},
"in": {
"$arrayElemAt": [{ "$ifNull": [ "$$matched", [0,-1] ] },1]
}
}
}
}}
])
Which would return for the $gte of Date("2018-03-02"):
{
"_id" : ObjectId("5ca01e301a97dd8b468b3f55"),
"array" : [
ISODate("2018-03-01T00:00:00Z"),
ISODate("2018-03-02T00:00:00Z"),
ISODate("2018-03-03T00:00:00Z")
],
"matchedIndex" : 1
}
Or -1 where the condition was not met in order to be consistent with $indexOfArray.
The basic premise is using $zip in order to "pair" with the array index positions which get generated from $range and $size of the array. This can be fed to a $filter condition which will return ALL matching elements to the supplied condition. Here it is the first element of the "pair" ( being the original array content ) via $arrayElemAt matching the specified condition using $gte
{ "$gte": [ { "$arrayElemAt": ["$$this", 0] }, new Date("2018-03-02") ] }
The $filter will return either ALL elements after ( in the case of $gte ) or an empty array where nothing was found. Consistent with $indexOfArray you only want the first match, which is done with another wrapping $arrayElemAt on the output for the 0 position.
Since the result could be an omitted value ( which is what happens by $arrayElemAt: [[], 0] ) then you use [$ifNull][8] to test the result ans pass a two element array back with a -1 as the second element in the case where the output was not defined. In either case that "paired" array has the second element ( index 1 ) extracted again via $arrayElemAt in order to get the first matched index of the condition.
Of course since you want to refer to that whole expression, it just reads a little cleaner in the end within a $let, but that is optional as you can "inline" with the $ifNull if wanted.
So it is possible, it's just a little more involved than placing a range expression inside of $indexOfArray.
Note that any expression which actually returns a single value for equality match is just fine. But since operators like $gte return a boolean, then that would not be equal to any value in the array, and thus the sort of processing with $filter and then extraction is what you require.

Query by field value, not value in field array

The following snippet shows three queries:
find all the documents
find the documents containing a field a containing either the string "x" or an array containing the string "x"
find the documents containing a field a containing an array containing the string "x"
I was not able to find the documents containing a field a containing the string "x", not inside an array.
> db.stuff.find({},{_id:0})
{ "a" : "x" }
{ "a" : [ "x" ] }
> db.stuff.find({a:"x"},{_id:0})
{ "a" : "x" }
{ "a" : [ "x" ] }
> db.stuff.find({a:{$elemMatch:{$eq:"x"}}},{_id:0})
{ "a" : [ "x" ] }
>
MongoDB basically does not care if the data at a "given path" is actually in an array or not. If you want to make the distinction, then you need to "tell it that":
db.stuff.find({ "a": "x", "$where": "return !Array.isArray(this.a)" })
This is what $where adds to the bargain, where you can supply a condition that explicitly asks "is this an array" via Array.isArray() in JavaScript evaluation. And the JavaScript NOT ! assertion reverses the logic.
An alternate approach is to add the $exists check:
db.stuff.find({ "a": "x", "a.0": { "$exists": false } })
Which also essentially asks "is this an array" by looking for the first element index. So the "reverse" false case means "this is not an array".
Or even as you note you can use $elemMatch to select only the array, but "negate" that using $not:
db.stuff.find({ "a": { "$not": { "$elemMatch": { "$eq": "x" } } } })
Though probably "not" the best of options since that also "negates index usage", which the other examples all strive to avoid by at least including "one" positive condition for a match. So it's for the best to include the "implicit AND" by combining arguments:
db.stuff.find({
"a": { "$eq": "x", "$not": { "$elemMatch": { "$eq": "x" } } }
})
Or for "aggregation" which does not support $where, you can test using the $isArray aggregation operator should your MongoDB version ( 3.2 or greater ) support it:
db.stuff.aggregate([
{ "$match": { "a": "x" } },
{ "$redact": {
"$cond": {
"if": { "$not": { "$isArray": "$a" } },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Noting that it is good practice to supply "regular" query conditions as well where possible, and in all cases.
Also noting that querying the BSON $type does not typically work in this case, since the "contents" of the array itself are in fact a "string", which is what the $type operator is going to consider, and thus not report that such an array is in fact an array.

Mongo Sort by Count of Matches in Array

Lets say my test data is
db.multiArr.insert({"ID" : "fruit1","Keys" : ["apple", "orange", "banana"]})
db.multiArr.insert({"ID" : "fruit2","Keys" : ["apple", "carrot", "banana"]})
to get individual fruit like carrot i do
db.multiArr.find({'Keys':{$in:['carrot']}})
when i do an or query for orange and banana, i see both the records fruit1 and then fruit2
db.multiArr.find({ $or: [{'Keys':{$in:['carrot']}}, {'Keys':{$in:['banana']}}]})
Result of the output should be fruit2 and then fruit1, because fruit2 has both carrot and banana
To actually answer this first, you need to "calculate" the number of matches to the given condition in order to "sort" the results to return with the preference to the most matches on top.
For this you need the aggregation framework, which is what you use for "calculation" and "manipulation" of data in MongoDB:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$project": {
"ID": 1,
"Keys": 1,
"order": {
"$size": {
"$setIntersection": [ ["carrot", "banana"], "$Keys" ]
}
}
}},
{ "$sort": { "order": -1 } }
])
On an MongoDB older than version 3, then you can do the longer form:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$unwind": "$Keys" },
{ "$group": {
"_id": "$_id",
"ID": { "$first": "$ID" },
"Keys": { "$push": "$Keys" },
"order": {
"$sum": {
{ "$cond": [
{ "$or": [
{ "$eq": [ "$Keys", "carrot" ] },
{ "$eq": [ "$Keys", "banana" ] }
]},
1,
0
]}
}
}
}},
{ "$sort": { "order": -1 } }
])
In either case the function here is to first match the possible documents to the conditions by providing a "list" of arguments with $in. Once the results are obtained you want to "count" the number of matching elements in the array to the "list" of possible values provided.
In the modern form the $setIntersection operator compares the two "lists" returning a new array that only contains the "unique" matching members. Since we want to know how many matches that was, we simply return the $size of that list.
In older versions, you pull apart the document array with $unwind in order to perform operations on it since older versions lacked the newer operators that worked with arrays without alteration. The process then looks at each value individually and if either expression in $or matches the possible values then the $cond ternary returns a value of 1 to the $sum accumulator, otherwise 0. The net result is the same "count of matches" as shown for the modern version.
The final thing is simply to $sort the results based on the "count of matches" that was returned so the most matches is on "top". This is is "descending order" and therefore you supply the -1 to indicate that.
Addendum concerning $in and arrays
You are misunderstanding a couple of things about MongoDB queries for starters. The $in operator is actually intended for a "list" of arguments like this:
{ "Keys": { "$in": [ "carrot", "banana" ] } }
Which is essentially the shorthand way of saying "Match either 'carrot' or 'banana' in the property 'Keys'". And could even be written in long form like this:
{ "$or": [{ "Keys": "carrot" }, { "Keys": "banana" }] }
Which really should lead you to if it were a "singular" match condition, then you simply supply the value to match to the property:
{ "Keys": "carrot" }
So that should cover the misconception that you use $in to match a property that is an array within a document. Rather the "reverse" case is the intended usage where instead you supply a "list of arguments" to match a given property, be that property an array or just a single value.
The MongoDB query engine makes no distinction between a single value or an array of values in an equality or similar operation.

Return only matched sub-document elements within a nested array

The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $ operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores" array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers", and as such every "offer" within the matchd "stores" array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter operation:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.x and above with $map and $setDifference:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
MongoDB 3.2.x and greater
So generally speaking, $filter is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers" within "stores" to examime and $filter that content.
The simple comparison here is "Does the "size" array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset operation to compare an array ("set") of ["L"] to the target array. Where that condition is true ( it contains "L" ) then the array element for "offers" is retained and returned in the result.
In the higher level $filter, you are then looking to see if the result from that previous $filter returned an empty array [] for "offers". If it is not empty, then the element is returned or otherwise it is removed.
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter in this version you can use $map to inspect each element and then use $setDifference to filter out any elements that were returned as false.
So $map is going to return the whole array, but the $cond operation just decides whether to return the element or instead a false value. In the comparison of $setDifference to a single element "set" of [false] all false elements in the returned array would be removed.
In all other ways, the logic is the same as above.
MongoDB 2.2.x and up
So below MongoDB 2.6 the only tool for working with arrays is $unwind, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
Conclusion
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find().
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match and $project, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints.
Have a fun!
It's also works without aggregate.
here is the solution link:https://mongoplayground.net/p/Q5lxPvGK03A
db.collection.find({
"stores.offers.size": "L"
},
{
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [
[
"L"
],
"$$offer.size"
]
}
}
}
}
}
},
"as": "store",
"cond": {
"$ne": [
"$$store.offers",
[]
]
}
}
}
})