Query by field value, not value in field array - mongodb

The following snippet shows three queries:
find all the documents
find the documents containing a field a containing either the string "x" or an array containing the string "x"
find the documents containing a field a containing an array containing the string "x"
I was not able to find the documents containing a field a containing the string "x", not inside an array.
> db.stuff.find({},{_id:0})
{ "a" : "x" }
{ "a" : [ "x" ] }
> db.stuff.find({a:"x"},{_id:0})
{ "a" : "x" }
{ "a" : [ "x" ] }
> db.stuff.find({a:{$elemMatch:{$eq:"x"}}},{_id:0})
{ "a" : [ "x" ] }
>

MongoDB basically does not care if the data at a "given path" is actually in an array or not. If you want to make the distinction, then you need to "tell it that":
db.stuff.find({ "a": "x", "$where": "return !Array.isArray(this.a)" })
This is what $where adds to the bargain, where you can supply a condition that explicitly asks "is this an array" via Array.isArray() in JavaScript evaluation. And the JavaScript NOT ! assertion reverses the logic.
An alternate approach is to add the $exists check:
db.stuff.find({ "a": "x", "a.0": { "$exists": false } })
Which also essentially asks "is this an array" by looking for the first element index. So the "reverse" false case means "this is not an array".
Or even as you note you can use $elemMatch to select only the array, but "negate" that using $not:
db.stuff.find({ "a": { "$not": { "$elemMatch": { "$eq": "x" } } } })
Though probably "not" the best of options since that also "negates index usage", which the other examples all strive to avoid by at least including "one" positive condition for a match. So it's for the best to include the "implicit AND" by combining arguments:
db.stuff.find({
"a": { "$eq": "x", "$not": { "$elemMatch": { "$eq": "x" } } }
})
Or for "aggregation" which does not support $where, you can test using the $isArray aggregation operator should your MongoDB version ( 3.2 or greater ) support it:
db.stuff.aggregate([
{ "$match": { "a": "x" } },
{ "$redact": {
"$cond": {
"if": { "$not": { "$isArray": "$a" } },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Noting that it is good practice to supply "regular" query conditions as well where possible, and in all cases.
Also noting that querying the BSON $type does not typically work in this case, since the "contents" of the array itself are in fact a "string", which is what the $type operator is going to consider, and thus not report that such an array is in fact an array.

Related

Mongo Sort by Count of Matches in Array

Lets say my test data is
db.multiArr.insert({"ID" : "fruit1","Keys" : ["apple", "orange", "banana"]})
db.multiArr.insert({"ID" : "fruit2","Keys" : ["apple", "carrot", "banana"]})
to get individual fruit like carrot i do
db.multiArr.find({'Keys':{$in:['carrot']}})
when i do an or query for orange and banana, i see both the records fruit1 and then fruit2
db.multiArr.find({ $or: [{'Keys':{$in:['carrot']}}, {'Keys':{$in:['banana']}}]})
Result of the output should be fruit2 and then fruit1, because fruit2 has both carrot and banana
To actually answer this first, you need to "calculate" the number of matches to the given condition in order to "sort" the results to return with the preference to the most matches on top.
For this you need the aggregation framework, which is what you use for "calculation" and "manipulation" of data in MongoDB:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$project": {
"ID": 1,
"Keys": 1,
"order": {
"$size": {
"$setIntersection": [ ["carrot", "banana"], "$Keys" ]
}
}
}},
{ "$sort": { "order": -1 } }
])
On an MongoDB older than version 3, then you can do the longer form:
db.multiArr.aggregate([
{ "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
{ "$unwind": "$Keys" },
{ "$group": {
"_id": "$_id",
"ID": { "$first": "$ID" },
"Keys": { "$push": "$Keys" },
"order": {
"$sum": {
{ "$cond": [
{ "$or": [
{ "$eq": [ "$Keys", "carrot" ] },
{ "$eq": [ "$Keys", "banana" ] }
]},
1,
0
]}
}
}
}},
{ "$sort": { "order": -1 } }
])
In either case the function here is to first match the possible documents to the conditions by providing a "list" of arguments with $in. Once the results are obtained you want to "count" the number of matching elements in the array to the "list" of possible values provided.
In the modern form the $setIntersection operator compares the two "lists" returning a new array that only contains the "unique" matching members. Since we want to know how many matches that was, we simply return the $size of that list.
In older versions, you pull apart the document array with $unwind in order to perform operations on it since older versions lacked the newer operators that worked with arrays without alteration. The process then looks at each value individually and if either expression in $or matches the possible values then the $cond ternary returns a value of 1 to the $sum accumulator, otherwise 0. The net result is the same "count of matches" as shown for the modern version.
The final thing is simply to $sort the results based on the "count of matches" that was returned so the most matches is on "top". This is is "descending order" and therefore you supply the -1 to indicate that.
Addendum concerning $in and arrays
You are misunderstanding a couple of things about MongoDB queries for starters. The $in operator is actually intended for a "list" of arguments like this:
{ "Keys": { "$in": [ "carrot", "banana" ] } }
Which is essentially the shorthand way of saying "Match either 'carrot' or 'banana' in the property 'Keys'". And could even be written in long form like this:
{ "$or": [{ "Keys": "carrot" }, { "Keys": "banana" }] }
Which really should lead you to if it were a "singular" match condition, then you simply supply the value to match to the property:
{ "Keys": "carrot" }
So that should cover the misconception that you use $in to match a property that is an array within a document. Rather the "reverse" case is the intended usage where instead you supply a "list of arguments" to match a given property, be that property an array or just a single value.
The MongoDB query engine makes no distinction between a single value or an array of values in an equality or similar operation.

Pull array within array

Similar to Find document with array that contains a specific value, but i'm trying to pull it.
db.getCollection('users').find({'favorites':{$elemMatch:{0:5719}}}, {"favorites.$": 1})
returns this:
{
"_id" : "FfEj5chmviLdqWh52",
"favorites" : [
[
5719,
"2016-03-21T17:46:01.441Z",
"a"
]
]
}
even after this returned 1:
Meteor.users.update(this.userId, {$pull: {'favorites':{$elemMatch:{0:movieid}}}})
It doesn't work because $pull is trying to remove a matching element from the "favorites" array. What you want to do is remove from the "array inside the array" of favorites.
For this you need a positional match to point to the nth inner element, then a very careful $pull expression to actually remove that element:
Meteor.users.update(
{ "favorites": { "$elemMatch": { "$elemMatch": { "$eq": 5719 } } } },
{ "$pull": { "favorites.$": 5719 } }
)
The "double" $elemMatch with the $eq operator is a bit more expressive than { 0: 5719 } since it is not "locked" into the first position only and is actually looking at the matching value. But you can write it that way if you must, or if you "really mean" to match that value in the first position only.
Note that the "index" returned from the match in the positional $ argument is actually that of the "outer" array. So to pull from the
Of course if there is only ever one nested array element within, the you might as well just write:
{ "$pull": { "favorites.0": 5719 } }
Using the direct "first index" position, since you know the inner array will always be there.
In either case, your object updates correctly:
{
"_id" : "FfEj5chmviLdqWh52",
"favorites" : [
[
"2016-03-21T17:46:01.441Z",
"a"
]
]
}
If you are trying to $pull the entire array entry from favorites, then the $eleMatch just needs to be dialed back one element:
Meteor.users.update(
{ "_id": this.userId },
{ "$pull": { "favorites": { "$elemMatch": { "$eq": 5719 } } } }
)
Or even:
Meteor.users.update(
{ "_id": this.userId },
{ "$pull": { "favorites": { "$elemMatch": { "0": 5719 } } } }
)
Noting that:
{ "_id": this.userId },
Is the long form that we generally use as a "query" selector, and especially when we want criteria "other than" the _id of the document. MiniMongo statements require at "least" the _id of the document though.
The rest of the statement has one "less" $elemMatch because the $pull already applies to the array.
That removes the whole matched element from the outer array:
{
"_id" : "FfEj5chmviLdqWh52",
"favorites" : []
}
This is the first code i found that actually works:
Meteor.users.update(Meteor.userId(), {$pull: {favorites: {$in: [i]}}})
Apparently $in does partial matching. It seems safer than the working code from this answer:
Meteor.users.update(
{ "_id": this.userId },
{ "$pull": { "favorites": { "$elemMatch": { "$eq": i } } } }
)

MongoDB: Why $literal required ? And where it can be used?

I have gone through MongoDB $literal in Aggregation framework, but I don't understand where it could be used ? more importantly, why it is required ?
Example from official MongoDB documentation,
db.records.aggregate( [
{ $project: { costsOneDollar: { $eq: [ "$price", { $literal: "$1" } ] } } }
])
Instead of the above example using $literal, why can't I use as below ?
db.records.aggregate( [
{ $project: { costsOneDollar: { $eq: [ "$price", "$1" ] } } }
] )
Also provide some other example which shows the best(or effective) usage of $literal.
For your basic case I think the documentation is fairly self explanatory:
In expression, the dollar sign $ evaluates to a field path; i.e. provides access to the field. For example, the $eq expression $eq: [ "$price", "$1" ] performs an equality check between the value in the field named price and the value in the field named 1 in the document.
So since $ is reserved for evaluation of field path values within the document, then this would be considered to acutally be looking for a "field" named 1 within the document. So the actual comparsion would likely be between the field named "price" and since there is no field named "1" then this would be treated as null and therefore false for every document.
On the other hand where the field "price" actually has a value equal to "$1", then the usage of $literal allows that "value" ( and not the field path reference ) to be considered. Hence "literal".
The operator has actually been around for some time ( since MongoDB 2.2 actually ) but under the guise of $const, which though not doucmented is still the basic operator, and $literal is really just an "alias" for that.
The usage mainly is and always has been to use where an expression is required to have some "specific value" as instructed within the pipeline. Take this simple statement:
{ "$project": { "myField": "one" } }
So for any number of reasons you might want to do that, and basically return a "literal" value in such a statement. But if you tried, it would result in a error as it essentially does not resolve to either a "field path" or a boolean condition for field selection, as is required here. So if you instead use:
{ "$project": { "myField": { "$literal": "one" } } }
Then you have "myField" with a value of "one" just like you asked for.
Other usages are more historic, such as:
{ "$project": { "array": { "$literal": ["A","B","C" ] } } },
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"trans": { "$push": {
"$cond": [
{ "$eq": [ "$array", "A" ] },
"$fieldA",
{ "$cond": [
{ "$eq": [ "$array", "B" ] },
"$fieldB",
"$fieldC"
]}
]
}}
}}
Which might more modernly be replaced with something like:
{ "$project": {
"trans": {
"$map": {
"input": ["A","B","C"],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$fieldA",
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
"$fieldB",
"$fieldC"
]}
]
}
}
}
}}
As a construct to move selected fields into an array based on position, with the difference being that as "array" and a field assignment the $literal is necessary, but as the "input" argument the plain array notation is just fine.
So the general cases are:
Where something reserved such as $ is needed as the value to match
Where there is a specific value to inject as a field assignment, and not as an argument to another operator expression.
The $1 example you give would try and compare the price field with the 1 field. By specifying the $literal operator, you're telling MongoDB that it is the exact string "$1". The same might be true if you wanted to use a MongoDB function name as a field name in your code, or even using a query snippet as a field value.

how to match the last value of array in mongo db? [duplicate]

I have a sample document like shown below
{
"_id" : "docID",
"ARRAY" : [
{
"k" : "value",
"T" : "20:15:35",
"I" : "Hai"
},
{
"K" : "some value",
"T" : "20:16:35",
"I" : "Hello"
},
{
"K" : "some other value",
"T" : "20:15:35",
"I" : "Update"
}
]
}
I am trying to update the last element in the "ARRAY" based on field "ARRAY.T"(which is only field i know at the point of update), but what my problem is first element in the array matches the query and its ARRAY.I field is updated.
Query used to update:
db.collection.update( { _id: "docID","ARRAY.T" : "20:15:35"},
{ $set: { "ARRAY.$.I": "Updated value" }
})
Actually i don't know index of the array where to update so i have to use ARRAY.I in the query, is there any way to to tell Mongodb to update the first element matched the query from last of the array.
I understand what you are saying in that you want to match the last element in this case or in fact process the match in reverse order. There is no way to modify this and the index stored in the positional $ operator will always be the "first" match.
But you can change your approach to this, as the default behavior of $push is to "append" to the end of the array. But MongoDB 2.6 introduced a $position modifier so you can in fact always "pre-pend" to the array meaning your "oldest" item is at the end.
Take this for example:
db.artest.update(
{ "array": { "$in": [5] } },
{ "$push": { "array": { "$each": [5], "$position": 0 } }},
{ "upsert": true }
)
db.artest.update(
{ "array": { "$in": [5] } },
{ "$push": { "array": { "$each": [6], "$position": 0 } }},
{ "upsert": true }
)
This results in a document that is the "reverse" of the normal $push behavior:
{ "_id" : ObjectId("53eaf4517d0dc314962c93f4"), "array" : [ 6, 5 ] }
Alternately you could apply the $sort modifier when updating your documents in order to "order" the elements so they were reversed. But that may not be the best option if duplicate values are stored.
So look into storing your arrays in "reverse" if you intend to match the "newest" items "first". Currently that is your only way of getting your "match from last" behavior.

Mongo. Narroving down results of nested array

If I have a document like this:
{
"name" : "Foo",
"words" :
[
"lorem",
"ipsum",
"dolor",
"sit",
"amet",
...
]
}
Let's say this words array is pretty big. Now I need a query that would fetch that document:
db.docs.find({'name':'Foo'}) - that will get whole document
but what I want, instead of fetching the entire words array (cause it's too big) I would like to retrieve only elements that meet some criteria. Let's say I want to see only words that start with "a" or have a length of at least 3 characters.
You know maybe something like this:
// this won't work!
db.docs.find({
"$where":"(this.words.map(function(e){ if (e.length >=3) { return e } }))"
})
EDIT
You cannot filter array contents using find, You can only match that the array contains the condition. So in order to filter the contents of the array you need to make use of aggregate:
db.docs.aggregate([
// Still makes sense to match the documents that meet the condition
{ "$match": {
"name": "Foo",
"words": { "$regex": "^[A-Za-z0-9_]{4,}" }
}},
// Unwind the array to "de-normalize"
{ "$unwind": "$words" },
// Actually "filter" the array elements
{ "$match": { "words": { "$regex": "^[A-Za-z0-9_]{4,}" } } },
// Group back the document with the "filtered" array
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"words": { "$push": "$words" }
}}
])
That makes use a regular expression condition that will match at least 4 characters from the start of the string. The ^ anchor is quite important here as it allows an index to be used which is much more optimal than whatever else you can do.
The result returned will look like this:
{
"result" : [
{
"_id" : ObjectId("5341f0476cbcc02b995092ac"),
"name" : "Foo",
"words" : [
"lorem",
"ipsum",
"dolor"
]
}
],
"ok" : 1
}
You can also throw a lot of arbitrary JavaScript at mapReduce and test the length of elements in the array, but that will take considerably longer to execute.
--
The terms are quite simple, you simply add the additional operator to the query document as so:
db.docs.find({ "name": "Foo", "$where": "(this.words.length > 3)" })
You really should not be using the $where operator unless absolutely necessary, and even then you really should think about what you are doing. Heed the warnings that are given in that document.
As stated in the manual page for $size, probably the best way to deal with detecting array length for a given range (rather than exact) is to create a "counter" field in your document that is updated as elements are added/removed from the array. This makes a very simple and efficient query:
db.docs.find({ "name": "Foo", "counter": { "$gt": 3 } })
Of course from MongoDB versions 2.6 and upwards you can also do this:
db.docs.aggregate([
{ "$project": {
"name": 1,
"words": 1,
"count": { "$size": "$words" }
}},
{ "$match": {
"count": { "$gt": 3 }
}}
])
Either of those forms is going to perform a lot better than using something that is going to remove the use of an index and then invoke the JavaScript interpreter over each resulting document. Or even just use the $size operator for an exact size of the array.