MongoDB query for null using dot syntax - mongodb

Im having trouble querying mongodb for null values using the dot syntax of mongo.
Some things in a db:
db.things.insertMany([
{ a: [{ value: 1 }] },
{ a: [{ value: null }] },
{ a: [{ value: 2 }] }
]);
I want to find all of the documents which have the first element in the 'a' array having a null value.
Queries:
db.getCollection('things').count({ "a.0.value": 1 }) => 1 (as expected)
db.getCollection('things').count({ "a.0.value": null }) => 3 (I would expect 1 here also)
I'm at a bit of a loss as to why this is returning all the elements for the second query. It only seems to have this behaviour for array indexed results, which also makes it kind of weird. (eg db.getCollection('things').count({ "a": null }) => 0 as expected)
The only thing I can think of is that its basically cancelling out the whole statement when it has the value null but I don't know how to get around this.
MongoDB v3.4.10

You can use $expr to use aggregation operator and then find the first index using $arrayElemAt which is $equal to null
db.collection.find({ "$expr": { "$eq": [{ "$arrayElemAt": ["$a.value", 0] }, null] } })
MongoPlayground
For the mongo version prior to 3.6
db.collection.aggregate([
{ "$addFields": {
"match": { "$arrayElemAt": ["$a.value", 0] }
}},
{ "$match": { "match": null }}
])
MongoPlayground
If you even want to check with .dot syntax then you have to use $type operator to compare with the null values
db.collection.find({ "a.0.value": { "$type": 10 } })
MongoPlayground

Related

MongoDB - How to match a single property value

Suppose a collection contains the following 3 documents:
[
{ "_id": 1, "prop": 1 },
{ "_id": 2, "prop": 4 },
{ "_id": 3, "prop": [1, 2, 3] }
]
The query { $match: { prop: 1 } } returns 2 documents, namely 1 and 3. I would have expected it to only return 1.
Is this behaviour documented somewhere or is it a bug?
How could one formulate the query to mean strict equality (as opposed to equality or array-contains)?
I think that MongoDB will always try to match against both scalars and arrays, unless you explicitly rule out the latter:
{ $match : { prop : { $eq : 1, $not: { $type : 'array' } } } }
It doesn't seem to be explicitly documented, but it's implied in the documentation because the syntax for querying scalars for a particular value is the same as the syntax for querying arrays.
I believe the query returns the document with _id: 3 is due to Query an Array for an Element.
The document with _id: 3 will be fulfilled as there is an element matched in the array.
To force strict equality match, I would suggest to provide the aggregation operator in your query, which will include the checking of type.
db.collection.aggregate([
{
$match: {
$expr: {
$eq: [
"$prop",
1
]
}
}
}
])

Ignoring NULL values within an aggregate operation in MongoDB

I have the following MongoDB aggregate operation which is working fine but it also seems to be returning NULL values.
How can I ignore NULL values against projectIP field?
db.inventory.aggregate(
[
{ $match: {projectIP: { $exists:true }}},
{ $project: {projectIP: "$projectIP",_id : 0}},
{ $group: {_id: "$projectIP"}},
{ $sort: {projectIP: 1}}
];
)
Seems some of the keys contain null values. Add this as well
{ $match: { projectIP: { $exists:true, $ne: null }}}
by replacing the first stage in your query
You can assign a value (0 or anything) to them instead of a null value.
Here how you do it
projectIP: { $ifNull: [ "$projectIP", 0.0 ] }

Find empty documents in a database

I have queried an API which is quiet inconsistent and therefore does not return objects for all numerical indexes (but most of them). To further go on with .count() on the numerical index I've been inserting empty documents with db.collection.insert({})
My question now is: how would I find and count these objects?
Something like db.collection.count({}) won't work obviously.
Thanks for any idea!
Use the $where operator. The Javascript expression returns only documents containing a single key. (that single key being the documents "_id" key)
db.collection.find({ "$where": "return Object.keys(this).length == 1" }).count()
For MongoDB 3.4.4 and newer, consider running the following aggregate pipeline which uses $objectToArray (which is available from MongoDB 3.4.4 and newer versions) to get the count of those empty documents/null fields:
db.collection.aggregate([
{ "$project": {
"hashmaps": { "$objectToArray": "$$ROOT" }
} },
{ "$project": {
"keys": "$hashmaps.k"
} },
{ "$group": {
"_id": null,
"count": { "$sum": {
"$cond": [
{
"$eq":[
{
"$ifNull": [
{ "$arrayElemAt": ["$keys", 1] },
0
]
},
0
]
},
1,
0
]
} }
} }
]);

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.

Check last element in array matches a condition

I have an array of numbers in my mongodb documents and need to check if the last number in that array meets my conditions.
My documents are stored like this:
{
name: String,
data: {
dates: Array,
numbers: Array
}
}
and I need to check if the last number in numbers "lies between" two other numbers.
Any suggestions on how to do this would be appreciated.
Right now the most effficient way you have of doing this is using the JavaScript evaluation of $where as you can simply find the value of the last array element and test it programatically.
With sample documents:
{ "a": [1,2,3] },
{ "a": [1,2,4] },
{ "a": [1,2,5] }
And to query:
db.collection.find(function() { var a = this.a.pop(); return ( a > 2 ) & ( a < 5 ) })
Or simply presented with $where as a string for evaluation:
Model.find(
{
"$where": "var a = this.a.pop(); return ( a > 2 ) && ( a < 5 )"
},
function(err,results) {
// handling here
}
);
Which is a really simple way to do this and does not have "overhead" such as $unwind in the aggregation framework created to to "denormalize" and process arrays. Not really efficient there.
In the "future" however, it will be. As is currently available in development releases, there is a $slice operator for the aggregation framework. This operator will allow easy access to the "last" array element for testing.
Since the aggregation framework operators are in "native code" aand not JavaScript to be interpreted, then a single pipeline stage then becomes more efficient than the JavaScript form. Though this listing to do this looks longer in submission:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$anyElementTrue": {
"$map": {
"input": { "$slice": ["$a",-1] },
"as": "el",
"in":{
"$and": [
{ "$gt": [ "$$el", 2 ] },
{ "$lt": [ "$$el", 5 ] }
]
}
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The $redact operator that already exists is used to "logically filter" with a comparison expression here. Based on the true/false match conditions it either "keeps" or "prunes" the document from the results repectively.
The $slice operator itself in it's aggregagtion framework form will still untimately return an array, albeit a single element array in this case. This is why $map is used to "transform" each element into a true/false condition and the $anyElementTrue operator reduces the "array" to a singular reponse as is repected by $cond.
So when that is released, then it will be be most efficient way to do this. But until then, stick with the JavaScript as it is presently the fastest way to to this evaluation.
Both query forms return just the first two documents of the sample here:
{ "a": [1,2,3] },
{ "a": [1,2,4] }
MongoDB aggregate may be a feasible way. Assuming name field in your document is unique.
If you have the sample document.
{
name: "allen",
data: {
dates: ["2015-08-08"],
numbers: [20, 21, 22, 23]
}
}
The following code is used to do the check. As the db.collection.aggregate() method returns a cursor and then we can use cursor's hasNext to decide whether the last number lies between the given two numbers.
var result = db.last_one.aggregate(
[
{
// deconstruct the array field numbers
$unwind: "$data.numbers"
},
{
$group: {
_id: "$name",
// lastNumber is 23 in this case
lastNumber: { $last: "$data.numbers" }
}
},
{
$match: {
lastNumber: { $gt: num1, $lt: num2 }
}
}
]
).hasNext()
if (result) print("matched"); else print("not matched")
For example, if num1 is 22, num2 is 24, the result is matched; if num1 is 21, num2 is 22, the result is not matched.
But actually, group on name is not a good idea. It's much better if your document has an unique ObjectId then we can group on that _id.