Counting entries of subdocument in MongoDB documents - mongodb

I have a document structure like so
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : {
"r152f47f1daf-0-2" : "c",
"r152f48413c1-0-2" : "c",
"r152f4851bf7-0-1" : "c"
}
}
My task is to find all documents with the following conditions:
The "_id" needs to start with "5:"
The number of revisions need to be exclusively greater then 3
The first part is easy, I have solved it with
db.nodes.find( {'_id': /^5:/} )
But I am struggling with the second part, am supposed to use $gt.
Since I am new to MongoDB, I was first looking at $size, but _revisions is not an array, it is a subdocument, right?.
Was also looking at $unwind and then counting the results, but that does not make sense either, since my result need to be the documents that match the above two conditions.
Any pointers highly appreciated.

Using the $where operator.
db.nodes.find(function() {
return (/^5:/.test(this._id) && Object.keys(this._revisions).length > 3 );
})
The problem with this as mentioned in the documentation is that:
$where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in).
You should definitely consider to change the _revisions field to an array of sub-documents like this:
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : [
{
"rev": "r152f47f1daf-0-2",
"value": "c"
},
{
"rev": "r152f48413c1-0-2",
"value": "c"
},
{
"rev": "r152f4851bf7-0-1",
"value": "c"
}
]
}
And use the $exists operator.
db.nodes.find({ "_id": /^5:/, "_revisions.3": { "$exists": true } } )

Related

pymongo db query with multiple conditions- $and $exists

An example document looks like this
{
"_id":ObjectId("562e7c594c12942f08fe4192"),
"Type": "f",
"runTime": ISODate("2016-12-21T13:34:00.000+0000"),
"data" : {
"PRICES SPOT" : [
{
"value" : 29.64,
"timeStamp" : ISODate("2016-12-21T23:00:00.000+0000")
},
{
"value" : 29.24,
"timeStamp" : ISODate("2016-12-22T00:00:00.000+0000")
},
{
"value" : 29.81,
"timeStamp" : ISODate("2016-12-22T01:00:00.000+0000")
},
{
"value" : 30.2,
"timeStamp" : ISODate("2016-12-22T02:00:00.000+0000")
},
{
"value" : 29.55,
"timeStamp" : ISODate("2016-12-22T03:00:00.000+0000")
}
]
}
}
My MongoDb has different Type of documents, I'd like to get a cursor for all of the documents that are from a time range that are of type: "f" but that actually exist. There are some documents in the database that broke the code I had previously(which did not check if PRICES SPOT existed).
I saw that I can use $and and $exists from the documentation. However, I am having trouble setting it up because of the range, and the nesting. I am using pyMongo as my python driver and also noticed here that I have to wrap the $and and $exists in quotes.
My code
def grab_forecast_cursor(self, model_dt_from, model_dt_till):
# create cursor with all items that actually exist
cursor = self._collection.find(
{
"$and":[
{'Type': 'f', 'runTime': {"$gte": model_dt_from, "$lte": model_dt_till}
['data']['PRICES SPOT': "$exists": true]}
]})
return cursor
This results in a Key Error it cannot find data. A sample document that has no PRICE SPOT looks exactly like the one I posted in the beginning, just without that respectively.
In short.. Can someone help me set up a query in which I can grab a cursor with all the documents of a certain type but that actually have respected contents nested in.
Update
I added a comma after the model_dt_till and have now a syntax error.
def grab_forecast_cursor(self, model_dt_from, model_dt_till):
# create cursor with all items that actually exist
cursor = self._collection.find(
{
"$and":[
{'Type': 'f', 'runTime': {"$gte": model_dt_from, "$lte": model_dt_till},
['data']['PRICES SPOT': "$exists": true]}
]})
return cursor
You're trying to use Python syntax to denote the path to a data structure, but the "database" want's it's syntax for the "key" using "dot notation":
cursor = self._collection.find({
"Type": "f",
"runTime": { "$gte": model_dt_from, "$lte": model_dt_till },
"data.PRICES SPOT.0": { "$exists": True }
})
You also don't need to write $and like that as ALL MongoDB query conditions are already AND expressions, and part of your statement was actually doing that anyway, so make it consistent.
Also the check for a "non-empty" array is 'data.PRICES SPOT.0' with the added bonus that not only do you know it "exists", but also that it has at least one item to process within it
Python and JavaScript are almost identical in terms of object/dict construction, so you really should be able to just follow the general documentation and the many samples here that are predominantly JavaScript.
I personally even try to notate answers here with valid JSON, so it could be picked up and "parsed" by users of any language. But here, python is just identical to what you could enter into the mongo shell. Except for True of course.
See "Dot Notation" for an overview of the syntax with more information at Query on Embedded / Nested Documents

Mongodb weird behaviour of $exists

I don't understand the behaviour of the command $exists.
I have two simple documents in the collection 'user':
/* 1 */
{
"_id" : ObjectId("59788c2f6be212c210c73233"),
"user" : "google"
}
/* 2 */
{
"_id" : ObjectId("597899a80915995e50528a99"),
"user" : "werty",
"extra" : "very important"
}
I want to retrieve documents which contain the field "extra" and the value is not equal to 'unimportant':
The query:
db.getCollection('users').find(
{"extra":{$exists:true},"extra": {$ne:"unimportant"}}
)
returns both two documents.
Also the query
db.getCollection('users').find(
{"extra":{$exists:false},"extra": {$ne:"unimportant"}}
)
returns both two documents.
It seems that $exists (when used with another condition on the same field) works like an 'OR'.
What I'm doing wrong? Any help appreciated.
I used mongodb 3.2.6 and 3.4.9
I have seen Mongo $exists query does not return correct documents
but i haven't sparse indexes.
Per MongoDB documentation (https://docs.mongodb.com/manual/reference/operator/query/and/):
Using an explicit AND with the $and operator is necessary when the same field or operator has to be specified in multiple expressions.
Therefore, and in order to enforce the cumpliment of both clauses, you should use the $and operator like follows:
db.getCollection('users').find({ $and : [ { "extra": { $exists : true } }, { "extra" : { $ne : "unimportant" } } ] });
The way you constructed your query is wrong, nothing to do with how $exists works. Because you are checking two conditions, you would need a query that does a logical AND operation to satisfy the two conditions.
The correct syntax for the query
I want to retrieve documents which contain the field "extra" and the
value is not equal to 'unimportant'
should follow:
db.getCollection('users').find(
{
"extra": {
"$exists": true,
"$ne": "unimportant"
}
}
)
or using the $and operator as:
db.getCollection('users').find(
{
"$and": [
{ "extra": { "$exists": true } },
{ "extra": { "$ne": "unimportant" } }
]
}
)

Is there a way to prevent mongo queries "branching" on arrays?

If I have the following documents:
{a: {x:1}} // without array
{a: [{x:1}]} // with array
Is there a way to query for {'a.x':1} that will return the first one but not the second one? IE, I want the document where a.x is 1, and a is not an array.
Please note that future version of MongoDB would incorporate the $isArray aggregation expression. In the meantime...
...the following code will do the trick as the $elemMatch operator matches only documents having an array field:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}})
Given that dataset:
> db.test.find({},{_id:0})
{ "a" : { "x" : 1 } }
{ "a" : [ { "x" : 1 } ] }
{ "a" : [ { "x" : 0 }, { "x" : 1 } ]
It will return:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}}, {_id:0})
{ "a" : { "x" : 1 } }
Please note this should be considered as a short term solution. The MongoDB team took great cares to ensure that [{x:1}] and {x:1} behave the same (see dot-notation or $type for arrays). So you should consider that at some point in the future, $elemMatch might be updated (see JIRA issue SERVER-6050). In the meantime, maybe worth considering fixing your data model so it would no longer be necessary to distinguish between an array containing one subdocument and a bare subdocument.
You can do this by adding a second term that ensures a has no elements. That second term will always be true when a is a plain subdoc, and always false when a is an array (as otherwise the first term wouldn't have matched).
db.test.find({'a.x': 1, 'a.0': {$exists: false}})

MongoDb: querying against a collection's own fields

I've done some research and it seems that it's possible to query (i.e. compare) two fields in the same collection using the aggregation framework. It's also possible with the $where operator but I want to avoid a low performance Javascript solution.
Here's an example document:
{
"_id" : ObjectId("541ba14d2208236d06ff1e57"),
"a" : "foo",
"d" : {
"e" : "foo"
}
}
{
"_id" : ObjectId("541ba14d2208236d06ff1e58"),
"a" : "foo",
"d" : {
"e" : "bar"
}
}
I'd like to pick the documents where 'a' != 'd.e'. I've attempted the following without success:
db.test.aggregate([{$match: {$ne: ['$a', '$d.e']}}]);
As you said the query can be done with JavaScript by issuing a $where condition in your query:
db.test.find(function() { return this.a != this.d.e } )
Which is the short form of the query.
While you can do other manipulation in the aggregation framework, it does not change the basic nature of the query in that you cannot place a query condition that compares the values of two fields. This is why $match alone cannot do this because it follows the same rules.
What you "can" do is $project another field value that matches the same logical conditions that you want to enforce. Depending on your actual implementation this may or may not be better for performance:
db.test.aggregate([
{ "$project": {
"a": 1,
"d": 1,
"notEqual": { "$ne": [ "$a", "$d.e" ] }
}},
{ "$match": { "notEqual": true } }
])
That probably is not going to make a lot of sense on it's own unless some other filtering is done in the overall process though. But the general comparison is done with a comparison operator to return a true/false result that can then be filtered.
So the best thing to do if you can is to actually maintain the result of this in a similar way by a field that is present on your document. Then you have a basic query condition to look for that value rather than the comparison. This is if you need to regularly do these kinds of checks.
But for "ad-hoc" purposes, you either stick with the JavaScript evaluation or use the "projection" form in aggregation queries ( where you cannot use a $where clause ) in order to do the field level comparison.

Mongodb query with fields in the same documents

I have the following json:
{
"a1": {"a": "b"},
"a2": {"a": "c"}
}
How can I request all documents where a1 and a2 are not equal in the same document?
You could use $where:
db.myCollection.find( { $where: "this.a1.a != this.a2.a" } )
However, be aware that this won't be very fast, because it will have to spin up the java script engine and iterate each and every document and check the condition for each.
If you need to do this query for large collections, or very often, it's best to introduce a denormalized flag, like areEqual. Still, such low-selectivity fields don't yield good index performance, because he candidate set is still large.
update
using the new $expr operator available as of mongo 3.6 you can use aggregate expressions in find query like this:
db.myCollection.find({$expr: {$ne: ["$a1.a", "$a2.a"] } });
Although this comment solves the problem, I think a better match for this use case would be to use $addFields operator available as of version 3.4 instead of $project.
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$addFields": {
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
]);
To avoid JavaScript use the aggregation framework:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aCmp": {"$cmp":["$a1.a","$a2.a"]}
}
},
{"$match":{"aCmp":0}}
])
On our development server the equivalent JavaScript query takes 7x longer to complete.
Update (10 May 2017)
I just realized my answer didn't answer the question, which wanted values that are not equal (sometimes I'm really slow). This will work for that:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
])
$ne could be used in place of $eq if the match condition was changed to true but I find using $eq with false to be more intuitive.
MongoDB uses Javascript in the background, so
{"a": "b"} == {"a": "b"}
would be false.
So to compare each you would have to a1.a == a2.a
To do this in MongoDB you would use the $where operator
db.myCollection.find({$where: "this.a1.a != this.a2.a"});
This assumes that each embedded document will have a property "a". If that isn't the case things get more complicated.
Starting in Mongo 4.4, for those that want to compare sub-documents and not only primitive values (since {"a": "b"} == {"a": "b"} is false), we can use the new $function aggregation operator that allows applying a custom javascript function:
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 1, "y" : 2 } }
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
db.collection.aggregate(
{ $match:
{ $expr:
{ $function: {
body: function(a1, a2) { return JSON.stringify(a1) != JSON.stringify(a2); },
args: ["$a1", "$a2"],
lang: "js"
}}
}
}
)
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
$function takes 3 parameters:
body, which is the function to apply, whose parameter are the two fields to compare.
args, which contains the fields from the record that the body function takes as parameter. In our case, both "$a1" and "$a2".
lang, which is the language in which the body function is written. Only js is currently available.
Thanks all for solving my problem -- concerning the answers that use aggregate(), one thing that confused me at first is that $eq (or $in, or lots of other operators) has different meaning depending on where it is used. In a find(), or the $match phase of aggregation, $eq takes a single value, and selects matching documents:
db.items.aggregate([{$match: {_id: {$eq: ObjectId("5be5feb45da16064c88e23d4")}}}])
However, in the $project phase of aggregation, $eq takes an Array of 2 expressions, and makes a new field with value true or false:
db.items.aggregate([{$project: {new_field: {$eq: ["$_id", "$foreignID"]}}}])
In passing, here's the query I used in my project to find all items whose list of linked items (due to a bug) linked to themselves:
db.items.aggregate([{$project: {idIn: {$in: ["$_id","$header.links"]}, "header.links": 1}}, {$match: {idIn: true}}])