pymongo db query with multiple conditions- $and $exists - mongodb

An example document looks like this
{
"_id":ObjectId("562e7c594c12942f08fe4192"),
"Type": "f",
"runTime": ISODate("2016-12-21T13:34:00.000+0000"),
"data" : {
"PRICES SPOT" : [
{
"value" : 29.64,
"timeStamp" : ISODate("2016-12-21T23:00:00.000+0000")
},
{
"value" : 29.24,
"timeStamp" : ISODate("2016-12-22T00:00:00.000+0000")
},
{
"value" : 29.81,
"timeStamp" : ISODate("2016-12-22T01:00:00.000+0000")
},
{
"value" : 30.2,
"timeStamp" : ISODate("2016-12-22T02:00:00.000+0000")
},
{
"value" : 29.55,
"timeStamp" : ISODate("2016-12-22T03:00:00.000+0000")
}
]
}
}
My MongoDb has different Type of documents, I'd like to get a cursor for all of the documents that are from a time range that are of type: "f" but that actually exist. There are some documents in the database that broke the code I had previously(which did not check if PRICES SPOT existed).
I saw that I can use $and and $exists from the documentation. However, I am having trouble setting it up because of the range, and the nesting. I am using pyMongo as my python driver and also noticed here that I have to wrap the $and and $exists in quotes.
My code
def grab_forecast_cursor(self, model_dt_from, model_dt_till):
# create cursor with all items that actually exist
cursor = self._collection.find(
{
"$and":[
{'Type': 'f', 'runTime': {"$gte": model_dt_from, "$lte": model_dt_till}
['data']['PRICES SPOT': "$exists": true]}
]})
return cursor
This results in a Key Error it cannot find data. A sample document that has no PRICE SPOT looks exactly like the one I posted in the beginning, just without that respectively.
In short.. Can someone help me set up a query in which I can grab a cursor with all the documents of a certain type but that actually have respected contents nested in.
Update
I added a comma after the model_dt_till and have now a syntax error.
def grab_forecast_cursor(self, model_dt_from, model_dt_till):
# create cursor with all items that actually exist
cursor = self._collection.find(
{
"$and":[
{'Type': 'f', 'runTime': {"$gte": model_dt_from, "$lte": model_dt_till},
['data']['PRICES SPOT': "$exists": true]}
]})
return cursor

You're trying to use Python syntax to denote the path to a data structure, but the "database" want's it's syntax for the "key" using "dot notation":
cursor = self._collection.find({
"Type": "f",
"runTime": { "$gte": model_dt_from, "$lte": model_dt_till },
"data.PRICES SPOT.0": { "$exists": True }
})
You also don't need to write $and like that as ALL MongoDB query conditions are already AND expressions, and part of your statement was actually doing that anyway, so make it consistent.
Also the check for a "non-empty" array is 'data.PRICES SPOT.0' with the added bonus that not only do you know it "exists", but also that it has at least one item to process within it
Python and JavaScript are almost identical in terms of object/dict construction, so you really should be able to just follow the general documentation and the many samples here that are predominantly JavaScript.
I personally even try to notate answers here with valid JSON, so it could be picked up and "parsed" by users of any language. But here, python is just identical to what you could enter into the mongo shell. Except for True of course.
See "Dot Notation" for an overview of the syntax with more information at Query on Embedded / Nested Documents

Related

Mongo pull multiple elements inside an array of object

I'm trying to pull one or multiple objects from an array and I noticed something odd.
Let's take the following document.
{
"_id" : UUID("f7e80c8e-6b4a-4741-95a3-2567cccf9e5f"),
"createdAt" : ISODate("2021-07-19T17:07:28.499Z"),
"description" : null,
"externalLinks" : [
{
"id" : "ZV8xMjM0NQ==",
"type" : "event"
},
{
"id" : "cF8xMjM0NQ==",
"type" : "planning"
}
],
"updatedAt" : ISODate("2021-07-19T17:07:28.499Z")
}
I wrote a basic query to pull one element of externalLinks which looks like
db.getCollection('Collection').update(
{
_id: {
$in: [UUID("f7e80c8e-6b4a-4741-95a3-2567cccf9e5f")]
}
}, {
$pull: {
externalLinks: {
"type": "planning",
"id": "cF8xMjM0NQ=="
}
}
})
And it's working fine. But it's getting trickier when I want to pull multiple element from the externalLinks. And I'm using the operator $in for that.
And the strange behaviour is here :
db.getCollection('Collection').update(
{
_id: {
$in: [UUID("f7e80c8e-6b4a-4741-95a3-2567cccf9e5f")]
}
}, {
$pull: {
externalLinks: {
$in: [{
"type": "planning",
"id": "cF8xMjM0NQ=="
}]
}
}
})
And this query doesn't work. The solution is to switch both field from externalLinks
and do something like :
$in: [{
"id": "cF8xMjM0NQ==",
"type": "planning"
}]
I tried multiple things like : $elemMatch, $positioning but it should be possible to pull multiple externalLinks.
I also tried the $and operator without success.
I could easily iterate over the externalLinks to update but it'd be too easy.
And it's tickling my brain to choose that solution.
Any help would be appreciate, thank you !
Document fields have order, and MongoDB compares documents based on the order of the fields see here, so what field you put first matters.
After MongoDB 4.2 we can also do pipeline updates, that can be sometimes bigger, but they are much more powerful, and feels more like programming.
(less declarative and pattern matching)
This doesn't mean that you need pipeline update in your case but check this way also.
Query
pipeline update
filter and keep members that the condition doesn't exist
Test code here
db.collection.update(
{_id: {$in: ["f7e80c8e-6b4a-4741-95a3-2567cccf9e5f"]}},
[{"$set":
{"externalLinks":
{"$filter":
{"input": "$externalLinks",
"cond":
{"$not":
[{"$and":
[{"$eq": ["$$this.id", "ZV8xMjM0NQ=="]},
{"$eq": ["$$this.type", "event"]}]}]}}}}}])

Need for using $and

In MongoDB, I have this following code:
db.products.find({name: "Postcard", status: "Available"})
But isn't that the same as using $and? If not, what is the difference?
Another example...
Where the status equals "Available" and either qty is less than ($gt) 100 or name starts with the characters "Po":
db.products.find( {status:"Available", $or:[{qty:{$gt:100 }},{item:/^Po/}]})
So seems as if there is no need of using $and in these two examples. So why or when would $and be used?
In both your examples it is superfluous to use $and because using ',' to specify match conditions on several different fields accomplishes it just the same.
One instance when to use them is if you need to specify multiple conditions on the same field. Here is an example (straight from mongodb tutorial videos).
db.movieDetails.find({"$and": [{"metacritic": {"$ne": "null"}},
"metacritic": {"$exists": "true"}]})
The explanation provided was that the keys in a JSON document must be unique. So if the above query were to be specified without $and, only the last "metacritic" value would be apparently be used.
Mongodb documentation specifies another example listed with a similar explanation. Notice $or operator being specified twice.
db.inventory.find( {
$and : [
{ $or : [ { price : 0.99 }, { price : 1.99 } ] },
{ $or : [ { sale : true }, { qty : { $lt : 20 } } ] }
]
} )

Converting older Mongo database references to DBRefs

I'm in the process of updating some legacy software that is still running on Mongo 2.4. The first step is to upgrade to the latest 2.6 then go from there.
Running the db.upgradeCheckAllDBs(); gives us the DollarPrefixedFieldName: $id is not valid for storage. errors and indeed we have some older records with legacy $id, $ref fields. We have a number of collections that look something like this:
{
"_id" : "1",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
},
{
"_id" : "2",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "3",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "4",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
}
I want to script this to convert the older {"$id" : "42", "$ref" : "someRef"} objects to DBRef("someRef", "42") objects but leave the existing DBRef objects untouched. Unfortunately, I haven't been able to differentiate between the two types of objects.
Using typeof and $type simply say they are objects.
Both have $id and $ref fields.
In our groovy console when you pull one of the old ones back and one of the new ones getClass() returns DBRef for both.
We have about 80k records with this legacy format out of millions of total records. I'd hate to have to brute force it and modify every record whether it needs it or not.
This script will do what I need it to do but the find() will basically return all the records in the collection.
var cursor = db.someCollection.find({"someRef.$id" : {$exists: true}});
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update({"_id": rec._id}, {$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}});
}
Is there another way that I am missing that can be used to find only the offending records?
Update
As described in the accepted answer the order matters which made all the difference. The script we went with that corrected our data:
var cursor = db.someCollection.find(
{
$where: "function() { return this.someRef != null &&
Object.keys(this.someRef)[0] == '$id'; }"
}
);
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update(
{"_id": rec._id},
{$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}}
);
}
We did have a collection with a larger number of records that needed to be corrected where the connection timed out. We just ran the script again and it got through the remaining records.
There's probably a better way to do this. I would be interested in hearing about a better approach. For now, this problem is solved.
DBRef is a client side thing. http://docs.mongodb.org/manual/reference/database-references/#dbrefs says it pretty clear:
The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
The drivers benefit from the fact that order of fields in BSON is consistent to recognise DBRef, so you can do the same:
db.someCollection.find({ $expr: {
$let: {
vars: {firstKey: { $arrayElemAt: [ { $objectToArray: "$someRef" }, 0] } },
in: { $eq: [{ $substr: [ "$$firstKey.k", 1, 2 ] } , "id"]}
}
} } )
will return objects where order of the fields doesn't match driver's expectation.

Counting entries of subdocument in MongoDB documents

I have a document structure like so
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : {
"r152f47f1daf-0-2" : "c",
"r152f48413c1-0-2" : "c",
"r152f4851bf7-0-1" : "c"
}
}
My task is to find all documents with the following conditions:
The "_id" needs to start with "5:"
The number of revisions need to be exclusively greater then 3
The first part is easy, I have solved it with
db.nodes.find( {'_id': /^5:/} )
But I am struggling with the second part, am supposed to use $gt.
Since I am new to MongoDB, I was first looking at $size, but _revisions is not an array, it is a subdocument, right?.
Was also looking at $unwind and then counting the results, but that does not make sense either, since my result need to be the documents that match the above two conditions.
Any pointers highly appreciated.
Using the $where operator.
db.nodes.find(function() {
return (/^5:/.test(this._id) && Object.keys(this._revisions).length > 3 );
})
The problem with this as mentioned in the documentation is that:
$where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in).
You should definitely consider to change the _revisions field to an array of sub-documents like this:
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : [
{
"rev": "r152f47f1daf-0-2",
"value": "c"
},
{
"rev": "r152f48413c1-0-2",
"value": "c"
},
{
"rev": "r152f4851bf7-0-1",
"value": "c"
}
]
}
And use the $exists operator.
db.nodes.find({ "_id": /^5:/, "_revisions.3": { "$exists": true } } )

How to access embedded documents in MongoTemplate when the key is an empty string?

{
"_id" : ObjectId("550add7ee0b4b54a3e7ad53c"),
"day" : "14-03-2015",
"node" : "2G",
"nodeName" : "BLR_SGSN",
"" : {
"A" : 905.84,
"B" : 261.34,
"C" : 2103.94,
"D" : 39.67
}
}
I have this as my data in mongo.
How do I get values of A,B,C,D. ??
You cannot query on this as the sub-document fields cannot be selected.
This can only be a result of a programming error doing something like this ( and probably trying to compute a key name in the process ):
db.collection.insert({
"": {
"A": 1,
"B": 2,
"C": 3
}
})
So you cannot get to sub-elements by standard query ways like:
db.collection.find({ ".A": 905.84 })
You can fix this by updating the documents in the collection affected in this way by giving them a proper key name. But it is of course this is an iterative process. Not sure how to fix this other than with JavaScript from the shell due to the naming problem but:
db.collection.find({ "": { "$exists": true } }).forEach(function(doc) {
if ( doc.hasOwnProperty("") ) {
doc.newprop = doc[""];
delete doc[""];
db.collection.update({ "_id": doc._id }, doc );
}
})
Then at least you can access things by the new "newprop" key ( or whatever you call it ):
db.collection.find({ "newprop.A": 905.84 })
And the same sort of thing will work in other drivers.
My advice here is "go and fix this" and find out the code that caused this key name to be blank in the first place.
There should be a bug report submitted to the MongoDB core project as none of the dirvers handle this well. I thought I could even use $rename here, but you can't.
So blank "" keys are a problem that needs to be fixed.