Why is one of (array) field/value pairs in MongoDB's find_and_modify() filter parameter being ignored? - mongodb

I am using pymongo to call MongoDB 3.2 (running on Azure Cosmos DB). My app needs to, in an atomic step, search for an array element in a document and if found, modify 1-2 of its fields.
For the filter argument, I am specifying 3 conditions (see fd in code). The first call seems to work OK (it modifies the first element in the array which happens to match fd), but on the second call, it modifies the second array element that doesn't match the node_id value specified in the filter. My code:
fd = {'_id': 'job7569', 'active_runs.node_id': 'node0', 'active_runs.status': 'unstarted'}
ud = {'active_runs.$.run_name': 'run8100.1', 'active_runs.$.status': 'started'}
result = self.mongo.mongo_db["__jobs__"].find_and_modify(fd, update={"$set": ud}, fields={"active_runs.$": 1}, new=True)
active_runs after 2nd call:
[ {'node_id': 'node0', 'run_index': 0, 'run_name': 'run9999.43', 'status': 'started'},
{'node_id': 'node1', 'run_index': 1, 'run_name': 'run9999.44', 'status': 'started'},
{'node_id': 'node2', 'run_index': 2, 'run_name': None, 'status': 'unstarted'},
...]

Related

PyMongo gives error when using dot notation in field name for sort method

I am trying to get the maximum value of a field inside a collection. The field's value is an array and I actually need to get the maximum of the first index of the array. For example, the collection is similar to this:
[
{
...,
"<field>": [10, 20],
...
},
{
...,
"<field>": [13, 23],
...
},
{
...,
"<field>": [19, 31],
...
}
]
So from the above document, I would need to get the maximum of the first index of array. In this case, it would be 19.
To do this, I am first sorting the field by the first index of the field array and then getting the first document (using limit). I am able to do this using Node.js but cannot get it working with PyMongo.
It works using the Node.js MongoDB API like:
const max = (
await collection
.find()
.sort({ "<field>.0": -1 })
.limit(1)
.toArray()
)[0];
However, if I try to do a similar thing using PyMongo:
max = list(collection.find().sort("<field>.0", -1).limit(1))[0]
I get the error:
KeyError: '<field>.0'
I am using PyMongo version 3.12.0. How can I resolve this?
In PyMongo, the sort option is a list of tuples, where the tuples accept two arguments: key name and sort-order.
And you can pass multiple tuples to this list since MongoDB supports sort by multiple key conditions.
col.find({}).sort([('<key1>', <sort-order>), ('<key2>', <sort-order>)])
In your scenario, you should replace your find command as follows:
max = list(collection.find().sort([("<field>.0", -1)]).limit(1))[0]

MongoDB, retrieve specific field in array of objects

In my collection I have an array of objects. I'd like to share only a subset of those objects, but I can't find out how to do this?
Here are a few things I tried:
db.collections.find({},
{ fields: {
'myField': 1, // works
'myArray': 1, // works
'myArray.$': 1, // doesn't work
'myArray.$.myNestedField': 1, // doesn't work
'myArray.0.myNestedField': 1, // doesn't work
}
};
myArray.myNestedField':1 for projecting nested fields from the array.
I'll briefly explain all the variants you have.
'myField': 1 -- Projecting a field value
'myArray': 1 -- Projecting a array as a whole - (Can be scalar, embedded and sub document)
The below variants works only with positional operator($) in the query preceding the projections and projects only the first element matching the query.
'myArray.$': 1
'myArray.$.myNestedField': 1
This is not a valid projection operation.
'myArray.0.myNestedField': 1
More here on how to query & project documents

How to get from n to n items in mongodb

I'm trying to create an android app which pulls first 1-10 documents in the mongodb collection and show those item in a list, then later when the list reaches the end i want to pull 11-20 documents in the mongodb collection and it goes on.
def get_all_tips(from_item, max_items):
db = client.MongoTip
tips_list = db.tips.find().sort([['_id', -1]]).limit(max_items).skip(from_item)
if tips_list.count() > 0:
from bson import json_util
return json_util.dumps(tips_list, default=json_util.default), response_ok
else:
return "Please move along nothing to see here", response_invalid
pass
But the above code does not work the way i intended but rather it returns from from_item to max_items
Example: calling get_all_tips(3,4)
It returns:
Document 3, Document 4, Document 5, Document 6
I'm expecting:
Document 3, Document 4
In your code you are specifying two parameters.
from_item: which is the starting index of the documents to return
max_items: number of items to return
Therefore calling get_all_tips(3,4) will return 4 documents starting from document 3 which is exactly what's happening.
Proposed fixes:
If you want it to return documents 3 and 4 call get_all_tips(3,2) instead, which means return a maximum of two documents starting from 3.
If you'd rather like to specify the start and end indexes in your function, I recommend the following changes:
def get_all_tips(from_item, to_item):
if to_item < from_item:
return "to_item must be greater than from item", bad_request
db = client.MongoTip
tips_list = db.tips.find().sort([['_id', -1]]).limit(to_item - from_item).skip(from_item)
That being said, I'd like to point out that MongoDb documentation does not recommend use of skip for pagination for large collections.
MongoDb 3.2 cursor.skip

Motor Index not created on empty Collection

I have the following code to setup my database:
self.con = motor.MotorClient(host, port)
self.Db = self.con.DB
self.Col = self.Db.Col
self.Col.create_index("c")
self.Col.create_index("h")
When I run index_information() I only see index information on _id field. However if I move the create_index() after some entries are inserted the index_information() shows the new indexes. Does this mean I have to wait until I have entries in the collection before I can create indexes? Is there another way to do this since I start with an empty collection?
You can create an index on an empty, or non-existent, MongoDB collection, and the index appears in index_information:
>>> from tornado import ioloop, gen
>>> import motor
>>>
>>> con = motor.MotorClient()
>>> db = con.test
>>> col = db.collection
>>>
>>>
>>> #gen.coroutine
... def coro():
... yield db.drop_collection("collection")
... yield col.create_index("c")
... yield col.create_index("h")
... print((yield col.index_information()))
...
>>> ioloop.IOLoop.current().run_sync(coro)
{u'c_1': {u'key': [(u'c', 1)], u'v': 1}, u'_id_': {u'key': [(u'_id', 1)], u'v': 1}, u'h_1': {u'key': [(u'h', 1)], u'v': 1}}
Since I don't see any "yield" statements in your example code, or any callbacks, I suspect you're not using Motor correctly. Motor is asynchronous; in order to wait for any Motor method that talks to the database server to complete you must either pass a callback to the method, or yield the Future the method returns.
For more information consult the tutorial:
http://motor.readthedocs.org/en/stable/tutorial.html#inserting-a-document
The discussion of calling asynchronous methods with Motor (and this applies to all Tornado libraries, not just Motor) begins at the "inserting a document" section.
You can very easily create the index on mongodb (even on empty collection) using
field_name and direction.
field_name: can be any field on which you want to create the index.
direction: can be any one from these values: 1, -1, 2dsphere, text or hashed
Refer MotorCollection Doc for details
In the below code I am trying to create index using motor library and python.
db.collection_name.create_index([("field_name", 1)] # To create ascending index
db.collection_name.create_index([("geoloc_field_name", "2dsphere")] # To create geo index
db.collection_name.create_index([("field_name", "text")] # To create text based index

PyMongo updating array records with calculated fields via cursor

Basically the collection output of an elaborate aggregate pipeline for a very large dataset is similar to the following:
{
"_id" : {
"clienta" : NumberLong(460011766),
"clientb" : NumberLong(2886729962)
},
"states" : [
[
"fixed", "fixed.rotated","fixed.rotated.off"
]
],
"VBPP" : [
244,
182,
184,
11,
299,
],
"PPF" : 72.4,
}
The intuitive, albeit slow, way to update these fields to be calculations of their former selves (length and variance of an array) with PyMongo before converting to arrays is as follows:
records_list = []
cursor = db.clientAgg.find({}, {'_id' : 0,
'states' : 1,
'VBPP' : 1,
'PPF': 1})
for record in cursor:
records_list.append(record)
for dicts in records_list:
dicts['states'] = len(dicts['states'])
dicts['VBPP'] = np.var(dicts['VBPP'])
I have written various forms of this basic flow to optimize for speed, but bringing in 500k dictionaries in memory to modify them before converting them to arrays to go through a machine learning estimator is costly. I have tried various ways to update the records directly via a cursor with variants of the following with no success:
cursor = db.clientAgg.find().skip(0).limit(50000)
def iter():
for item in cursor:
yield item
l = []
for x in iter():
x['VBPP'] = np.var(x['VBPP'])
# Or
# db.clientAgg.update({'_id':x['_id']},{'$set':{'x.VBPS': somefunction as above }},upsert=False, multi=True)
I also unsuccessfully tried using Mongo's usual operators since the variance is as simple as subtracting the mean from each element of the array, squaring the result, then averaging the results.
If I could successfully modify the collection directly then I could utilize something very fast like Monary or IOPro to load data directly from Mongo and into a numpy array without the additional overhead.
Thank you for your time
MongoDB has no way to update a document with values calculated from the document's fields; currently you can only use update to set values to constants that you pass in from your application. So you can set document.x to 2, but you can't set document.x to document.y + document.z or any other calculated value.
See https://jira.mongodb.org/browse/SERVER-11345 and https://jira.mongodb.org/browse/SERVER-458 for possible future features.
In the immediate future, PyMongo will release a bulk API that allows you to send a batch of distinct update operations in a single network round-trip which will improve your performance.
Addendum:
I have two other ideas. First, run some Javascript server-side. E.g., to set all documents' b fields to 2 * a:
db.eval(function() {
var collection = db.test_collection;
collection.find().forEach(function(doc) {
var b = 2 * doc.a;
collection.update({_id: doc._id}, {$set: {b: b}});
});
});
The second idea is to use the aggregation framework's $out operator, new in MongoDB 2.5.2, to transform the collection into a second collection that includes the calculated field:
db.test_collection.aggregate({
$project: {
a: '$a',
b: {$multiply: [2, '$a']}
}
}, {
$out: 'test_collection2'
});
Note that $project must explicitly include all the fields you want; only _id is included by default.
For a million documents on my machine the former approach took 2.5 minutes, and the latter 9 seconds. So you could use the aggregation framework to copy your data from its source to its destination, with the calculated fields included. Then, if desired, drop the original collection and rename the target collection to the source's name.
My final thought on this, is that MongoDB 2.5.3 and later can stream large result sets from an aggregation pipeline using a cursor. There's no reason Monary can't use that capability, so you might file a feature request there. That would allow you to get documents from a collection in the form you want, via Monary, without having to actually store the calculated fields in MongoDB.