PyMongo gives error when using dot notation in field name for sort method - mongodb

I am trying to get the maximum value of a field inside a collection. The field's value is an array and I actually need to get the maximum of the first index of the array. For example, the collection is similar to this:
[
{
...,
"<field>": [10, 20],
...
},
{
...,
"<field>": [13, 23],
...
},
{
...,
"<field>": [19, 31],
...
}
]
So from the above document, I would need to get the maximum of the first index of array. In this case, it would be 19.
To do this, I am first sorting the field by the first index of the field array and then getting the first document (using limit). I am able to do this using Node.js but cannot get it working with PyMongo.
It works using the Node.js MongoDB API like:
const max = (
await collection
.find()
.sort({ "<field>.0": -1 })
.limit(1)
.toArray()
)[0];
However, if I try to do a similar thing using PyMongo:
max = list(collection.find().sort("<field>.0", -1).limit(1))[0]
I get the error:
KeyError: '<field>.0'
I am using PyMongo version 3.12.0. How can I resolve this?

In PyMongo, the sort option is a list of tuples, where the tuples accept two arguments: key name and sort-order.
And you can pass multiple tuples to this list since MongoDB supports sort by multiple key conditions.
col.find({}).sort([('<key1>', <sort-order>), ('<key2>', <sort-order>)])
In your scenario, you should replace your find command as follows:
max = list(collection.find().sort([("<field>.0", -1)]).limit(1))[0]

Related

Mongoose greater than and equal

how to get greater than or equal integer values from Mongodb using Mongoose?
assume that below list
list = [134,56,89,89,90,200] //Marks field
want to get values equal or greater than 89, result set must be [89,90,200]
in my query I was able to get values greater than 89, I want to get it with 89
let x = 89;
query.find().where('marks').gt(x);
I don't know what version of mongodb you are using. But this is what available in the latest releases.
$gte selects the documents where the value of the field is greater than or equal to (i.e. >=) a specified value
query.find( { marks: { $gte: 89} } )
Here
Simply use
.gte(upperlimit)
for greater than equal to and
.lte(lowerlimit)
for less than equal to.
Remember upperlimit and lowerlimit should be numeric.
Query.prototype.gt method is the same as using "greater than" operator.
In your case, you have to use gte method that returns exactly what you need - results that are greater or equal to given value:
let x = 89;
query.find()
.where('marks')
.gte(x)
.exec();
Also, as another answer mentioned, you can always use native MongoDB operator $gte, however Mongoose offers more intuitive approach with its chaining Query API, as above.
Also, if you're using Mongoose with Promises or async/await, don't forget to add
exec() at the end - this will convert thenable-compatible Query result to a fully featured Promise, e.g.:
let x = 89;
find().where('marks').gte(x).exec()
.then(() => {/* rest of handling*/})
.catch(() => { /* error handling */});
The same code using async/await:
async function getResultsGreaterThan(x) {
const results = await find().where('marks').gte(x).exec();
console.log(results);
return results;
}
getResultsGreaterThan(89);
More on Mongoose chainable Query API can be found in rather solid official docs https://mongoosejs.com/docs/api/query.html

pymongo find().hint('index') does not use index [duplicate]

I'm trying to use the sort feature when querying my mongoDB, but it is failing. The same query works in the MongoDB console but not here. Code is as follows:
import pymongo
from pymongo import Connection
connection = Connection()
db = connection.myDB
print db.posts.count()
for post in db.posts.find({}, {'entities.user_mentions.screen_name':1}).sort({u'entities.user_mentions.screen_name':1}):
print post
The error I get is as follows:
Traceback (most recent call last):
File "find_ow.py", line 7, in <module>
for post in db.posts.find({}, {'entities.user_mentions.screen_name':1}).sort({'entities.user_mentions.screen_name':1},1):
File "/Library/Python/2.6/site-packages/pymongo-2.0.1-py2.6-macosx-10.6-universal.egg/pymongo/cursor.py", line 430, in sort
File "/Library/Python/2.6/site-packages/pymongo-2.0.1-py2.6-macosx-10.6-universal.egg/pymongo/helpers.py", line 67, in _index_document
TypeError: first item in each key pair must be a string
I found a link elsewhere that says I need to place a 'u' infront of the key if using pymongo, but that didn't work either. Anyone else get this to work or is this a bug.
.sort(), in pymongo, takes key and direction as parameters.
So if you want to sort by, let's say, id then you should .sort("_id", 1)
For multiple fields:
.sort([("field1", pymongo.ASCENDING), ("field2", pymongo.DESCENDING)])
You can try this:
db.Account.find().sort("UserName")
db.Account.find().sort("UserName",pymongo.ASCENDING)
db.Account.find().sort("UserName",pymongo.DESCENDING)
This also works:
db.Account.find().sort('UserName', -1)
db.Account.find().sort('UserName', 1)
I'm using this in my code, please comment if i'm doing something wrong here, thanks.
Why python uses list of tuples instead dict?
In python, you cannot guarantee that the dictionary will be interpreted in the order you declared.
So, in mongo shell you could do .sort({'field1':1,'field2':1}) and the interpreter would sort field1 at first level and field 2 at second level.
If this syntax was used in python, there is a chance of sorting by field2 at first level. With tuple, there is no such risk.
.sort([("field1",pymongo.ASCENDING), ("field2",pymongo.DESCENDING)])
Sort by _id descending:
collection.find(filter={"keyword": keyword}, sort=[( "_id", -1 )])
Sort by _id ascending:
collection.find(filter={"keyword": keyword}, sort=[( "_id", 1 )])
DESC & ASC :
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
col = db["customers"]
doc = col.find().sort("name", -1) #
for x in doc:
print(x)
###################
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
col = db["customers"]
doc = col.find().sort("name", 1) #
for x in doc:
print(x)
TLDR: Aggregation pipeline is faster as compared to conventional .find().sort().
Now moving to the real explanation. There are two ways to perform sorting operations in MongoDB:
Using .find() and .sort().
Or using the aggregation pipeline.
As suggested by many .find().sort() is the simplest way to perform the sorting.
.sort([("field1",pymongo.ASCENDING), ("field2",pymongo.DESCENDING)])
However, this is a slow process compared to the aggregation pipeline.
Coming to the aggregation pipeline method. The steps to implement simple aggregation pipeline intended for sorting are:
$match (optional step)
$sort
NOTE: In my experience, the aggregation pipeline works a bit faster than the .find().sort() method.
Here's an example of the aggregation pipeline.
db.collection_name.aggregate([{
"$match": {
# your query - optional step
}
},
{
"$sort": {
"field_1": pymongo.ASCENDING,
"field_2": pymongo.DESCENDING,
....
}
}])
Try this method yourself, compare the speed and let me know about this in the comments.
Edit: Do not forget to use allowDiskUse=True while sorting on multiple fields otherwise it will throw an error.
.sort([("field1",pymongo.ASCENDING), ("field2",pymongo.DESCENDING)])
Python uses key,direction. You can use the above way.
So in your case you can do this
for post in db.posts.find().sort('entities.user_mentions.screen_name',pymongo.ASCENDING):
print post
Say, you want to sort by 'created_on' field, then you can do like this,
.sort('{}'.format('created_on'), 1 if sort_type == 'asc' else -1)

How to $add together a subset of elements of an array in mongodb aggregation?

Here is the problem I want to resolve:
each document contains an array of 30 integers
the documents are grouped under a certain condition (not relevant here)
while grouping them, I want to:
add together the 29 last elements of the array (skipping the first one) of each document
sum the previous result among the same group, and return it
Data structure is very difficult to change and I cannot afford a migration + I still need the 30 values for another purpose. Here is what I tried, unsuccessfully:
db.collection.aggregate([
{$match: {... some matching query ...}},
{$project: {total_29_last_values: {$add: ["$my_array.1", "$my_array.2", ..., "$my_array.29"]}}},
{$group: {
... some grouping here ...
my_result: {$sum: "$total_29_last_values"}
}}
])
Theoretically (IMHO) this should work, given the definition of $add in mongodb documentation, but for some reason it fails:
exception: $add only supports numeric or date types, not Array
Maybe there is not support for adding together elements of an array, but this seems strange...
Thanks for your help !
From the docs,
The $add expression has the following syntax:
{ $add: [ <expression1>, <expression2>, ... ] }
The arguments can be any valid expression as long as they resolve to
either all numbers or to numbers and a date.
It clearly states that the $add operator accepts only numbers or dates.
$my_array.1 resolves to an empty array. for example, []. (You can always look for a match in particular index, such as, {$match:{"a.0":1}} but cannot derive the value from a particular index of an array. For that you need to use the $ or the $slice operators.This is currently an unresolved issue: JIRA1, JIRA2)
And the $add expression becomes $add:[[],[],[],..].
$add does not take an array as input and hence you get the error stating that it does not support Array as input.
What you need to do is:
Match the documents.
Unwind the my_array field.
Group together based on the _id of each document to get the sum
of all the elements in the array skipping the first element.
Project the summed field for each grouped document.
Again group the documents based on the condition to get the sum.
Stage operators:
db.collection.aggregate([
{$match:{}}, // condition
{$unwind:"$my_array"},
{$group:{"_id":"$_id",
"first_element":{$first:"$my_array"},
"sum_of_all":{$sum:"$my_array"}}},
{$project:{"_id":"$_id",
"sum_of_29":{$subtract:["$sum_of_all","$first_element"]}}},
{$group:{"_id":" ", // whatever condition
"my_result":{$sum:"$sum_of_29"}}}
])

PyMongo updating array records with calculated fields via cursor

Basically the collection output of an elaborate aggregate pipeline for a very large dataset is similar to the following:
{
"_id" : {
"clienta" : NumberLong(460011766),
"clientb" : NumberLong(2886729962)
},
"states" : [
[
"fixed", "fixed.rotated","fixed.rotated.off"
]
],
"VBPP" : [
244,
182,
184,
11,
299,
],
"PPF" : 72.4,
}
The intuitive, albeit slow, way to update these fields to be calculations of their former selves (length and variance of an array) with PyMongo before converting to arrays is as follows:
records_list = []
cursor = db.clientAgg.find({}, {'_id' : 0,
'states' : 1,
'VBPP' : 1,
'PPF': 1})
for record in cursor:
records_list.append(record)
for dicts in records_list:
dicts['states'] = len(dicts['states'])
dicts['VBPP'] = np.var(dicts['VBPP'])
I have written various forms of this basic flow to optimize for speed, but bringing in 500k dictionaries in memory to modify them before converting them to arrays to go through a machine learning estimator is costly. I have tried various ways to update the records directly via a cursor with variants of the following with no success:
cursor = db.clientAgg.find().skip(0).limit(50000)
def iter():
for item in cursor:
yield item
l = []
for x in iter():
x['VBPP'] = np.var(x['VBPP'])
# Or
# db.clientAgg.update({'_id':x['_id']},{'$set':{'x.VBPS': somefunction as above }},upsert=False, multi=True)
I also unsuccessfully tried using Mongo's usual operators since the variance is as simple as subtracting the mean from each element of the array, squaring the result, then averaging the results.
If I could successfully modify the collection directly then I could utilize something very fast like Monary or IOPro to load data directly from Mongo and into a numpy array without the additional overhead.
Thank you for your time
MongoDB has no way to update a document with values calculated from the document's fields; currently you can only use update to set values to constants that you pass in from your application. So you can set document.x to 2, but you can't set document.x to document.y + document.z or any other calculated value.
See https://jira.mongodb.org/browse/SERVER-11345 and https://jira.mongodb.org/browse/SERVER-458 for possible future features.
In the immediate future, PyMongo will release a bulk API that allows you to send a batch of distinct update operations in a single network round-trip which will improve your performance.
Addendum:
I have two other ideas. First, run some Javascript server-side. E.g., to set all documents' b fields to 2 * a:
db.eval(function() {
var collection = db.test_collection;
collection.find().forEach(function(doc) {
var b = 2 * doc.a;
collection.update({_id: doc._id}, {$set: {b: b}});
});
});
The second idea is to use the aggregation framework's $out operator, new in MongoDB 2.5.2, to transform the collection into a second collection that includes the calculated field:
db.test_collection.aggregate({
$project: {
a: '$a',
b: {$multiply: [2, '$a']}
}
}, {
$out: 'test_collection2'
});
Note that $project must explicitly include all the fields you want; only _id is included by default.
For a million documents on my machine the former approach took 2.5 minutes, and the latter 9 seconds. So you could use the aggregation framework to copy your data from its source to its destination, with the calculated fields included. Then, if desired, drop the original collection and rename the target collection to the source's name.
My final thought on this, is that MongoDB 2.5.3 and later can stream large result sets from an aggregation pipeline using a cursor. There's no reason Monary can't use that capability, so you might file a feature request there. That would allow you to get documents from a collection in the form you want, via Monary, without having to actually store the calculated fields in MongoDB.

Mongo DB: how to select items with nested array count > 0

The database is near 5GB. I have documents like:
{
_id: ..
user: "a"
hobbies: [{
_id: ..
name: football
},
{
_id: ..
name: beer
}
...
]
}
I want to return users who have more then 0 "hobbies"
I've tried
db.collection.find({"hobbies" : { &gt : 0}}).limit(10)
and it takes all RAM and no result.
How to do conduct this select?
And how to return only: id, name, count ?
How to do it with c# official driver?
TIA
P.S.
near i've found:
"Add new field to hande category size. It's a usual practice in mongo world."
is this true?
In this specific case, you can use list indexing to solve your problem:
db.collection.find({"hobbies.0" : {$exists : true}}).limit(10)
This just makes sure a 0th element exists. You can do the same to make sure the list is shorter than n or between x and y in length by checking the existing of elements at the ends of the range.
Have you tried using hobbies.length. i haven't tested this, but i believe this is the right way to query the range of the array in mongodb
db.collection.find({$where: '(this.hobbies.length > 0)'})
You can (sort of) check for a range of array lengths with the $size operator using a logical $not:
db.collection.find({array: {$not: {$size: 0}}})
That's somewhat true.
According to the manual
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
$size
The $size operator matches any array with the specified number of
elements. The following example would match the object {a:["foo"]},
since that array has just one element:
db.things.find( { a : { $size: 1 } } );
You cannot use $size to find a range of sizes (for example: arrays
with more than 1 element). If you need to query for a range, create an
extra size field that you increment when you add elements
So you can check for array size 0, but not for things like 'larger than 0'
Earlier questions explain how to handle the array count issue. Although in your case if ZERO really is the only value you want to test for, you could set the array to null when it's empty and set the option to not serialize it, then you can test for the existence of that field. Remember to test for null and to create the array when you want to add a hobby to a user.
For #2, provided you added the count field it's easy to select the fields you want back from the database and include the count field.
if you need to find only zero hobbies, and if the hobbies key is not set for someone with zero hobbies , use EXISTS flag.
Add an index on "hobbies" for performance enhancement :
db.collection.find( { hobbies : { $exists : true } } );
However, if the person with zero hobbies has empty array, and person with 1 hobby has an array with 1 element, then use this generic solution :
Maintain a variable called "hcount" ( hobby count), and always set it equal to size of hobbies array in any update.
Index on the field "hcount"
Then, you can do a query like :
db.collection.find( { hcount : 0 } ) // people with 0 hobbies
db.collection.find( { hcount : 5 } ) // people with 5 hobbies
3 - From #JohnPs answer, "$size" is also a good operator for this purpose.
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size