Spring data MongoDB condition in projection - spring-data

How can I add condition on my projection stage during aggregation using Spring Data?
For example I want to add new_field that will be calculated by formula field1 / field2. From Code side it would look like:
ProjectionOperation projectionOperation = project("field1", "field2").andExpression("field1 / field2").as("field3");
But if field2 is equal to 0, I'll get an error. So to avoid such situation it was suggested to use $cond operator but I have no idea how it should look like in code. Do anybody have any thoughts?
Note. Expression field2 != 0 ? 1 : 0 didn't work (even if "SpEL" allows such syntax).

I just had the same issue.
!= does not seem to be supported, however you can use "ifNull" or "cond" in your SPeL.
See here:
how to use spel to represent $cond of mongo
And here for more examples:
https://github.com/spring-projects/spring-data-mongodb/blob/master/spring-data-mongodb/src/test/java/org/springframework/data/mongodb/core/aggregation/SpelExpressionTransformerUnitTests.java

It's a little late but for future viewers of this question, the following code worked for me. It returns 0 as the value of field3 when field2 is 0.
ProjectionOperation projectionOperation = Aggregation.project("field1, field2")
.and(AggregationSpELExpression.expressionOf("cond(field2 == 0, 0, field1/field2)"))
.as("field3");

You can find more examples here: https://www.javatips.net/api/spring-data-mongodb-master/spring-data-mongodb/src/test/java/org/springframework/data/mongodb/core/aggregation/ProjectionOperationUnitTests.java

Related

Concatenate pymongo Cursor

How do you concatenate multiple pymongo Cursor? If not it is not possible, how do you take results from multiple Cursor and create a new one?
Example :
result1 = db[collection].find(query1)
result2 = db[collection].find(query2)
concat_result = result1 + result2 #something like that.
Update :
All answers here seems to take into account that the queries are in the same format. For example. query1 might get 2 documents between dates as query2 might sorts documents by categories and may be limited by a count of 5. $or is too homogeneous for what I need. After concatening those two queries, I need to sort them base on another key.
For further details, a class Printer needs to receive a pymongo.Cursor and only one and i'm stuck with this.
The easiest way is to use mongo $or operator like
db[collection].find({'$or': [query1, query2]})
Or if you have got to do this in python you
def concat_results(*results):
ids = set()
for result in results:
for v in result:
if v['_id'] not in ids:
ids.add(v['_id'])
yield v1
concat_result = list(concat_results(result1, result2))
yes the wise solution would be to use the $or as stated above.
if you wanted to do so in a pythonic way then you could:
a = [item for item in db[collection].find({filters},{select_fields})]
b = [item for item in db[collection].find({filters},{select_fields})]
c = []
for x,y in zip(a,b):
c += [x, y]

MongoDB: What is the fastest / is there a way to get the 200 documents with a closest timestamp to a specified list of 200 timestamps, say using a $in [duplicate]

Let's assume I have a collection with documents with a ratio attribute that is a floating point number.
{'ratio':1.437}
How do I write a query to find the single document with the closest value to a given integer without loading them all into memory using a driver and finding one with the smallest value of abs(x-ratio)?
Interesting problem. I don't know if you can do it in a single query, but you can do it in two:
var x = 1; // given integer
closestBelow = db.test.find({ratio: {$lte: x}}).sort({ratio: -1}).limit(1);
closestAbove = db.test.find({ratio: {$gt: x}}).sort({ratio: 1}).limit(1);
Then you just check which of the two docs has the ratio closest to the target integer.
MongoDB 3.2 Update
The 3.2 release adds support for the $abs absolute value aggregation operator which now allows this to be done in a single aggregate query:
var x = 1;
db.test.aggregate([
// Project a diff field that's the absolute difference along with the original doc.
{$project: {diff: {$abs: {$subtract: [x, '$ratio']}}, doc: '$$ROOT'}},
// Order the docs by diff
{$sort: {diff: 1}},
// Take the first one
{$limit: 1}
])
I have another idea, but very tricky and need to change your data structure.
You can use geolocation index which supported by mongodb
First, change your data to this structure and keep the second value with 0
{'ratio':[1.437, 0]}
Then you can use $near operator to find the the closest ratio value, and because the operator return a list sorted by distance with the integer you give, you have to use limit to get only the closest value.
db.places.find( { ratio : { $near : [50,0] } } ).limit(1)
If you don't want to do this, I think you can just use #JohnnyHK's answer :)

Selecting data from MongoDB where K of N criterias are met

I have documents with four fields: A, B, C, D Now I need to find documents where at least three fields matches. For example:
Query: A=a, B=b, C=c, D=d
Returned documents:
a,b,c,d (four of four met)
a,b,c (three of four met)
a,b,d (another three of four met)
a,c,d (another three of four met)
b,c,d (another three of four met)
So far I created something like:
`(A=a AND B=b AND C=c)
OR (A=a AND B=b AND D=d)
OR (A=a AND C=c AND D=d)
OR (B=b AND C=c AND D=d)`
But this is ugly and error prone.
Is there a better way to achieve it? Also, query performance matters.
I'm using Spring Data but I believe it does not matter. My current code:
Criteria c = new Criteria();
Criteria ca = Criteria.where("A").is(doc.getA());
Criteria cb = Criteria.where("B").is(doc.getB());
Criteria cc = Criteria.where("C").is(doc.getC());
Criteria cd = Criteria.where("D").is(doc.getD());
c.orOperator(
new Criteria().andOperator(ca,cb,cc),
new Criteria().andOperator(ca,cb,cd),
new Criteria().andOperator(ca,cc,cd),
new Criteria().andOperator(cb,cc,cd)
);
Query query = new Query(c);
return operations.find(query, Document.class, "documents");
Currently in MongoDB we cannot do this directly, since we dont have any functionality supporting Permutation/Combination on the query parameters.
But we can simplify the query by breaking the condition into parts.
Use Aggregation pipeline
$project with records (A=a AND B=b) --> This will give the records which are having two conditions matching.(Our objective is to find the records which are having matches for 3 out of 4 or 4 out of 4 on the given condition)`
Next in the pipeline use OR condition (C=c OR D=d) to find the final set of records which yields our expected result.
Hope it Helps!
The way you have it you have to do all permutations in your query. You can use the aggregation framework to do this without permuting all combinations. And it is generic enough to do with any K. The downside is I think you need Mongodb 3.2+ and also Spring Data doesn't support these oparations yet: $filter $concatArrays
But you can do it pretty easy with the java driver.
[
{
$project:{
totalMatched:{
$size:{
$filter:{
input:{
$concatArrays:[ ["$A"], ["$B"], ["$C"],["$D"]]
},
as:"attr",
cond:{
$eq:["$$attr","a"]
}
}
}
}
}
},
{
$match:{
totalMatched:{ $gte:3 }
}
}
]
All you are doing is you are concatenating the values of all the fields you need to check in a single array. Then select a subset of those elements that are equal to the value you are looking for (or any condition you want for that matter) and finally getting the size of that array for each document.
Now all you need to do is to $match the documents that have a size of greater than or equal to what you want.

pymongo find().hint('index') does not use index [duplicate]

I'm trying to use the sort feature when querying my mongoDB, but it is failing. The same query works in the MongoDB console but not here. Code is as follows:
import pymongo
from pymongo import Connection
connection = Connection()
db = connection.myDB
print db.posts.count()
for post in db.posts.find({}, {'entities.user_mentions.screen_name':1}).sort({u'entities.user_mentions.screen_name':1}):
print post
The error I get is as follows:
Traceback (most recent call last):
File "find_ow.py", line 7, in <module>
for post in db.posts.find({}, {'entities.user_mentions.screen_name':1}).sort({'entities.user_mentions.screen_name':1},1):
File "/Library/Python/2.6/site-packages/pymongo-2.0.1-py2.6-macosx-10.6-universal.egg/pymongo/cursor.py", line 430, in sort
File "/Library/Python/2.6/site-packages/pymongo-2.0.1-py2.6-macosx-10.6-universal.egg/pymongo/helpers.py", line 67, in _index_document
TypeError: first item in each key pair must be a string
I found a link elsewhere that says I need to place a 'u' infront of the key if using pymongo, but that didn't work either. Anyone else get this to work or is this a bug.
.sort(), in pymongo, takes key and direction as parameters.
So if you want to sort by, let's say, id then you should .sort("_id", 1)
For multiple fields:
.sort([("field1", pymongo.ASCENDING), ("field2", pymongo.DESCENDING)])
You can try this:
db.Account.find().sort("UserName")
db.Account.find().sort("UserName",pymongo.ASCENDING)
db.Account.find().sort("UserName",pymongo.DESCENDING)
This also works:
db.Account.find().sort('UserName', -1)
db.Account.find().sort('UserName', 1)
I'm using this in my code, please comment if i'm doing something wrong here, thanks.
Why python uses list of tuples instead dict?
In python, you cannot guarantee that the dictionary will be interpreted in the order you declared.
So, in mongo shell you could do .sort({'field1':1,'field2':1}) and the interpreter would sort field1 at first level and field 2 at second level.
If this syntax was used in python, there is a chance of sorting by field2 at first level. With tuple, there is no such risk.
.sort([("field1",pymongo.ASCENDING), ("field2",pymongo.DESCENDING)])
Sort by _id descending:
collection.find(filter={"keyword": keyword}, sort=[( "_id", -1 )])
Sort by _id ascending:
collection.find(filter={"keyword": keyword}, sort=[( "_id", 1 )])
DESC & ASC :
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
col = db["customers"]
doc = col.find().sort("name", -1) #
for x in doc:
print(x)
###################
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
col = db["customers"]
doc = col.find().sort("name", 1) #
for x in doc:
print(x)
TLDR: Aggregation pipeline is faster as compared to conventional .find().sort().
Now moving to the real explanation. There are two ways to perform sorting operations in MongoDB:
Using .find() and .sort().
Or using the aggregation pipeline.
As suggested by many .find().sort() is the simplest way to perform the sorting.
.sort([("field1",pymongo.ASCENDING), ("field2",pymongo.DESCENDING)])
However, this is a slow process compared to the aggregation pipeline.
Coming to the aggregation pipeline method. The steps to implement simple aggregation pipeline intended for sorting are:
$match (optional step)
$sort
NOTE: In my experience, the aggregation pipeline works a bit faster than the .find().sort() method.
Here's an example of the aggregation pipeline.
db.collection_name.aggregate([{
"$match": {
# your query - optional step
}
},
{
"$sort": {
"field_1": pymongo.ASCENDING,
"field_2": pymongo.DESCENDING,
....
}
}])
Try this method yourself, compare the speed and let me know about this in the comments.
Edit: Do not forget to use allowDiskUse=True while sorting on multiple fields otherwise it will throw an error.
.sort([("field1",pymongo.ASCENDING), ("field2",pymongo.DESCENDING)])
Python uses key,direction. You can use the above way.
So in your case you can do this
for post in db.posts.find().sort('entities.user_mentions.screen_name',pymongo.ASCENDING):
print post
Say, you want to sort by 'created_on' field, then you can do like this,
.sort('{}'.format('created_on'), 1 if sort_type == 'asc' else -1)

In MongoDB's pymongo, how do I do a count()?

for post in db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num):
How do I get the count()?
Since pymongo version 3.7.0 and above count() is deprecated. Instead use Collection.count_documents. Running cursor.count or collection.count will result in following warning message:
DeprecationWarning: count is deprecated. Use Collection.count_documents instead.
To use count_documents the code can be adjusted as follows
import pymongo
db = pymongo.MongoClient()
col = db[DATABASE][COLLECTION]
find = {"test_set":"abc"}
sort = [("abc",pymongo.DESCENDING)]
skip = 10
limit = 10
doc_count = col.count_documents(find, skip=skip)
results = col.find(find).sort(sort).skip(skip).limit(limit)
for doc in result:
//Process Document
Note: count_documents method performs relatively slow as compared to count method. In order to optimize you can use collection.estimated_document_count. This method will return estimated number of docs(as the name suggested) based on collection metadata.
If you're using pymongo version 3.7.0 or higher, see this answer instead.
If you want results_count to ignore your limit():
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count()
for post in results:
If you want the results_count to be capped at your limit(), set applySkipLimit to True:
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count(True)
for post in results:
Not sure why you want the count if you are already passing limit 'num'. Anyway if you want to assert, here is what you should do.
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count(True)
That will match results_count with num
Cannot comment unfortuantely on #Sohaib Farooqi's answer... Quick note: although, cursor.count() has been deprecated it is significantly faster, than collection.count_documents() in all of my tests, when counting all documents in a collection (ie. filter={}). Running db.currentOp() reveals that collection.count_documents() uses an aggregation pipeline, while cursor.count() doesn't. This might be a cause.
This thread happens to be 11 years old. However, in 2022 the 'count()' function has been deprecated. Here is a way I came up with to count documents in MongoDB using Python. Here is a picture of the code snippet. Making a empty list is not needed I just wanted to be outlandish. Hope this helps :). Code snippet here.
The thing in my case relies in the count of matched elements for a given query, and surely not to repeat this query twice:
one to get the count, and
two to get the result set.
no way
I know the query result set is not quite big and fits in memory, therefore, I can convert it to a list, and get the list length.
This code illustrates the use case:
# pymongo 3.9.0
while not is_over:
it = items.find({"some": "/value/"}).skip(offset).size(limit)
# List will load the cursor content into memory
it = list(it)
if len(it) < size:
is_over = True
offset += size
If you want to use cursor and also want count, you can try this way
# Have 27 items in collection
db = MongoClient(_URI)[DB_NAME][COLLECTION_NAME]
cursor = db.find()
count = db.find().explain().get("executionStats", {}).get("nReturned")
# Output: 27
cursor = db.find().limit(5)
count = db.find().explain().get("executionStats", {}).get("nReturned")
# Output: 5
# Can also use cursor
for item in cursor:
...
You can read more about it from https://pymongo.readthedocs.io/en/stable/api/pymongo/cursor.html#pymongo.cursor.Cursor.explain