Concatenate pymongo Cursor - mongodb

How do you concatenate multiple pymongo Cursor? If not it is not possible, how do you take results from multiple Cursor and create a new one?
Example :
result1 = db[collection].find(query1)
result2 = db[collection].find(query2)
concat_result = result1 + result2 #something like that.
Update :
All answers here seems to take into account that the queries are in the same format. For example. query1 might get 2 documents between dates as query2 might sorts documents by categories and may be limited by a count of 5. $or is too homogeneous for what I need. After concatening those two queries, I need to sort them base on another key.
For further details, a class Printer needs to receive a pymongo.Cursor and only one and i'm stuck with this.

The easiest way is to use mongo $or operator like
db[collection].find({'$or': [query1, query2]})
Or if you have got to do this in python you
def concat_results(*results):
ids = set()
for result in results:
for v in result:
if v['_id'] not in ids:
ids.add(v['_id'])
yield v1
concat_result = list(concat_results(result1, result2))

yes the wise solution would be to use the $or as stated above.
if you wanted to do so in a pythonic way then you could:
a = [item for item in db[collection].find({filters},{select_fields})]
b = [item for item in db[collection].find({filters},{select_fields})]
c = []
for x,y in zip(a,b):
c += [x, y]

Related

how to get the values not repeated in the list? (Dart - Flutter)

I have two lists:
a = [1,2,3,4,5]
b = [1,4,5]
I want to get the values from list a that do not exist in list b:
result = [2,3]
There are various ways to achieve that and which one fits your situation depends on your requirements (which are not specified in the question).
Do you want to preserve duplicates in a or not:
a = [1,2,2,3,4,5]
b = [1,4,5]
Should this result in [2,3] or [2,2,3]?
If you do not want to preserve duplicates you can convert your lists to Sets and use the difference method:
result = a.toSet().difference(b.toSet())
Our if you want to preserve the duplicates just filter the list.
result = a.filter((v) => b.contains(v))

Is it possible to return a map of key values using gremlin scala

Currently i have two gremlin queries which will fetch two different values and i am populating in a map.
Scenario : A->B , A->C , A->D
My queries below,
graph.V().has(ID,A).out().label().toList()
Fetch the list of outE labels of A .
Result : List(B,C,D)
graph.traversal().V().has("ID",A).outE("interference").as("x").otherV().has("ID",B).select("x").values("value").headOption()
Given A and B , get the egde property value (A->B)
Return : 10
Is it possible that i can combine both there queries to get a return as Map[(B,10)(C,11)(D,12)]
I am facing some performance issue when i have two queries. Its taking more time
There is probably a better way to do this but I managed to get something with the following traversal:
gremlin> graph.traversal().V().has("ID","A").outE("interference").as("x").otherV().has("ID").label().as("y").select("x").by("value").as("z").select("y", "z").select(values);
==>[B,1]
==>[C,2]
I would wait for more answers though as I suspect there is a better traversal out there.
Below is working in scala
val b = StepLabel[Edge]()
val y = StepLabel[Label]()
val z = StepLabel[Integer]()
graph.traversal().V().has("ID",A).outE("interference").as(b)
.otherV().label().as(y)
.select(b).values("name").as(z)
.select((y,z)).toMap[String,Integer]
This will return Map[String,Int]

Selecting data from MongoDB where K of N criterias are met

I have documents with four fields: A, B, C, D Now I need to find documents where at least three fields matches. For example:
Query: A=a, B=b, C=c, D=d
Returned documents:
a,b,c,d (four of four met)
a,b,c (three of four met)
a,b,d (another three of four met)
a,c,d (another three of four met)
b,c,d (another three of four met)
So far I created something like:
`(A=a AND B=b AND C=c)
OR (A=a AND B=b AND D=d)
OR (A=a AND C=c AND D=d)
OR (B=b AND C=c AND D=d)`
But this is ugly and error prone.
Is there a better way to achieve it? Also, query performance matters.
I'm using Spring Data but I believe it does not matter. My current code:
Criteria c = new Criteria();
Criteria ca = Criteria.where("A").is(doc.getA());
Criteria cb = Criteria.where("B").is(doc.getB());
Criteria cc = Criteria.where("C").is(doc.getC());
Criteria cd = Criteria.where("D").is(doc.getD());
c.orOperator(
new Criteria().andOperator(ca,cb,cc),
new Criteria().andOperator(ca,cb,cd),
new Criteria().andOperator(ca,cc,cd),
new Criteria().andOperator(cb,cc,cd)
);
Query query = new Query(c);
return operations.find(query, Document.class, "documents");
Currently in MongoDB we cannot do this directly, since we dont have any functionality supporting Permutation/Combination on the query parameters.
But we can simplify the query by breaking the condition into parts.
Use Aggregation pipeline
$project with records (A=a AND B=b) --> This will give the records which are having two conditions matching.(Our objective is to find the records which are having matches for 3 out of 4 or 4 out of 4 on the given condition)`
Next in the pipeline use OR condition (C=c OR D=d) to find the final set of records which yields our expected result.
Hope it Helps!
The way you have it you have to do all permutations in your query. You can use the aggregation framework to do this without permuting all combinations. And it is generic enough to do with any K. The downside is I think you need Mongodb 3.2+ and also Spring Data doesn't support these oparations yet: $filter $concatArrays
But you can do it pretty easy with the java driver.
[
{
$project:{
totalMatched:{
$size:{
$filter:{
input:{
$concatArrays:[ ["$A"], ["$B"], ["$C"],["$D"]]
},
as:"attr",
cond:{
$eq:["$$attr","a"]
}
}
}
}
}
},
{
$match:{
totalMatched:{ $gte:3 }
}
}
]
All you are doing is you are concatenating the values of all the fields you need to check in a single array. Then select a subset of those elements that are equal to the value you are looking for (or any condition you want for that matter) and finally getting the size of that array for each document.
Now all you need to do is to $match the documents that have a size of greater than or equal to what you want.

Is it possible to refer to multiple documents in a mongo db query?

Suppose I have a collection containing the following documents:
...
{
event_counter : 3
event_type: 50
event_data: "yaya"
}
{
event_counter : 4
event_type: 100
event_data: "whowho"
}
...
Is it possible to ask for:
for each document, e where e.event_type == 100
get me any document f where
f.event_counter = e.event_counter+1
or equivalently:
find each f, where f.event_counter==e.event_counter+1 && e.event_type==100
I think the best way for you to approach this is on the application side, using multiple queries. You would want to run a query to match all documents with e.event_type = 100, like this one:
db.collection.find({"e.event_type" : 100});
Then, you'll have to write some logic to iterate through the results and run more queries to find documents with the right value of f.event_counter.
I am not sure it's possible to do this using MongoDB's aggregation framework. If it is possible, it will be quite a complicated query.

In MongoDB's pymongo, how do I do a count()?

for post in db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num):
How do I get the count()?
Since pymongo version 3.7.0 and above count() is deprecated. Instead use Collection.count_documents. Running cursor.count or collection.count will result in following warning message:
DeprecationWarning: count is deprecated. Use Collection.count_documents instead.
To use count_documents the code can be adjusted as follows
import pymongo
db = pymongo.MongoClient()
col = db[DATABASE][COLLECTION]
find = {"test_set":"abc"}
sort = [("abc",pymongo.DESCENDING)]
skip = 10
limit = 10
doc_count = col.count_documents(find, skip=skip)
results = col.find(find).sort(sort).skip(skip).limit(limit)
for doc in result:
//Process Document
Note: count_documents method performs relatively slow as compared to count method. In order to optimize you can use collection.estimated_document_count. This method will return estimated number of docs(as the name suggested) based on collection metadata.
If you're using pymongo version 3.7.0 or higher, see this answer instead.
If you want results_count to ignore your limit():
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count()
for post in results:
If you want the results_count to be capped at your limit(), set applySkipLimit to True:
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count(True)
for post in results:
Not sure why you want the count if you are already passing limit 'num'. Anyway if you want to assert, here is what you should do.
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count(True)
That will match results_count with num
Cannot comment unfortuantely on #Sohaib Farooqi's answer... Quick note: although, cursor.count() has been deprecated it is significantly faster, than collection.count_documents() in all of my tests, when counting all documents in a collection (ie. filter={}). Running db.currentOp() reveals that collection.count_documents() uses an aggregation pipeline, while cursor.count() doesn't. This might be a cause.
This thread happens to be 11 years old. However, in 2022 the 'count()' function has been deprecated. Here is a way I came up with to count documents in MongoDB using Python. Here is a picture of the code snippet. Making a empty list is not needed I just wanted to be outlandish. Hope this helps :). Code snippet here.
The thing in my case relies in the count of matched elements for a given query, and surely not to repeat this query twice:
one to get the count, and
two to get the result set.
no way
I know the query result set is not quite big and fits in memory, therefore, I can convert it to a list, and get the list length.
This code illustrates the use case:
# pymongo 3.9.0
while not is_over:
it = items.find({"some": "/value/"}).skip(offset).size(limit)
# List will load the cursor content into memory
it = list(it)
if len(it) < size:
is_over = True
offset += size
If you want to use cursor and also want count, you can try this way
# Have 27 items in collection
db = MongoClient(_URI)[DB_NAME][COLLECTION_NAME]
cursor = db.find()
count = db.find().explain().get("executionStats", {}).get("nReturned")
# Output: 27
cursor = db.find().limit(5)
count = db.find().explain().get("executionStats", {}).get("nReturned")
# Output: 5
# Can also use cursor
for item in cursor:
...
You can read more about it from https://pymongo.readthedocs.io/en/stable/api/pymongo/cursor.html#pymongo.cursor.Cursor.explain