I want to order my results based on their proximity to MULTIPLE points in a 2D space.
I assume this would be done by querying against the first point and then re-querying/checking those results against the second point?
Maybe the code below explains what I am trying to achieve a bit better?
db.runCommand({
geoNear:"places",
near:[ [52.5243, 13.4063], [48.1448, 11.5580] ]
})
Solution: Incase anyone is interested, this is how I achieved this (thanks to the answer below)
a = Trip.geo_near([52.5243, 13.4063], :max_distance => 40, :unit => :mi).uniq
b = Trip.geo_near([48.1448, 11.5580], :max_distance => 40, :unit => :mi).uniq
#results = a & b
MongoDB has a whole section in their documentation on Geospacial indexing. http://www.mongodb.org/display/DOCS/Geospatial+Indexing
I think what you're looking for is a bounding box query. This is directly from their code examples.
box = [[40.73083, -73.99756], [40.741404, -73.988135]]
db.places.find({"loc" : {"$within" : {"$box" : box}}})
What do you intend the query above to return? Places that are near one OR the other location? In that case, you should run two queries, then union the results in your application code.
Related
How do you concatenate multiple pymongo Cursor? If not it is not possible, how do you take results from multiple Cursor and create a new one?
Example :
result1 = db[collection].find(query1)
result2 = db[collection].find(query2)
concat_result = result1 + result2 #something like that.
Update :
All answers here seems to take into account that the queries are in the same format. For example. query1 might get 2 documents between dates as query2 might sorts documents by categories and may be limited by a count of 5. $or is too homogeneous for what I need. After concatening those two queries, I need to sort them base on another key.
For further details, a class Printer needs to receive a pymongo.Cursor and only one and i'm stuck with this.
The easiest way is to use mongo $or operator like
db[collection].find({'$or': [query1, query2]})
Or if you have got to do this in python you
def concat_results(*results):
ids = set()
for result in results:
for v in result:
if v['_id'] not in ids:
ids.add(v['_id'])
yield v1
concat_result = list(concat_results(result1, result2))
yes the wise solution would be to use the $or as stated above.
if you wanted to do so in a pythonic way then you could:
a = [item for item in db[collection].find({filters},{select_fields})]
b = [item for item in db[collection].find({filters},{select_fields})]
c = []
for x,y in zip(a,b):
c += [x, y]
Currently i have two gremlin queries which will fetch two different values and i am populating in a map.
Scenario : A->B , A->C , A->D
My queries below,
graph.V().has(ID,A).out().label().toList()
Fetch the list of outE labels of A .
Result : List(B,C,D)
graph.traversal().V().has("ID",A).outE("interference").as("x").otherV().has("ID",B).select("x").values("value").headOption()
Given A and B , get the egde property value (A->B)
Return : 10
Is it possible that i can combine both there queries to get a return as Map[(B,10)(C,11)(D,12)]
I am facing some performance issue when i have two queries. Its taking more time
There is probably a better way to do this but I managed to get something with the following traversal:
gremlin> graph.traversal().V().has("ID","A").outE("interference").as("x").otherV().has("ID").label().as("y").select("x").by("value").as("z").select("y", "z").select(values);
==>[B,1]
==>[C,2]
I would wait for more answers though as I suspect there is a better traversal out there.
Below is working in scala
val b = StepLabel[Edge]()
val y = StepLabel[Label]()
val z = StepLabel[Integer]()
graph.traversal().V().has("ID",A).outE("interference").as(b)
.otherV().label().as(y)
.select(b).values("name").as(z)
.select((y,z)).toMap[String,Integer]
This will return Map[String,Int]
Suppose I have a collection containing the following documents:
...
{
event_counter : 3
event_type: 50
event_data: "yaya"
}
{
event_counter : 4
event_type: 100
event_data: "whowho"
}
...
Is it possible to ask for:
for each document, e where e.event_type == 100
get me any document f where
f.event_counter = e.event_counter+1
or equivalently:
find each f, where f.event_counter==e.event_counter+1 && e.event_type==100
I think the best way for you to approach this is on the application side, using multiple queries. You would want to run a query to match all documents with e.event_type = 100, like this one:
db.collection.find({"e.event_type" : 100});
Then, you'll have to write some logic to iterate through the results and run more queries to find documents with the right value of f.event_counter.
I am not sure it's possible to do this using MongoDB's aggregation framework. If it is possible, it will be quite a complicated query.
After reading about MongoDB and Geospatial Indexing
I was amazed that it did not support compound keys not starting with the 2d index.
I dont know if I would gain anything on it, but right now the mssql solution is just as slow/fast.
SELECT TOP 30 * FROM Villages WHERE SID = 10 ORDER BY (math to calc radius from the center point)
This works, but is slow because it not smart enough to use a index so it has to calc the radius for all villages with that SID.
So in Mongo I wanted to create an index like: {sid: 1, loc: "2d"} so I could filter out alot from the start.
I'm not sure there are any solutions for this. I thought about creating a collection for each sid since they don't share any information. But what are the disadvantages of this? Or is this how people do it ?
Update
The maps are flat: 800, 800 to -800,-800, villages are places from the center of the map and out. There are about 300 different maps which are not related, so they could be in diff collections, but not sure about the overhead.
If more information is need, please let me know.
What I have tried
> var res = db.Villages.find({sid: 464})
> db.Villages.find({loc: {$near: [50, 50]}, sid: {$in: res}})
error: { "$err" : "invalid query", "code" : 12580 }
>
Also tried this
db.Villages.find({loc: {$near: [50, 50]}, sid: {$in: db.Villages.find({sid: 464}, {sid: 1})}})
error: { "$err" : "invalid query", "code" : 12580 }
I'm not really sure what I'm doing wrong, but its probably somthing about the syntax. Confused here.
As you stated already Mongodb cannot accept location as secondary key in geo index. 2d has to be first in index. So you are out of luck here in changing indexing patterns here.
But there is a workaround, instead the compound geo index you can create two separate indexes on sid and one compound index with loc and sid
db.your_collection.ensureIndex({sid : 1})
db.your_collection.ensureIndex({loc : '2d',sid:1})
or two separate indexes on sid and loc
db.your_collection.ensureIndex({sid : 1})
db.your_collection.ensureIndex({loc : '2d'})
(am not sure which of the above one is efficient, you can try it yourself)
and you can make two different queries to get the results filterd by sid first and the location next, kinda like this
res = db.your_collection.find({sid:10})
//get all the ids from the res (res_ids)
//and query by location using the ids
db.your_collection.find({loc:{ $near : [50,50] } ,sid : {$in : res_ids}})
for post in db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num):
How do I get the count()?
Since pymongo version 3.7.0 and above count() is deprecated. Instead use Collection.count_documents. Running cursor.count or collection.count will result in following warning message:
DeprecationWarning: count is deprecated. Use Collection.count_documents instead.
To use count_documents the code can be adjusted as follows
import pymongo
db = pymongo.MongoClient()
col = db[DATABASE][COLLECTION]
find = {"test_set":"abc"}
sort = [("abc",pymongo.DESCENDING)]
skip = 10
limit = 10
doc_count = col.count_documents(find, skip=skip)
results = col.find(find).sort(sort).skip(skip).limit(limit)
for doc in result:
//Process Document
Note: count_documents method performs relatively slow as compared to count method. In order to optimize you can use collection.estimated_document_count. This method will return estimated number of docs(as the name suggested) based on collection metadata.
If you're using pymongo version 3.7.0 or higher, see this answer instead.
If you want results_count to ignore your limit():
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count()
for post in results:
If you want the results_count to be capped at your limit(), set applySkipLimit to True:
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count(True)
for post in results:
Not sure why you want the count if you are already passing limit 'num'. Anyway if you want to assert, here is what you should do.
results = db.datasets.find({"test_set":"abc"}).sort("abc",pymongo.DESCENDING).skip((page-1)*num).limit(num)
results_count = results.count(True)
That will match results_count with num
Cannot comment unfortuantely on #Sohaib Farooqi's answer... Quick note: although, cursor.count() has been deprecated it is significantly faster, than collection.count_documents() in all of my tests, when counting all documents in a collection (ie. filter={}). Running db.currentOp() reveals that collection.count_documents() uses an aggregation pipeline, while cursor.count() doesn't. This might be a cause.
This thread happens to be 11 years old. However, in 2022 the 'count()' function has been deprecated. Here is a way I came up with to count documents in MongoDB using Python. Here is a picture of the code snippet. Making a empty list is not needed I just wanted to be outlandish. Hope this helps :). Code snippet here.
The thing in my case relies in the count of matched elements for a given query, and surely not to repeat this query twice:
one to get the count, and
two to get the result set.
no way
I know the query result set is not quite big and fits in memory, therefore, I can convert it to a list, and get the list length.
This code illustrates the use case:
# pymongo 3.9.0
while not is_over:
it = items.find({"some": "/value/"}).skip(offset).size(limit)
# List will load the cursor content into memory
it = list(it)
if len(it) < size:
is_over = True
offset += size
If you want to use cursor and also want count, you can try this way
# Have 27 items in collection
db = MongoClient(_URI)[DB_NAME][COLLECTION_NAME]
cursor = db.find()
count = db.find().explain().get("executionStats", {}).get("nReturned")
# Output: 27
cursor = db.find().limit(5)
count = db.find().explain().get("executionStats", {}).get("nReturned")
# Output: 5
# Can also use cursor
for item in cursor:
...
You can read more about it from https://pymongo.readthedocs.io/en/stable/api/pymongo/cursor.html#pymongo.cursor.Cursor.explain