calculating $avg value within a given geo polygon - mongodb

i'm trying to calculate a value within a given polygon:
acutally i'm using this pipeline:
'aggregation': {
'pipeline': [
{ "$match" : { "location" : "$loc" } },
{ "$group": { "_id": 'Average', "AvgField": { "$avg": "$myavgvalue" } , "count": {"$sum": 1} } },
]
}
but it seems the $match is ignoring the geospatial index.
any idea how i can do this ?
best regards
Harald

You need to use the ?aggregate={"$loc": ...} query syntax, so the parser knows it has to invoke the aggregation engine instead of the standard query parser. This example comes straight from the documentation:
$ curl -i http://example.com/posts?aggregate={"$value": 2}
Also, make sure the proper geo index has been added to the collection. Eve won't automatically do that for you, unless you explicitly choose to do so by setting mongo_indexes.

Related

How to match two different Object ids using MongoDB find() query?

I have an entry like below,
[
{
"_id":ObjectId("59ce020caa87df4da0ee2c78"),
"name": "Tom",
"owner_id": ObjectId("59ce020caa87df4da0ee2c78")
},
{
"_id":ObjectId("59ce020caa87df4da0ee2c79"),
"name": "John",
"owner_id": ObjectId("59ce020caa87df4da0ee2c78")
}
]
now, I need to find the person whose _id is equal to owner_id using find() in MongoDB.
Note, we can't not use $match (aggregation) due to some reason.
I am using this query,
db.people.find({ $where: "this._id == this.owner_id" })
but, it's not returning the expected output. Can anyone help me with this.
Thanks.
Using $expr and $eq you can get desired values avoiding the use of $where into a find stage (not aggregation necessary).
db.collection.find({
"$expr": {
"$eq": [
"$_id",
"$owner_id"
]
}
})

MongoDB - How can I use a field's value in the first argument of $centerSphere

I'm trying to get a negative match for $geoWithin, will be used in mongodb Charts.
all of the required information is in the result of the latest stage in an aggregation i'm constructing in mongodb compass, the result of that stage looks like this:
{
"PizzaId": "123",
"info": {
"timestamp": {
"$date": "2021-02-15T05:00:00.000Z"
},
"location": {
"type": "Point",
"coordinates": [33.21883773803711, 33.802675247192383]
},
"dayOfWeek": 2,
},
"PizzaLocation": [{
"_id": "456",
"location": {
"type": "Point",
"coordinates": [37.83396911621094, 37.07674026489258]
}
}]
}
I want to add a stage after that a filter that checks that info.location is not in a 100 km radius within Pizzalocation.0.location:
{
$match: {
"info.location.coordinates": {
$not:
{
$geoWithin: {
$centerSphere: [
"$PizzaLocation.0.location.coordinates",
100 / 6378.1
]
}
}
}
}
}
I get an error: Point must be an array or object
Things I tried:
playing with the field name in centerSphere: removing the 0, or $, using:
{$arrayElemAt: ["$PizzaLocation.location.coordinates",0]}
even used the [lon,lat] format and put
[{$arrayElemAt: [{$arrayElemAt: ["$PizzaLocation.location.coordinates",0]},0]},
{$arrayElemAt: [{$arrayElemAt: ["$PizzaLocation.location.coordinates",0]},1]}]
setting literal coordinates instead of field name, it worked, but I need to use a field.
creating a view that will hold the centerSphere itself, and use a lookup to get it, but mongoDB didn't recognize $geoWithin nor $centerSphere in $addField aggregation
Things I verified:
I used $project stage on {$arrayElemAt: ["$PizzaLocation.location.coordinates",0]} , and indeed it showed in the array: [lon,lat]
I used $project stage on
{$arrayElemAt: [{$arrayElemAt: ["$PizzaLocation.location.coordinates",0]},0]}
and
{$arrayElemAt: [{$arrayElemAt: ["$PizzaLocation.location.coordinates",0]},1]}
and indeed it showed a number for each one.
So, how can I use a field's value(s) in the first argument of $centerSphere.
thank you.
So you can't do it, let's first understand why not.
From the $match docs:
The query syntax is identical to the read operation query syntax;
This means $match queries use the same syntax as find queries. and unsurprisingly $geoWithin is a query operator.
Unfortunately query syntax can not access the document values as part of the query. This is also the reason why your query fails, the "coordinates" you pass are being parsed as a string expression.
For example this following query:
{
$match: {
field1: {$eq: "$field2"}
}
}
Matches: { field1: "$field2"} but no { field1: 1, field2: 1 }
Again this is just the query language parser's behaviour so there's not much you can do.
The alternative is to use an the $geoNear stage, but not only there is no easy way to combine it with $not logic there are additional restrictions like it having to be the first stage of the pipeline and so on.
The best I can recommend is split your query into 2 parts, 1 fetch the document you need and only then re-query it using $geoWithin with the proper coordinates input.

MongoDB query for finding number of people with conflicting schedules [duplicate]

I have startTime and endTime for all records like this:
{
startTime : 21345678
endTime : 31345678
}
I am trying to find number of all the conflicts. For example if there are two records and they overlap the number of conflict is 1. If there are three records and two of them overlap the conflict is 1. If there are three records and all three overlap the conflicts is 3 i.e [(X1, X2), (X1, X3), (X2, X3)]
As an algorithm I am thinking of sorting the data by start time and for each sorted record checking the end time and finding the records with start time less than the end time. This will be O(n2) time. A better approach will be using interval tree and inserting each record into the tree and finding the counts when overlaps occur. This will be O(nlgn) time.
I have not used mongoDB much so what kind of query can I use to achieve something like this?
As you correctly mention, there are different approaches with varying complexity inherent to their execution. This basically covers how they are done and which one you implement actually depends on which your data and use case is best suited to.
Current Range Match
MongoDB 3.6 $lookup
The most simple approach can be employed using the new syntax of the $lookup operator with MongoDB 3.6 that allows a pipeline to be given as the expression to "self join" to the same collection. This can basically query the collection again for any items where the starttime "or" endtime of the current document falls between the same values of any other document, not including the original of course:
db.getCollection('collection').aggregate([
{ "$lookup": {
"from": "collection",
"let": {
"_id": "$_id",
"starttime": "$starttime",
"endtime": "$endtime"
},
"pipeline": [
{ "$match": {
"$expr": {
"$and": [
{ "$ne": [ "$$_id", "$_id" },
{ "$or": [
{ "$and": [
{ "$gte": [ "$$starttime", "$starttime" ] },
{ "$lte": [ "$$starttime", "$endtime" ] }
]},
{ "$and": [
{ "$gte": [ "$$endtime", "$starttime" ] },
{ "$lte": [ "$$endtime", "$endtime" ] }
]}
]},
]
},
"as": "overlaps"
}},
{ "$count": "count" },
]
}},
{ "$match": { "overlaps.0": { "$exists": true } } }
])
The single $lookup performs the "join" on the same collection allowing you to keep the "current document" values for the "_id", "starttime" and "endtime" values respectively via the "let" option of the pipeline stage. These will be available as "local variables" using the $$ prefix in subsequent "pipeline" of the expression.
Within this "sub-pipeline" you use the $match pipeline stage and the $expr query operator, which allows you to evaluate aggregation framework logical expressions as part of the query condition. This allows the comparison between values as it selects new documents matching the conditions.
The conditions simply look for the "processed documents" where the "_id" field is not equal to the "current document", $and where either the "starttime"
$or "endtime" values of the "current document" falls between the same properties of the "processed document". Noting here that these as well as the respective $gte and $lte operators are the "aggregation comparison operators" and not the "query operator" form, as the returned result evaluated by $expr must be boolean in context. This is what the aggregation comparison operators actually do, and it's also the only way to pass in values for comparison.
Since we only want the "count" of the matches, the $count pipeline stage is used to do this. The result of the overall $lookup will be a "single element" array where there was a count, or an "empty array" where there was no match to the conditions.
An alternate case would be to "omit" the $count stage and simply allow the matching documents to return. This allows easy identification, but as an "array embedded within the document" you do need to be mindful of the number of "overlaps" that will be returned as whole documents and that this does not cause a breach of the BSON limit of 16MB. In most cases this should be fine, but for cases where you expect a large number of overlaps for a given document this can be a real case. So it's really something more to be aware of.
The $lookup pipeline stage in this context will "always" return an array in result, even if empty. The name of the output property "merging" into the existing document will be "overlaps" as specified in the "as" property to the $lookup stage.
Following the $lookup, we can then do a simple $match with a regular query expression employing the $exists test for the 0 index value of output array. Where there actually is some content in the array and therefore "overlaps" the condition will be true and the document returned, showing either the count or the documents "overlapping" as per your selection.
Other versions - Queries to "join"
The alternate case where your MongoDB lacks this support is to "join" manually by issuing the same query conditions outlined above for each document examined:
db.getCollection('collection').find().map( d => {
var overlaps = db.getCollection('collection').find({
"_id": { "$ne": d._id },
"$or": [
{ "starttime": { "$gte": d.starttime, "$lte": d.endtime } },
{ "endtime": { "$gte": d.starttime, "$lte": d.endtime } }
]
}).toArray();
return ( overlaps.length !== 0 )
? Object.assign(
d,
{
"overlaps": {
"count": overlaps.length,
"documents": overlaps
}
}
)
: null;
}).filter(e => e != null);
This is essentially the same logic except we actually need to go "back to the database" in order to issue the query to match the overlapping documents. This time it's the "query operators" used to find where the current document values fall between those of the processed document.
Because the results are already returned from the server, there is no BSON limit restriction on adding content to the output. You might have memory restrictions, but that's another issue. Simply put we return the array rather than cursor via .toArray() so we have the matching documents and can simply access the array length to obtain a count. If you don't actually need the documents, then using .count() instead of .find() is far more efficient since there is not the document fetching overhead.
The output is then simply merged with the existing document, where the other important distinction is that since theses are "multiple queries" there is no way of providing the condition that they must "match" something. So this leaves us with considering there will be results where the count ( or array length ) is 0 and all we can do at this time is return a null value which we can later .filter() from the result array. Other methods of iterating the cursor employ the same basic principle of "discarding" results where we do not want them. But nothing stops the query being run on the server and this filtering is "post processing" in some form or the other.
Reducing Complexity
So the above approaches work with the structure as described, but of course the overall complexity requires that for each document you must essentially examine every other document in the collection in order to look for overlaps. Therefore whilst using $lookup allows for some "efficiency" in reduction of transport and response overhead, it still suffers the same problem that you are still essentially comparing each document to everything.
A better solution "where you can make it fit" is to instead store a "hard value"* representative of the interval on each document. For instance we could "presume" that there are solid "booking" periods of one hour within a day for a total of 24 booking periods. This "could" be represented something like:
{ "_id": "A", "booking": [ 10, 11, 12 ] }
{ "_id": "B", "booking": [ 12, 13, 14 ] }
{ "_id": "C", "booking": [ 7, 8 ] }
{ "_id": "D", "booking": [ 9, 10, 11 ] }
With data organized like that where there was a set indicator for the interval the complexity is greatly reduced since it's really just a matter of "grouping" on the interval value from the array within the "booking" property:
db.booking.aggregate([
{ "$unwind": "$booking" },
{ "$group": { "_id": "$booking", "docs": { "$push": "$_id" } } },
{ "$match": { "docs.1": { "$exists": true } } }
])
And the output:
{ "_id" : 10, "docs" : [ "A", "D" ] }
{ "_id" : 11, "docs" : [ "A", "D" ] }
{ "_id" : 12, "docs" : [ "A", "B" ] }
That correctly identifies that for the 10 and 11 intervals both "A" and "D" contain the overlap, whilst "B" and "A" overlap on 12. Other intervals and documents matching are excluded via the same $exists test except this time on the 1 index ( or second array element being present ) in order to see that there was "more than one" document in the grouping, hence indicating an overlap.
This simply employs the $unwind aggregation pipeline stage to "deconstruct/denormalize" the array content so we can access the inner values for grouping. This is exactly what happens in the $group stage where the "key" provided is the booking interval id and the $push operator is used to "collect" data about the current document which was found in that group. The $match is as explained earlier.
This can even be expanded for alternate presentation:
db.booking.aggregate([
{ "$unwind": "$booking" },
{ "$group": { "_id": "$booking", "docs": { "$push": "$_id" } } },
{ "$match": { "docs.1": { "$exists": true } } },
{ "$unwind": "$docs" },
{ "$group": {
"_id": "$docs",
"intervals": { "$push": "$_id" }
}}
])
With output:
{ "_id" : "B", "intervals" : [ 12 ] }
{ "_id" : "D", "intervals" : [ 10, 11 ] }
{ "_id" : "A", "intervals" : [ 10, 11, 12 ] }
It's a simplified demonstration, but where the data you have would allow it for the sort of analysis required then this is the far more efficient approach. So if you can keep the "granularity" to be fixed to "set" intervals which can be commonly recorded on each document, then the analysis and reporting can use the latter approach to quickly and efficiently identify such overlaps.
Essentially, this is how you would implement what you basically mentioned as a "better" approach anyway, and the first being a "slight" improvement over what you originally theorized. See which one actually suits your situation, but this should explain the implementation and the differences.

Order_by length of listfield in mongoengine

I wan't to run a query to get all Articles that have more than 6 com and then sort according length of com list,
for this i doing it:
ArticleModel.objects.filter(com__6__exists=True).order_by('-com.length')[:50]
suppose com is a ListField, but ordering not work, how can i fix it? thanks
Standard queries cannot do this as the "sort" needs to be done on a physical field present in the document. The best way to do this is to actually keep a count of your "list" as another field in the document. That also makes your query more efficient as well as that "counter" field can be indexed, so the basic query becomes ( Raw MongoDB sytax ) :
{ "comLength": { "$gt": 6 } }
If you cannot or do not want to change the document structure then the only way to otherwise sort on the length of your list is to $project it via .aggregate():
ArticleModel._get_collection().aggregate([
{ "$match": { "com.6": { "$exists": true } }},
{ "$project": {
"com": 1,
"otherField": 1,
"comLength": { "$size": "$com" }
}},
{ "$sort": { "comLength": -1 } }
])
And that considers that you have MongoDB 2.6 at least for the use of the $size aggregation operator. If you don't then you have to $unwind and $group in order to calculate the length of arrays:
ArticleModel._get_collection().aggregate([
{ "$match": { "com.6": { "$exists": true } }},
{ "$unwind": "$com" },
{ "$group": {
"_id": "$_id",
"otherField": { "$first": "$otherField" }
"com": { "$push": "$com" },
"comLength": { "$sum": 1 }
}},
{ "$sort": { "comLength": -1 } }
])
So if you are going to go down that route then take a good look at the documentation since you are possibly not used to the raw MongoDB syntax and have been using the query DSL that MongoEngine provdides.
Overall, only the aggregation providers in .aggregate() or .mapReduce() can actually "create a field" that is not present within the document. There is also not test for the "current" length that is available to standard projection or sorting of documents either.
Your best option to to add another field and keep it in sync with the actual array length. But failing that the above shows you the general approach.
If you're creating the database and you know such request will mostly be requested a lot it's recommended to add "com_length" field in A ArticleModel and make it automatically inserted on every save using save() method
add inside of your ArticleModel in models.py
def save(self, *args, **kwargs):
self.com_length = len(self.com)
return super(ArticleModel, self).save(*args, **kwargs)
then for requesting the asked question:
ArticleModel.objects.filter(com__6__exists=True).order_by('-com_length')[:50]

Getting first and last element of array in MongoDB

Mongo DB: I'm looking to make one query to return both the first and last element of an array. I realize that I can do this multiple queries, but I would really like to do it with one.
Assume a collection "test" where each objects has an array "arr" of numbers:
db.test.find({},{arr:{$slice: -1},arr:{$slice: 1}});
This will result in the following:
{ "_id" : ObjectId("xxx"), "arr" : [ 1 ] } <-- 1 is the first element
Is there a way to maybe alias the results? Similar to what the mysql AS keyword would allow in a query?
This is not possible at the moment but will be with the Aggregation Framework that's in development now if I understand your functional requirement correctly.
You have to wonder about your schema if you have this requirement in the first place though. Are you sure there isn't a more elegant way to get this to work by changing your schema accordingly?
This can be done with the aggregation framework using the operators $first and $last as follows:
db.test.aggregate([
{ '$addFields': {
'firstElem': { '$first': '$arr' },
'lastElem': { '$last': '$arr' }
} }
])
or using $slice as
db.test.aggregate([
{ '$addFields': {
'firstElem': { '$slice': [ '$arr', 1 ] },
'lastElem': { '$slice': [ '$arr', -1 ] }
} }
])