Different semantics in $not $geoWithin with Polygon geometries between MongoDB 2.4 and 2.6 - mongodb

I have run the following experiment, comparing how MongoDB 2.4 and MongoDB 2.6 behave regarding the $geoWithin selector combined with $not with Polygons (i.e. "outside polygon" query). I'm including the particular versions (three numbers), alghouth I guess it would happend the same with other minor versions of 2.4 and 2.6.
Two documents (A and B) are created in a given collection: A with p field set to coordinates [1, 1] and B without p field. Next, I create a 2dsphere index in p and do a query for the area outside a triangle which vertices are [0, 0], [0, 4]and [4, 0]. Note that A is inside that polygon (so it is not supposed to be got with this query).
With 2.4.9:
db.x.insert({k: "A", p: [1,1]})
db.x.insert({k: "B"})
db.x.ensureIndex({"p": "2dsphere"})
db.x.find({p: { $not: { $geoWithin: { $geometry: { type: "Polygon", coordinates: [ [ [ 0, 0 ], [ 0, 4 ], [ 4, 0 ], [ 0, 0 ] ] ] } } } }})
--> no result
Makes sense: A is not returned (as it is inside the polygon) and B is not returned (given that it doesn't have a p field).
Next, testing with 2.6.1 the same script:
db.x.insert({k: "A", p: [1,1]})
db.x.insert({k: "B"})
db.x.ensureIndex({"p": "2dsphere"})
db.x.find({p: { $not: { $geoWithin: { $geometry: { type: "Polygon", coordinates: [ [ [ 0, 0 ], [ 0, 4 ], [ 4, 0 ], [ 0, 0 ] ] ] } } } }})
-> result: B
It seems that in 2.6 semantics have changed, so when the 2dsphere-indexed field is not in a given document, that document is considered outside any possible polygon.
Changing semantics between versions is ok as long as some mechanism in the new version allows to configure behaviour in the old way. I thought that mechanism was using { "2dsphereIndexVersion" : 1 } at index creation time (based on what I read here, maybe I misunderstood that information...). However, the result (with 2.6.1 again) is the same:
db.x.insert({k: "A", p: [1,1]})
db.x.insert({k: "B"})
db.x.ensureIndex({"p": "2dsphere"}, { "2dsphereIndexVersion" : 1 })
db.x.find({p: { $not: { $geoWithin: { $geometry: { type: "Polygon", coordinates: [ [ [ 0, 0 ], [ 0, 4 ], [ 4, 0 ], [ 0, 0 ] ] ] } } } }})
-> result B
Thus, is there any way of using MongoDB 2.6 with the same semantics that MongoDB 2.4 in the sense that any document without the 2dsphere-indexed not to be returned in "outside poylgon" queries?

The query result in 2.6 is right - the query result in 2.4 I think I would call incorrect. Technically, your query asks for documents that do not match the $geoWithin condition. The "k" : "B" document does not match the $geoWithin condition, so it should be returned by the query. You can drop results without the p field using $exists:
db.x.find({
"p" : {
"$exists" : true,
"$not" : { "$geoWithin" : {
"$geometry" : {
"type": "Polygon",
"coordinates" : [ [ [ 0, 0 ], [ 0, 4 ], [ 4, 0 ], [ 0, 0 ] ] ]
}
} } }
})
Also note that 1) your $not query isn't actually using the geo index, as you can check with an explain, and 2) when using a 2dsphere index you should store points as GeoJSON
{
"k" : "A",
"p" : {
"type" : "Point",
"coordinates" : [1,1]
}
}
Technically it's required in MongoDB >= 2.6, and the docs say it should be an error not to use GeoJSON, but it seems to work for us.

Related

Is there a way to query for dict of arrays of array in MongoDB

My mongo db collection contains the structure as :
{
"_id" : ObjectId("5889ce0d2e9bfa938c49208d"),
"filewise_word_freq" : {
"33236365" : [
[
"cluster",
4
],
[
"question",
2
],
[
"differ",
2
],[
"come",
1
]
],
"33204685" : [
[
"node",
6
],
[
"space",
4
],
[
"would",
3
],[
"templat",
1
]
]
},
"file_root" : "socialcast",
"main_cluster_name" : "node",
"most_common_words" : [
[
"node",
16
],
[
"cluster",
7
],
[
"n't",
3
]
]
}
I want to search for a value "node" inside the arrays of arrays of the filename (in my case its "33236365","33204685" and so on...) of the dict filewise_word_freq.
And if the value("node") is present inside any one of the array of arrays of the filename(33204685), then should return the filename(33204685).
I tried from this link of stackoverflow :
enter link description here
I tried to execute for my use case it didn't work. And above all this I didn't no how to return only the filename rather the entire object or document.
db.frequencydist.find({"file_root":'socialcast',"main_cluster_name":"node","filewise_word_freq":{$elemMatch:{$elemMatch:{$elemMatch:{$in:["node"]}}}}}).pretty().
It returned nothing.
Kindly help me.
the data model you have chosen has made it extremely difficult to either query or even for aggregation. I would suggest to revise your document model. However I think you can use $where
db.collection.find({"file_root": 'socialcast',
"main_cluster_name": "node", $where : "for(var i in this.filewise_word_freq){for(var j in this.filewise_word_freq[i]){if(this.filewise_word_freq[i][j].indexOf("node")>=0){return true}}}"})
yes, this will return you the whole document and from your application you might need to filter the files name out.
you might also want to see map-reduce functionality, though that's not recommended.
One other way is to do it through functions, functions runs on mongo server and are saved in a special collection.
Still going back to the db model, do revise it if that's a possibility. maybe something like
{
"_id" : ObjectId("5889ce0d2e9bfa938c49208d"),
"filewise_word_freq" : [
{
"fileName":"33236365",
"word_counts" : {
"cluster":4,
"question":2,
"differ":2,
"come":1
}
},
{
"fileName":"33204685",
"word_counts" : {
"node":6,
"space":4,
"would":3,
"template":1
}
}
]
"file_root" : "socialcast",
"main_cluster_name" : "node",
"most_common_words" : [
{
"node":16
},
{
"cluster":7
},
{
"n't":3
}
]
}
It would be a lot easier to run aggregation on these.
For this model, the aggregation would be something like
db.collection.aggregate([
{$unwind : "$filewise_word_freq"},
{$match : {'filewise_word_freq.word_counts.node' : {$gte : 0}}},
{$group :{_id: 1, fileNames : {$addToSet : "$filewise_word_freq.fileName"}}},
{$project :{ _id:0}}
])
this will provide you a single document with a single field fileNames with list of all the filename
{
fileNames : ["33204685"]
}
You can try something like this. This will match the node as part of the query and returns filewise_word_freq.33204685 as part of the projection.
db.collection.find({
"file_root": 'socialcast',
"main_cluster_name": "node",
"filewise_word_freq.33204685": {
$elemMatch: {
$elemMatch: {
$in: ["node"]
}
}
}
}, {
"filewise_word_freq.33204685": 1
}).pretty();

addToSet for an array in an array

Given this sample document:
> db.sample.find().pretty()
{
"_id" : ObjectId("570f76ca4fe66c8ae29f13cd"),
"a" : [
{
"b" : [
1,
2,
3
]
},
{
"b" : [
1,
2,
3,
4
]
},
{
"b" : [
4
]
}
]
}
I'm trying to add the number 4 to b array for each instance in the a array
I had hoped that
db.sample.update({},{$addToSet:{"a.b":4}})
would do the trick, but this yields the error:
cannot use the part (a of a.b) to traverse the element ({a: [ { b: [ 1.0, 2.0, 3.0 ] }, { b: [ 1.0, 2.0, 3.0, 4.0 ] }, { b: [ 4.0 ] } ]})
Is such a update possible? Obviously I can pull each document to the client side update and replace, but that's really only a last resort.
It looks like until SERVER-1243 Jira is implemented, you'll have to do it one-by-one for each item in the array, e.g.:
db.sample.update({},{$addToSet:{"a.0.b":4}})
db.sample.update({},{$addToSet:{"a.1.b":4}})
If you only need to update first element you could have used:
db.sample.update({},{$addToSet:{"a.$.b":4}})

Geo-query using a circle as area to match at least one of the points of MultiPoint object in MongoDB

I have the following document in the entities collection at Mongo (a 2dsphere index for location.coords is in place):
> db.entities.find({},{location: 1}).pretty()
{
"_id" : {
"id" : "en3",
"type" : "t",
"servicePath" : "/"
},
"location" : {
"attrName" : "position",
"coords" : {
"type" : "MultiPoint",
"coordinates" : [
[
-3.691944,
40.418889
],
[
4.691944,
45.418889
]
]
}
}
}
As far as I have checked, $geoWithin only matches when the geometry includes all the points of the MultiPoint, e.g:
> db.entities.find({"location.coords": { $geoWithin: { $centerSphere: [ [ -3.691944, 40.418889 ], 0.002118976612776644 ] } } })
// Small circle centered at first point, but without covering the second point: it doesn't matchh
> db.entities.find({"location.coords": { $geoWithin: { $centerSphere: [ [ -3.691944, 40.418889 ], 2 ] } } })
// Big circle centered at first point covering also the second point: it matches
However, I would like to have a query to match if at least one point of the MultiPoint matches. I have read about the $geoIntersects operator. I have tried just replace $geoWithin by $geoIntersect in my query, but it doesn't work:
> db.entities.find({"location.coords": { $geoIntersects: { $centerSphere: [ [ -3.691944, 40.418889 ], 0.002118976612776644 ] } } })
error: {
"$err" : "Can't canonicalize query: BadValue bad geo query",
"code" : 17287
}
Reading the $geoIntersects operator, it seems that it can be only used with polygons or multi-polygons, but it doesn't mention circles. I wonder if I'm missing something, because this "asymmetry" between $geoWithin and $geoIntersects seems to be a bit weird...
Thus, is there any way of doing a geo-query using a circle as area to match at least one of the points of MultiPoint object?
I think I have found the answer at the end. It can be done with the $near operator, in the following way:
db.entities.find({"location.coords": { $near: { $geometry: { type: "Point", "coordinates": [ -3.691944, 40.418889 ] }, $maxDistance: 0.5 } }})

MongoDB can't parse query (2dsphere): $geoWithin:

I have the following objects in my Collection that look like the following:
{
"_id" : ObjectId("527d33a8623f6efd1c997440"),
"location" : {
"geometry" : {
"type" : "Point",
"coordinates" : [
-78.4067,
37.26725
]
},
"type" : "Feature",
"properties" : {
"name" : "Something here"
}
},
"name" : "Name of Object"
}
I have the following index:
{
"location.geometry" : "2dsphere"
}
I can do the following:
db.myCollection.find({'location.geometry':{'$near':{'$geometry':{'type':"Point", 'coordinates': [-78.406700,37.267250]}, '$maxDistance' : 1000 }}})
However, I can Not do the following:
db.myCollection.find( { 'location.geometry': { '$geoWithin':
{ '$geometry' :
{ 'type' : "Polygon",
'coordinates' : [ [ -118.108006, 34.046072], [ -117.978230, 34.041521] , [ -117.987328,33.913645 ]] } }
} } )
As it returns with the error:
error: {
"$err" : "can't parse query (2dsphere): { $geoWithin: { $geometry: { type: \"Polygon\", coordinates: [ [ -118.108006, 34.046072 ], [ -117.97823, 34.041521 ], [ -117.987328, 33.913645 ] ] } } }",
"code" : 16535
}
Am I using geoWithin wrong? Can it not be used on this index?
The polygon that you are providing for $geowithin query is incorrect. A polygon needs to have the same start and end point as per GeoJSON definition.
The correct query is:
db.myCollection.find( { 'location.geometry':
{ '$geoWithin':
{ '$geometry' :
{ 'type' : "Polygon",
'coordinates' : [
[ -118.108006, 34.046072],
[ -117.978230, 34.041521],
[ -117.987328,33.913645 ],
[ -118.108006, 34.046072]
]
}
}
}
}
);
Notice the updated coordinates array.
Clearly, what is mentioned here in MongoDB docs about implicit connection of Polygons is NOT incorrect. It says that when you define the polygon using $polygon in MongoDB, only then is the connection implicit. It says nothing about being smart and making an implicit connection in the GeoJSON polygon provided to the query.
In fact, if for some GeoJSON variable you are saying that its type is polygon and you are not connecting its start with the end, then you have not created a correct GeoJSON polygon in the first place.
There is an error in the MongoDB documentation on $geoWithin queries. While the documentation states that:
The last point specified is always implicitly connected to the first.
You can specify as many points, and therefore sides, as you like.
This is incorrect. The polygon needs to be closed. There is an open ticket about this in MongoDB Jira:
https://jira.mongodb.org/browse/DOCS-2029
So your first and last points need to be equal - you cannot depend on MongoDB to implicitly draw the last line of the polygon.

mongodb 2.4.9 $geoWithin query on very simple dataset returning no results. Why?

Here is the output from my mongodb shell of a very simple example of a $geoWithin query. As you can see, I have only a single GeoJson Polygon in my collection, and each of its coordinates lies within the described $box. Furthermore, the GeoJson seems valid, as the 2dsphere index was created without error.
> db.Townships.find()
{ "_id" : ObjectId("5310f13c9f3a313af872530c"), "geometry" : { "type" : "Polygon", "coordinates" : [ [ [ -96.74084500000001, 36.99911500000002 ], [ -96.74975600000002, 36.99916100000001 ], [ -96.74953099999998, 36.99916000000002 ], [ -96.74084500000001, 36.99911500000002 ] ] ] }, "type" : "Feature" }
> db.Townships.ensureIndex( { "geometry" : "2dsphere"})
> db.Townships.find( { "geometry" : { $geoWithin : { "$box" : [[-97, 36], [-96, 37]] } } } ).count()
0
Thanks for any advice.
From documentation:
The $box operator specifies a rectangle for a geospatial $geoWithin query. The query returns documents that are within the bounds of the rectangle, according to their point-based location data. The $box operator returns documents based on grid coordinates and does not query for GeoJSON shapes.
If you insert this document...
db.Townships.insert(
{ "geometry" : [ -96.74084500000001, 36.99911500000002 ],
"type" : "Feature"
})
...your query will found it (but without index support).