How to query MongoDB via Spark for geospatial queries - mongodb

Is there any way to use MongoDB with Spark for geospatial queries? I cannot see how to do that with Stratio.

There are many ways to query geospatial data from spark. Use magellan https://github.com/harsha2010/magellan or hive esri geospatial toolkit. https://github.com/Esri/spatial-framework-for-hadoop
I've never tried the mongo librairie from stratio, but with the spark data source api, or the mongo connector, I think you can run geo queries with the mongo syntax then convert them into an RDD or a Dataframe.

You can query MongoDB from Spark SQL using this library.
MongoDB allows applications to do the following types of query on geospatial data: inclusion, intersection, proximity.
Obviously, you will be able to use all the other operator in addition to the geospatial ones. Let's now look at some concrete examples.
Here is an example:
Find all the airports in California. For this you need to get the California location (Polygon) and use the command $geoWithin in the query. From the shell it will look like :
use geo
var cal = db.states.findOne( {code : "CA"} );
db.airports.find(
{
loc : { $geoWithin : { $geometry : cal.loc } }
},
{ name : 1 , type : 1, code : 1, _id: 0 }
);
The Result:
{ "name" : "Modesto City - County", "type" : "", "code" : "MOD" }
...
{ "name" : "San Francisco Intl", "type" : "International", "code" : "SFO" }
{ "name" : "San Jose International", "type" : "International", "code" : "SJC" }
If you would like to try other example, check out this blog post here.

Related

Geo-spatial MongoDB Pizza Restaurant Query. Is the 2dSphere index correct?

I'm stuck on a homework question regarding using the Mongodb shell to query two geo-spatial data collections. Which are as follows:
An example record from the restaurants dataset is:
{
"_id" : ObjectId("55cba2476c522cafdb053ae8"),
"location" : {
"coordinates" : [
-73.9973325,
40.61174889999999
],
"type" : "Point"
},
"name" : "C & C Catering Service"
}
An example record from the zipcodes dataset is:
{
"_id" : "01002",
"city" : "CUSHMAN",
"loc" : [
-72.51565,
42.377017
],
"pop" : 36963,
"state" : "MA"
}
The question is :
For each Pizza restaurant from the above set, find the city that it is located in. You can assume that a restaurant is located in the nearest city to its location.
I have created the variable for finding the pizza restaurants:
var pizza = db.restaurants.find({"name" : /pizza/i});
I created these indexes but not sure weather I need a compound index.
db.zipcodes.createIndex({"loc" : "2dsphere"});
db.restaurants.createIndex({"location" : "2dsphere"});
Are these indexes correct? and if so what should I use for creating the query? Aggregate as well as $nearSphere?

Meteor JS CollectionFS and/or MongoDB Query: How to Query CollectionFS For a Document

This is the code I used to initialise my CollectionFS:
Uploads = new FS.Collection('uploads', {
stores: [new FS.Store.FileSystem('uploads', {path: '~/projectUploads'})]
});
This is my CollectionFS document I've inserted as shown in Robomongo:
{
"_id" : "n3M8geaZXnNkZ7mHP",
"original" : {
"name" : "AguaBendita.jpg",
"updatedAt" : ISODate("2014-02-19T11:05:40.000Z"),
"size" : 73719,
"type" : "image/jpeg"
},
"uploadedAt" : ISODate("2015-04-04T09:24:49.433Z"),
"copies" : {
"uploads" : {
"name" : "AguaBendita.jpg",
"type" : "image/jpeg",
"size" : 73719,
"key" : "uploads-n3M8geaZXnNkZ7mHP-AguaBendita.jpg",
"updatedAt" : ISODate("2015-04-04T09:24:49.000Z"),
"createdAt" : ISODate("2015-04-04T09:24:49.000Z")
}
}
}
I know how to find or findOne the document based on the _id
e.g.
var testingfetchAgua = Uploads.find({_id: 'n3M8geaZXnNkZ7mHP'}).fetch();
or
var testingfetchAgua = Uploads.files.findOne({_id: 'n3M8geaZXnNkZ7mHP'}).fetch();
however I don't know how to find or findOne the document based on the "name" key/value inside the "original" key/value?
Is this even possible in CollectionFS?
How does one do that in a CollectionFS query?
How do you do that in a MongoDB query?
Is the query in MongoDB the same in CollectionFS?
It is possible.
You do it with a dot notation like so: var testingfetchAgua = Uploads.find({ 'original.name': 'AguaBendita.jpg' }).fetch();
AFAIK queries are the same, since CollectionFS uses MongoDB.
Here's also a related question. And here's MongoDB docs, which you can consult when facing such issues.
Edit (to expand a bit on the relation of FS and Mongo):
When you call Uploads.files you actually get the underlying MongoDB Collection.
An FS.Collection provides a collection in which information about files can be stored. It is backed by an underlying normal Mongo.Collection instance. Most collection methods, such as find and insert are available on the FS.Collection instance. If you need to call other collection methods such as _ensureIndex, you can call them directly on the underlying Mongo.Collection instance available through myFSCollection.files.
From the CollectionFS github readme

MongoDB geospatial query with find()

I have some documents with a "loc" field that looks like this:
mongos> db.foo.findOne()
{
// ... some other stuff
"loc" : {
"type" : "Point",
"coordinates" : [
-83.362342,
26.687779
]
}
}
It's indexed like so:
mongos> db.foo.getIndices()
[
// ... some other stuff
{
"v" : 1,
"key" : {
"loc" : "2dsphere"
},
"name" : "loc_2dsphere",
"ns" : "amitest.foo",
"2dsphereIndexVersion" : 2
},
]
According to the docs I should be able to query it like the following, but I get the error shown:
mongos> db.foo.find(
{loc :
{$near:
{$geometry:
{type: "Point",
coordinates: [-83.362342, 26.687779]
},
$maxDistance: 100
}
}
}
).limit(5)
error: { "$err" : "use geoNear command rather than $near query", "code" : 13501 }
I can successfully query the field using runCommand(), but that's not ideal because I can't combine it with other criteria:
mongos> db.runCommand(
{geoNear : "foo",
near : {type : "Point",
coordinates : [-83.362342, 26.687779]
},
spherical : true,
maxDistance : 10,
limit : 5
}
)
I've got MongoDB 2.6.0, on a hash-sharded collection with 8 shards.
In the geospatial index documentation, it states that: "For sharded collections, queries using $near are not supported. You can instead use either the geoNear command or the $geoNear aggregation stage."
There is a bug related to this, https://jira.mongodb.org/browse/SERVER-926, which is marked as closed and targets a future release of MongoDB, 1.7.2, which will enable the $near operator to be used with sharded collections. However, note, there is a big but which is that near queries will be routed to all shards via mongos, which is likely to be very inefficient. This is because sharding is not supported on a geo column in general, see the related and still open bug https://jira.mongodb.org/browse/SERVER-1982
This, in turn, is related to to the fact that MongoDB uses geohashing to convert two dimensional geographical objects to something that can be both indexed using a B-tree and something that could, in theory, be used as a shard key. It is a difficult problem to fix, as with geohash indexes, objects very close together, can end up with hash values very far apart, and so it is difficult to design something that is both appropriate to use as a shard key, but also supports efficient geospatial querying such as $near. See the limitations section in the Wikipedia geohash article for more on potential issues with geohashing.

spring mongo support for geoJSOn for 2dsphere index

Mongo though provides 2dsphere index on legacy co-ordinates, the query requires to present to Point/Shapes in geoJSON format. For e.g., I have inserted the following records to address collection.
{ "city" : "First", "geo" : [ 13.45, 23.46 ] }
{ "city" : "Second", "geo" : [ 13.45, 20.46 ] }
Then I added 2dsphere index using following command as mongodb still allows 2dsphere index on legacy co-ordinates.
db.address.ensureIndex({"geo":"2dsphere"})
Then if I do $near query using legacy format, but got an exception.
> db.address.find({"geo":{$near:{"x":13.45,"y":23.45}}})
error: {
"$err" : "can't parse query (2dsphere): { $near: { x: 13.45, y: 23.45 } }",
"code" : 16535
}
But If do same query with geoJSON format, then I get result.
> db.address.find({"geo":{$near:{"type":"Point",coordinates:[13.45,23.45]}}})
{ "_id" : ObjectId("537306b4b8ac1f134d9efe89"), "city" : "First", "geo" : [ 13.45, 23.46 ] }
{ "_id" : ObjectId("537306c3b8ac1f134d9efe8a"), "city" : "Second", "geo" : [ 13.45, 20.46 ] }
My question is, GeoConverters has all conversion made to legacy format. So, obviously they wont' work if I use 2dsphere index. Are there any converts available for geoJSON format. Is there any workaround?
Currently spring-data-mongo doesn't support the new mongo (> 2.4) 2dsphere indexes. There is a open issue on Jira about it:
https://jira.spring.io/browse/DATAMONGO-1113?jql=project%20%3D%20DATAMONGO%20AND%20text%20~%20%22%24geometry%22
In the link you can find a gist link to example of how create such converters. You can use it or you can overcome this limitation creating a #Query with the query that you want that spring-data-mongo execute.
Regards.
avaz

Geonames and MongoDB: Exporting Access 2010 into MongoDB to use geospatial index

I have an Access 2010 database with information from Geonames. The table name is cities. For simplicity the columns are "name", "latitude", "longitude", "loc". I created "loc" to combine "longitude" and "latitude" into one field so that I can use MongoDB's geospatial index. I had to save the "loc" column in string format since I am combining two decimal values with a comma in between.
When I export this table to a CSV I skip "longitude" and "latitude" since I have the new combined "loc" field.
The CSV looks like:
"geonameid", "name", "loc"
1, "Sweden", "18.068287,59.336341"
After I run MongoImport into "cities" collection, the loc field is embraced with quotation marks such as:
{ "_id" : ObjectId("50a9ed157db1bcfe22cddd0a"), "geonameid" : 20, "name" : "Sweden", "loc" : "18.068287,59.336341", }
From here I run a command to turn "loc" into an array. I use the following command thanks to another contributor here on Stackoverflow:
db.cities.find( { 'loc' : {$type : 2 } } ).forEach( function (x) { x.loc = new Array(x.loc); db.cities.save(x); });
The result is:
{ "_id" : ObjectId("50a9ed157db1bcfe22cddd0a"), "geonameid" : 20, "name" : "Sweden", "loc" : [ "18.068287,59.336341" ], }
How can I remove the quotation marks from my gps coordinates in the "loc" field? I can't code in java and don't mind waiting for MongoDB to perform this task. If it would work to export the CSV with the "longitude" and "latitude" fields and combine them into a new "loc" field in MongoDB I don't mind that either even if it is not a beautiful solution. I just need to get the "loc" field working with the geospatial index.
The command to turn loc into a usable geospatial array is :
db.cities.find( { 'loc' : {$type : 2 } } ).forEach( function (x) { x.loc = x.loc.split(",").map(parseFloat); db.cities.save(x); });
You combined numbers into "loc" fild so now it's a string and it's being quoted. Separate it back to numbers before importing into MongoDB.
But you need "loc" field to be an array to use it with GEOindex - and arrays aren't possible with CSV. You need to find a way to export data from MS Access to JSON and then import JSON to MongoDB with "mongoimport --jsonArray" command
Exporting Access to JSON - http://www.divconq.com/2010/export-a-microsoft-access-database-to-json/