I'm new to OSM file format. I have a big OSM file which 2GB large.
What is the best way to read OSM data ?
If I need to separate data into several chunks and process, is there a way to the partition?
If you refer to planet dumps/extracts, you use osmosis to split on bounding boxes or boundary polygones. Otherwise you can query the DB via Overpass API and don't need to splitup as you can request the desired bboxes.
Related
How to download OSM building data to get the building levels based on latitude and longitude
The typical way would be to download the full planet data from planet.osm.org, or a regional extract for the region you're interested in, import that into a GIS database using either the osm2pgsql or imposm tool, and then to query that database. Both import tools can be configured to import certain object types, like here: buildings, only to keep the database small and fast. There is no "just the buildings" extract though.
The other way, if you don't want to set up a database yourself, and only have a low query volume, might be to use one of the OverpassAPI servers out there:
https://wiki.openstreetmap.org/wiki/Overpass_API
I try to filter the polygons of an .osm by their area.
Now it's about swiss lakes. I extracted all the polygons using the "natural=water" filter, but I still have all the ponds of Switzerland. Therefore I try to add a filter using the area of the polygons.
How can I do it??
I have allready searched some solutions, but was unable to find good answer.
The best I found was this question but I don't know where I should execute it and if it is compatible with osm data.
Thanks for your answers
One way to solve this issue would be to use Atlas. Here are the steps you would need to follow (with links to other related SO answers inline):
Convert your OSM file to a .osm.pbf using osmosis
Load the .osm.pbf into an Atlas file, using atlas-shell-tools's pbf2atlas subcommand
Write a small Java class that opens the Atlas file, gets all the lakes using a TaggableFilter, and then filters them by area, using Polygon.getSurface().
I am currently taking the great-britain-latest.osm file from https://download.geofabrik.de/europe.html and importing it into POSTGIS for use with Nominatim.
However I would also like to render mapping tiles using Maperative which cannot handle the size of the entire great-britain file. Therefore my intention is to split the area into smaller .osm file chunks and process these.
My understanding is that Osmosis can be used to create these smaller files, and that it can do so from either the POSTGIS database or the large .osm file directly. Can anyone advise which approach would be faster?
I want to create a global dataset of wetlands using the OSM database. As there are problems for huge datasets if I use the overpass-turbo or so, I thought I could use wget to download the planet file and filter it for only the data I'm interested in. Problem is, I don't know much about wget. So I wanted to know whether there is a way to filter the data from the planet file while downloading and unzipping it?
In general I'm looking for the least time and disk-space consuming way to get to that data. Would you have any suggestions?
wget is just a tool for downloading, it can't filter on the fly. There is probably a chance that you can pipe the data to a second tool which does filtering on the fly but I don't see any advantage here. And the disadvantage is that you can't verify the file checksum afterwards.
Download the planet and filter it afterwards using osmosis or osmfilter.
I am new to HBase. Here is my problem.
I have a very large HBase table. An example data in the table.
1003:15:Species1:MONTH:01 0.1,02 0.7,03 0.3,04 0.1,05 0.1,06 0,07 0,08 0,09 0.1,10 0.2,11 0.3,12 0.1:LATITUDE 26.664503840000002 29.145674380000003,LONGITUDE -96.27139215 -90.40762858
As you can see for each Species there is a month attribute (12 vectors), Lat & Long, etc. There are around 300 unique species and several 1000 observations for one particular species.
I have written a Mapreduce job which does K-means clustering on one particular species. The output of my MR is
C1:1003:15:Species1:MONTH:01 0.1,02 0.7,03 0.3,04 0.1,05 0.1,06 0,07 0,08 0,09 0.1,10 0.2,11 0.3,12 0.1:LATITUDE 26.664503840000002 29.145674380000003,LONGITUDE -96.27139215 -90.40762858
The C1 indicates which cluster it belongs to.
Now I want to visualize the output i.e plot all the Lat and Long for each cluster on a Map. I was thinking of using Mapbox.js and D3.js for my data visualization, since the Lat and Longs in the data are bounding boxes for a particular region.
If I write the o/p of my MR back to Hbase is it possible to retrive the data using javascript on the client side ?
I was thinking of either writing the data to MongoDB which I can query using JS or write a program to create a JSON from the Hbase table which I can visualize. Any suggestions ?
You can use HBAse REST API though security-wise it is probably safer to put your own service in the middle
you can also use node-hbase from https://github.com/alibaba/node-hbase-client to read the hbase data
you can also use hbase-rpc-client https://github.com/falsecz/hbase-rpc-client to read data from nodejs. This client supports hbase 0.96+