Would I be faster using Osmosis to generate smaller .osm files from PostGIS or directly from a larger .osm file? - openstreetmap

I am currently taking the great-britain-latest.osm file from https://download.geofabrik.de/europe.html and importing it into POSTGIS for use with Nominatim.
However I would also like to render mapping tiles using Maperative which cannot handle the size of the entire great-britain file. Therefore my intention is to split the area into smaller .osm file chunks and process these.
My understanding is that Osmosis can be used to create these smaller files, and that it can do so from either the POSTGIS database or the large .osm file directly. Can anyone advise which approach would be faster?

Related

Mbtiles generation with tippecanoe on a mixed geojson feature file

I have a bunch of geojson files that I want to use to create a .mbtiles file, these geojson files were made by using ogr2ogr on a .osm.pbf file.
Though these files seemed to have mixed features within them, e.g. the linestring.geojson file has waterways and highways feature types, and the layer this generates is name after the file not the individual feature types.
How would I go about formatting this or using tippecanoe to separate these features into separate layers so then I could use this on a tileserver.
My only thought on how this could be done would be to separate the file myself but this would be extremely time consuming even on the smallest of .osm.pbf files.
I want to be able to generate the .mbtiles file like this, so then I could use it in turn with map box.

How to use wget to download large osm dataset?

I want to create a global dataset of wetlands using the OSM database. As there are problems for huge datasets if I use the overpass-turbo or so, I thought I could use wget to download the planet file and filter it for only the data I'm interested in. Problem is, I don't know much about wget. So I wanted to know whether there is a way to filter the data from the planet file while downloading and unzipping it?
In general I'm looking for the least time and disk-space consuming way to get to that data. Would you have any suggestions?
wget is just a tool for downloading, it can't filter on the fly. There is probably a chance that you can pipe the data to a second tool which does filtering on the fly but I don't see any advantage here. And the disadvantage is that you can't verify the file checksum afterwards.
Download the planet and filter it afterwards using osmosis or osmfilter.

OSM(Open Street Map) file reading and patitioning

I'm new to OSM file format. I have a big OSM file which 2GB large.
What is the best way to read OSM data ?
If I need to separate data into several chunks and process, is there a way to the partition?
If you refer to planet dumps/extracts, you use osmosis to split on bounding boxes or boundary polygones. Otherwise you can query the DB via Overpass API and don't need to splitup as you can request the desired bboxes.

Postgres - Convert bytea PNG column to GeoTiff

I have loaded a massive set of tiles into a postgres database for use in a tile server. These were all loaded into bytea columns in PNG format.
I now find out that the tile server code needs these to be in GEOTiff format.
The command:-
gdal_translate -of GTiff -expand rgb -co COMPRESS=DEFLATE -co ZLEVEL=6
Works perfectly.
However, a massive amount of data is loaded already on a remote server. So is it possible for me to do the conversion within the database instead of retrieving each file and using gdal_translate on them individually? I understand that gdal is integrated with postgis 2.0 through the raster support which is installed on my server.
If not, any suggestions as to how to do this efficiently.
Is it possible to do in the database with an appropriate procedural language? I suppose. Additionally it is worth noting up front that gdal's support for Postgis goes rather one way.
To be honest the approach is likely to be "retrieve individual record, convert, restore using external image processing like you are doing." You might get some transaction benefit but this is likely to be offset by locks.
If you are going to go this route, you may find pl/java to be the most helpful approach since you can load any image processing library Java supports and use that.
I am, however, not convinced that this will be better than retrieve/transform/load.

How to import big amount of data from file to sqlite inside the application (in real-time)

I have a big list of words (over 2 millions) in CSV file (size about 35MB).
I wanted to import the CSV file into sqlite3 with index (primiary key).
So I've imported it using sqlite command line tool. The DB has been created and size of the .sqlite file has grown to over 120MB! (50% because of primary key index)
And here we get the problem: if I add this 120MB .sqlite file to the resources even after compressing to .ipa file it has >60MB. And I'd like if it will be less then 30MB (because of the limitiation through E/3G).
Also because of the size I cannot import it (zipped sqlite file) by a web service (45MB * 1000 download = 45GB! it's my server's half year limit).
So I thought I could do something like this:
compress the CSV file with words to ZIP and than the file will have only 7MB file.
add ZIP file to resources.
in the application I can unzip the file and import data from the unzipped CSV file to sqlite.
But I don't know how to do this. I've tried to do this :
sqlite3_exec(sqlite3_database, ".import mydata.csv mytable", callback, 0, &errMsg);
but it doesn't work. The reason for the failure is ".import" is a part of the command line interface and not in the C API.
So I need to know how to import it(unzipped CSV file) to the SQLite file inside app (not during develompent using command line).
If the words that you are inserting are unique you could make the text the primary key.
If you only want to test whether words exist in a set (say for a spell checker), you could use an alternative data structure such as a bloom filter, which only requires 9.6 bits for each word with 1% false positives.
http://en.wikipedia.org/wiki/Bloom_filter
As FlightOfStairs mentioned depending on the requirements a bloom filter is one solution, if you need the full data another solution is to use a trie or radix tree data structure. You would preprocess your data and build these datastructures and then either put it in sqlite or some other external data format.
The simplest solution would be to write a CSV parser using NSScanner and insert the rows into the database one by one. That's actually a fairly easy job—you can find a complete CSV parser here.