How to clip the specific area from OSM in Postgresql - postgresql

I uploaded OSM network to my server. Now, I'm working directly in Postgres (using dBeaver), and the file format is OSM format.
The imported file has a lot of areas outside of Montreal Island. This makes a lot of extra data and slowing the process. For example running the inner join takes like 2 hours.
Also, when I open the OSM file in JOSM, it has a lot of extra information that I don't need it.
Do you know is there any query to limit and clip the data only to the Montreal Island in dBeaver?
I know I can do this in ArcGIS but I prefer working on OSM data directly. Mainly because bringing shapefile to Postgres is tricky and breaking the original file. I prefer to work directly on the original file and not importing it form somewhere else.

Related

Convert OpenStreetMap POI Data to CSV

I am looking to extract some Point of Interest (POI) data from OpenStreetMap in a tabular format. I have used this link navigated to the relevant country and downloaded the file,
http://download.geofabrik.de/
What I get is a file with the .osm.pbf extension. However, I think it is possible to download the files in other formats like .shp.zip and .osm.bz2. Is there some way that I can convert this data into a tabular format like a CSV file?
I came across a tool called Osmosis which can be used to manipulate data in these formats, but I am not sure if it can be used for this purpose,
https://wiki.openstreetmap.org/wiki/Osmosis
I was able to successfully install in on my Windows machine though.
To be frank, I am not even sure if this gives me what I want.
In essence, what I am looking for is Sri Lankan POI data that contains the following attributes,
https://data.humdata.org/dataset/hotosm_lka_points_of_interest
If the conversion of this file does not give me data in this format, then I am open to other approaches as well? What is the best way to go about acquiring this data?

Why Does Open Street Maps (OSM) use PostgreSQL databases?

Over the last few months, I've been using the openstreetmap-tile-server on GitHub (link here) to render OSM tiles from a Docker container. The tile server uses a PostgreSQL database to store its data. From doing more research into creating my own OSM tiles and my own tile server, a lot of tutorials mention using a PostgreSQL database.
Why is this? Why not use an SQL database such as MySQL instead? What can be gained / is gained from using PostgreSQL rather than a different SQL database for a dataset such as the openstreetmap data?
EDIT: Edited question, to indicate that I'm comparing Postgres to other SQL databases.
Originally MySQL was actually used for the main internal OSM database that stores actual OSM data and is queried and modified via the OSM API. For tile rendering and other purposes the internal raw format is never used though, instead OSM data exported as compressed XML or in more compact binary PBF format is imported into a database schema more suitable for further processing.
Typically this is done with either the "imposm" or the "osm2pgsql" tool, with the PostgreSQL/PostGIS combination as the RDBMS of choice, as it provides the most powerful GIS feature set, at least in the free & open source world.
The main OSM database is an exception as any queries on it are always retrieving data for a rectangular area only, and so GIS extensions are actually not needed, having the coordinates stored as simple numeric data is sufficient in this case. Eventually it was decided to switch that to PostgreSQL, too, to reduce the number of different components to maintain in the openstreetmap.org site setup.
In theory you could also use other RDBMS with GIS support, too, e.g. the SpatiaLite variant of SQLite, or MariaDB/MySQL, but compared to PostgreSQL/PostGIS setup they have their disadvantages:
E.g. SpatiaLite is only good as long as there's only one thread accessing the data, with concurrent access it doesn't scale well at all.
And MariaDB and MySQL only really implement more or less the bare minimum of the OpenGIS SQL specs, end even that only really materialized over the last years. Feature wise both are still more than a decade behind PostGIS at least.
Disclaimer: even I, although working for MariaDB Corp, and having worked for MySQL AB before, in total for over a decade, have always recommended to use PostGIS over MariaDB or MySQL for GIS applications unless someone was bound to MariaDB or MySQL for other reasons already.

How to copy Tableau Data Extract logic?

Someone in my org created a Data Extract. There is an issue in one of the worksheets that uses it, and we suspect it's due to a mistake in how the Union was built.
But since it's a Data Extract, I can't see the UI for the data merge. Is there anyway to take a current Data Extract and view the logic that creates it?
Download the extract from the server (I'm assuming you're using server), then open that extract using desktop. You should be able to see the details of it.
Before going too deep into extract details, note that extracts are not intended to be permanent systems of record for data - just an efficient way to work with query results for optimized reporting. So in general, you should always be able to throw away the extract and look at the original source - or recreate the extract on command. But life isn't always perfect so ...
If you use Tableau Desktop to look at your worksheet, and look at the data source icon at the top of the data pane in the left sidebar, do you see an icon for your data source that looks like two databases with one on top of (shadowing) the other? If so, you can at right click on the data source icon and view its properties to see the source database table or file path. You can then even try disabling the extract to view the original source data.
If instead you see a single database icon, you have a "naked" extract where you've discarded the reference to the original source, (unless it is stored in the catalog mentioned below.)
If your organization purchased the Data Management Add-on for Tableau Server (strongly recommended), then if your data source is published to Tableau Server you can trace its history and origin by exploring the Tableau Catalog. That is especially valuable if the extract was built by a Tableau Prep Flow.
If instead, someone built the extract another way, say by writing a custom app using the Tableau Data Extract API, then the answer is to find that program.
One last point, in recent versions of Tableau, extracts are stored in an efficient relational type database file called Hyper. Hyper extracts can either be a single table (say serializing the results of a query joining multiple tables) or a Hyper extract can contain multiple tables (say serializing caching individual tables and deferring the join for later).
That may not be relevant to your question, but could turn out to matter as you reverse engineer how the extract was created.

Using postgres to replace csv files (pandas to load data)

I have been saving files as .csv for over a year now and connecting those files to Tableau Desktop for visualization for some end-users (who use Tableau Reader to view the data).
I think I settled on migrating to postgreSQL and I will be using the pandas library to_sql to fill it up.
I get 9 different files each day and I process each of them (I currently consolidate them into monthly files in .csv.bz2 format) by adding columns, calculations, replacing information, etc.
I create two massive csv files using pd.concat and pd.merge out of those
processed files which Tableau is connected to. These files are literally overwritten every day when new data is added which is time consuming
Is it okay to still do my file joins and concatenation with pandas and export the output data to postgres? This will be my first time using a real database and I am more comfortable with pandas compared to learning SQL syntax and creating views or tables. I just want to avoid overwriting the same csv files over and over (and some other csv problems I run into).
Don't worry too much about normalization. A properly normalized database will usually be more efficient and easier to handle than an non-normalized. On the other hand, if you have non-normalized csv data you dump into a database, your import functions will be a lot more complicated if you do a proper normalization. I think I would recommend you to make one step at the time. Start up with just loading the processed csv-files into postgres. I am pretty sure all processing following that will be a lot easier and quicker than doing it using csv-files (just make sure you set up the right indexes). When you start to get used to using the database, you can start to do more processing there.
Just remember, one thing a database is really good at is to pick out the subset of data you want to work on. Try as much as possible to avoid pulling out huge amount of data from the database when you only intend to work on a subset of it.

Importing and exporting KML into and from DashDB for Geospatial Analysis

I am a recent convert to IBM's DashDB, and I considering proposing to use it at my work. My case would greatly be bolstered if I could show good, easy integration for geospatial analytics data, namely loading and performing SQL filtering on geodata currently in .shp or .kml formats. If it would be possible to also export the filtered data into a KML as a result that would be AMAZING.
So, to give a practical example, say I have an .SHP file with all the zipcodes in the US, I want to select the shape for the 02138 zip code and send it to the query-sender in KML format.
Does anyone have experience with that?
Loading .kml file directly to dashDB is currently not possible, as answered by another post here. You may use ArcGIS for Desktop to connect to dashDB if you want to do this.
https://gis.stackexchange.com/questions/141194/importing-and-exporting-kml-into-and-from-dashdb-for-geospatial-analysis
On the other hand, uploading a .shp file is natively supported and is demonstrated in this tutorial below. Note that dashDB requires the upload file to be compressed in .zip, .tar or .gz file formats.
https://www-01.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.doc/tutorial/analyze-geospatial-data.html
Also, ArcGIS for Desktop should let you use its export features as well.
There are 2 options loading shape files: via the gui ("Load geospatial data") or via the CLPPlus command IDA LOADGEOSPATIALDATA .
Once the data is loaded it resides in the internal geometry format (e.g. as ST_Polygon), and can then be accessed in various formats through spatial functions imbedded in SQL calls. Unfortunately, kml is not one of the formats that dashDB supports as of now - options are GML, WKT, WKB.