Given the OSM data for a state, find its area - postgresql

Apologies if this is a silly question, I'm very new to PostgreSQL/PostGIS.
I have downloaded an .osm file for the entire state of Baden-Württemberg from here - https://download.geofabrik.de/europe/germany/baden-wuerttemberg.html
I have to answer the following questions, by writing queries for the same -
What is the area (expressed in m^2) of Baden-Württemberg?
What is by area the smallest city of Baden-Württemberg?
I have imported the osm file into a PostgreSQL using the osm2pgsql tool.
There seem to be multiple tables that are being created, core and generated by osm2pgsql. In addition to the core tables - planet_osm_nodes, planet_osm_ways and planet_osm_rels there are other tables as well, and I'm not quite sure how to use them to complete my tasks.
It would be really helpful if anybody could provide some insight into how I can go about writing the PostGIS query for the same.

I think something like this should give you the cities and their area in square meters in order
SELECT *, ST_Area(way)
FROM planet_osm_polygon
WHERE boundary='administrative'
AND admin_level=7
ORDER BY ST_Area(way)
Same thing should work with the border of Baden-Württemberg only that you would need admin_level=4
https://postgis.net/docs/ST_Area.html
https://wiki.openstreetmap.org/wiki/DE:Grenze

Related

SQL: how to get global overview of database

I am new SQL and I was wondering if there is any quick way of getting a global "view" of a new database (if for example you are starting to use a database you know nothing about and you want to just get a global idea of how the entire database looks like).
In other words is there a way to :
Maybe get some graphical representation of the database? - a sort of diagram that shows the relation between all tables
Maybe do some sort of query that could return the no. of rows, no. columns (and ideally column names) of each table in the database?
Apologies if this is a really basic question, I am very new to SQL. I am currently using PostgreSQL and PgAdmin4. Thanks

How to get all points along a way from (osm)PostGIS?

I have import OpenstreetMap data into Postgres with gis extension with tool
osm2pgsql (-s option)
of course, I have the following tables
planet_osm_point
planet_osm_ways
....
Within planet_osm_ways I have a column called way, type geometry(LineString, 4326), content like following
"0102000020E6100000070000005E70BCF1A49F2540D3D226987B134840896764EB749F25403B5DCC858013484040D1860D609F2540C426327381134840CE50DCF1269F2540EF552B137E1348405AAB2CC02D9E2540F978324976134840D66F26A60B9D2540CE8877256E1348403CA81F2FFF9C2540BC1D86FB6D134840"
What is that ? How could I get all points along this way ?
Thanks a lot
That's hex-encoded extended well-known binary (EWKB) of a LINESTRING.
There are several methods to get the points along the way. To get individual coordinates as points, use ST_DumpPoints. Or to simply output the geometry in other human-readable formats (WKT, EWKT, GeoJSON, GML, etc.), see the relevant manual section.

Can I have more than 250 columns in the result of a PostgreSQL query?

Note that PostgreSQL website mentions that it has a limit on number of columns between 250-1600 columns depending on column types.
Scenario:
Say I have data in 17 tables each table having around 100 columns. All are joinable through primary keys. Would it be okay if I select all these columns in a single select statement? The query would be pretty complex but can be programmatically generated. The reason for doing this is to get denormalised data to populate a web page. Please do not ask why though :)
Quite obviously if I do create table table1 as (<the complex select statement>), I will be hitting the limit mentioned in the website. But do simple queries also face the same restriction?
I could probably find this out by doing the exercise myself. In the next few days I probably will. However, if someone has an idea about this and the problems I might face by doing a single query, please share the knowledge.
I can't find definitive documentation to back this up, but I have
received the following error using JDBC on Postgresql 9.1 before.
org.postgresql.util.PSQLException: ERROR: target lists can have at most 1664 entries
As I say though, I can't find the documentation for that so it may
vary by release.
I've found the confirmation. The maximum is 1664.
This is one of the metrics that is available for confirmation in the INFORMATION_SCHEMA.SQL_SIZING table.
SELECT * FROM INFORMATION_SCHEMA.SQL_SIZING
WHERE SIZING_NAME = 'MAXIMUM COLUMNS IN SELECT';

Tableau Extract API with multiple tables in a database

I am currently experimenting with Tableau Extract API to generate some TDE from the tables I have in a PostgreSQL database. I was able to write a code to generate the TDE from single table, but I would like to do this for multiple joined tables. To be more specific, if I have two tables that are inner joined by some field, how would I generate the TDE for this?
I can see that if I am working with small number of tables, I could use a SQL query with JOIN clauses to create a one gigantic table, and generate the TDE from that table.
>> SELECT * FROM table_1 INNER JOIN table_2
INTO new_table_1
ON table_1.id_1 = table_2.id_2;
>> SELECT * FROM new_table_1 INNER JOIN TABLE_3
INTO new_table_2
ON new_table_1.id_1 = table_3.id_3
and then generate the TDE from new_table_2.
However, I have some tables that have over 40 different fields, so this could get messy.
Is this even a possibility with current version of the API?
You can read from as many tables or other sources as you want. Or use complex query with lots of joins, or create a view and read from that. Usually, creating a view is helpful when you have a complex query joining many tables.
The data extract API is totally agnostic about how or where you get the data to feed it -- the whole point is to allow you to grab data from unusual sources that don't have pre-built drivers for Tableau.
Since Tableau has a Postgres driver and can read from it directly, you don't need to write a program with the data extract API at all. You can define your extract with Tableau Desktop. If you need to schedule automated refreshes of the extract, you can use Tableau Server or its tabcmd command.
Many thanks for your replies. I am aware that I could use Tableau Desktop to define my extract. In fact, I have done this many times before. I am just trying to create the extracts using the API, because I need to create some calculated fields, which is near impossible to create using the Tableau Desktop.
At this point, I am hesitant to use JOINs in the SQL query because the resulting table would look too complicated to comprehend (some of these tables also have same field names).
When you say that I could read from multiple tables or sources, does that mean with the Tableau Extract API? At this point, I cannot find anywhere in this API that accommodates multiple sources. For example, I know that when I use multiple tables in the Tableau Desktop, there are icons on the left hand side that tells me that the extract is composed of multiple tables. This just doesn't seem to be happening with the API, which leaves me stranded. Anyways, thank you again for your replies.
Going back to the topic, this is something that I tried few days ago on my python code
try:
tdefile= tde.Extract("extract.tde")
except:
os.remove("extract.tde")
tdefile = tde.Extract("extract.tde")
tableDef = tde.TableDefinition()
# Read each column in table and set the column data types using tableDef.addColumn
# Some code goes here...
for eachTable in tableNames:
tableAdd = tdeFile.addTable(eachTable, tableDef)
# Use SQL query to retrieve bunch_of_rows from eachTable
for some_row in bunch_of_rows:
# Read each row in table, and set the values in each column position of each row
# Some code goes here...
tableAdd.insert(some_row)
some_row.close()
tdefile.close()
When I execute this code, I get the error that eachTable has to be called "Extract".
Of course, this code has its flaws, as there is no where in this code that tells how each table are being joined.
So I am little thrown off here, because it doesn't seem like I can use multiple tables unless I use JOINs to generate one table that contains everything.

Apply diffs between two SQL Server Tables efficiently

I have a once-a-day ingestion case in which I will be getting a large file via FTP which contains the up-to-date versions of 4 database tables.
For each table, I would like to:
Truncate table in staging database
BCP the FTP'd file into that table
Find diffs (IUD) between staging table and production table
Make any required IUDs to production table so it matches staging table
I'm sure this is a reasonably common problem, but I'm not 100% sure as to the best way to approach it.
Are there any built in T-SQL features for this kind of problem, or do I just need to do various joins to find the inserted/updated/deleted records and execute them manually? I'm sure I can manage to do it this second way, but any suggestions are greatly appreciated none-the-less (not looking for working code).
Since nobody ever put it as a real answer, the MERGE command as mentioned by Mikael Eriksson in the comment is the right way to go, it worked great.
Here's a simple example usage:
MERGE dbo.DimProduct AS Target
USING (SELECT ProductID, ProductName, ProductColor, ProductCategory FROM dbo.StagingProduct) AS Source
ON (Target.ProductID = Source.ProductID)
WHEN MATCHED THEN
UPDATE SET Target.ProductName = Source.ProductName
WHEN NOT MATCHED BY TARGET THEN
INSERT (ProductID, ProductName, ProductColor, ProductCategory)
VALUES (Source.ProductID, Source.ProductName, Source.ProductColor, Source.ProductCategory)
OUTPUT $action, Inserted.*, Deleted.*;
from: http://www.bidn.com/blogs/bretupdegraff/bidn-blog/239/using-the-new-tsql-merge-statement-with-sql-server-2008
which helped me.
RedGate's SQL Compare product has automation capabilities.
http://downloads.red-gate.com/HelpPDF/ContinuousIntegrationForDatabasesUsingRedGateSQLTools.pdf
(I am not associated with redgate. I don't even like their products that much, but it seems to fit the case in this instance)