Why Does Open Street Maps (OSM) use PostgreSQL databases? - postgresql

Over the last few months, I've been using the openstreetmap-tile-server on GitHub (link here) to render OSM tiles from a Docker container. The tile server uses a PostgreSQL database to store its data. From doing more research into creating my own OSM tiles and my own tile server, a lot of tutorials mention using a PostgreSQL database.
Why is this? Why not use an SQL database such as MySQL instead? What can be gained / is gained from using PostgreSQL rather than a different SQL database for a dataset such as the openstreetmap data?
EDIT: Edited question, to indicate that I'm comparing Postgres to other SQL databases.

Originally MySQL was actually used for the main internal OSM database that stores actual OSM data and is queried and modified via the OSM API. For tile rendering and other purposes the internal raw format is never used though, instead OSM data exported as compressed XML or in more compact binary PBF format is imported into a database schema more suitable for further processing.
Typically this is done with either the "imposm" or the "osm2pgsql" tool, with the PostgreSQL/PostGIS combination as the RDBMS of choice, as it provides the most powerful GIS feature set, at least in the free & open source world.
The main OSM database is an exception as any queries on it are always retrieving data for a rectangular area only, and so GIS extensions are actually not needed, having the coordinates stored as simple numeric data is sufficient in this case. Eventually it was decided to switch that to PostgreSQL, too, to reduce the number of different components to maintain in the openstreetmap.org site setup.
In theory you could also use other RDBMS with GIS support, too, e.g. the SpatiaLite variant of SQLite, or MariaDB/MySQL, but compared to PostgreSQL/PostGIS setup they have their disadvantages:
E.g. SpatiaLite is only good as long as there's only one thread accessing the data, with concurrent access it doesn't scale well at all.
And MariaDB and MySQL only really implement more or less the bare minimum of the OpenGIS SQL specs, end even that only really materialized over the last years. Feature wise both are still more than a decade behind PostGIS at least.
Disclaimer: even I, although working for MariaDB Corp, and having worked for MySQL AB before, in total for over a decade, have always recommended to use PostGIS over MariaDB or MySQL for GIS applications unless someone was bound to MariaDB or MySQL for other reasons already.

Related

What's the fastest way to put RDF data (specifically DBPedia dumps) into Postgres?

I'm looking to put RDF data from DBPedia Turtle (.ttl) files into Postgres. I don't really care how the data is modelled in Postgres as long as it is a complete mapping (it would also be nice if there were sensible indexes), I just want to get the data in Postgres and then I can transform it with SQL from there.
I tried using this StackOverflow solution that leverages Python and sqlalchemy, but it seems to be much too slow (would take days if not more at the pace I observed on my machine).
I expected there might have been some kind of ODBC/JDBC-level tool for this type of connection. I did the same thing with Neo4j in less than an hour using a plugin Neo4j provides.
Thanks to anyone that can provide help.

Can I use grafana with a relational database not listed in the supported data source list?

I need to show metrics in real time but my metrics are stored in a relational database not supported by the datasources listed here https://grafana.com/docs/grafana/latest/http_api/data_source/
Can I somehow provide the JDBC (or other DB driver) to Grafana?
As #danielle clearly mentioned, "There is no direct support for JDBC or ODBC currently. You could get this data in time series form and into Grafana if you are prepared to do some programming.
The simple json data source is a generic backend that could make JDBC/ODBC calls to MapD and then transform the data into the right form for Grafana."
https://github.com/grafana/grafana/issues/8739#issuecomment-312118425
Though this comment is a bit old, i'm pretty sure there is no out of the box way to visualize data using JDBC/ODBC, yet.
One possible approach can make use of:
Grafana can access PostgreSQL
PostgreSQL can transparently display data in other databases as though it was a PostgreSQL table through Foreign Data Wrappers
Doing it this way, you'd use PostgreSQL to act as a gateway to the data. Depending on the table structure, you might also need to create a view in PG to shape the data to match Grafana's requirements for PG data source.

Postgresql tuning to be used as DatawareHouse

I am assembling a Business Intelligence solution using the Pentaho software as a BI engine. Within this solution, I had to set up a requirements for a PostgreSQL database server.
The current situation is very easy, since no ETL process is being carried out for data extraction, so the PostgreSQL configuration has not changed it much, and it is practically as it is configured as "factory".
I would like to know what Postgres configuration parameters have to be touched and modified to optimize it as a Datawarehouse. I have seen a lot of documentation, but it is not clear to me at all, since one documentation says that such values have to be modified, and other documentation, other completely different values.
I would like to know just that, if there is a clearer and more precise documentation to optimize a postgres 9.6 to be used as a Pentaho DW.
Thank you very much

Can Nominatim use OpenMapTile's database?

If we're using OpenMapTiles, can we point Nominatim to OMT's database, or are the schemas different?
It is taking us quite a long time to processes the global OSM dataset for Nominatim, but we could save ourselves some time/storage/etc. if both products can share the same Postgres database.
The geocoder Nominatim uses a different database scheme than a tile server. Geocoders and tile servers need slightly different data. Furthermore, for maximum performance, the data has to be pre-processed in different ways. That's why you can't use the same database for both.

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.