osm2pgsql data converting: lost columns

osm2pgsql data converting: lost columns - openstreetmap

I've executed osm data converting using osm2pgsql from *.bz2 format to PostgreSQL database. But after converting I don't see such columns in table planet_osm_roads as: lanes, maxspeed.
Сan someone explain where are these columns? Thanks.

Add the option -k when using osm2pgsql
osm2pgsql -d geodatabase -k planet.osm.bz2
-k|--hstore Add tags without column to an additional hstore (key/value) column to postgresql tables
Explanation: osm2pgsql imports normally the data in a static database schema. The tags without a corresponding column are ignored. By adding the option -k or --hstore, osm2pgsql will add a new hstore column tags to each table and save there all tags without column.
Depending of your needs, you can use the -j instead, which make osm2pgsql to save ALL tags in the tags column, this means, the tags with a database column too.
-j|--hstore-all Add all tags to an additional hstore (key/value) column in postgresql tables
After the import, to extract all maxspeed tags from the database, you can use query like this (in example):
SELECT osm_id, name, tags -> 'maxspeed' FROM planet_osm_roads;
where tags is the hstore column and -> is a hstore operator.
See the Postgresql documentation for more infos about the hstore type and his operators: http://www.postgresql.org/docs/9.3/static/hstore.html

This should better be a comment, however, I don't have enough reputation to do so: Instead of using .bz2, I recommend strongly to use .pbf, the "Protocolbuffer Binary Format", because: "It is about half of the size of a gzipped planet and about 30% smaller than a bzipped planet. It is also about 5x faster to write than a gzipped planet and 6x faster to read than a gzipped planet. The format was designed to support future extensibility and flexibility." More infos: http://wiki.openstreetmap.org/wiki/PBF_Format

Related

Cassandra Alter Column type from Timestamp to Date

Is there any way to alter the Cassandra column from timestamp to date without data lost? For example '2021-02-25 20:30:00+0000' to '2021-02-25'
If not, what is the easiest way to migrate this column(timestamp) to the new column(date)?

It's impossible to change a type of the existing column, so you need to add a new column with correct data type, and perform migration. Migration could be done via Spark + Spark Cassandra Connector - it could be most flexible solution, and even could be done via single node machine with Spark running in the local master mode (default). Code could look something like this (try on test data first):
import pyspark.sql.functions as F
options = { "table": "tbl", "keyspace": "ks"}
spark.read.format("org.apache.spark.sql.cassandra").options(**options).load()\
.select("pk_col1", "pk_col2", F.col("timestamp_col").cast("date").alias("new_name"))\
.write.format("org.apache.spark.sql.cassandra").options(**options).save()
P.S. you can use DSBulk, for example, but you need to have enough space to offload the data (although you need only primary key column + your timestamp)

To add to Alex Ott's answer, there are validations done in Cassandra that prevents changing the data type of a column. The reason is that SSTables (Cassandra data files) are immutable -- once they are written to disk, they are never modified/edited/updated. They can only be compacted to new SSTables.
Some try to get around it by dropping the column from the table then adding it back in with a new data type. Unlike traditional RDBMS, the existing data in the SSTables don't get updated so if you tried to read the old data, you'll get a CorruptSSTableException because the CQL type of the data on disk won't match that of the schema.
For this reason, it is no longer possible to drop/recreate columns with the same name (CASSANDRA-14948). If you're interested, I've explained it in a bit more detail in this post -- https://community.datastax.com/questions/8018/. Cheers!

You can use ToDate to change it. For example: Table Email has column Date with format: 2001-08-29 13:03:35.000000+0000.
Select Date, ToDate(Date) as Convert from keyspace.Email:
date | convert ---------------------------------+------------ 2001-08-29 13:03:35.000000+0000 | 2001-08-29

Hive - the correct way to permanently change the date and type in the entire column

I would be grateful if someone could explain here step by step what the process of changing the date format and column type from string to date should look like in the table imported via Hive View to HDP 2.6.5.
The data source is the well-known MovieLens 100K Dataset set ('u.item' file) from:
https://grouplens.org/datasets/movielens/100k/
$ hive --version is: 1.2.1000.2.6.5.0-292
Date format for the column is: '01-Jan-1995'
Data type of column is: 'string'
ACID Transactions is 'On'
Ultimately, I would like to convert permanently the data in the entire column to the correct Hive format 'yyyy-MM-dd' and next column type to 'Date'.
I have looked at over a dozen threads regarding similar questions before. Of course, the problem is not to display the column like this, it can be easily done using just:
SELECT from_unixtime(unix_timestamp(prod_date,'dd-MMM-yyyy'),'yyyy-MM-dd') FROM moviesnames;
The problem is to finally write it down this way. Unfortunately, this cannot be done via UPDATE in the following way, despite the inclusion of atomic operations in Hive config.
UPDATE moviesnames SET prodate = (select to_date(from_unixtime(UNIX_TIMESTAMP(prod_date,'dd-MMM-yyyy'))) from moviesnames);
What's the easiest way to achieve the above using Hive-SQL? By copying and transforming a column or an entire table?

Try this:
UPDATE moviesnames SET prodate = to_date(from_unixtime(UNIX_TIMESTAMP(prod_date,'dd-MMM-yyyy')));

Export CSV From Postgres VIA Command Line

Hello Stack Overflowers!
I'm currently exporting a Postgres table as a .csv using a C# application I developed. I'm able to export them no problem with the following command...
set PGPASSWORD=password
psql -U USERNAME Database_Name
\copy (SELECT * FROM table1) TO C:\xyz\exportfile.csv CSV DELIMITER ',' HEADER;
The problem I am running into is the .csv is meant to be used with Tableau, however, when importing to excel I run into the same issue. It turns text fields into integers in both Tableau and Excel. This causes issues specifically on joining serial numbers on the Tableau side.
I know I can change these fields in Tableau/Excel manually but I am trying to find a way to make sure the end-user wouldn't need to do this. I'd like for them to just drag and drop the updated .csv postgresql data extracts and be able to start Tableau no problem. They don't seem real tech-savvy. I know you can connect Tableau directly to Postgres but in this particular case, I am not allowed to due to limitations beyond my control.
I'm using PostgreSQL 12 and Tableau v2019.4.0
EDIT: As request providing example data! Both of the fields are TEXT inside of PostgreSQL but the export doesn't specify.
Excel Formatting
ASSETNUM,ITEMNUM
1834,8.11234E+12
1835,8.11234E+12
Notepad Formatting
ASSETNUM,ITEMNUM
1834,8112345673294
1835,8112345673295
Note: If you select the specific cell in Excel it shows the full number.

CSV files don't have any type information, so programs like Excel/Tableau are free to interpret the data how they like.
However, #JorgeCampos's link provides useful information. For example
"=""123""","=""123"""
gets interpreted differently than
123,123
when you load it into Excel.
If you want to add quotes to your data, the easiest way is to use PostgreSQL's string functions, e.g.
SELECT '"=""' || my_column || '"""' FROM my_database

Get all the tables involved in a SELECT query in PostgreSQL using `libpq'

Is it possible to get the names or (Oid's) for all queried tables with libpq? If there exists a generic standard SQL way i would prefer it.

It's not standard, but the Postgres EXPLAIN command can give you (more than) what you want.
http://www.postgresql.org/docs/9.3/static/sql-explain.html
If you use the JSON output format, the table names are found in the "Relation Name" attributes of the "Plan" objects.

Export tables to Flat File with some logic

I'm writing scripts to export some tables to flat files every day. I'm looking at the BCP utility, but I'm not sure it has the kind of features I really need.
For example, I need to output the fields out of order. That is, the 15th field in the MSSQL database should be the 2nd field in the flat file, et.c
More importantly, some of the fields need to be altered. For example, if a certain field is null or contains some special values, I need to replace them with codes.
Is BCP the right tool for this? My gut tells me to do this in Perl instead.

You can write a stored procedure and do all data transformations there.
Then feed this stored procedure to bcp.
It will surely be faster than Perl.
SSIS is fast too; could be an option in case transformations are very complex.

You can use a query to order and format the columns directly with BCP
bcp Utility
"query"
Is a Transact-SQL query that returns a result set.
example:
bcp "SELECT Name FROM AdventureWorks.Sales.Currency" queryout Currency.Name.dat -T -c

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

osm2pgsql data converting: lost columns - openstreetmap

I've executed osm data converting using osm2pgsql from *.bz2 format to PostgreSQL database. But after converting I don't see such columns in table planet_osm_roads as: lanes, maxspeed. Сan someone explain where are these columns? Thanks.

Related

Cassandra Alter Column type from Timestamp to Date

Hive - the correct way to permanently change the date and type in the entire column

Export CSV From Postgres VIA Command Line

Get all the tables involved in a SELECT query in PostgreSQL using `libpq'

Export tables to Flat File with some logic

Categories

Resources