Data mismatch in AWS DMS Bulk load vs CDC - postgresql

I have a postgis database(source) for which i have done DMS and moved it to S3 bucket(target) in parquet file.
There is a column name point of datatype-geometry(point, 4326) in the source which converts to string after dms and looks like this in target-
-In bulk load it is like -point": "<Point srsName=\"EPSG:4326\"><coordinates>72.836903300000003,19.0823766</coordinates></Point>"
-In CDC it is like -point": "0101000020E610000051D77F4262855EC09591C4DCFFB54240"
I am able to get the coordinates back from the string in full load by string parsing but during CDC it is in some random hexadecimal string from which I am not aware how to decode back to the coordinates.

Welcome to SO.
In PostgreSQL in order to display geometries in a format other than WKB you have to explicitly state it in your query. In your case ST_AsGML:
SELECT ST_AsGML('0101000020E610000051D77F4262855EC09591C4DCFFB54240');
st_asgml
-------------------------------------------------------------------------------------------------------
<gml:Point srsName="EPSG:4326"><gml:coordinates>-122.0841223,37.4218708</gml:coordinates></gml:Point>
(1 row)
Or something like this if you wish to omit the namespace (as your example suggests):
SELECT ST_AsGML('0101000020E610000051D77F4262855EC09591C4DCFFB54240', 15, 0, '', '');
st_asgml
---------------------------------------------------------------------------------------
<Point srsName="EPSG:4326"><coordinates>-122.0841223,37.4218708</coordinates></Point>
(1 row)
See also: Converting geometries in PostGIS

Related

PostgreSQL open JSONB column with slashes

How is it possible to open the JSONB column where JSONB string with slashes?
"{\"id\":\"c39fe0f5f9b7c89b005bf3491f2a2ce1\",\"token\":\"c39fe0f5f9b7c89b005bf3491f2a2ce1\",\"line_items\":[{\"id\":32968150843480,\"properties\":{},\"quantity\":2,\"variant_id\":32968150843480,\"key\":\"32968150843480:4a6f6b7d19c7aef119af2cd909f429f1\",\"discounted_price\":\"40.00\",\"discounts\":[],\"gift_card\":false,\"grams\":0,\"line_price\":\"80.00\",\"original_line_price\":\"80.00\",\"original_price\":\"40.00\",\"price\":\"40.00\",\"product_id\":4638774493272,\"sku\":\"36457537-mud-yellow-28\",\"taxable\":false,\"title\":\"Knee Length Summer Shorts - Camel / 28\",\"total_discount\":\"0.00\",\"vendor\":\"Other\",\"discounted_price_set\":{\"shop_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"}},\"line_price_set\":{\"shop_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"}},\"original_line_price_set\":{\"shop_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"}},\"price_set\":{\"shop_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"}},\"total_discount_set\":{\"shop_money\":{\"amount\":\"0.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"0.0\",\"currency_code\":\"USD\"}}}],\"note\":null,\"updated_at\":\"2022-03-15T13:24:02.787Z\",\"created_at\":\"2022-03-15T13:23:31.912Z\",\"controller\":\"custom_webhooks\",\"action\":\"store_data\",\"custom_webhook\":{\"id\":\"c39fe0f5f9b7c89b005bf3491f2a2ce1\",\"token\":\"c39fe0f5f9b7c89b005bf3491f2a2ce1\",\"line_items\":[{\"id\":32968150843480,\"properties\":{},\"quantity\":2,\"variant_id\":32968150843480,\"key\":\"32968150843480:4a6f6b7d19c7aef119af2cd909f429f1\",\"discounted_price\":\"40.00\",\"discounts\":[],\"gift_card\":false,\"grams\":0,\"line_price\":\"80.00\",\"original_line_price\":\"80.00\",\"original_price\":\"40.00\",\"price\":\"40.00\",\"product_id\":4638774493272,\"sku\":\"36457537-mud-yellow-28\",\"taxable\":false,\"title\":\"Knee Length Summer Shorts - Camel / 28\",\"total_discount\":\"0.00\",\"vendor\":\"Other\",\"discounted_price_set\":{\"shop_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"}},\"line_price_set\":{\"shop_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"}},\"original_line_price_set\":{\"shop_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"80.0\",\"currency_code\":\"USD\"}},\"price_set\":{\"shop_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"40.0\",\"currency_code\":\"USD\"}},\"total_discount_set\":{\"shop_money\":{\"amount\":\"0.0\",\"currency_code\":\"USD\"},\"presentment_money\":{\"amount\":\"0.0\",\"currency_code\":\"USD\"}}}],\"note\":null,\"updated_at\":\"2022-03-15T13:24:02.787Z\",\"created_at\":\"2022-03-15T13:23:31.912Z\"}}"
this real JSONB column
I can not find any example of how to deal with this type of JSONB
Whatever is inserting your data is screwing it up. It is taking the string representation of a JSON object and stuffing that into a JSON string scalar. You need to fix that or it will just keep happening.
To fix what is already there, you need to extract the real PostgreSQL string out of the JSON string, then cast that to JSONB. Extracting a JSON string can be done unintuitively with #>>'{}', or even less intuitively with ->>0.
select (data#>>'{}')::jsonb from table_name.
Of course you should fix it permanently, not just do it on the fly all the time, which is both slow and confusing.
update table_name set data=(data#>>'{}')::jsonb;
Of course fixing the tool which screws this up in the first place, and fixing the historical data, need to be done in a coordinated fashion or you will have a glorious mess on your hands.
I think you have wrong formatted string in jsonb field. You can try fix it in next way:
select trim(both '"' from replace(data::varchar, '\"', '"'))::jsonb data from tbl;
PostgreSQL JSONB online

Azure Data Factory schema mapping not working with SQL sink

I have a simple pipeline that loads data from a csv file to an Azure SQL db.
I have added a data flow where I have ensured all schema matches the SQL table. I have a specific field which contains numbers with leading zeros. The data type in the source - projection is set to string. The field is mapped to the SQL sink showing as string data-type. The field in SQL has nvarchar(50) data-type.
Once the pipeline is run, all the leading zeros are lost and the field appears to be treated as decimal:
Original data: 0012345
Inserted data: 12345.0
The CSV data shown in the data preview is showing correctly, however for some reason it loses its formatting during insert.
Any ideas how I can get it to insert correctly?
I had repro’d in my lab and was able to load as expected. Please see the below repro details.
Source file (CSV file):
Sink table (SQL table):
ADF:
Connect the data flow source to the CSV source file. As my file is in text format, all the source columns in the projection are in a string.
Source data preview:
Connect sink to Azure SQL database to load the data to the destination table.
Data in Azure SQL database table.
Note: You can all add derived columns before sink to convert the value to string as the sink data type is a string.
Thank you very much for your response.
As per your post the DF dataflow appears to be working correctly. I have finally discovered an issue with the transformation - I have an Azure batch service which runs a python script, which does a basic transformation and saves the output to a csv file.
Interestingly, when I preview the data in the dataflow, it looks as expected. However, the values stored in SQL are not.
For the sake of others having a similar issue, my existing python script used to convert a 'float' datatype column to string-type. Upon conversion, it used to retain 1 decimal number but as all of my numbers are integers, they were ending up with .0.
The solution was to convert values to integer and then to string:
df['col_name'] = df['col_name'].astype('Int64').astype('str')

Hive - the correct way to permanently change the date and type in the entire column

I would be grateful if someone could explain here step by step what the process of changing the date format and column type from string to date should look like in the table imported via Hive View to HDP 2.6.5.
The data source is the well-known MovieLens 100K Dataset set ('u.item' file) from:
https://grouplens.org/datasets/movielens/100k/
$ hive --version is: 1.2.1000.2.6.5.0-292
Date format for the column is: '01-Jan-1995'
Data type of column is: 'string'
ACID Transactions is 'On'
Ultimately, I would like to convert permanently the data in the entire column to the correct Hive format 'yyyy-MM-dd' and next column type to 'Date'.
I have looked at over a dozen threads regarding similar questions before. Of course, the problem is not to display the column like this, it can be easily done using just:
SELECT from_unixtime(unix_timestamp(prod_date,'dd-MMM-yyyy'),'yyyy-MM-dd') FROM moviesnames;
The problem is to finally write it down this way. Unfortunately, this cannot be done via UPDATE in the following way, despite the inclusion of atomic operations in Hive config.
UPDATE moviesnames SET prodate = (select to_date(from_unixtime(UNIX_TIMESTAMP(prod_date,'dd-MMM-yyyy'))) from moviesnames);
What's the easiest way to achieve the above using Hive-SQL? By copying and transforming a column or an entire table?
Try this:
UPDATE moviesnames SET prodate = to_date(from_unixtime(UNIX_TIMESTAMP(prod_date,'dd-MMM-yyyy')));

Copying GeoJSON data from S3 to Redshift

I have a spatial data. It is GeoJSON format. I want to copy this data into Redshift from S3. So can you please help me to create table and copy the data into table. I want to know copy command.
Redshift's COPY command currently supports ingestion of geometries from (hexadecimal) WKB/EWKB format only. We currently do not support ingestion from GeoJSON. https://docs.aws.amazon.com/redshift/latest/dg/geospatial-overview.html
Alternatively, you can ingest the data in WKT format as VARCHAR(MAX) and then convert to GEOMETRY using the ST_GeomFromText() function. Using this method the WKT description of a geometry is limited to the 64KB max VARCHAR size.
More info: https://docs.aws.amazon.com/redshift/latest/dg/spatial-limitations.html

osm2pgsql data converting: lost columns

I've executed osm data converting using osm2pgsql from *.bz2 format to PostgreSQL database. But after converting I don't see such columns in table planet_osm_roads as: lanes, maxspeed.
Сan someone explain where are these columns? Thanks.
Add the option -k when using osm2pgsql
osm2pgsql -d geodatabase -k planet.osm.bz2
-k|--hstore Add tags without column to an additional hstore (key/value) column to postgresql tables
Explanation: osm2pgsql imports normally the data in a static database schema. The tags without a corresponding column are ignored. By adding the option -k or --hstore, osm2pgsql will add a new hstore column tags to each table and save there all tags without column.
Depending of your needs, you can use the -j instead, which make osm2pgsql to save ALL tags in the tags column, this means, the tags with a database column too.
-j|--hstore-all Add all tags to an additional hstore (key/value) column in postgresql tables
After the import, to extract all maxspeed tags from the database, you can use query like this (in example):
SELECT osm_id, name, tags -> 'maxspeed' FROM planet_osm_roads;
where tags is the hstore column and -> is a hstore operator.
See the Postgresql documentation for more infos about the hstore type and his operators: http://www.postgresql.org/docs/9.3/static/hstore.html
This should better be a comment, however, I don't have enough reputation to do so: Instead of using .bz2, I recommend strongly to use .pbf, the "Protocolbuffer Binary Format", because: "It is about half of the size of a gzipped planet and about 30% smaller than a bzipped planet. It is also about 5x faster to write than a gzipped planet and 6x faster to read than a gzipped planet. The format was designed to support future extensibility and flexibility." More infos: http://wiki.openstreetmap.org/wiki/PBF_Format