Display DateTime in Hive - pyspark

I have a SQL server table which has a column of type
datetimeoffset(7) => 2022-07-22 09:25:47.2910000 +00:00
and it has been converted into
convert(varchar(50),col1,126) => 2022-07-22T09:25:47.2910000 +00:00
and in the Hive table the same is stored as string
hive => string => 2022-07-22T09:25:47.291Z
I am not able to understand the Hive format. What format is Hive datetime.
I need to convert the Hive format value to same as convert(varchar(50),col1,126) SQL Server is producing in Pyspark.

Related

Store JSONB PostgreSQL data type column into Athena

I am creating an Athena external table on a CSV that I generated from my PostgreSQL database.
The csv contains a columns that has a jsonb datatype.
If possible, I want to exclude this column from the table created in Athena, or kindly suggest a way to include this datatype.

Is there a way to convert a varchar column in dd-mm-yyyy format to a date column in yyyy-mm-dd format in postgresql?

I am working on PostgreSQL.
I have a column named curr_date in my table. The datatype previously assigned to it is varchar but the column stores dates in the format dd-mm-yyyy.
Now I want to change its datatype to date but in order to do that i first have to convert all the values in the column in dd-mm-yyy format to yyy-mm-dd format.
Only then can I use the query alter table alter column curr_date type date using curr_date::date;
so is there a way to convert this format. i am open to using dummy column to make the changes too.
You can do that in a single statement:
ALTER TABLE mytable
ALTER col TYPE date USING to_date(col, 'DD-MM-YYYY');
That will explicitly convert the data from the old format to the new format.
A change like this will cause the table to be rewritten, which can take a while if the table is large. During that time, the table is inaccessible even for SELECT statements.

Unable to read jsonb columns in Postgres as StructType in Spark

I am trying to create a Spark DataFrame by reading a Postgres table. Postgres table has some columns of type json and jsonb. Instead of parsing these columns as of type StructType, Spark is converting it as of type StringType. How can this be fixed ?

Field abc has changed type from DATETIME to TIMESTAMP

Goal: Loading parquet file in GCS bucket to BigQuery.
I have a transformed parquet file with a column datatype as "TIMESTAMP"(converted using Apache Spark sql). The datatype of the column in target table in BigQuery is "DATETIME". The data is a 'date and time without timezone'.
While loading this data to BigQuery, the following error is thrown:
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException:
Provided Schema does not match Table table_name. Field abc has changed type from DATETIME to TIMESTAMP
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job.reload(Job.java:411)
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job.waitFor(Job.java:248)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.finishedJob$lzycompute$1(BigQueryWriteHelper.scala:153)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.finishedJob$1(BigQueryWriteHelper.scala:153)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.loadDataToBigQuery(BigQueryWriteHelper.scala:155)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.writeDataFrameToBigQuery(BigQueryWriteHelper.scala:90)
... 31 more

Timestamp in postgresql during oracle to postgresql migration

I have a table in Oracle with timestamp data in "JAN-16-15 05.10.14.034000000 PM".
When I created the table in postgresql with "col_name timestamp" it is showing data as "2015-01-16 17:10:14.034".
Any suggestions on how can set the column to get the data format as in postgre same to what I have in Oracle?
Timestamps (or dates or numbers) do not have any "format2. Neither in Postgres nor in Oracle or in any other relational database).
Any "format" you see, is applied by your SQL client displaying those values.
You need to configure your SQL client to use a different format for timestamp, or use the to_char() function to format the value as you want.
In particular, to get the format you desire, use
SELECT to_char(current_timestamp, 'MON-MM-YY HH.MI.SS.US000 AM');
The output format can be changed in psql by changing the DateStyle parameter, but I would strongly recommend to not change it away from the default ISO format as that also affects the input that is parsed.