debezium date/time field value out of range: 0000-12-30T00:00:00Z - postgresql

We use debezium to sync data to
In the source table we have column timestamptz start_at, when the value is zero start_at='0001-01-01 00:00:00.000000 +00:00', but when we check the data in kafka, it's changed to start_at = '0000-12-30T00:00:00Z'. It lead to the error when we use jdbc sink connector to write to another postgres db
ERROR: date/time field value out of range: "0000-12-30T00:00:00Z"
here my debezium config about time.precesision.mode
"time.precision.mode": "connect",
"decimal.handling.mode": "double",
is there any solution to set data on kafka match zero timestamp value in postgres?

Related

Debezium or JDBC skip tables without primary key

I have Debezium in a container, capturing all changes of PostgeSQL database records. In addition a have a confluent JDBC container to write all changes to another database.
In the source connector the definition is allowing multiple tables to be captured and therefore some of the tables have primary keys and some others not.
In the sink connector the definition specifies pk.mode like the following:
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
But while there are tables in the source database without keys on the tables, sink connector is throwing the following message:
Caused by: org.apache.kafka.connect.errors.ConnectException: PK mode for table 'contex_str_dealer_branch_address' is RECORD_KEY, but record key schema is missing
Normally there should be a few options
to exclude tables without primary keys from the source
to exclude tables without primary keys from the sink
to have primary key for tables that they have or use all other columns for tables that they have not primary key
Is there any way to skip those tables from any operation?
In debezium there isn't a parameter to filter tables without primary keys.
If you know the table in advance you can use database.exclude.list in debeziun connector source.
In the JDBC sink connector you have two options:
Ignored as no fields are used as primary key.
"pk.mode": "none"
Or all fields from the value struct will be used
"pk.mode": "record_value"

Creating a table from topic with value of String type in KSQLDB

How can one create a table from a topic which contains value of type String?
We have some topics that contains rdf data embedded inside strings, in a sense it is just a string value. Based on the KSQLDB documentation, we need to use value_format='KAFKA' with WRAP_SINGLE_VALUE=false given that it is an anonymous value.
CREATE SOURCE Table source_table_proxy (
key VARCHAR primary KEY,
value VARCHAR
) WITH (
KEY_FORMAT='KAFKA',
VALUE_FORMAT='KAFKA',
WRAP_SINGLE_VALUE=false,
kafka_topic = 'topic'
);
This is the topic info:
Key Type: STRING
Value Type: STRING
Topic Info
Partitions: 12
Replication: 1
Weirdly we get the following error:
The 'KAFKA' format only supports a single field. Got: [`VALUE` STRING, `ROWPARTITION` INTEGER, `ROWOFFSET` BIGINT]
Is there any workaround this issue ?
Unclear why you need KAFKA format.
The JSON format will work for plain (primitive) strings as well
supports reading and writing top-level primitives, arrays and maps.
For example, given a SQL statement with only a single field in the value schema and the WRAP_SINGLE_VALUE property set to false:
CREATE STREAM x (ID BIGINT) WITH (VALUE_FORMAT='JSON', WRAP_SINGLE_VALUE=false, ...);
And a JSON value of:
10
ksqlDB can deserialize the values into the ID field of the stream
https://docs.ksqldb.io/en/latest/reference/serialization/#top-level-primitives-arrays-and-maps

Field abc has changed type from DATETIME to TIMESTAMP

Goal: Loading parquet file in GCS bucket to BigQuery.
I have a transformed parquet file with a column datatype as "TIMESTAMP"(converted using Apache Spark sql). The datatype of the column in target table in BigQuery is "DATETIME". The data is a 'date and time without timezone'.
While loading this data to BigQuery, the following error is thrown:
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException:
Provided Schema does not match Table table_name. Field abc has changed type from DATETIME to TIMESTAMP
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job.reload(Job.java:411)
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job.waitFor(Job.java:248)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.finishedJob$lzycompute$1(BigQueryWriteHelper.scala:153)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.finishedJob$1(BigQueryWriteHelper.scala:153)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.loadDataToBigQuery(BigQueryWriteHelper.scala:155)
at com.google.cloud.spark.bigquery.BigQueryWriteHelper.writeDataFrameToBigQuery(BigQueryWriteHelper.scala:90)
... 31 more

Hive Partition Table with Date Datatype via Spark

I have a scenario and would like to get an expert opinion on it.
I have to load a Hive table in partitions from a relational DB via spark (python). I cannot create the hive table as I am not sure how many columns there are in the source and they might change in the future, so I have to fetch data by using; select * from tablename.
However, I am sure of the partition column and know that will not change. This column is of "date" datatype in the source db.
I am using SaveAsTable with partitionBy options and I am able to properly create folders as per the partition column. The hive table is also getting created.
The issue I am facing is that since the partition column is of "date" data type and the same is not supported in hive for partitions. Due to this I am unable to read data via hive or impala queries as it says date is not supported as partitioned column.
Please note that I cannot typecast the column at the time of issuing the select statement as I have to do a select * from tablename, and not select a,b,cast(c) as varchar from table.

Timestamp in postgresql during oracle to postgresql migration

I have a table in Oracle with timestamp data in "JAN-16-15 05.10.14.034000000 PM".
When I created the table in postgresql with "col_name timestamp" it is showing data as "2015-01-16 17:10:14.034".
Any suggestions on how can set the column to get the data format as in postgre same to what I have in Oracle?
Timestamps (or dates or numbers) do not have any "format2. Neither in Postgres nor in Oracle or in any other relational database).
Any "format" you see, is applied by your SQL client displaying those values.
You need to configure your SQL client to use a different format for timestamp, or use the to_char() function to format the value as you want.
In particular, to get the format you desire, use
SELECT to_char(current_timestamp, 'MON-MM-YY HH.MI.SS.US000 AM');
The output format can be changed in psql by changing the DateStyle parameter, but I would strongly recommend to not change it away from the default ISO format as that also affects the input that is parsed.