Debezium: Mysql LONGTEXT to Debezium Data type conversions is not correct - debezium

mysql schema
`Info` longtext,
debezium schema for the same field
{
"name": "Info",
"type": [
"null",
"string"
],
"default": null
},
When this data is loaded in Redshift it fails as it expects the data type to be large i.e. VARCHAR(MAX) but it is getting VARCHAR(255) since debezium is not transforming longtext to long.
Please suggest, why is this happening.

please take a look at https://debezium.io/documentation/reference/1.2/connectors/mysql.html#mysql-property-column-propagate-source-type
This will add the type constarint parameters into the schema.
Also IIUC you are using Confluent Avro Converter. If yes then set enhanced.avro.schema.support and connect.meta.data to true.
In this case you will need to convert the Debezium onstraint params into ones supported by the sink converter if such functionlaity is provided.

Related

debezium date/time field value out of range: 0000-12-30T00:00:00Z

We use debezium to sync data to
In the source table we have column timestamptz start_at, when the value is zero start_at='0001-01-01 00:00:00.000000 +00:00', but when we check the data in kafka, it's changed to start_at = '0000-12-30T00:00:00Z'. It lead to the error when we use jdbc sink connector to write to another postgres db
ERROR: date/time field value out of range: "0000-12-30T00:00:00Z"
here my debezium config about time.precesision.mode
"time.precision.mode": "connect",
"decimal.handling.mode": "double",
is there any solution to set data on kafka match zero timestamp value in postgres?

Debezium or JDBC skip tables without primary key

I have Debezium in a container, capturing all changes of PostgeSQL database records. In addition a have a confluent JDBC container to write all changes to another database.
In the source connector the definition is allowing multiple tables to be captured and therefore some of the tables have primary keys and some others not.
In the sink connector the definition specifies pk.mode like the following:
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
But while there are tables in the source database without keys on the tables, sink connector is throwing the following message:
Caused by: org.apache.kafka.connect.errors.ConnectException: PK mode for table 'contex_str_dealer_branch_address' is RECORD_KEY, but record key schema is missing
Normally there should be a few options
to exclude tables without primary keys from the source
to exclude tables without primary keys from the sink
to have primary key for tables that they have or use all other columns for tables that they have not primary key
Is there any way to skip those tables from any operation?
In debezium there isn't a parameter to filter tables without primary keys.
If you know the table in advance you can use database.exclude.list in debeziun connector source.
In the JDBC sink connector you have two options:
Ignored as no fields are used as primary key.
"pk.mode": "none"
Or all fields from the value struct will be used
"pk.mode": "record_value"

How do I define the type of a field starting with a #?

I am trying to create a stream from some kakfa messages in json format like :
"beat": {
"name": "xxxxxxx",
"hostname": "xxxxxxxxxx",
"version": "zzzzz"
},
"log_instance": "forwarder-2",
"type": "prod",
"message": "{ ... json string.... }",
"#timestamp": "2020-06-14T23:31:33.925Z",
"input_type": "log",
"#version": "1"
}
I tried using
CREATE STREAM S (
beat STRUCT<
name VARCHAR,
hostname VARCHAR,
version VARCHAR
>,
log_instance VARCHAR,
type VARCHAR,
message VARCHAR, # for brevity - I also have this with a struct
#timestamp VARCHAR,
input_type VARCHAR,
#version` VARCHAR )
WITH (KAFKA_TOPIC='some_topic', VALUE_FORMAT='JSON');
However I get an error :
Caused by: line 10:5: extraneous input '#' expecting ....
I tied quoting and preceeding underscore but no luck. I also tried creating an entry in the registry but I could not create legit avro this way.
PS. How do I "bind" a topic to a registry schema?
Thanks.
If you're on a recent enough version of ksqlDB then simply quoting the column names with invalid characters should work:
CREATE STREAM S (
beat STRUCT<
name VARCHAR,
hostname VARCHAR,
version VARCHAR
>,
log_instance VARCHAR,
type VARCHAR,
message VARCHAR, # for brevity - I also have this with a struct
`#timestamp` VARCHAR,
input_type VARCHAR,
`#version` VARCHAR )
WITH (KAFKA_TOPIC='some_topic', VALUE_FORMAT='JSON');
If the above doesn't work, then it's likely you're on an old version of ksqlDB. Upgrading should fix this issue.
PS. How do I "bind" a topic to a registry schema?
ksqlDB will auto-publish the JSON schema to the Schema Registry if you use the JSON_SR format, rather than just JSON. The latter only supports reading the schema from the schema registry.
If you're more asking how you register a schema in the SR for an existing topic... then you're best off looking at the SR docs. Note, ksqlDB only supports the TopicNameStrategy naming strategy. The value schema has the subject {topic-name}-value, e.g. the following registers a JSON schema for the test topic's values.
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"Payment\",\"namespace\":\"io.confluent.examples.clients.basicavro\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"amount\",\"type\":\"double\"}]}"}' http://localhost:8081/subjects/test-value/versions
See the SR tutorial for more info: https://docs.confluent.io/current/schema-registry/schema_registry_tutorial.html
I also tried creating an entry in the registry but I could not create legit avro this way.
Avro does not allow # in its field names. However, it looks like your data is in JSON format, which does allow #. See the curl example above on how to register a JSON schema.

NIFI Insert CSV File into Postgres Database with date fields

I would like to insert csv file into my postgres database. I use processors :
Getfiles ->
Split (cause files are big) ->
UpdateAttribute (to add avro.schema) ->
ConvertCSvToAvro ->
Putdatabaserecord.
If i use only string/text fields (in my avro schema and in column postgres database), the result is ok.
But when i tried to format Date fields, i have an error.
My raw data (CSV) is :
date_export|num_etiquette|key
07/11/2019 01:36:00|BAROMETRExxxxx|BAROMETRE-xxxxx
My avro schema is :
{
"type":"record",
"name":"public.data_scope_gp_temp",
"fields":[
{"name":"date_export","type":{ "type": "int", "logicalType": "date"}},
{"name":"num_etiquette","type":"string"},
{"name":"cle_scope","type":"string"}
]}
My postgres schema is:
date_export date,
num_etiquette text COLLATE pg_catalog."default",
key text COLLATE pg_catalog."default"
Any idea ?Regards
You don't need UpdateAttribute or ConvertCsvToAvro to use PutDatabaseRecord. You can specify a CSVReader in PutDatabaseRecord, and your CSVReader can supply the Avro schema in the Schema Text property (don't forget to set your Schema Strategy to Use Schema Text).

ADF json to sql copy empty value not inserted as null

I am attempting to copy json data to a sql table and noticed that any empty value is not being inserted as a null, even though the column is nullable. It seems to be inserting an empty string.
I have tried to add nullValue and treatEmptyAsNull parameters like the code below, but that made no difference:
"source": {
"type": "BlobSource",
"recursive": true,
"nullValue": "",
"treatEmptyAsNull": true
},
I am expecting a null to be inserted.
Is this standard behavior for ADF copy using json as a source to not insert empty values as null? Is there other properties I need to add to the json?
The value inserted into SQL db can't be null directly because your source data is empty string "",not null value. ADF copy activity can't convert empty string to null automatically for you.
However, you could invoke a stored procedure in sql server dataset. In that SP, you could convert the "" to null value as you want before the columns inserted into table. Please follow the detail steps in above link or some example in my previous case:Azure Data factory copy activity failed mapping strings (from csv) to Azure SQL table sink uniqueidentifier field.