Not able to override consumer config in azure iot hub sink connector - apache-kafka

I'm making an AzureIoT Hub sink connector using the Microsoft connector class. I am using an AVRO converter on the connector.
I want to use KafkaAvroDeserializer, on the consumer to deserialize the Avro data coming from the topic but I'm unable to override value. deserializer value.
I'm using consumer.override.value.deserializer in the logs.
Could anyone please suggest a way out?
My config is below :
"consumer.value.deserializer": "io.confluent.kafka.serializers.KafkaAvroDeSerializer".
I'm getting the deserializer as byte array and I want it to be kafkaAvroDeserializer
I am making a azure iot hub sink connector. And, I'm getting error deserializing avro data from kafka topic.
{
"config": {
"IotHub.ConnectionString": "connectionString",
"IotHub.MessageDeliveryAcknowledgement": "None",
"confluent.topic.bootstrap.servers": "server",
"confluent.topic.replication.factor": "1","connector.class":"com.microsoft.azure.iot.kafka.connect.sink.IotHubSinkConnector",
"consumer.override.auto.register.schemas": "true",
"consumer.override.id.compatibility.strict": "false",
"consumer.override.latest.compatibility.strict": "false",
"consumer.override.schema.registry.url": "registryUrl",
"consumer.value.deserializer":"io.confluent.kafka.serializers.KafkaAvroDeSerializer",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"name": "TEST1",
"tasks.max": "1",
"topics": "testtopicazure3",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.auto.register.schemas": "true",
"value.converter.schema.registry.url": "registryUrl"
},
}
Getting error :
Caused by:
org.apache.kafka.common.errors.SerializationException: Error
deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!

In Connect, you only set value.converter, not consumer client deserializers
value.converter=io.confluent.connect.avro.AvroConverter
And all your consumer.override prefixes should be value.converter, instead
https://docs.confluent.io/kafka-connectors/self-managed/userguide.html#configuring-key-and-value-converters

Related

Kafka Connect - From JSON records to Avro files in HDFS

My current setup contains Kafka, HDFS, Kafka Connect, and a Schema Registry all in networked docker containers.
The Kafka topic contains simple JSON data without a Schema:
{
"repo_name": "ironbee/ironbee"
}
The Schema Registry contains a JSON Schema describing the data in the Kafka Topic:
{"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "http://example.com/example.json",
"type": "object",
"title": "Root Schema",
"required": [
"repo_name"
],
"properties": {
"repo_name": {
"type": "string",
"default": "",
"title": "The repo_name Schema",
"examples": [
"ironbee/ironbee"
]
}
}}
What I am trying to achieve is a Connection that reads JSON data from a Topic and dumps it into files in HDFS (Avro or Parquet).
{
"name": "kafka to hdfs",
"connector.class": "io.confluent.connect.hdfs3.Hdfs3SinkConnector",
"topics": "repo",
"hdfs.url": "hdfs://namenode:9000",
"flush.size": 3,
"confluent.topic.bootstrap.servers": "kafka-1:19092,kafka-2:29092,kafka-3:39092",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schemas.enable": "false",
"value.converter.schema.registry.url": "http://schema-registry:8081"
}
If I try to read the raw JSON value via the StringConverter (no schema used) and dump it into Avro files it works, resulting in
Key=null Value={my json} touples
so no usable structure at all.
When I try to use my schema via the JsonSchemaConverter I get the errors
“Converting byte[] to Kafka Connect data failed due to serialization error of topic”
“Unknown magic byte”
I think that there is something wrong with the configuration of my connection, but after a week of trying everything my google-skills have reached their limits.
All the code is available here: https://github.com/SDU-minions/7-Scalable-Systems-Project/tree/dev/Kafka
raw JSON value via the StringConverter (no schema used)
schemas.enable property only exists on JSONConverter. Strings don't have schemas. JSONSchema always has a schema, so property also doesn't exist there.
When I try to use my schema via the JsonSchemaConverter I get the errors
Your producer needs to use Confluent JSONSchema Serializer. Otherwise, it doesn't get sent to Kafka with the "magic byte" referred to in your error.
I personally haven't tried converting JSON schema records to Avro directly in Connect. Usually the pattern is to either produce Avro directly, or convert within ksqlDB, for example to a new Avro topic, which is then consumed by Connect.

KSQLDB Push Queries Fail to Deserialize Data - Schema Lookup Performed with Wrong Schema ID

I'm not certain what I could be missing.
I have set up a Kafka broker server, with a Zookeeper and a distributed Kafka Connect.
For schema management, I have set up an Apicurio Schema Registry instance
I also have KSQLDB setup
The following I can confirm is working as expected
My source JDBC connector successfully pushed table data into the topic stss.market.info.public.ice_symbols
Problem:
Inside the KSQLDB server, I have successfully created a table from the topic stss.market.info.public.ice_symbols
Here is the detail of the table created
The problem I'm facing is when performing a push query against this table, it returns no data. Deserialization of the data fails due to the unsuccessful lookup of the AVRO schema in the Apicurio Registry.
Looking at the Apicurio Registry logs reveals that KSQLDB calls to Apicrio Registry to fetch the deserialization schema using a schema ID of 0 instead of 5, which is the ID of the schema I have registered in the registry.
KSQLDB server logs also confirm this 404 HTTP response in the Apicurio logs as shown in the image below
Expectation:
I expect, KSQLDB queries to the table to perform a schema lookup with an ID of 5 and not 0. I'm guessing I'm probably missing some configuration.
Here is the image of the schema registered in the Apicruio Registry
Here is also my source connector configuration. It has the appropriate schema lookup strategy configured. Although, I don't believe KSQLDB requires this when deserialization its table data. This configuration should only be relevant to the capturing of the table data, and its validation and storage in the topic stss.market.info.public.ice_symbols.
{
"name": "new.connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "172.17.203.10",
"database.port": "6000",
"database.user": "postgres",
"database.password": "123",
"database.dbname": "stss_market_info",
"database.server.name": "stss.market.info",
"table.include.list": "public.ice_symbols",
"message.key.columns": "public.ice_symbols:name",
"snapshot.mode": "always",
"transforms": "unwrap,extractKey",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field": "name",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url": "http://local-server:8080/apis/registry/v2",
"value.converter.apicurio.registry.auto-register": true,
"value.converter.apicurio.registry.find-latest": true,
"value.apicurio.registry.as-confluent": true,
"name": "new.connector",
"value.converter.schema.registry.url": "http://local-server:8080/apis/registry/v2"
}
}
Thanks in advance for any assistance.
You can specify the "VALUE_SCHEMA_ID=5" property in the WITH clause when you create a stream/table.

Kafka Sink how to map fields to db with different topic and table schema name

I am currently setting up the Kafka Sink connector with a topic name waiting-room, while my db schema is called waiting_room. So I am trying to map the topic message to the db schema but I do not see any data entering the database.
So I tried the following scenario:
So since the table schema is waiting_room I tried to add quote.sql.identifier=ALWAYS since it quotes table name and allow the Kafka sink to quote it so it can map to the table but I did not see quote.sql.identifier=ALWAYS in the Kafka sink. Does both table.schema and Kafka sink need to be quote inorder to map it or how can I map with table schema as underscore and have kafka map it
Then if I changed the table.name.format=waiting-room and have the db schema = gt.namespace."waiting-room" I do not see my kafka sink get updated but instead my table.name.format will = waiting_room and have the status of the connector as 404 not found.
Is there a way to map and have data enter to the db when topic and db name different
Try to use Kafka Connect SMT RegexRouter:
{
"task.max": "1",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "'"$URL"'",
"topics": "waiting-room",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "waiting-room",
"transforms.route.replacement": "gt.namespace.waiting_room",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": true
}

Unable to stream data from MySQL to Postgres using Kafka

I am trying Kafka for the first time and set up Kafka cluster using AWS MSK. The objective is to stream data from MySQL server to Postgresql.
I used debezium MySQL connector for source and Confluent JDBC connector for the sink.
MySQL config:
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.server.id": "1",
"tasks.max": "3",
"internal.key.converter.schemas.enable": "false",
"transforms.unwrap.add.source.fields": "ts_ms",
"key.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.value.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
After registering the Mysql connector, its status is "running" and capturing the changes being made in MySQL table and showing result in consumer console in the following format:
{"id":5,"created_at":1594910329000,"userid":"asldnl3r234mvnkk","amount":"B6Eg","wallet_type":"CDW"}
My first issue: in table "amount" column is of type "decimal" and contains numeric value but in consumer console why it is showing as alphanumeric value?
For Postgresql as target DB, I used JDBC sink connector, with following config:
"name": "postgres-connector-db08",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"topics": "mysql-cash.kafka_test.test",
"connection.url": "jdbc:postgresql://xxxxxx:5432/test?currentSchema=public",
"connection.user": "xxxxxx",
"connection.password": "xxxxxx",
"insert.mode": "upsert",
"auto.create": "true",
"auto.evolve": "true"
After registering JDBC connector when I check status it gives an error:
{"name":"postgres-connector-db08","connector":{"state":"RUNNING","worker_id":"x.x.x.x:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"x.x.x.x:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:561)
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)\nCaused by: org.apache.kafka.connect.errors.ConnectException: Sink connector 'postgres-connector-db08' is configured with 'delete.enabled=false' and 'pk.mode=none' and therefore requires records with a non-null Struct value and non-null Struct schema, but found record at (topic='mysql-cash.kafka_test.test',partition=0,offset=0,timestamp=1594909233389) with a HashMap value and null value schema.
io.confluent.connect.jdbc.sink.RecordValidator.lambda$requiresValue$2(RecordValidator.java:83)
io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:82)
io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:66)
io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:74)
org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
... 10 more
"}],"type":"sink"}
Why this error is coming? Is something I missed in the sink config?
https://docs.confluent.io/kafka-connect-jdbc/current/sink-connector/index.html#data-mapping
The sink connector requires knowledge of schemas, so you should use a suitable converter e.g. the Avro converter that comes with Schema Registry, or the JSON converter with schemas enabled.
Since the JSON is plain (has no schema) and the connector is configured with "value.converter.schemas.enable": "false" (JSON converter with schemas disabled), Avro converter should be set up with Schema Registry: https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/#applying-schema
Answer about first issue. "Why decimal is dropped in alpha-numerical format?"
The conversion of decimal depends on decimal.handling.mode configuration.
Specifies how the connector should handle values for DECIMAL and NUMERIC columns: precise (the default) represents them precisely using java.math.BigDecimal values represented in change events in a binary form; or double represents them using double values, which may result in a loss of precision but will be far easier to use. string option encodes values as formatted string which is easy to consume but a semantic information about the real type is lost.
https://debezium.io/documentation/reference/0.10/connectors/mysql.html#decimal-values
If there's no proper conversion configure for you, you can also create custom converter.
https://debezium.io/documentation/reference/stable/development/converters.html
If lucky you can find some open-source converters to solve this issue.

ElasticsearchSinkConnector Failed to deserialize data to Avro

I created the simplest kafka sink connector config and I'm using confluent 4.1.0:
{
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "test-type",
"tasks.max": "1",
"topics": "dialogs",
"name": "elasticsearch-sink",
"key.ignore": "true",
"connection.url": "http://localhost:9200",
"schema.ignore": "true"
}
and in the topic I save the messages in JSON
{ "topics": "resd"}
But in the result I get an error:
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
As cricket_007 says, you need to tell Connect to use Json deserialiser, if that's the format your data is in. Add this to your connector configuration:
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false"
That error happens because it's trying to read non Confluent Schema Registry encoded Avro messages.
If the topic data is Avro, it needs to use the Schema Registry.
Otherwise, if topic data is JSON, then you've started the connect cluster with AvroConverter on your keys or values in the property file, where you need to use the JsonConverter instead