ElasticsearchSinkConnector Failed to deserialize data to Avro - apache-kafka

I created the simplest kafka sink connector config and I'm using confluent 4.1.0:
{
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "test-type",
"tasks.max": "1",
"topics": "dialogs",
"name": "elasticsearch-sink",
"key.ignore": "true",
"connection.url": "http://localhost:9200",
"schema.ignore": "true"
}
and in the topic I save the messages in JSON
{ "topics": "resd"}
But in the result I get an error:
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!

As cricket_007 says, you need to tell Connect to use Json deserialiser, if that's the format your data is in. Add this to your connector configuration:
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false"

That error happens because it's trying to read non Confluent Schema Registry encoded Avro messages.
If the topic data is Avro, it needs to use the Schema Registry.
Otherwise, if topic data is JSON, then you've started the connect cluster with AvroConverter on your keys or values in the property file, where you need to use the JsonConverter instead

Related

Not able to override consumer config in azure iot hub sink connector

I'm making an AzureIoT Hub sink connector using the Microsoft connector class. I am using an AVRO converter on the connector.
I want to use KafkaAvroDeserializer, on the consumer to deserialize the Avro data coming from the topic but I'm unable to override value. deserializer value.
I'm using consumer.override.value.deserializer in the logs.
Could anyone please suggest a way out?
My config is below :
"consumer.value.deserializer": "io.confluent.kafka.serializers.KafkaAvroDeSerializer".
I'm getting the deserializer as byte array and I want it to be kafkaAvroDeserializer
I am making a azure iot hub sink connector. And, I'm getting error deserializing avro data from kafka topic.
{
"config": {
"IotHub.ConnectionString": "connectionString",
"IotHub.MessageDeliveryAcknowledgement": "None",
"confluent.topic.bootstrap.servers": "server",
"confluent.topic.replication.factor": "1","connector.class":"com.microsoft.azure.iot.kafka.connect.sink.IotHubSinkConnector",
"consumer.override.auto.register.schemas": "true",
"consumer.override.id.compatibility.strict": "false",
"consumer.override.latest.compatibility.strict": "false",
"consumer.override.schema.registry.url": "registryUrl",
"consumer.value.deserializer":"io.confluent.kafka.serializers.KafkaAvroDeSerializer",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"name": "TEST1",
"tasks.max": "1",
"topics": "testtopicazure3",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.auto.register.schemas": "true",
"value.converter.schema.registry.url": "registryUrl"
},
}
Getting error :
Caused by:
org.apache.kafka.common.errors.SerializationException: Error
deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException:
Unknown magic byte!
In Connect, you only set value.converter, not consumer client deserializers
value.converter=io.confluent.connect.avro.AvroConverter
And all your consumer.override prefixes should be value.converter, instead
https://docs.confluent.io/kafka-connectors/self-managed/userguide.html#configuring-key-and-value-converters

Kafka Connect - From JSON records to Avro files in HDFS

My current setup contains Kafka, HDFS, Kafka Connect, and a Schema Registry all in networked docker containers.
The Kafka topic contains simple JSON data without a Schema:
{
"repo_name": "ironbee/ironbee"
}
The Schema Registry contains a JSON Schema describing the data in the Kafka Topic:
{"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "http://example.com/example.json",
"type": "object",
"title": "Root Schema",
"required": [
"repo_name"
],
"properties": {
"repo_name": {
"type": "string",
"default": "",
"title": "The repo_name Schema",
"examples": [
"ironbee/ironbee"
]
}
}}
What I am trying to achieve is a Connection that reads JSON data from a Topic and dumps it into files in HDFS (Avro or Parquet).
{
"name": "kafka to hdfs",
"connector.class": "io.confluent.connect.hdfs3.Hdfs3SinkConnector",
"topics": "repo",
"hdfs.url": "hdfs://namenode:9000",
"flush.size": 3,
"confluent.topic.bootstrap.servers": "kafka-1:19092,kafka-2:29092,kafka-3:39092",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schemas.enable": "false",
"value.converter.schema.registry.url": "http://schema-registry:8081"
}
If I try to read the raw JSON value via the StringConverter (no schema used) and dump it into Avro files it works, resulting in
Key=null Value={my json} touples
so no usable structure at all.
When I try to use my schema via the JsonSchemaConverter I get the errors
“Converting byte[] to Kafka Connect data failed due to serialization error of topic”
“Unknown magic byte”
I think that there is something wrong with the configuration of my connection, but after a week of trying everything my google-skills have reached their limits.
All the code is available here: https://github.com/SDU-minions/7-Scalable-Systems-Project/tree/dev/Kafka
raw JSON value via the StringConverter (no schema used)
schemas.enable property only exists on JSONConverter. Strings don't have schemas. JSONSchema always has a schema, so property also doesn't exist there.
When I try to use my schema via the JsonSchemaConverter I get the errors
Your producer needs to use Confluent JSONSchema Serializer. Otherwise, it doesn't get sent to Kafka with the "magic byte" referred to in your error.
I personally haven't tried converting JSON schema records to Avro directly in Connect. Usually the pattern is to either produce Avro directly, or convert within ksqlDB, for example to a new Avro topic, which is then consumed by Connect.

Kafka Connect FileStreamSink connector does not include the KEY in the output file

Trying a simple File sink connector to extract data from a topic. The generated file does not include the event key and I am not able to find a setting that enables that. Eventually the goal will be to load the file using a source connector and produce the same sample data and the event KEY is very important.
Thanks
{
"name": "save-seed-data",
"config": {
"connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector",
"tasks.max": "1",
"name": "save-seed-data",
"topics": "FIRM",
"file": "/tmp/FIRM.txt",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable": "false"
}
}
Not sure where you found that the key should be in the output, since the source code only references the value.
You can download and use a message transform to move the key over into the value, though.
https://github.com/jcustenborder/kafka-connect-transform-archive
Also worth mentioning, that the FileStream Source Connector does not parse the data. Each line, also, only goes into the value
Generally, using kafkacat is much more straightforward for dumping/loading data from files.

Unable to stream data from MySQL to Postgres using Kafka

I am trying Kafka for the first time and set up Kafka cluster using AWS MSK. The objective is to stream data from MySQL server to Postgresql.
I used debezium MySQL connector for source and Confluent JDBC connector for the sink.
MySQL config:
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.server.id": "1",
"tasks.max": "3",
"internal.key.converter.schemas.enable": "false",
"transforms.unwrap.add.source.fields": "ts_ms",
"key.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.value.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
After registering the Mysql connector, its status is "running" and capturing the changes being made in MySQL table and showing result in consumer console in the following format:
{"id":5,"created_at":1594910329000,"userid":"asldnl3r234mvnkk","amount":"B6Eg","wallet_type":"CDW"}
My first issue: in table "amount" column is of type "decimal" and contains numeric value but in consumer console why it is showing as alphanumeric value?
For Postgresql as target DB, I used JDBC sink connector, with following config:
"name": "postgres-connector-db08",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"topics": "mysql-cash.kafka_test.test",
"connection.url": "jdbc:postgresql://xxxxxx:5432/test?currentSchema=public",
"connection.user": "xxxxxx",
"connection.password": "xxxxxx",
"insert.mode": "upsert",
"auto.create": "true",
"auto.evolve": "true"
After registering JDBC connector when I check status it gives an error:
{"name":"postgres-connector-db08","connector":{"state":"RUNNING","worker_id":"x.x.x.x:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"x.x.x.x:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:561)
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)\nCaused by: org.apache.kafka.connect.errors.ConnectException: Sink connector 'postgres-connector-db08' is configured with 'delete.enabled=false' and 'pk.mode=none' and therefore requires records with a non-null Struct value and non-null Struct schema, but found record at (topic='mysql-cash.kafka_test.test',partition=0,offset=0,timestamp=1594909233389) with a HashMap value and null value schema.
io.confluent.connect.jdbc.sink.RecordValidator.lambda$requiresValue$2(RecordValidator.java:83)
io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:82)
io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:66)
io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:74)
org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:539)
... 10 more
"}],"type":"sink"}
Why this error is coming? Is something I missed in the sink config?
https://docs.confluent.io/kafka-connect-jdbc/current/sink-connector/index.html#data-mapping
The sink connector requires knowledge of schemas, so you should use a suitable converter e.g. the Avro converter that comes with Schema Registry, or the JSON converter with schemas enabled.
Since the JSON is plain (has no schema) and the connector is configured with "value.converter.schemas.enable": "false" (JSON converter with schemas disabled), Avro converter should be set up with Schema Registry: https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/#applying-schema
Answer about first issue. "Why decimal is dropped in alpha-numerical format?"
The conversion of decimal depends on decimal.handling.mode configuration.
Specifies how the connector should handle values for DECIMAL and NUMERIC columns: precise (the default) represents them precisely using java.math.BigDecimal values represented in change events in a binary form; or double represents them using double values, which may result in a loss of precision but will be far easier to use. string option encodes values as formatted string which is easy to consume but a semantic information about the real type is lost.
https://debezium.io/documentation/reference/0.10/connectors/mysql.html#decimal-values
If there's no proper conversion configure for you, you can also create custom converter.
https://debezium.io/documentation/reference/stable/development/converters.html
If lucky you can find some open-source converters to solve this issue.

defining unique name for Kafka file sink connector

I have a service which generates XML string and sending it to Kafka topic which should than generate XML file.
At the moment I am using Kafka FileStreamSink connector which generates the file with predefined fixed name.
The filename of that XML file should be generated according to the XML content, how can i do so?
below is my FileStreamSink connector configuration with the predefined filename.
{
"name": "file_sink_stream_01",
"config": {
"connector.class": "FileStreamSink",
"group.id": "file_sink_stream_connector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"topics": "stream_userid_stream",
"file": "file.xml"
}
}
It's not possible to do so with the file sink - the file name is static and even an SMT wouldn't let you redefine it
Note: json converter would output json, not xml
If you absolutely need this, you could try Apache Nifi