confluent kafka avro producer schema error - apache-kafka

I am using the example code from https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/avro_producer.py to load data onto a topic. Only one change I have done and that is I have added "default": null to each field for schema compatibility. It gets loaded fine as I can see the message and schema in http://localhost:9021/. I also am able to see the data coming into the topic if I run the kafka-avro-console-consumer command via cli.
But trying to use redshift sink with configuration as provided in https://docs.confluent.io/current/connect/kafka-connect-aws-redshift/index.html, I get the following error as shown below. However, if I don't add "default": null in the fields, then it goes till the end all fine. Any guidance would be much appreciated.
org.apache.kafka.connect.errors.SchemaBuilderException: Invalid default value
at org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:131)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1812)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1567)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1687)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1543)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1226)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:108)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:324)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:200)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.DataException: Invalid value: null used for required field: "null", schema type: INT32
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:220)
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:213)
at org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:129)

It's not enough to add "default: null", you need to amend the type to be something like:
type: ["null", "string"], default: null
taking care to add "null" to the type in the first position, ie. not
type: ["string", "null"], default: null
See discussion at:
http://apache-avro.679487.n3.nabble.com/How-to-declare-an-optional-field-tp4025089p4025094.html

Related

Kafka Connect 'ExtractField$Key' SMT results in 'Unknown Field' error

I have a setup of Debezium connector (running on ksqlDB-server) that's streaming values from SQL Server CDC Tables to Kafka Topics. I'm trying to transform the key of my message from JSON to Integer value. The example key I'm receiving looks like this: {"InternalID":11117} and I want to represent it as just a number 11117. According to Kafka Connect documentation this should be fairly easy with ExtractField SMT. However when I'm configuring my connector to use this transform I'm receiving an error Caused by: java.lang.IllegalArgumentException: Unknown field: InternalID.
Connector config:
CREATE SOURCE CONNECTOR properties_sql_connector WITH (
'connector.class'= 'io.debezium.connector.sqlserver.SqlServerConnector',
'database.hostname'= 'propertiessql',
'database.port'= '1433',
'database.user'= 'XXX',
'database.password'= 'XXX',
'database.dbname'= 'Properties',
'database.server.name'= 'properties',
'table.exclude.list'= 'dbo.__EFMigrationsHistory',
'database.history.kafka.bootstrap.servers'= 'kafka:9091',
'database.history.kafka.topic'= 'dbhistory.properties',
'key.converter.schemas.enable'= 'false',
'transforms'= 'unwrap,extractField',
'transforms.unwrap.type'= 'io.debezium.transforms.ExtractNewRecordState',
'transforms.unwrap.delete.handling.mode'= 'none',
'transforms.extractField.type'= 'org.apache.kafka.connect.transforms.ExtractField$Key',
'transforms.extractField.field'= 'InternalID',
'key.converter'= 'org.apache.kafka.connect.json.JsonConverter');
Error details:
--------------------------------------------------------------------------------------------------------------------------------------
0 | FAILED | org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:223)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:149)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:355)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:258)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Unknown field: InternalID
at org.apache.kafka.connect.transforms.ExtractField.apply(ExtractField.java:65)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:173)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:207)
... 11 more
Any ideas for why this transform is failing? Am I missing some configuration?
When extractField transform is removed the key of my message looks like the above:{"InternalID":11117}
In order to extract a named field from JSON, you'll need schemas.enable = 'true' for that converter
For any data that's not sourced from Debezium, that'll require the JSON has a schema as part of the event.
Or, if you're using the Schema Registry, switch to a different converter that uses that, and it should work.

Error with Confluent Kafka Azure EventHub Source Connector

I am trying to consume data from an Azure EventHub to my Confluent Cloud Platform Kafka (installed on an AWS EC2).
However, the Confluent Control Center is telling me that my connector failed upon launching it and reading the log output, I see the following error:
[2021-07-15 21:46:16,362] ERROR WorkerSourceTask{id=EventHubsLinuxMetrics-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:184)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(WorkerSourceTask.java:290)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:319)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:249)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.DataException: Found null value for non-optional schema
at io.confluent.connect.avro.AvroData.validateSchemaValue(AvroData.java:1207)
at io.confluent.connect.avro.AvroData.fromConnectData(AvroData.java:389)
at io.confluent.connect.avro.AvroData.fromConnectData(AvroData.java:352)
at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:89)
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$1(WorkerSourceTask.java:290)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
The error in questions says: Found null value for non-optional schema.
This is my connector config:
{
"name": "EventhubLinuxMetrics",
"config": {
"confluent.topic.bootstrap.servers": "localhost:9092",
"connector.class": "io.confluent.connect.azure.eventhubs.EventHubsSourceConnector",
"kafka.topic": "linuxMetrics",
"tasks.max": "1",
"azure.eventhubs.sas.keyname": "<censored for security>",
"azure.eventhubs.sas.key": "<censored for security>",
"azure.eventhubs.namespace": "<censored for security>",
"azure.eventhubs.hub.name": "linuxvmmetrics"
}
}
Is there anything wrong with my config? Is localhost:9092 the correct value for the bootstrap.servers field?

Sending from Logastash to Kafka in with Avro

I am trying to send data from logstash into kafka using an avro schema.
My logstash output looks like:
kafka{
codec => avro {
schema_uri => "/tmp/avro/hadoop.avsc"
}
topic_id => "hadoop_log_processed"
}
My schema file looks like:
{"type": "record",
"name": "hadoop_schema",
"fields": [
{"name": "loglevel", "type": "string"},
{"name": "error_msg", "type": "string"},
{"name": "syslog", "type": ["string", "null"]},
{"name": "javaclass", "type": ["string", "null"]}
]
}
Output of kafka-console-consumer:
CElORk+gAURvd24gdG8gdGhlIGxhc3QgbWVyZ2UtcGCzcywgd2l0aCA3IHNlZ21lbnRzIGxlZnQgb2YgdG90YWwgc256ZTogMjI4NDI0NDM5IGJ5dGVzAAxbbWFpbl0APm9yZy5hcGFjaGUuaGFkb29wLm1hcHJlZC5NZXJnZXI=
CElORk9kVGFzayAnYXR0ZW1wdF8xNDQ1JDg3NDkxNDQ1XzAwMDFfbV8wMDAwMDRfMCcgZG9uZS4ADFttYWluXQA6t3JnLmFwYWNoZS5oYWRvb6AubWFwcmVkLlRhc2s=
CElORk9kVGFzayAnYXR0ZW1wdF8xNDQ1JDg3NDkxNDQ1XzAwMDFfbV8wMDAwMDRfMCcgZG9uZS4ADFttYWluXQA6t3JnLmFwYWNoZS5oYWRvb6AubWFwcmVkLlRhc2s=
CElORk9OVGFza0hlYAJ0YmVhdEhhbmRsZXIgdGhyZWFkIGludGVycnVwdGVkAERbVGFza0hlYXJdYmVhdEhhbmRsZXIgUGluZ0NoZWNrZXJdAG5vcmcuYVBhY2hlLmhhZG9vcC5tYXByZWR1Y2UudjIuYXBwLlRhc2tIZWFydGJ3YXRIYW5kbGVy
I am getting also the following error in my connector:
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:465)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic hadoop_log_processed to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:110)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
I know that I encoding the data on the logstash site. Do I have to decode the messages during the input in kafka, or can I decode/deserialize the data in the connector config?
Is there a way to disable the encoding on the logstash site? I read about an base64_encoding options, but it seems it hasn't the option.
The problem you have here is that Logstash's Avro codec is not serialising the data into an Avro form that the Confluent Schema Registry Avro deserialiser expects.
Whilst Logstash takes an avsc and encodes the data into a binary form based on that, the Confluent Schema Registry [de]serialiser instead stores & retrieves a schema directly from the registry (not avsc files).
So when you get Failed to deserialize data … SerializationException: Unknown magic byte! this is the Avro deserialiser saying that it doesn't recognise the data as Avro that's been serialised using the Schema Registry serialiser.
I had a quick Google and found this codec that looks like it supports the Schema Registry (and thus Kafka Connect, and any other consumer deserialising Avro data this way).
Alternatively, write your data as JSON into Kafka and use the org.apache.kafka.connect.json.JsonConverter in Kafka Connect to read it from the topic.
Ref:
http://rmoff.dev/ksldn19-kafka-connect
https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/

Kafka Sink Connector fails: Schema not found; error code: 40403

I have a sink connector with the following configuration
{
"name": "sink-test-mariadb-MY_TOPIC",
"config": {
"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":"10",
"topics":"MY_TOPIC",
"connection.url":"jdbc:mariadb://localhost:3306/myschema?user=myuser&password=mypass",
"auto.create":"false",
"auto.evolve":"true",
"table.name.format":"MY_TABLE",
"pk.mode":"record_value",
"pk.fields":"ID",
"insert.mode":"upsert",
"transforms":"ExtractField",
"transforms.ExtractField.type":"org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractField.field":"data"
}
}
and after a while all the tasks of the connector fail with the following error:
{
"state": "FAILED",
"trace": "org.apache.kafka.connect.errors.DataException: MY_TOPIC
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:95)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 802
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:202)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:229)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:409)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:402)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:119)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:192)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getById(CachedSchemaRegistryClient.java:168)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:121)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:194)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:120)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:83)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)",
"id": 0,
"worker_id": "localhost:8083"
}
The connector manages to synchronise the topic with tha database but it suddenly fails without any reason. I am also very sure that the schema is there. Its subject appears in the list returned by calling schema registry API localhost:8081/subjects
[
...
MY_TOPIC-value
...
]
I had the same problem and I realized that the code 40403 doesn't really mean that the schema was not found, it means that the schema does not correspond to the required one. a different code exists in case the schema was not found at all (40401).
So all I did was to change the schema accordingly and it worked for me.
The message on the Kafka topic is serialised with a different version of the schema that the one you have on the Schema Registry. Perhaps it was generated by a tool that wrote the schema to a different Schema Registry, or in a different environment? In order to be able to deserialise it Kafka Connect needs to be able to retrieve the schema ID that is in the magic byte at the beginning of the Kafka message on the topic.
The schema is not present on your Schema Registry, as seen by :
GET /schemas/ids/803
{ "error_code": 40403, "message": "Schema not found" }
You can inspect the ID of the schema that you do have by looking at
curl -s "http://localhost:8081/subjects/MY_TOPIC-value/versions/3/"|jq '.id'

Kafka Connect export multiple event types from same topic

I am trying to use a new feature (https://www.confluent.io/blog/put-several-event-types-kafka-topic/) regarding storing two
different types of events on the same topic. Actually I am using Confluent version 4.1.0 and set these properties below
to make this happen
properties.put(KafkaAvroSerializerConfig.VALUE_SUBJECT_NAME_STRATEGY,TopicRecordNameStrategy.class.getName());
properties.put("value.multi.type", true);
Data are written to topic without issues and can be seen from a Kafka Streams application as Generic Avro Records. Also
on the Kafka Schema registry two new entries are created one for each event hosted on that specific topic.
The problem I am facing is that I cannot export these data from this topic using Kafka Connect. In the simplest case when
I use a File Sink Connector as below
{
"name": "sink-connector",
"config": {
"topics": "source-topic",
"connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector",
"tasks.max": 1,
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"value.subject.name.strategy":"io.confluent.kafka.serializers.subject.TopicRecordNameStrategy",
"file": "/tmp/sink-file.txt"
}
}
I get an error from the Connector that seems to be some kind of serialization error based on AvroConverter like
the one shown here
org.apache.kafka.connect.errors.DataException: source-topic
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:95)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 2
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:202)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:229)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:296)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:284)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getVersionFromRegistry(CachedSchemaRegistryClient.java:125)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getVersion(CachedSchemaRegistryClient.java:236)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:152)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:194)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:120)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:83)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Note that schema registry has an Avro schema with id 2 and another with schema id 3 that describe the two events hosted on the
same topic. Same issues arise when using JDBC connector.
So how do I handle this case in order to export data from my Kafka Cluster
to an external system. Am I missing something on my configuration ? Is it possible to have a topic with multiple type of events
and export them through Kafka Connect ?
Found the solution. My code was passing key as String and value as avro. Hive-sink while reading tried to lookup for avro schema of key and was not able to find it.
Adding the property
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schema.registry.url=http://localhost:8081
helped resolve the issue.