Kafka sink connector --> postgres, fails with avro JSON data - apache-kafka

I setup a Kafka JDBC sink to send events to a PostgreSQL.
I wrote this simple producer that sends JSON with schema (avro) data to a topic as follows:
producer.py (kafka-python)
biometrics = {
"heartbeat": self.pulse, # integer
"oxygen": self.oxygen,# integer
"temprature": self.temprature, # float
"time": time # string
}
avro_value = {
"schema": open(BASE_DIR+"/biometrics.avsc").read(),
"payload": biometrics
}
producer.send("biometrics",
key="some_string",
value=avro_value
)
Value Schema:
{
"type": "record",
"name": "biometrics",
"namespace": "athlete",
"doc": "athletes biometrics"
"fields": [
{
"name": "heartbeat",
"type": "int",
"default": 0
},
{
"name": "oxygen",
"type": "int",
"default": 0
},
{
"name": "temprature",
"type": "float",
"default": 0.0
},
{
"name": "time",
"type": "string"
"default": ""
}
]
}
Connector config (without hosts, passwords etc)
{
"name": "jdbc_sink",
"connector.class": "io.aiven.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter ",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "biometrics",
"insert.mode": "insert",
"auto.create": "true"
}
But my connector fails hard, with three errors and I am unable to spot the reason for any of them:
TL;DR; log Version
(Error 1) Caused by: org.apache.kafka.connect.errors.DataException: biometrics
(Error 2) Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
(Error 3) Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Full log
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:498)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:475)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:229)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.DataException: biometrics
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:98)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:498)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Could someone help me understand those errors and the underlying reason?

The error is because you need to use the JSONConverter class w/ value.converter.schemas.enabled=true in your Connector since that was what was produced, but the schema payload is not an Avro schema represntation of the payload, so it might still fail with those changes alone...
If you want to actually send Avro, then use the AvroProducer in confluent-kafka library, which requires running the Schema Registry.

Related

Kafka Connect Mongo Sink connector is failing due to schema

I'm doing my first steps in the kafka-connect area and things are not going as expected.
I created a Mongo sink connector (kafka to mongo), but it fails to consume the messages with the following error:
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic test_topic to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:114)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
I don't wanna use any schema but for some reason it fails on the deserialize, probably because the schema config is missing.
My configurations:
{
"name": "kafka-to-mongo",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"topics": "test_topic",
"connection.uri": "mongodb://mongodb4:27017/test-db",
"database": "auto-resolve",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"collection": "test_collection",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"document.id.strategy.partial.value.projection.list": "id",
"document.id.strategy.partial.value.projection.type": "AllowList",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy"
}
}
Example of a message in the topic:
{
"_id": "62a6e4d88c1e1e0011902616",
"type": "alert",
"key": "test444",
"timestamp": 1655104728,
"source_system": "api.test",
"tags": [
{
"type": "aaa",
"value": "bbb"
},
{
"type": "fff",
"value": "rrr"
}
],
"ack": "no",
"main": "nochange",
"all_ids": []
}
Any ideas? Thanks!

Error when defining Camel S3 source connector

I am trying to define a Camel S3 Source Connector in our confluent environment.
This is the configuration I am using
{
"name": "CamelAWSS3SourceConnector",
"config": {
"connector.class": "org.apache.camel.kafkaconnector.awss3.CamelAwss3SourceConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.camel.kafkaconnector.awss3.converters.S3ObjectConverter",
"camel.source.maxPollDuration": "10000",
"topics": "TEST-S3-SOURCE-POC",
"camel.source.path.bucketNameOrArn": "json-poc",
"camel.component.aws-s3.region": "us-east-1",
"tasks.max": "2",
"camel.source.endpoint.autocloseBody": "true"
}
}
And this is the error I receive, when I try to define the connector
{
"error_code": 405,
"message": "HTTP 405 Method Not Allowed"
} {
"name": "CamelAWSS3SourceConnector",
"connector": {
"state": "RUNNING",
"worker_id": "confluent-connect-server2:8083"
},
"tasks": [{
"id": 0,
"state": "FAILED",
"worker_id": "confluent-connect-server2",
"trace": "org.apache.kafka.connect.errors.ConnectException: Failed to create and start Camel context
at org.apache.camel.kafkaconnector.CamelSourceTask.start(CamelSourceTask.java:118)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:215)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Option bucketNameOrArn is required when creating endpoint uri with syntax aws-s3://bucketNameOrArn
at org.apache.camel.support.component.EndpointUriFactorySupport.buildPathParameter(EndpointUriFactorySupport.java:53)
at org.apache.camel.component.aws.s3.S3EndpointUriFactory.buildUri(S3EndpointUriFactory.java:103)
at org.apache.camel.kafkaconnector.utils.TaskHelper.buildUrl(TaskHelper.java:68)
at org.apache.camel.kafkaconnector.CamelSourceTask.start(CamelSourceTask.java:98)
... 8 more"
}],
"type": "source"
}
What would be the cause of above error?
I am told that since the connect server is an ec2 instance, I don't have to define AWS parameters here. Is that correct?
Thank you
Note: Wanted to add that there are two connect servers and the error shows up for only one of them in the output
Just wanted to post an answer here in case someone else runs into this.
The issue was when using PUT to add or update a connector the url format of the curl request needs to be http(s)://<serverurl>/<connectorName>/config. My json contained the key "name:" and url was just http(s)://. Initially, I created the connector using POST and then was trying to update using PUT to update ( actually, add ) the bucketNameOrArn key which was not getting updated in reality.

Forwarding messages from Kafka to Elasticsearch and Postgresql

I am trying to build an infrastructure in which I need to forward messages from one kafka topic to elasticsearch and postgresql. My infrastructure looks like in the picture below, and it all runs on the same host. Logstash is making some anonymization and some mutates, and sends the document back to kafka as json. Kafka should then forward the message to PostgreSQL and Elasticsearch
Everything works fine, accept the connection to postgresql, with which i'm having some trouble.
My config files looks like follows:
sink-quickstart-sqlite.properties
name=jdbc-test-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
#table.name.format=${topic}
topics=processed
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable:true
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable:true
connection.url=jdbc:postgresql://localhost:5432/postgres
connection.user=postgres
connection.password=luka
insert.mode=upsert
pk.mode=kafka
pk_fields=__connect_topic,__connect_partition,__connect_offset
fields.whitelist=ident,auth,response,request,clientip
auto.create=true
auto.evolve=true
confluent-distributed.properties
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=10000
plugin.path=/usr/share/java
quicstart-elasticsearch.properties
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
#topics=test-elasticsearch-sink
topics=processed
key.ignore=true
connection.url=http://localhost:9200
type.name=kafka-connect
schema.ignore=true
schemas.enable=false
The confluent-schema-registry service is running.
I'm getting the following error after curl http://localhost:8083/connectors/jdbc-sink/status | jq
{
"name": "jdbc-sink",
"connector": {
"state": "RUNNING",
"worker_id": "192.168.50.37:8083"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "192.168.50.37:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:465)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires \"schema\" and \"payload\" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:359)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
"
}
],
"type": "sink"
}
This looks like a message in my "processed" topic (the message in the topic is a one-liner, this is just formated):
{
"ROWTIME": 1587134287569,
"ROWKEY": "null",
"bytes": "4050",
"input": {
"type": "log"
},
"clientip": "156.226.170.95",
"#timestamp": "2020-04-17T14:38:06.346Z",
"timestamp": "17/Apr/2020:14:37:57 +0000",
"#version": "1",
"request": "/lists",
"ident": "536e605f097a92cb07c2a0a81f809f481c5af00d13305f0094057907792ce65e2a62b8ab8ba036f95a840504c3e2f05a",
"httpversion": "1.1",
"auth": "33a7f4a829adfaa60085eca1b84e0ae8f0aa2835d206ac765c22ad440e50d7ae462adafb13306aecfdcd6bd80b52b03f",
"agent": {
"ephemeral_id": "053b9c29-9038-4a89-a2b3-a5d8362460fe",
"version": "7.6.2",
"id": "50e21169-5aa0-496f-b792-3936e9c8de04",
"hostname": "HOSTNAME_OF_MY_AWS_INSTANCE",
"type": "filebeat"
},
"log": {
"offset": 707943,
"file": {
"path": "/home/ec2-user/log/apache.log"
}
},
"host": {
"name": "HOSTNAME_OF_MY_AWS_INSTANCE"
},
"verb": "DELETE",
"ecs": {
"version": "1.4.0"
},
"response": "503"
}
Please let me know if you need some more information.
Your error is here:
DataException: JsonConverter with schemas.enable requires \"schema\" and \"payload\" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
Since this is the JDBC Sink you must provide a schema to your data. If you have the option, I would suggest you use Avro. If not, you must structure your JSON data as required by Kafka Connect.
More info: https://www.youtube.com/watch?v=b-3qN_tlYR4&t=981s

S3 sink record field TimeBasedPartitioner not working

I am trying to deploy s3 sink connector where s3 partitions needs to be based on a field coming in data props.eventTime
Following is my config :
{
"name" : "test_timeBasedPartitioner",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"partition.duration.ms": "3600000",
"s3.region": "us-east-1",
"topics.dir": "dream11",
"flush.size": "50000",
"topics": "test_topic",
"s3.part.size": "5242880",
"tasks.max": "5",
"timezone": "Etc/UTC",
"locale": "en",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"rotate.schedule.interval.ms": "1800000",
"path.format": "'EventDate'=YYYYMMdd",
"s3.bucket.name": "test_bucket",
"partition.duration.ms": "86400000",
"timestamp.extractor": "RecordField",
"timestamp.field": "props.eventTime"
}
Following is my sample json that is present in kafka topic :
{
"eventName": "testEvent",
"props": {
"screen_resolution": "1436x720",
"userId": 0,
"device_name": "1820",
"eventTime": "1565792661712"
}
}
And the exception that I am getting is :
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:302)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Invalid format: "1564561484906" is malformed at "4906"
at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at io.confluent.connect.storage.partitioner.TimeBasedPartitioner$RecordFieldTimestampExtractor.extract(TimeBasedPartitioner.java:281)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:199)
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:176)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:195)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:524)
... 10 more
Is there something that I am missing here to configure?
Any help would be appreciated.
Your field props.eventTime is coming in as microsecond and not millisecond.
This can be identified in the stack trace and by inspecting the relevant code in org.joda.time doParseMillis method, which is used by the Connector partitioner TimeBasedPartitioner and its timestamp extractor from message payload RecordFieldTimestampExtractor when the timestamp.field is a STRING:
Caused by: java.lang.IllegalArgumentException: Invalid format: "1564561484906" is malformed at "4906"
at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at io.confluent.connect.storage.partitioner.TimeBasedPartitioner$RecordFieldTimestampExtractor.extract(TimeBasedPartitioner.java:281)
You could follow one of these solutions:
Write your own TimestampExtractor to support microsecond. You can check how to write a custom TimestampExtractor here.
Change/Transform your source data to include a field that comes in with millisecond instead of microsecond
Followup on some issues where default TimestampExtractor flexibility is discussed and suggest/contribute to have it support your use case.

Deserializing exception with data generated ksql-datagen utility

Generated sample stream from ksql-datagen utility from following schema -
{
"type": "record",
"name": "users",
**"namespace": "com.example",**
"fields": [
{
"name": "registertime",
"type": {
"type":"long",
"arg.properties":{
"range":{"min":1487715775521,"max":1519273364600}
}
}
},
{
"name": "userid",
"type": {
"type":"string",
"arg.properties":{"regex":"User_[1-9][0-2]"}
}
},
{
"name": "regionid",
"type": {
"type":"string",
"arg.properties":{"regex":"Region_[1-9]"}
}
},
{
"name": "gender",
"type": {
"type":"string",
"arg.properties":{
"options":["MALE","FEMALE","OTHER"]
}
}
}
]}
while checked for versions, it still picks "io.confluent.ksql.avro_schemas" schema -
curl "http://localhost:8081/subjects/test-user-value/versions/1"
{"subject":"test-user-value","version":1,"id":4,"schema":"{"type":"record","name":"KsqlDataSourceSchema","namespace":"io.confluent.ksql.avro_schemas","fields":[{"name":"registertime","type":["null","long"],"default":null},{"name":"userid","type":["null","string"],"default":null},{"name":"regionid","type":["null","string"],"default":null},{"name":"gender","type":["null","string"],"default":null}]}"}
Got following error while tried to consume with Kafka-streams API -
Exception in thread
"PageView-Users-Stream-Join-eg-1dc610a3-c9d9-4c1e-b5eb-910e4bc74826-StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: Deserialization
exception handler is set to fail upon a deserialization error. If you
would rather have the streaming pipeline continue after a
deserialization error, please set the
default.deserialization.exception.handler appropriately. at
org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:80)
at
org.apache.kafka.streams.processor.internals.RecordQueue.maybeUpdateTimestamp(RecordQueue.java:160)
at
org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:101)
at
org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:124)
at
org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:711)
at
org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:995)
at
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:833)
at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:777)
at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:747)
Caused by: org.apache.kafka.common.errors.SerializationException:
Error deserializing Avro message for id 4 Caused by:
org.apache.kafka.common.errors.SerializationException: Could not find
class io.confluent.ksql.avro_schemas.KsqlDataSourceSchema specified in writer's schema whilst finding reader's schema for a
SpecificRecord.
Answered at https://github.com/confluentinc/schema-registry/issues/980
Datagen always defines the namespace as io.confluent.ksql.avro_schemas. See confluentinc/ksql#1906
There are now other ways to generate test data into Kafka, too.