Error with Confluent Kafka Azure EventHub Source Connector - apache-kafka

I am trying to consume data from an Azure EventHub to my Confluent Cloud Platform Kafka (installed on an AWS EC2).
However, the Confluent Control Center is telling me that my connector failed upon launching it and reading the log output, I see the following error:
[2021-07-15 21:46:16,362] ERROR WorkerSourceTask{id=EventHubsLinuxMetrics-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:184)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(WorkerSourceTask.java:290)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:319)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:249)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.DataException: Found null value for non-optional schema
at io.confluent.connect.avro.AvroData.validateSchemaValue(AvroData.java:1207)
at io.confluent.connect.avro.AvroData.fromConnectData(AvroData.java:389)
at io.confluent.connect.avro.AvroData.fromConnectData(AvroData.java:352)
at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:89)
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$1(WorkerSourceTask.java:290)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
The error in questions says: Found null value for non-optional schema.
This is my connector config:
{
"name": "EventhubLinuxMetrics",
"config": {
"confluent.topic.bootstrap.servers": "localhost:9092",
"connector.class": "io.confluent.connect.azure.eventhubs.EventHubsSourceConnector",
"kafka.topic": "linuxMetrics",
"tasks.max": "1",
"azure.eventhubs.sas.keyname": "<censored for security>",
"azure.eventhubs.sas.key": "<censored for security>",
"azure.eventhubs.namespace": "<censored for security>",
"azure.eventhubs.hub.name": "linuxvmmetrics"
}
}
Is there anything wrong with my config? Is localhost:9092 the correct value for the bootstrap.servers field?

Related

Deserializing JSON data from Kafka stream - Kafka connect to PostgreSQL

I'm streaming topic with Kafka_2.12-3.0.0 on Ubuntu in standalone mode to PosgreSQL and getting deserialization error.
Using confluent_kafka from pip package to produce kafka stream in python (works ok):
{"pmu_id": 2, "time": 1644329854.08, "stream_id": 2, "stat": "ok", "ph_i1_r": 27.682000117654074, "ph_i1_j": -1.546410917622178, "ph_i2_r": 25.055846468243697, "ph_i2_j": 2.6658974347348012, "ph_i3_r": 25.470616978816988, "ph_i3_j": 0.5585993153435624, "ph_v4_r": 3338.6901623241415, "ph_v4_j": -1.6109426103444193, "ph_v5_r": 3149.0595421490525, "ph_v5_j": 2.5863594222073076, "ph_v6_r": 3071.4231229187553, "ph_v6_j": 0.4872377558335442, "ph_7_r": 0.0, "ph_7_j": 0.0, "ph_8_r": 3186.040175515683, "ph_8_j": -1.6065850592620299, "analog": [], "digital": 0, "frequency": 50.014, "rocof": 1}
Configuration for storing in PostgreSQL
In my kafka_2.12-3.0.0/config/connect-standalone.properties I've added connector and converter:
plugin.path=/home/user/kafkaConnectors/confluentinc-kafka-connect-jdbc-10.3.2,/home/user/kafkaConverters/confluentinc-kafka-connect-json-schema-converter-7.0.1
I'm executing with:
bin/connect-standalone.sh config/connect-standalone.properties config/sink-postgres.properties
My full config/sink-postgres.properties :
name=sinkIRIpostgre
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
connection.url=jdbc:postgresql://localhost:5432/pgdb
topics=pmu1
key.converter=io.confluent.connect.json.JsonSchemaConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.json.JsonSchemaConverter
value.converter.schema.registry.url=http://localhost:8081
connection.user=pguser
connection.password=pgpass
auto.create=true
auto.evolve=true
insert.mode=insert
pk.mode=record_key
pk.fields=MESSAGE_KEY
Getting error:
ERROR [sinkIRIpostgre|task-0] WorkerSinkTask{id=sinkIRIpostgre-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:193)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:493)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error of topic pmu214:
at io.confluent.connect.json.JsonSchemaConverter.toConnectData(JsonSchemaConverter.java:119)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertKey(WorkerSinkTask.java:530)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:493)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing JSON message for id -1
at io.confluent.kafka.serializers.json.AbstractKafkaJsonSchemaDeserializer.deserialize(AbstractKafkaJsonSchemaDeserializer.java:177)
at io.confluent.kafka.serializers.json.AbstractKafkaJsonSchemaDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaJsonSchemaDeserializer.java:235)
at io.confluent.connect.json.JsonSchemaConverter$Deserializer.deserialize(JsonSchemaConverter.java:165)
at io.confluent.connect.json.JsonSchemaConverter.toConnectData(JsonSchemaConverter.java:108)
... 18 more
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.getByteBuffer(AbstractKafkaSchemaSerDe.java:250)
at io.confluent.kafka.serializers.json.AbstractKafkaJsonSchemaDeserializer.deserialize(AbstractKafkaJsonSchemaDeserializer.java:112)
EDIT (Python code)
Here is python code used for generating kafka producer:
from confluent_kafka import Producer
..
p = Producer({'bootstrap.servers': self.kafka_bootstrap_servers})
...
record_key = str(uuid.uuid4())
record_value = self.createKafkaJSON(base_message)
p.produce(self.kafka_topic, key=record_key, value=record_value)
p.poll(0)
function createKafkaJSON is returning json.dumps(kafkaDictFinal).encode('utf-8') where is kafkaDictFinal is Python dictionary.
Producer is called in main with:
KafkaPMUProducer(pdc_id=2, pmu_ip="x.x.x.x", pmu_port=4712, kafka_bootstrap_servers ="localhost:9092", kafka_topic="pmu214").kafka_producer()
If you're writing straight JSON from your Python app then you'll need to use the org.apache.kafka.connect.json.JsonConverter converter, but your messages will need a schema and payload attribute.
io.confluent.connect.json.JsonSchemaConverter relies on the Schema Registry wire format which includes a "magic byte" (hence the error).
You can learn more in this deep-dive article about serialisation and Kafka Connect, and see how Python can produce JSON data with a schema using SerializingProducer

Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema

I have a pipeline flow, where i connect the debezium CDC mysql connector from confluent platform to Confluent Cloud since the cloud inbuilt debezium mysql connector is in preview which i have successfully established the connection and the messages from the topic is subscribed by a S3 sink connector. Initially i had the flow in json format, but later i wanted this to be in AVRO format hence i changed the connector config file for key and value converters as shown below:
Debezium connector json:
{
"name":"mysql_deb3",
"config":{
"connector.class":"io.debezium.connector.mysql.MySqlConnector",
"tasks.max":"1",
"database.hostname":"host_name",
"database.port":"3306",
"database.user":"user_name",
"database.password":"password",
"database.server.id":"123456789",
"database.server.name": "server_name",
"database.whitelist":"db_name",
"database.history.kafka.topic":"dbhistory.db_name",
"include.schema.changes": "true",
"table.whitelist": "db_name.table_name",
"tombstones.on.delete": "false",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "cloud_schema_registry_endpoint",
"value.converter.schema.registry.url": "cloud_schema_registry_endpoint",
"key.converter.schema.registry.basic.auth.user.info":"schema_registry_api_key:schema_registry_api_secret",
"value.converter.schema.registry.basic.auth.user.info":"schema_registry_api_key:schema_registry_api_secret",
"decimal.handling.mode": "double",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "true",
"transforms.unwrap.delete.handling.mode": "rewrite",
"database.history.kafka.bootstrap.servers":"confluent_cloud_kafka_server_endpoint:9092",
"database.history.consumer.security.protocol":"SASL_SSL",
"database.history.consumer.ssl.endpoint.identification.algorithm":"https",
"database.history.consumer.sasl.mechanism":"PLAIN",
"database.history.consumer.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"cloud_kafka_api\" password=\"cloud_kafka_api_secret\";",
"database.history.producer.security.protocol":"SASL_SSL",
"database.history.producer.ssl.endpoint.identification.algorithm":"https",
"database.history.producer.sasl.mechanism":"PLAIN",
"database.history.producer.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"cloud_kafka_api\" password=\"cloud_kafka_api_secret\";",
}
}
####################################################################
connect-distributed.properties:
bootstrap.servers=confluent_cloud_kafka_server_endpoint:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
ssl.endpoint.identification.algorithm=https
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="cloud_kafka_api" password="cloud_kafka_api_secret";
request.timeout.ms=20000
retry.backoff.ms=500
producer.bootstrap.servers=confluent_cloud_kafka_server_endpoint:9092
producer.ssl.endpoint.identification.algorithm=https
producer.security.protocol=SASL_SSL
producer.sasl.mechanism=PLAIN
producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="cloud_kafka_api" password="cloud_kafka_api_secret";
producer.request.timeout.ms=20000
producer.retry.backoff.ms=500
consumer.bootstrap.servers=confluent_cloud_kafka_server_endpoint:9092
consumer.ssl.endpoint.identification.algorithm=https
consumer.security.protocol=SASL_SSL
consumer.sasl.mechanism=PLAIN
consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="cloud_kafka_api" password="cloud_kafka_api_secret";
consumer.request.timeout.ms=20000
consumer.retry.backoff.ms=500
offset.flush.interval.ms=10000
group.id=connect-cluster
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
offset.storage.partitions=3
config.storage.topic=connect-configs
config.storage.replication.factor=3
status.storage.topic=connect-status
status.storage.replication.factor=3
schema.registry.url=https://cloud_schema_registry_endpoint
schema.registry.basic.auth.user.info=<schema_registry_api_key>:<schema_registry_api_secret>
#################################################
-- I start up the kafka connect by --> bin/connect-distributed etc/connect-distributed.properties
-- The connect starts up good, but when i try to load the debezium connector using the curl command it show up the below error "unauthorized", but the api keys and secrets i have given is correct that i manually checked it using cli too.
Caused by: org.apache.kafka.connect.errors.DataException: staging-development-rds-cluster
at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:78)
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$1(WorkerSourceTask.java:266)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"SchemaChangeKey","namespace":"io.debezium.connector.mysql","fields":[{"name":"databaseName","type":"string"}],"connect.name":"io.debezium.connector.mysql.SchemaChangeKey"}
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Unauthorized; error code: 401
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:209)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:235)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:326)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:318)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:313)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:119)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:156)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:79)
at io.confluent.connect.avro.AvroConverter$Serializer.serialize(AvroConverter.java:117)
at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:76)
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$1(WorkerSourceTask.java:266)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(WorkerSourceTask.java:266)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:293)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:228)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2020-11-30 05:30:47,389] ERROR WorkerSourceTask{id=mysql_deb3-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:178)
[2020-11-30 05:30:47,389] INFO Stopping down connector (io.debezium.connector.common.BaseSourceTask:187)
[2020-11-30 05:30:47,389] INFO Stopping MySQL connector task (io.debezium.connector.mysql.MySqlConnectorTask:458)
Please guys help me on this.
Thanks in Advance

Is there a way to using kafka schema registry without magic byte?

I'm trying to make my applications work using the schema registry from confluent but at this point I'm not in total control of the producers, you can even see them as legacy applications that simply are not bound to the confluent products.
I was looking at the confluent information and it seems all the messages should include in the payload a Magic Byte and Schema ID
https://docs.confluent.io/3.2.0/schema-registry/docs/serializer-formatter.html
or else when I try to consume it I get an error:
[2020-09-25 13:12:09,008] ERROR WorkerSinkTask{id=s3_parquet_connector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:324)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:200)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic com.obj_pos to Protobuf:
at io.confluent.connect.protobuf.ProtobufConverter.toConnectData(ProtobufConverter.java:123)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Protobuf message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-25 13:12:09,010] ERROR WorkerSinkTask{id=s3_parquet_connector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
my question is, if there is a way of somehow either disable this magic byte check or if I could create a kafka stream that would just append a this 5 bytes to the initial message so that afterwards I could consume it with a consumer that would connect to the schema registry.
What is happening is that the producer is out of my control so I would need somehow to be able to deserialize messages that do not contain those 5 bytes because they are produced by producers that don't rely on the confluent serializers/de-serializers
they are produced by producers that don't rely on the confluent serializers
Then the problem isn't the Registry.
You shouldn't be using the Converters written by Confluent to consume the messages, as those are bound to the Registry, and there is no way to skip it.
You would instead use the BlueApron ones (assuming the data is really protobuf), or write your own Converter classes.

Confluent Replicator connector fails because the principal can not be determined from the subject

I am running into an issue trying to replicate from one confluent cloud cluster to another.
I have been following the confluent documentation on how to do this, however I am running into consistent failure that I have to assume is caused by a mistake in my config.
The config is as follows:
{
"name": "kafka_topics_replication",
"config": {
"name": "kafka_topics_replication",
"connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector",
"topic.whitelist": "topics",
"src.kafka.bootstrap.servers": "source-broker:9092",
"src.kafka.security.protocol": "SASL_SSL",
"src.kafka.security.mechanism": "PLAIN",
"src.kafka.client.id": "src-to-dst-replicator",
"src.kafka.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"src-username\" password=\"src-password\" serviceName=\"Kafka\";",
"confluent.topic.replication.factor": "3",
"dest.topic.replication.factor": "3",
"dest.kafka.bootstrap.servers": "dest-broker.cloud:9092",
"dest.kafka.security.protocol": "SASL_SSL",
"dest.kafka.sasl.mechanism": "PLAIN",
"dest.kafka.client.id": "src-to-dst-replicator",
"dest.kafka.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"dst-username\" password=\"dst-password\" serviceName=\"Kafka\";"
},
"tasks": [],
"type": "source"
}
The connector starts, but keeps logging the following error:
[2020-07-14 14:45:15,568] WARN [kafka_topics_replication|worker] [AdminClient clientId=src-to-dst-replicator] Error connecting to node source-broker:9092 (id: -1 rack: null) (org.apache.kafka.clients.NetworkClient:969)
java.io.IOException: Channel could not be created for socket java.nio.channels.SocketChannel[closed]
at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:348)
at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:329)
at org.apache.kafka.common.network.Selector.connect(Selector.java:256)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:964)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:294)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:1018)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1260)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1203)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.errors.SaslAuthenticationException: Failed to configure SaslClientAuthenticator
at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:228)
at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:338)
... 8 more
Caused by: org.apache.kafka.common.errors.SaslAuthenticationException: Failed to configure SaslClientAuthenticator
Caused by: org.apache.kafka.common.KafkaException: Principal could not be determined from Subject, this may be a transient failure due to Kerberos re-login
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.firstPrincipal(SaslClientAuthenticator.java:616)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.<init>(SaslClientAuthenticator.java:200)
at org.apache.kafka.common.network.SaslChannelBuilder.buildClientAuthenticator(SaslChannelBuilder.java:274)
at org.apache.kafka.common.network.SaslChannelBuilder.lambda$buildChannel$1(SaslChannelBuilder.java:216)
at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:142)
at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:224)
at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:338)
at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:329)
at org.apache.kafka.common.network.Selector.connect(Selector.java:256)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:964)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:294)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:1018)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1260)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1203)
at java.lang.Thread.run(Thread.java:748)
Googling around hasn't really turned up anything that seems applicable to this scenario.
Fingers crossed that someone here can at least point me in the right direction.
Thanks,
-Ryan
Your connector configuration seems to be missing some properties such as:
src.kafka.ssl.endpoint.identification.algorithm=https
dest.kafka.ssl.endpoint.identification.algorithm=https
src.kafka.request.timeout.ms=20000
dest.kafka.request.timeout.ms=20000
src.kafka.retry.backoff.ms=500
dest.kafka.retry.backoff.ms=500
Also, the properties src.kafka.sasl.jaas.config and dest.kafka.sasl.jaas.config seems to be using a wrong format for the value.
Instead of:
org.apache.kafka.common.security.plain.PlainLoginModule required username=\"<CLUSTER_API_KEY>\" password=\"<CLUSTER_API_SECRET>\" serviceName=\"Kafka\";
Use:
"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"<CLUSTER_API_KEY>\" password=\"<CLUSTER_API_SECRET>\";"
You can find more information about this configuration here.

Kafka Sink Connector fails: Schema not found; error code: 40403

I have a sink connector with the following configuration
{
"name": "sink-test-mariadb-MY_TOPIC",
"config": {
"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max":"10",
"topics":"MY_TOPIC",
"connection.url":"jdbc:mariadb://localhost:3306/myschema?user=myuser&password=mypass",
"auto.create":"false",
"auto.evolve":"true",
"table.name.format":"MY_TABLE",
"pk.mode":"record_value",
"pk.fields":"ID",
"insert.mode":"upsert",
"transforms":"ExtractField",
"transforms.ExtractField.type":"org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractField.field":"data"
}
}
and after a while all the tasks of the connector fail with the following error:
{
"state": "FAILED",
"trace": "org.apache.kafka.connect.errors.DataException: MY_TOPIC
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:95)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 802
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:202)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:229)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:409)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:402)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:119)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:192)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getById(CachedSchemaRegistryClient.java:168)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:121)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:194)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:120)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:83)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)",
"id": 0,
"worker_id": "localhost:8083"
}
The connector manages to synchronise the topic with tha database but it suddenly fails without any reason. I am also very sure that the schema is there. Its subject appears in the list returned by calling schema registry API localhost:8081/subjects
[
...
MY_TOPIC-value
...
]
I had the same problem and I realized that the code 40403 doesn't really mean that the schema was not found, it means that the schema does not correspond to the required one. a different code exists in case the schema was not found at all (40401).
So all I did was to change the schema accordingly and it worked for me.
The message on the Kafka topic is serialised with a different version of the schema that the one you have on the Schema Registry. Perhaps it was generated by a tool that wrote the schema to a different Schema Registry, or in a different environment? In order to be able to deserialise it Kafka Connect needs to be able to retrieve the schema ID that is in the magic byte at the beginning of the Kafka message on the topic.
The schema is not present on your Schema Registry, as seen by :
GET /schemas/ids/803
{ "error_code": 40403, "message": "Schema not found" }
You can inspect the ID of the schema that you do have by looking at
curl -s "http://localhost:8081/subjects/MY_TOPIC-value/versions/3/"|jq '.id'