Kafka consumer is failing with Error deserializing Avro message with unknown protocol - apache-kafka

In my use case, i had created JDBC kafka connector, pulled the data from oracle table and sucessfully pushed to kafka topic but when i try to read the messages from this kafka topic i get the deserialization issue as listed below.
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 2
Caused by: java.net.MalformedURLException: unknown protocol: localhost
at java.net.URL.<init>(URL.java:593)
at java.net.URL.<init>(URL.java:483)
at java.net.URL.<init>(URL.java:432)
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:124)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:188)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:330)

The problem is the schema registry URL in the YAML configuration file. Notice unknown protocol error
Change it to this (note I added the http:// protocol to the URL line), and it should work:
schema:
version: latest2
registry:
url: http://localhost:8081

Related

Kafka connect S3 source failing with read-only registry

I am trying to read avro records stored in S3 in order to put them back in a kafka topic using the S3 source provided by confluent.
I already have the topics and the registry setup with the right schemas but when the connect S3 source tries to serialize the my records to the topics I get this error
Caused by: org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema: ... at
io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:121)
at
io.confluent.connect.avro.AvroConverter$Serializer.serialize(AvroConverter.java:143)
at
io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:84)
... 15 more Caused by:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Subject com-row-count-value is in read-only mode; error code: 42205
at
io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:495)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:486)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:459)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:214)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:276)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:252)
at
io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:75)
it seems that the connect producer does not try to get the schema_id if it exists but tries to write it but my registry is readonly.
Anyone knows if this is an issue or there are some configuration I am missing ?
If you're sure the correct schema for that subject is already registered by some other means, you can try to set auto.register.schemas to false in the serializer configuration.
See here for more details: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#handling-differences-between-preregistered-and-client-derived-schemas

Getting error while publishing message to kafka topic

I am new to Kafka. I have written a simple JAVA program to generate a message using avro schema. I have generated a specific record. The record is generated successfully. My schema is not yet registered with my local environment. It is currently registered with some other environment.
I am using the apache kafka producer library to publish the message to my local environment kafka topic. Can I publish the message to the local topic or the schema needs to be registered with the local schema registry as well.
Below are the producer properties -
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
properties.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "https://schema-registry.xxxx.service.dev:443");```
Error I am getting while publishing the message -
``` org.apache.kafka.common.errors.SerializationException: Error registering Avro schema:
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: User is denied operation Write on Subject: xxx.avro-value; error code: **40301**
The issue was kafka producer by default tries to register the schema on topic. So, I added the below - properties.put(KafkaAvroSerializerConfig.AUTO_REGISTER_SCHEMAS, false);
and it resolved the issue.

Strange behaviour when using AvroConverter in Confluent Sink Connector

I was running confluent(v5.5.1) s3 sink connector with below config:
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://registryurl",
"value.converter.value.subject.name.strategy":"io.confluent.kafka.serializers.subject.RecordNameStrategy",
......
And got below error in the log like:
DEBUG Sending GET with input null to http://registryurl/schemas/ids/309?fetchMaxId=false
DEBUG Sending POST with input {......} to http://registryurl/MyRecordName?deleted=true
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro value schema version for id 309
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
There are 2 questions that baffles me here:
Why the sink connector sends additional POST request to schema registry given it's just a consumer? And I have successfully received messages when using a standard kafka consumer, which ONLY sends a GET request to the schema registry.
As per this docs and official doc, the schema subject format will be like SubjectNamingStrategy-value or -key. However judging by the log, it does not suffix the request with "-value". I have tried all the 3 strategies and found ONLY the default TopicNameStrategy works as expected.
Appreciated if anyone could shed some light here.
Thanks a lot

Implementing custom AvroConverter for confluent kafka-connect-s3

I am using Confluent's Kafka s3 connect for copying data from apache Kafka to AWS S3.
The problem is that I have Kafka data in AVRO format which is NOT using Confluent Schema Registry’s Avro serializer and I cannot change the Kafka producer. So I need to deserialize existing Avro data from Kafka and then persist the same in parquet format in AWS S3. I tried using confluent's AvroConverter as value converter like this -
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost/api/v1/avro
And i am getting this error -
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic dcp-all to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:110)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
As far as I understand, "io.confluent.connect.avro.AvroConverter" will only work if the data is written in Kafka using Confluent Schema Registry’s Avro serializer and hence I am getting this error. So my question is Do I need to implement a generic AvroConverter in this case? And if yes, how do I extend the existing source code - https://github.com/confluentinc/kafka-connect-storage-cloud?
Any help here will be appreciated.
You don't need to extend that repo. You just need to implement a Converter (part of Apache Kafka) shade it into a JAR, then place it on your Connect worker's CLASSPATH, like BlueApron did for Protobuf
Or see if this works - https://github.com/farmdawgnation/registryless-avro-converter
NOT using Confluent Schema Registry
Then what registry are you using? Each one that I know of has configurations to interface with the Confluent one

SF_KAFKA_CONNECTOR name is empty or invalid error using Confluent Cloud and Snowflake Kafka Connector

I have a cluster running in Confluent Cloud and am able to Produce and Consume data using other applications. However, when I try to hook up the Snowflake Kafka Connector I receive these errors:
[2019-10-15 22:12:08,979] INFO Creating connector source-snowflake of type com.snowflake.kafka.connector.SnowflakeSinkConnector (org.apache.kafka.connect.runtime.Worker)
[2019-10-15 22:12:08,983] INFO Instantiated connector source-snowflake with version 0.5.1 of type class com.snowflake.kafka.connector.SnowflakeSinkConnector (org.apache.kafka.connect.runtime.Worker)
[2019-10-15 22:12:08,986] INFO
[SF_KAFKA_CONNECTOR] Snowflake Kafka Connector Version: 0.5.1 (com.snowflake.kafka.connector.Utils)
[2019-10-15 22:12:09,029] INFO
[SF_KAFKA_CONNECTOR] SnowflakeSinkConnector:start (com.snowflake.kafka.connector.SnowflakeSinkConnector)
[2019-10-15 22:12:09,030] ERROR
[SF_KAFKA_CONNECTOR] name is empty or invalid. It should match Snowflake object identifier syntax. Please see the documentation. (com.snowflake.kafka.connector.Utils)
[2019-10-15 22:12:09,033] ERROR WorkerConnector{id=source-snowflake} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector)
com.snowflake.kafka.connector.internal.SnowflakeKafkaConnectorException:
[SF_KAFKA_CONNECTOR] Exception: Invalid input connector configuration
[SF_KAFKA_CONNECTOR] Error Code: 0001
[SF_KAFKA_CONNECTOR] Detail: input kafka connector configuration is null, missing required values, or wrong input value
at com.snowflake.kafka.connector.internal.SnowflakeErrors.getException(SnowflakeErrors.java:347)
at com.snowflake.kafka.connector.internal.SnowflakeErrors.getException(SnowflakeErrors.java:306)
at com.snowflake.kafka.connector.Utils.validateConfig(Utils.java:400)
at com.snowflake.kafka.connector.SnowflakeSinkConnector.start(SnowflakeSinkConnector.java:131)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:111)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:136)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:196)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:252)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1079)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:117)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:1095)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$15.call(DistributedHerder.java:1091)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Here is my scrubbed Snowflake config file:
{
"name":"snowsink",
"config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"tp-snow-test",
"buffer.count.records":"100",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"xxxxxxx.east-us-2.azure.snowflakecomputing.com",
"snowflake.user.name":"svc_cc_strm",
"snowflake.private.key":"<key>",
"snowflake.private.key.passphrase":<password>,
"snowflake.database.name":"testdb",
"snowflake.schema.name":"test1",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter"
}
}
Any ideas? Thanks.
The name of the connector should be a valid SQL identifier to Snowflake. So many of the kafka topic examples have dashes in them that when I first tried the Snowflake Kafka connector I got this same error.
According to the documentation, a Snowflake pipe is created using the connector_name specified, and pipe names must be valid SQL identifiers.
The connector creates one pipe for each topic partition. The name is:
SNOWFLAKE_KAFKA_CONNECTOR_PIPE_.
Also from the same doc page at "Fields in the Configuration File" for name:
Application name. This must be unique across all Kafka connectors used by the customer. This name name must be a valid Snowflake unquoted identifier.
If the topic has a dash in it then it will need to mapped to a table name that is also a proper SQL identifier in your connector config, otherwise it will try to create the table name as the same as the topic name and fail on the "-" in the name.
You need to change the name of your connector (source-snow) to remove the - from it (so that it matches this validation pattern).
🤷‍♂️
You need to have below entry in your config file , below topics entry.
"topics":"tp-snow-test",
"snowflake.topic2table.map": "tp-snow-test:TestKafkaTable",