Error 40101 when retrieving Avro schema in kafka-avro-console-consumer - apache-kafka

The following error appears when attempting to use Confluent Platform CLI tools to read messages from Kafka.
[2023-01-17T18:00:14.960189+0100] [2023-01-17 18:00:14,957] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:105)
[2023-01-17T18:00:14.960210+0100] org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 119
[2023-01-17T18:00:14.960230+0100] Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Unauthorized; error code: 40101
[2023-01-17T18:00:14.960249+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:170)
[2023-01-17T18:00:14.960272+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:188)
[2023-01-17T18:00:14.960293+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:330)
[2023-01-17T18:00:14.960312+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:323)
[2023-01-17T18:00:14.960332+0100] at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:63)
[2023-01-17T18:00:14.960353+0100] at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndID(CachedSchemaRegistryClient.java:118)
[2023-01-17T18:00:14.960372+0100] at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:121)
[2023-01-17T18:00:14.960391+0100] at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:92)
[2023-01-17T18:00:14.960412+0100] at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:120)
[2023-01-17T18:00:14.960431+0100] at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:112)
[2023-01-17T18:00:14.960449+0100] at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:137)
[2023-01-17T18:00:14.960468+0100] at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
[2023-01-17T18:00:14.960487+0100] at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
[2023-01-17T18:00:14.960506+0100] at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
I am using Kafka 3.2 (both client and server), with a Karapace schema registry by Aiven. I can query the schema registry manually using curl by including the credentials in the URL:
(base) me#my-laptop:~$ curl https://$SCHEMA_REGISTRY_USER:$SCHEMA_REGISTRY_PASSWORD#$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT/subjects
["my-topic-" <redacted>
Or as basic auth in a header:
(base) me#my-laptop:~$ curl -u "$SCHEMA_REGISTRY_USER:$SCHEMA_REGISTRY_PASSWORD" https://$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT/subjects
["my-topic-" <redacted>
The error seems to happen when the credentials are not passed to the schema registry:
(base) me#my-laptop:~$ curl https://$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT/subjects
{"error_code": 40101, "message": "Unauthorized"}
According to official docs for kafka-avro-console-consumer, I can use the authentication source URL or USER_INFO, and it should pass those credentials to the schema registry. This does not work, and causes the above error.
kafka-avro-console-consumer \
--bootstrap-server $KAFKA_HOST:$KAFKA_PORT \
--consumer.config /home/guido/.tls/kafka/client-tls.properties \
--property schema.registry.url=https://$SCHEMA_REGISTRY_USER:$SCHEMA_REGISTRY_PASSWORD#$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT \
--property basic.auth.credentials.source=URL \
--topic my-topic
I've tried every combination I can think of, with URL, USER_INFO, separate credentials, prefixed with schema.registry and without, but all lead to the same error. When I use the regular kafka-console-consumer.sh the same settings work, but I see the Kafka messages as a byte stream, rather than the deserialized Avro message that I'm looking for.
EDIT: it appears that java.net.HttpURLConnection is the problem. It strips credendtials from the URL, and the version of schema-registry-client packaged with Confluent Platform does not support any other version of Basic Authentication yet.
import java.net.URL
import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers
class ExampleTest extends AnyFlatSpec with Matchers {
behavior.of("Example")
it should "work" in {
val url = "https://username:p4ssw0rd#kafka.example.com:12345"
val connection = new URL(url).openConnection()
noException shouldBe thrownBy {
connection.getInputStream
}
}
}
The test fails

Found it. There were three causes for my problem.
I had an old version of Confluent Platform installed, namely confluent-platform-2.11. This version did not yet support any schema registry authentication, beyond username and password in the URL.
I thought I had the latest version already (3.3.x) but that's actually the latest version of Kafka, not the latest version of Confluent Platform.
Java's default web request implementation, sun.net.www.protocol.http.HttpURLConnection, does not support credentials in the URL. They are stripped before making the request, despite the URL correctly containing the credentials.
The correct solution was to upgrade to a later version of Confluent Platform.
See https://docs.confluent.io/platform/current/installation/installing_cp/deb-ubuntu.html#configure-cp

Related

Kafka connect MongoDB sink connector using kafka-avro-console-producer

I'm trying to write some documents to MongoDB using the Kafka connect MongoDB connector. I've managed to set up all the components required and start up the connector but when I send the message to Kafka using the kafka-avro-console-producer, Kafka connect is giving me the following error:
org.apache.kafka.connect.errors.DataException: Error: `operationType` field is doc is missing.
I've tried to add this field to the message but then kafka connect is asking me to include a documentKey field. It seems like I need to include some extra fields apart from the payload defined in my schema but I can't find a comprehensive documentation. Does anyone have an example of a kafka message payload (using kafka-avro-console-producer) that goes through a Kafka -> Kafka connect -> MongoDB pipeline?
See following an example of one of the messages I'm sending to Kafka (btw, kafka-avro-console-consumer is able to consume the messages):
./kafka-avro-console-producer --broker-list kafka:9093 --topic sampledata --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"field1","type":"string"}]}'
{"field1": "value1"}
And see also following the configuration of the sink connector:
{"name": "mongo-sink",
"config": {
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://schemaregistry:8081",
"connection.uri":"mongodb://cadb:27017",
"database":"cognitive_assistant",
"collection":"topicData",
"topics":"sampledata6",
"change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
}
}
I've just managed to make the connector work. I deleted the change.data.capture.handler property from the connector configuration and it works now.

Kafka consumer using AWS_MSK_IAM ClassCastException error

I have MSK running on AWS and I'd like to consume information using AWS_MSK_IAM authentication.
My MSK is properly configured and I can consume the information using Kafka CLI with the following command:
../bin/kafka-console-consumer.sh --bootstrap-server b-1.kafka.*********.***********.amazonaws.com:9098 --consumer.config client_auth.properties --topic TopicTest --from-beginning
My client_auth.properties has the following information:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
When I try to consume from my Databricks cluster using spark, I receive the following error:
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: java.lang.ClassCastException: software.amazon.msk.auth.iam.IAMClientCallbackHandler cannot be cast to kafkashaded.org.apache.kafka.common.security.auth.AuthenticateCallbackHandler
Here is my cluster config:
The libraries I'm using in the cluster:
And the code I'm running on Databricks:
raw = (
spark
.readStream
.format('kafka')
.option('kafka.bootstrap.servers', 'b-.kafka.*********.***********.amazonaws.com:9098')
.option('subscribe', 'TopicTest')
.option('startingOffsets', 'earliest')
.option('kafka.sasl.mechanism', 'AWS_MSK_IAM')
.option('kafka.security.protocol', 'SASL_SSL')
.option('kafka.sasl.jaas.config', 'software.amazon.msk.auth.iam.IAMLoginModule required;')
.option('kafka.sasl.client.callback.handler.class', 'software.amazon.msk.auth.iam.IAMClientCallbackHandler')
.load()
)
Though I haven't tested this, based on the comment from Andrew on being theoretically able to relocate the dependency, I dug a bit into the source of aws-msk-iam-auth. They have a compileOnly('org.apache.kafka:kafka-clients:2.4.1') in their build.gradle. Hence the uber jar doesn't contain this library and is picked up from whatever databricks has (and shaded).
They are also relocating all their dependent jars with a prefix. So changing the compileOnly to implementation and rebuilding the uber jar with gradle clean shadowJar should include and relocate the kafka jars without any conflicts when uploading to databricks.
I faced the same issue, I forked aws-msk-iam-auth in order to make it compatible with databricks. Just add the jar from the following release https://github.com/Iziwork/aws-msk-iam-auth-for-databricks/releases/tag/v1.1.2-databricks to your cluster.

unable to read avro message via kafka-avro-console-consumer (end goal read it via spark streaming)

(end goal) before trying out whether i could eventually read avro data, usng spark stream, out of the Confluent Platform like some described here: Integrating Spark Structured Streaming with the Confluent Schema Registry
I'd to verify whether I could use below command to read them:
$ kafka-avro-console-consumer \
> --topic my-topic-produced-using-file-pulse-xml \
> --from-beginning \
> --bootstrap-server localhost:9092 \
> --property schema.registry.url=http://localhost:8081
I receive this error message, Unknown magic byte
Processed a total of 1 messages
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
note, The message can be read like this (using console-consumer instead of avro-console-consumer):
kafka-console-consumer \
--bootstrap-server localhost:9092 --group my-group-console \
--from-beginning \
--topic my-topic-produced-using-file-pulse-xml
The message was produced using confluent connect file-pulse (1.5.2) reading xml file (streamthoughts/kafka-connect-file-pulse)
Please help here:
Did I use the kafka-avro-console-consumer wrong?
I tried "deserializer" properties options described here: https://stackoverflow.com/a/57703102/4582240, did not help
I did not want to be brave to start the spark streaming to read the data yet.
the file-pulse 1.5.2 properties i used are like below added 11/09/2020 for completion.
name=connect-file-pulse-xml
connector.class=io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector
topic= my-topic-produced-using-file-pulse-xml
tasks.max=1
# File types
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.RegexFileListFilter
file.filter.regex.pattern=.*\\.xml$
task.reader.class=io.streamthoughts.kafka.connect.filepulse.reader.XMLFileInputReader
force.array.on.fields=sometagNameInXml
# File scanning
fs.cleanup.policy.class=io.streamthoughts.kafka.connect.filepulse.clean.LogCleanupPolicy
fs.scanner.class=io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker
fs.scan.directory.path=/tmp/kafka-connect/xml/
fs.scan.interval.ms=10000
# Internal Reporting
internal.kafka.reporter.bootstrap.servers=localhost:9092
internal.kafka.reporter.id=connect-file-pulse-xml
internal.kafka.reporter.topic=connect-file-pulse-status
# Track file by name
offset.strategy=name
If you are getting Unknown Magic Byte with the consumer, then the producer didn't use the Confluent AvroSerializer, and might have pushed Avro data that doesn't use the Schema Registry.
Without seeing the Producer code or consuming and inspecting the data in binary format, it is difficult to know which is the case.
The message was produced using confluent connect file-pulse
Did you use value.converter with the AvroConverter class?

spark-submit --keytab option does not copy the file to executors

In my case I am using Spark (2.1.1) and for the processing I need to connect to Kafka (using kerberos, therefore a keytab).
When submitting the job I can pass the keytab with --keytab and --principal options. The main drawback is that the keytab will no be send to the distributed cache (or at least be available to the executors) so it will fail.
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
...
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner` authentication information from the user
If I try passing it also in --files it works (version 2.1.0) but in this latest version (2.1.1) it is not allowed because it failes due to:
Exception in thread "main" java.lang.IllegalArgumentException: Attempt to add (file:keytab.keytab) multiple times to the distributed cache.
Any tips?
I resolved this issue making a copy of my keytab file (e.g. original file is osboo.keytab and its copy osboo-copy-for-kafka.keytab) and pushing it to HDFS via --files option.
# Call
spark2-submit --keytab osboo.keytab \
--principal osboo \
--files osboo-copy-for-kafka.keytab#osboo-copy-for-kafka.keytab,kafka.jaas#kafka.jaas
# kafka.jaas
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
principal="osboo#REALM.COM"
serviceName="kafka";
};
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
serviceName="zookeeper"
principal="osboo#REALM.COM";
};
Maybe this solution requires less efforts to keep in mind symlinks between files so I hope it helps.
spark-submit --keytab option copy the file with different name in the local container dir when you submit app on yarn.
you can find this in lauch_container.sh
lauch_container.sh

pykafka can not connect kafka broker

When I use pykafka to connect kafka cluster via the following code:
from pykafka import KafkaClient
client = KafkaClient(hosts="10.0.0.101:9092")
I got the exception as following:
raise Exception('Unable to connect to a broker to fetch metadata.')
Exception: Unable to connect to a broker to fetch metadata.
But when I was using the command line such as:
kafka-console-producer --broker-list 10.0.0.101:9092 --topic userCND
it works fine but just gives me a warning message:
WARN Property topic is not valid (kafka.utils.VerifiableProperties)
What version of Kafka are you using? pykafka currently only supports 0.8.2, not 0.9.0.
You may want to use the REST API instead. Learn more about the REST API here:
http://docs.confluent.io/2.0.0/kafka-rest/docs/index.html