Kafka consumer using AWS_MSK_IAM ClassCastException error - apache-kafka

I have MSK running on AWS and I'd like to consume information using AWS_MSK_IAM authentication.
My MSK is properly configured and I can consume the information using Kafka CLI with the following command:
../bin/kafka-console-consumer.sh --bootstrap-server b-1.kafka.*********.***********.amazonaws.com:9098 --consumer.config client_auth.properties --topic TopicTest --from-beginning
My client_auth.properties has the following information:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
When I try to consume from my Databricks cluster using spark, I receive the following error:
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: java.lang.ClassCastException: software.amazon.msk.auth.iam.IAMClientCallbackHandler cannot be cast to kafkashaded.org.apache.kafka.common.security.auth.AuthenticateCallbackHandler
Here is my cluster config:
The libraries I'm using in the cluster:
And the code I'm running on Databricks:
raw = (
spark
.readStream
.format('kafka')
.option('kafka.bootstrap.servers', 'b-.kafka.*********.***********.amazonaws.com:9098')
.option('subscribe', 'TopicTest')
.option('startingOffsets', 'earliest')
.option('kafka.sasl.mechanism', 'AWS_MSK_IAM')
.option('kafka.security.protocol', 'SASL_SSL')
.option('kafka.sasl.jaas.config', 'software.amazon.msk.auth.iam.IAMLoginModule required;')
.option('kafka.sasl.client.callback.handler.class', 'software.amazon.msk.auth.iam.IAMClientCallbackHandler')
.load()
)

Though I haven't tested this, based on the comment from Andrew on being theoretically able to relocate the dependency, I dug a bit into the source of aws-msk-iam-auth. They have a compileOnly('org.apache.kafka:kafka-clients:2.4.1') in their build.gradle. Hence the uber jar doesn't contain this library and is picked up from whatever databricks has (and shaded).
They are also relocating all their dependent jars with a prefix. So changing the compileOnly to implementation and rebuilding the uber jar with gradle clean shadowJar should include and relocate the kafka jars without any conflicts when uploading to databricks.

I faced the same issue, I forked aws-msk-iam-auth in order to make it compatible with databricks. Just add the jar from the following release https://github.com/Iziwork/aws-msk-iam-auth-for-databricks/releases/tag/v1.1.2-databricks to your cluster.

Related

Error 40101 when retrieving Avro schema in kafka-avro-console-consumer

The following error appears when attempting to use Confluent Platform CLI tools to read messages from Kafka.
[2023-01-17T18:00:14.960189+0100] [2023-01-17 18:00:14,957] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:105)
[2023-01-17T18:00:14.960210+0100] org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 119
[2023-01-17T18:00:14.960230+0100] Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Unauthorized; error code: 40101
[2023-01-17T18:00:14.960249+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:170)
[2023-01-17T18:00:14.960272+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:188)
[2023-01-17T18:00:14.960293+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:330)
[2023-01-17T18:00:14.960312+0100] at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:323)
[2023-01-17T18:00:14.960332+0100] at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:63)
[2023-01-17T18:00:14.960353+0100] at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndID(CachedSchemaRegistryClient.java:118)
[2023-01-17T18:00:14.960372+0100] at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:121)
[2023-01-17T18:00:14.960391+0100] at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:92)
[2023-01-17T18:00:14.960412+0100] at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:120)
[2023-01-17T18:00:14.960431+0100] at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:112)
[2023-01-17T18:00:14.960449+0100] at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:137)
[2023-01-17T18:00:14.960468+0100] at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
[2023-01-17T18:00:14.960487+0100] at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
[2023-01-17T18:00:14.960506+0100] at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
I am using Kafka 3.2 (both client and server), with a Karapace schema registry by Aiven. I can query the schema registry manually using curl by including the credentials in the URL:
(base) me#my-laptop:~$ curl https://$SCHEMA_REGISTRY_USER:$SCHEMA_REGISTRY_PASSWORD#$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT/subjects
["my-topic-" <redacted>
Or as basic auth in a header:
(base) me#my-laptop:~$ curl -u "$SCHEMA_REGISTRY_USER:$SCHEMA_REGISTRY_PASSWORD" https://$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT/subjects
["my-topic-" <redacted>
The error seems to happen when the credentials are not passed to the schema registry:
(base) me#my-laptop:~$ curl https://$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT/subjects
{"error_code": 40101, "message": "Unauthorized"}
According to official docs for kafka-avro-console-consumer, I can use the authentication source URL or USER_INFO, and it should pass those credentials to the schema registry. This does not work, and causes the above error.
kafka-avro-console-consumer \
--bootstrap-server $KAFKA_HOST:$KAFKA_PORT \
--consumer.config /home/guido/.tls/kafka/client-tls.properties \
--property schema.registry.url=https://$SCHEMA_REGISTRY_USER:$SCHEMA_REGISTRY_PASSWORD#$SCHEMA_REGISTRY_HOST:$SCHEMA_REGISTRY_PORT \
--property basic.auth.credentials.source=URL \
--topic my-topic
I've tried every combination I can think of, with URL, USER_INFO, separate credentials, prefixed with schema.registry and without, but all lead to the same error. When I use the regular kafka-console-consumer.sh the same settings work, but I see the Kafka messages as a byte stream, rather than the deserialized Avro message that I'm looking for.
EDIT: it appears that java.net.HttpURLConnection is the problem. It strips credendtials from the URL, and the version of schema-registry-client packaged with Confluent Platform does not support any other version of Basic Authentication yet.
import java.net.URL
import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers
class ExampleTest extends AnyFlatSpec with Matchers {
behavior.of("Example")
it should "work" in {
val url = "https://username:p4ssw0rd#kafka.example.com:12345"
val connection = new URL(url).openConnection()
noException shouldBe thrownBy {
connection.getInputStream
}
}
}
The test fails
Found it. There were three causes for my problem.
I had an old version of Confluent Platform installed, namely confluent-platform-2.11. This version did not yet support any schema registry authentication, beyond username and password in the URL.
I thought I had the latest version already (3.3.x) but that's actually the latest version of Kafka, not the latest version of Confluent Platform.
Java's default web request implementation, sun.net.www.protocol.http.HttpURLConnection, does not support credentials in the URL. They are stripped before making the request, despite the URL correctly containing the credentials.
The correct solution was to upgrade to a later version of Confluent Platform.
See https://docs.confluent.io/platform/current/installation/installing_cp/deb-ubuntu.html#configure-cp

Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set from Kafka rest proxy

I am trying to use kafka rest proxy for AWS MSK cluster.
MSK Encryption details:
Within the cluster
TLS encryption: Enabled
Between clients and brokers
TLS encryption: Enabled
Plaintext: Not enabled
I have created topic "TestTopic" on MSK and then I have created another EC2 instance in the same VPC as MSK to work as Rest proxy. Here are details from kafka-rest.properties:
zookeeper.connect=z-3.msk.xxxx.xx.xxxxxx-1.amazonaws.com:2181,z-1.msk.xxxx.xx.xxxxxx-1.amazonaws.com:2181
bootstrap.servers=b-1.msk.xxxx.xx.xxxxxx-1.amazonaws.com:9096,b-2.msk.xxxx.xx.xxxxxx-1.amazonaws.com:9096
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="username" password="password";
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
ssl.truststore.location=/tmp/kafka.client.truststore.jks
I have also created rest-jaas.properties file with below content:
KafkaClient {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="username"
password="password";
};
and then set the java.security.auth.login.config using:
export KAFKA_OPTS=-Djava.security.auth.login.config=/home/ec2-user/confluent-6.1.1/rest-jaas.properties
After this I started Kafka rest proxy using:
./kafka-rest-start /home/ec2-user/confluent-6.1.1/etc/kafka-rest/kafka-rest.properties
But when I tried to put an event on the TestTopic by calling service from postman:
POST: http://IP_of_ec2instance:8082/topics/TestTopic
I am getting 500 error. But in the EC2 instance I can see error:
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:441)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:291)
at io.confluent.kafkarest.ProducerPool.buildNoSchemaProducer(ProducerPool.java:120)
at io.confluent.kafkarest.ProducerPool.buildBinaryProducer(ProducerPool.java:106)
at io.confluent.kafkarest.ProducerPool.<init>(ProducerPool.java:71)
at io.confluent.kafkarest.ProducerPool.<init>(ProducerPool.java:60)
at io.confluent.kafkarest.ProducerPool.<init>(ProducerPool.java:53)
at io.confluent.kafkarest.DefaultKafkaRestContext.getProducerPool(DefaultKafkaRestContext.java:54)
... 64 more
Caused by: java.lang.IllegalArgumentException: Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set
at org.apache.kafka.common.security.JaasContext.defaultContext(JaasContext.java:141)
at org.apache.kafka.common.security.JaasContext.load(JaasContext.java:106)
at org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:92)
at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:139)
at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:74)
at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:120)
at org.apache.kafka.clients.producer.KafkaProducer.newSender(KafkaProducer.java:449)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:430)
... 71 more
I can also see that value of sasl.jaas.config = null in the ProducerConfig values.
Could someone please help me with this. Thanks in advance!
Finally the issue was fixed. I am updating the fix here so that it can be beneficial for someone:
kafka-rest.properties file should have below text:
zookeeper.connect=z-3.msk.xxxx.xx.xxxxxx-1.amazonaws.com:2181,z-1.msk.xxxx.xx.xxxxxx-1.amazonaws.com:2181
bootstrap.servers=b-1.msk.xxxx.xx.xxxxxx-1.amazonaws.com:9096,b-2.msk.xxxx.xx.xxxxxx-1.amazonaws.com:9096
client.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="username" password="username";
client.security.protocol=SASL_SSL
client.sasl.mechanism=SCRAM-SHA-512
Neither there was a need to create file rest-jaas.properties nor export KAFKA_OPTS was needed.
After these changes, I was able to put the messages in the kafka topic using scram authentication.

Snowflake Kafka connector config issue

I'm following the steps in this guide Snowflake Connector for Kafka
The error message I'm getting is
BadRequestException: Connector config {.....} contains no connector type
I am running the command as
sh kafka_2.12-2.3.0/bin/connect-standalone.sh connect-standalone.properties snowflake_kafka_config.json
my config files are
connect-standalone.properties
bootstrap.servers=localhost:9092
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/Users/kafka_test/kafka
jar file snowflake-kafka-connector-0.5.1.jar is in plugin.path
snowflake_kafka_config.json
{
"name":"Kafka_Test",
"Config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"test",
"snowflake.topic2table.map": "",
"buffer.count.records":"1",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"<url>",
"snowflake.user.name":"<user_name>",
"snowflake.private.key":"<private_key>",
"snowflake.private.key.passphrase":"<pass_phrase>",
"snowflake.database.name":"<db>",
"snowflake.schema.name":"<schema>",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"value.converter.schema.registry.url":"",
"value.converter.basic.auth.credentials.source":"",
"value.converter.basic.auth.user.info":""
}
}
Kafka is running on local, I have a producer and consumer up, can see the data flowing.
This is the same question I answered over on the Confluent community Slack, but I'll post it here for reference too :-)
The connect worker log shows that the connector JAR itself is being loaded, so the 'contains no connector type` is because your config formatting is fubar.
You're running in Standalone mode, but passing in a JSON file which won't. My personal opinion is always use distributed, even if just a single node of it. Check this out if you need a recap on standalone vs distributed : http://rmoff.dev/ksldn19-kafka-connect
If you must use standalone then you need your connector config (snowflake_kafka_config.json) to be a properties file like this:
param1=argument1
param2=argument2
You can see valid JSON examples (if you use distributed mode) here: https://github.com/confluentinc/demo-scene/blob/master/kafka-connect-zero-to-hero/demo_zero-to-hero-with-kafka-connect.adoc#stream-data-from-kafka-to-elasticsearch

spark-submit --keytab option does not copy the file to executors

In my case I am using Spark (2.1.1) and for the processing I need to connect to Kafka (using kerberos, therefore a keytab).
When submitting the job I can pass the keytab with --keytab and --principal options. The main drawback is that the keytab will no be send to the distributed cache (or at least be available to the executors) so it will fail.
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
...
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner` authentication information from the user
If I try passing it also in --files it works (version 2.1.0) but in this latest version (2.1.1) it is not allowed because it failes due to:
Exception in thread "main" java.lang.IllegalArgumentException: Attempt to add (file:keytab.keytab) multiple times to the distributed cache.
Any tips?
I resolved this issue making a copy of my keytab file (e.g. original file is osboo.keytab and its copy osboo-copy-for-kafka.keytab) and pushing it to HDFS via --files option.
# Call
spark2-submit --keytab osboo.keytab \
--principal osboo \
--files osboo-copy-for-kafka.keytab#osboo-copy-for-kafka.keytab,kafka.jaas#kafka.jaas
# kafka.jaas
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
principal="osboo#REALM.COM"
serviceName="kafka";
};
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
serviceName="zookeeper"
principal="osboo#REALM.COM";
};
Maybe this solution requires less efforts to keep in mind symlinks between files so I hope it helps.
spark-submit --keytab option copy the file with different name in the local container dir when you submit app on yarn.
you can find this in lauch_container.sh
lauch_container.sh

Getting exception while instantiating KafkaProducer

I am using IBM Bluemix implementation of the Kafka Broker.
I am creating the KafkaProducer with following properties:
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
bootstrap.servers=xxxx.xxxxxx.xxxxxx.xxxxxx.bluemix.net:xxxx
client.id=messagehub
acks=-1
security.protocol=SASL_SSL
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.truststore.location=xxxxxxxxxxxxxxxxx
ssl.truststore.password=xxxxxxxx
ssl.truststore.type=JKS
ssl.endpoint.identification.algorithm=HTTPS
KafkaProducer<byte[], byte[]> kafkaProducer =
new KafkaProducer<byte[], byte[]>(props);
With this I got following exception:
org.apache.kafka.common.KafkaException:
org.apache.kafka.clients.producer.internals.DefaultPartitioner is not
an instance of org.apache.kafka.clients.producer.Partitioner
After reading the following blog:
http://blog.rocana.com/kafkas-defaultpartitioner-and-byte-arrays I added the following line to my property file, even though I was using new API:
partitioner.class=kafka.producer.ByteArrayPartitioner
Now I am getting this exception:
org.apache.kafka.common.KafkaException: Could not instantiate class
kafka.producer.ByteArrayPartitioner Does it have a public no-argument
constructor?
It looks like ByteArrayPartitioner does not have a default constructor.
Any idea what I am missing here?
Thanks
Madhu
As I was using the KafkaProducer API, I did not need
partitioner.class=kafka.producer.ByteArrayPartitioner
property. The issue was there were 2 copies of the kafkaclient jar. We have configured our installation such that all library jar files are in an external shared directory. But due to the POM configuration error the war file also had a copy of the kafka client in its lib directory. Once I fixed this issue, it worked fine.
Madhu