Apache Spark: Getting a InstanceAlreadyExistsException when running the Kafka producer - scala

I have an small app in scala that creates kafka producer and that run with Apache Spark.
when I run the command
spark-submit --master local[2] --deploy-mode client <into the jar file> <app Name> <kafka broker> <kafka in queue> <kafka out queue> <interval>
I am getting this WARN:
WARN AppInfoParser: Error registering AppInfo mbean
javax.management.InstanceAlreadyExistsException: kafka.producer:type=app-info,id=
The code is not relevant because I am getting this exception when scala creates the KafkaProducer: val producer = new KafkaProducerObject,Object
Does anybody have a solution for this?
Thank you!

When a Kafka Producer is created, it attempts to register an MBean using the client.id as its unique identifier.
There are two possibilities of why you are getting the InstanceAlreadyExistsException warning:
You are attempting to initialize more than one Producer at a time with the same client.id property on the same JVM.
You are not calling close() on an existing Producer before initializing another Producer. Calling close() unregisters the MBean.
If you leave the client.id property blank when initializing the producer, a unique one will be created for you. Giving your producers unique client.id values or allowing them to be auto-generated would resolve this problem.
In the case of Kafka, MBeans can be used for tracking statistics.

Related

Kafka Admin client unregistered causing metadata issues

After migrating our microservice functionality to Spring Cloud function we have been facing issues with one of the producer topics.
Event of type: abc and key: xxx_yyy could not be sent to kafka org.springframework.messaging.MessageHandlingException: error occurred in message handler [org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder$ProducerConfigurationMessageHandler#2333d598]; nested exception is org.springframework.kafka.KafkaException: Send failed; nested exception is org.apache.kafka.common.errors.TimeoutException: Topic pc-abc not present in metadata after 60000 ms.
o.s.kafka.support.LoggingProducerListener - Exception thrown when sending a message with key='byte[15]' and payload='byte[256]' to topic pc-abc and partition 6: org.apache.kafka.common.errors.TimeoutException: Topic pc-abc not present in metadata after 60000 ms.
FYI: Topics are already created in our staging/prod environment and are not to be created as the application starts.
My producer config:
spring.cloud.stream.bindings.pc-abc-out-0.content-type=application/json
spring.cloud.stream.bindings.pc-abc-out-0.destination=pc-abc
spring.cloud.stream.bindings.pc-abc-out-0.producer.header-mode=headers
***spring.cloud.stream.bindings.pc-abc-out-0.producer.partition-count=5***
spring.cloud.stream.bindings.pc-abc-out-0.producer.partitionKeyExpression=payload.key
spring.cloud.stream.kafka.bindings.pc-abc-out-0.producer.sync=true
I am kind of stuck at this point and exhausted. Has anyone else faced this issue?
Spring Cloud version: 2.5.5
Kafka: 2.7.1
The issue is :
The producer is configured with partition-count=5
and Kafka is looking for partition number 6 , which obviously does not exist , I have commented the auto-add partitions property, but the issue still turns up !! Is it stale configuration? How do I force kafka to take up new configuration.

Kafka consumer using AWS_MSK_IAM ClassCastException error

I have MSK running on AWS and I'd like to consume information using AWS_MSK_IAM authentication.
My MSK is properly configured and I can consume the information using Kafka CLI with the following command:
../bin/kafka-console-consumer.sh --bootstrap-server b-1.kafka.*********.***********.amazonaws.com:9098 --consumer.config client_auth.properties --topic TopicTest --from-beginning
My client_auth.properties has the following information:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
When I try to consume from my Databricks cluster using spark, I receive the following error:
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: java.lang.ClassCastException: software.amazon.msk.auth.iam.IAMClientCallbackHandler cannot be cast to kafkashaded.org.apache.kafka.common.security.auth.AuthenticateCallbackHandler
Here is my cluster config:
The libraries I'm using in the cluster:
And the code I'm running on Databricks:
raw = (
spark
.readStream
.format('kafka')
.option('kafka.bootstrap.servers', 'b-.kafka.*********.***********.amazonaws.com:9098')
.option('subscribe', 'TopicTest')
.option('startingOffsets', 'earliest')
.option('kafka.sasl.mechanism', 'AWS_MSK_IAM')
.option('kafka.security.protocol', 'SASL_SSL')
.option('kafka.sasl.jaas.config', 'software.amazon.msk.auth.iam.IAMLoginModule required;')
.option('kafka.sasl.client.callback.handler.class', 'software.amazon.msk.auth.iam.IAMClientCallbackHandler')
.load()
)
Though I haven't tested this, based on the comment from Andrew on being theoretically able to relocate the dependency, I dug a bit into the source of aws-msk-iam-auth. They have a compileOnly('org.apache.kafka:kafka-clients:2.4.1') in their build.gradle. Hence the uber jar doesn't contain this library and is picked up from whatever databricks has (and shaded).
They are also relocating all their dependent jars with a prefix. So changing the compileOnly to implementation and rebuilding the uber jar with gradle clean shadowJar should include and relocate the kafka jars without any conflicts when uploading to databricks.
I faced the same issue, I forked aws-msk-iam-auth in order to make it compatible with databricks. Just add the jar from the following release https://github.com/Iziwork/aws-msk-iam-auth-for-databricks/releases/tag/v1.1.2-databricks to your cluster.

Unable to run Kafka Console Producer (NoSuchMethodError)

Error while running kafka producer
./kafka-console-producer.sh --broker-list localhost:9092 --topic testing
Exception in thread "main" java.lang.NoSuchMethodError: kafka.utils.CommandLineUtils$.parseKeyValueArgs(Lscala/collection/Iterable;)Ljava/util/Properties;
at kafka.tools.ConsoleProducer$ProducerConfig.<init>(ConsoleProducer.scala:245)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:35)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
This kind of error is usually related to mismatched versions of Kafka jars. If this is the case, resetting your CLASSPATH should do the trick:
export CLASSPATH=""
Looks like you either have conflicting jars in your classpath or you have mismatched versions of kafka broker and kafka client.

How can you set the max.message.bytes of a state store changelog topic?

I have a Kafka Streams application with messages up to 10MiB. I want to persist these messages in a state store, but Kafka Streams fails to produce to the internal changelog topic:
2017-11-17 08:36:19,792 ERROR RecordCollectorImpl - task [4_5] Error sending record to topic appid-statestorename-state-store-changelog. No more offsets will be recorded for this task and the exception will eventually be thrown
org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
2017-11-17 08:36:20,583 ERROR StreamThread - stream-thread [StreamThread-1] Failed while executing StreamTask 4_5 due to flush state:
By adding some logging, it looks like the default max.message.bytes setting of an internal topic is 1MiB.
The default max.message.bytes for the cluster is set to 50MiB.
Is it possible to tweak the configuration of internal topics of Kafka Streams applications?
A work-around is to start the streams application, let it create the topics, and afterwards alter the topic config. But this feels like a dirty hack.
./kafka-topics.sh --zookeeper ... \
--alter --topic appid-statestorename-state-store-changelog \
--config max.message.bytes=10485760
Kafka 1.0 allows to specify custom topic properties for internal topics via StreamsConfig.
You prefix those configs with "topic." and can use any configs as defined in TopicConfig.
See the original KIP for more details:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-173%3A+Add+prefix+to+StreamsConfig+to+enable+setting+default+internal+topic+configs

Getting exception while instantiating KafkaProducer

I am using IBM Bluemix implementation of the Kafka Broker.
I am creating the KafkaProducer with following properties:
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
bootstrap.servers=xxxx.xxxxxx.xxxxxx.xxxxxx.bluemix.net:xxxx
client.id=messagehub
acks=-1
security.protocol=SASL_SSL
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.truststore.location=xxxxxxxxxxxxxxxxx
ssl.truststore.password=xxxxxxxx
ssl.truststore.type=JKS
ssl.endpoint.identification.algorithm=HTTPS
KafkaProducer<byte[], byte[]> kafkaProducer =
new KafkaProducer<byte[], byte[]>(props);
With this I got following exception:
org.apache.kafka.common.KafkaException:
org.apache.kafka.clients.producer.internals.DefaultPartitioner is not
an instance of org.apache.kafka.clients.producer.Partitioner
After reading the following blog:
http://blog.rocana.com/kafkas-defaultpartitioner-and-byte-arrays I added the following line to my property file, even though I was using new API:
partitioner.class=kafka.producer.ByteArrayPartitioner
Now I am getting this exception:
org.apache.kafka.common.KafkaException: Could not instantiate class
kafka.producer.ByteArrayPartitioner Does it have a public no-argument
constructor?
It looks like ByteArrayPartitioner does not have a default constructor.
Any idea what I am missing here?
Thanks
Madhu
As I was using the KafkaProducer API, I did not need
partitioner.class=kafka.producer.ByteArrayPartitioner
property. The issue was there were 2 copies of the kafkaclient jar. We have configured our installation such that all library jar files are in an external shared directory. But due to the POM configuration error the war file also had a copy of the kafka client in its lib directory. Once I fixed this issue, it worked fine.
Madhu