Getting exception while instantiating KafkaProducer - apache-kafka

I am using IBM Bluemix implementation of the Kafka Broker.
I am creating the KafkaProducer with following properties:
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
bootstrap.servers=xxxx.xxxxxx.xxxxxx.xxxxxx.bluemix.net:xxxx
client.id=messagehub
acks=-1
security.protocol=SASL_SSL
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.truststore.location=xxxxxxxxxxxxxxxxx
ssl.truststore.password=xxxxxxxx
ssl.truststore.type=JKS
ssl.endpoint.identification.algorithm=HTTPS
KafkaProducer<byte[], byte[]> kafkaProducer =
new KafkaProducer<byte[], byte[]>(props);
With this I got following exception:
org.apache.kafka.common.KafkaException:
org.apache.kafka.clients.producer.internals.DefaultPartitioner is not
an instance of org.apache.kafka.clients.producer.Partitioner
After reading the following blog:
http://blog.rocana.com/kafkas-defaultpartitioner-and-byte-arrays I added the following line to my property file, even though I was using new API:
partitioner.class=kafka.producer.ByteArrayPartitioner
Now I am getting this exception:
org.apache.kafka.common.KafkaException: Could not instantiate class
kafka.producer.ByteArrayPartitioner Does it have a public no-argument
constructor?
It looks like ByteArrayPartitioner does not have a default constructor.
Any idea what I am missing here?
Thanks
Madhu

As I was using the KafkaProducer API, I did not need
partitioner.class=kafka.producer.ByteArrayPartitioner
property. The issue was there were 2 copies of the kafkaclient jar. We have configured our installation such that all library jar files are in an external shared directory. But due to the POM configuration error the war file also had a copy of the kafka client in its lib directory. Once I fixed this issue, it worked fine.
Madhu

Related

Kafka consumer using AWS_MSK_IAM ClassCastException error

I have MSK running on AWS and I'd like to consume information using AWS_MSK_IAM authentication.
My MSK is properly configured and I can consume the information using Kafka CLI with the following command:
../bin/kafka-console-consumer.sh --bootstrap-server b-1.kafka.*********.***********.amazonaws.com:9098 --consumer.config client_auth.properties --topic TopicTest --from-beginning
My client_auth.properties has the following information:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
When I try to consume from my Databricks cluster using spark, I receive the following error:
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: java.lang.ClassCastException: software.amazon.msk.auth.iam.IAMClientCallbackHandler cannot be cast to kafkashaded.org.apache.kafka.common.security.auth.AuthenticateCallbackHandler
Here is my cluster config:
The libraries I'm using in the cluster:
And the code I'm running on Databricks:
raw = (
spark
.readStream
.format('kafka')
.option('kafka.bootstrap.servers', 'b-.kafka.*********.***********.amazonaws.com:9098')
.option('subscribe', 'TopicTest')
.option('startingOffsets', 'earliest')
.option('kafka.sasl.mechanism', 'AWS_MSK_IAM')
.option('kafka.security.protocol', 'SASL_SSL')
.option('kafka.sasl.jaas.config', 'software.amazon.msk.auth.iam.IAMLoginModule required;')
.option('kafka.sasl.client.callback.handler.class', 'software.amazon.msk.auth.iam.IAMClientCallbackHandler')
.load()
)
Though I haven't tested this, based on the comment from Andrew on being theoretically able to relocate the dependency, I dug a bit into the source of aws-msk-iam-auth. They have a compileOnly('org.apache.kafka:kafka-clients:2.4.1') in their build.gradle. Hence the uber jar doesn't contain this library and is picked up from whatever databricks has (and shaded).
They are also relocating all their dependent jars with a prefix. So changing the compileOnly to implementation and rebuilding the uber jar with gradle clean shadowJar should include and relocate the kafka jars without any conflicts when uploading to databricks.
I faced the same issue, I forked aws-msk-iam-auth in order to make it compatible with databricks. Just add the jar from the following release https://github.com/Iziwork/aws-msk-iam-auth-for-databricks/releases/tag/v1.1.2-databricks to your cluster.

Kafka Connect JDBC sink connector issue

Getting below error while running JDBC sink connector
[2020-01-08 15:05:39,271] ERROR Plugin class loader for connector: 'io.confluent.connect.jdbc.JdbcSinkConnector' was not found. Returning: org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader#6f2cfcc2 (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:165)
[2020-01-08 15:05:39,272] INFO Finished creating connector test-sink (org.apache.kafka.connect.runtime.Worker:273)
[2020-01-08 15:05:39,273] ERROR Plugin class loader for connector: 'io.confluent.connect.jdbc.JdbcSinkConnector' was not found. Returning: org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader#6f2cfcc2 (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:165)
[2020-01-08 15:05:39,273] INFO SinkConnectorConfig values:
I have set plugin path properly as per given in the documentation.
I had the same issue with you and just solved it. The point here is you should not copy the connector jar file to the kafka libs directory. You should set the CLASSPATH during running the command like this:
env CLASSPATH=./* connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties config/quickstart-couchbase-source.properties
or set the plugin.path in the worker .properties file.
plugin.path=/path_to_the_plugin_jar_file
Hope this help.

Not able to use Kafka's JdbcSourceConnector to read data from Oracle DB to kafka topic

I am trying to write a standalone java program using kafka-jdbc-connect API to stream data from oracle-table to kafka topic.
API used: I'm currently trying to use Kafka Connectors, JdbcSourceConnector class to be precise.
Constraint: Use Confluent Java API and not do it through CLI or by executing provided shell script.
What I did: create an instance of JdbcSourceConnector.java class and call start(Properties) method of this class by providing the Properties object as a parameter. This properties object has database connection properties, table whitelist property, topic prefix etc.
After starting thread, i'm unable to read the data from "topic-prefix-tablename" topic. I am not sure how to pass Kafka Broker details to JdbcSourceConnector. Calling start() method on JdbcSourceConnector starting thread but not doing anything.
Is there a simple java API tutorial page/example code i can refer because all the examples i see are using CLI/shell scripts?
Any help is appreciated
Code:
public static void main(String[] args) {
Map<String, String> jdbcConnectorConfig = new HashMap<String, String>();
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.CONNECTION_URL_CONFIG, "<DATABASE_URL>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.CONNECTION_USER_CONFIG, "<DATABASE_USER>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.CONNECTION_PASSWORD_CONFIG, "<DATABASE_PASSWORD>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.POLL_INTERVAL_MS_CONFIG, "300000");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.BATCH_MAX_ROWS_CONFIG, "10");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.MODE_CONFIG, "timestamp");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.TABLE_WHITELIST_CONFIG, "<TABLE_NAME>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.TIMESTAMP_COLUMN_NAME_CONFIG, "<TABLE_COLUMN_NAME>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.TOPIC_PREFIX_CONFIG, "test-oracle-jdbc-");
JdbcSourceConnector jdbcSourceConnector = new JdbcSourceConnector ();
jdbcSourceConnector.start(jdbcConnectorConfig);
}
Assuming you are trying to do it in Standalone mode.
In your Application run configuration, your main class should be "org.apache.kafka.connect.cli.ConnectStandalone" and you need to pass two property files as program arguments.
You should also extend "your-custom-JdbcSourceConnector" class with "org.apache.kafka.connect.source.SourceConnector" class
Main Class: org.apache.kafka.connect.cli.ConnectStandalone
Program Arguments: .\path-to-config\connect-standalone.conf .\path-to-config\connetcor.properties
"connect-standalone.conf" file will contain all Kafka broker details.
// Example connect-standalone.conf
bootstrap.servers=<comma seperated brokers list here>
group.id=some_loca_group_id
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=connect.offset
offset.flush.interval.ms=100
offset.flush.timeout.ms=180000
buffer.memory=67108864
batch.size=128000
producers.acks=1
"connector.properties" file will contain all details required to create and start connector
// Example connector.properties
name=some-local-connector-name
connector.class=your-custom-JdbcSourceConnector
tasks.max=3
topic=output-topic
fetchsize=10000
More info here : https://docs.confluent.io/current/connect/devguide.html#connector-example

Kafka org.apache.kafka.connect.converters.ByteArrayConverter doesn't work as values for key.converter and value.converter

I'm trying to build a pipeline where I have to move binary data from kafka topic to kinesis stream with out transforming. So I'm planning to use ByteArrayConverter for worker properties setup. But I'm getting the following error! Although I could see the ByteArrayConverter class in here
on 0.11.0 version. I cannot find the same class under 3.2.x :(
Any help would be much appreciated.
key.converter=io.confluent.connect.replicator.util.ByteArrayConverter
value.converter=io.confluent.connect.replicator.util.ByteArrayConverter
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.connect.replicator.util.ByteArrayConverter for configuration key.converter: Class io.confluent.connect.replicator.util.ByteArrayConverter could not be found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:672)
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:418)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:55)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:62)
at org.apache.kafka.connect.runtime.WorkerConfig.<init>(WorkerConfig.java:156)
at org.apache.kafka.connect.runtime.distributed.DistributedConfig.<init>(DistributedConfig.java:198)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:65)
org.apache.kafka.connect.converters.ByteArrayConverter was only added to Apache Kafka 0.11 (which is Confluent 3.3). If you are running a Confluent distro earlier than 3.3 then you will need the Confluent Enterprise distro (not Confluent Open Source) and use the io.confluent.connect.replicator.util.ByteArrayConverter converter

Apache Spark: Getting a InstanceAlreadyExistsException when running the Kafka producer

I have an small app in scala that creates kafka producer and that run with Apache Spark.
when I run the command
spark-submit --master local[2] --deploy-mode client <into the jar file> <app Name> <kafka broker> <kafka in queue> <kafka out queue> <interval>
I am getting this WARN:
WARN AppInfoParser: Error registering AppInfo mbean
javax.management.InstanceAlreadyExistsException: kafka.producer:type=app-info,id=
The code is not relevant because I am getting this exception when scala creates the KafkaProducer: val producer = new KafkaProducerObject,Object
Does anybody have a solution for this?
Thank you!
When a Kafka Producer is created, it attempts to register an MBean using the client.id as its unique identifier.
There are two possibilities of why you are getting the InstanceAlreadyExistsException warning:
You are attempting to initialize more than one Producer at a time with the same client.id property on the same JVM.
You are not calling close() on an existing Producer before initializing another Producer. Calling close() unregisters the MBean.
If you leave the client.id property blank when initializing the producer, a unique one will be created for you. Giving your producers unique client.id values or allowing them to be auto-generated would resolve this problem.
In the case of Kafka, MBeans can be used for tracking statistics.