Apache kafka cluster using MapR Spark streaming not working - apache-kafka

We are facing a issue while connecting to Apache kafka cluster using MapR Spark streaming (1.6.1). The setup details are as below:
• MapR cluster with Spark 1.6.1 (3 node cluster)
• Apache Kafka cluster v0.8.1.1 (5 node cluster)
We are using ‘spark-streaming-kafka’ library from mapr v1.6.1-ampr-1605. We also tried to run in local mode with apache spark (not mapr spark) this is working very well.
Below is the stack trace of the error:
Exception in thread "main" org.apache.kafka.common.config.ConfigException: No bootstrap urls given in bootstrap.servers
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:57)
at org.apache.kafka.clients.consumer.KafkaConsumer.initializeConsumer(KafkaConsumer.java:606)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1563)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1$$anonfun$1.apply(KafkaCluster.scala:54)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1$$anonfun$1.apply(KafkaCluster.scala:54)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1.apply(KafkaCluster.scala:53)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1.apply(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.v09.KafkaCluster.withConsumer(KafkaCluster.scala:164)
at org.apache.spark.streaming.kafka.v09.KafkaCluster.getPartitions(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.v09.KafkaUtils$.getFromOffsets(KafkaUtils.scala:421)
at org.apache.spark.streaming.kafka.v09.KafkaUtils$.createDirectStream(KafkaUtils.scala:292)
at org.apache.spark.streaming.kafka.v09.KafkaUtils$.createDirectStream(KafkaUtils.scala:397)
at org.apache.spark.streaming.kafka.v09.KafkaUtils.createDirectStream(KafkaUtils.scala)
at com.cisco.it.log.KafkaDirectStreamin2.main(KafkaDirectStreamin2.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:742)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
PS: we are passing “metadata.broker.list” while creating connection.
Spark streaming application is not able to connect to ZK and not able to get bootstrap URL. This is what my understanding. Or it could be issue of not having correct version of map-r and kafka jar. We took jar from Map-r side but still not working.
We are able to test with apache spark successfully but not able to get it working on mapr.
Any help appericated.

In your stacktrace there are references to org.apache.spark.streaming.kafka.v09 which probably means that it's an implementation using the new consumer API which became available with Kafka 0.9 and won't work with Kafka 0.8.1.1. You should probably try one of the libraries from MapR's spark-streaming-kafka_2.10 instead.

Related

Flink - kafka connector OAUTHBEARER Class loader issue

I try to configure kafka authentification using sasl mechanism (OAUTHBEARER)(using flink 1.9.2, kafka-client 2.2.0).
When using Flink with SASL authentification I got the exception bellow.
Kafka is shaded in a fat jar with the application.
After a remote debugging I found that my callback handler has a ChildFirstClassloader and
org.apache.kafka.common.security.auth.AuthenticateCallbackHandler belongs to another ChildFirstClassloader so the instance of the following test is failing (OAuthBearerSaslClientFactory) :
if (!(Objects.requireNonNull(callbackHandler) instanceof AuthenticateCallbackHandler))
throw new IllegalArgumentException(String.format(
"Callback handler must be castable to %s: %s",
AuthenticateCallbackHandler.class.getName(), callbackHandler.getClass().getName()));
I have no idea why these two classes have two different classloader.
Any idea? Any workaround?
Thanks for the help.
Caused by: org.apache.kafka.common.errors.SaslAuthenticationException: Failed to configure SaslClientAuthenticator
Caused by: java.lang.IllegalArgumentException: Callback handler must be castable to org.apache.kafka.common.security.auth.AuthenticateCallbackHandler: org.apache.kafka.common.security.oauthbearer.internals.OAuthBearerSaslClientCallbackHandler
at org.apache.kafka.common.security.oauthbearer.internals.OAuthBearerSaslClient$OAuthBearerSaslClientFactory.createSaslClient(OAuthBearerSaslClient.java:182)
at javax.security.sasl.Sasl.createSaslClient(Sasl.java:420)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.lambda$createSaslClient$0(SaslClientAuthenticator.java:180)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslClient(SaslClientAuthenticator.java:176)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.<init>(SaslClientAuthenticator.java:168)
at org.apache.kafka.common.network.SaslChannelBuilder.buildClientAuthenticator(SaslChannelBuilder.java:254)
at org.apache.kafka.common.network.SaslChannelBuilder.lambda$buildChannel$1(SaslChannelBuilder.java:202)
at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:140)
at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:210)
at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:334)
at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:325)
at org.apache.kafka.common.network.Selector.connect(Selector.java:257)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:920)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:474)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1803)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1771)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:77)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:508)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:552)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:416)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
I'm not sure if you've solved this already, but I wrestled with this exact same scenario for quite a while. What ended up working for me was copying the kafka-clients jar into Flink's lib/ directory.
Sorry forgot to post the solution, but yes i solved it in the same way, by copying the kafka-client in flink lib.

scala spark raises an error related to derby everytime when doing toDF() or createDataFrame

I am new to scala and scala-api spark and I tried scala-api spark recently on my own computer, which means I run the spark locally by setting SparkSession.builder().master("local[*]"). at first I succeeded in reading the text file using spark.sparkContext.textFile(). After having got the corresponding rdd, I tried convert the rdd to a spark DataFrame, but failed again and again.
To be specific, I used two methods, 1) toDF() and 2) spark.createDataFrame(), all failed, both two methods gave me similar error as shown below.
2018-10-16 21:14:27 ERROR Schema:125 - Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1#199549a5, see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.jdbc.InternalDriver.getNewEmbedConnection(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
I examined the error message, it seems that the errors are related to apache.derby and some connection to some database is failed. I do not know what JDBC is actually. I am somewhat familiar with pyspark and I have never been asked to configure any JDBC database, WHY SCALA-API SPARK need it? what should I do to avoid this error? why scala-api spark dataframe need JDBC or any database while scala-api spark RDD doesn't?
For future googler:
I have googled for several hours and still have no idea about how to get rid of this error. But the origin of this problem is very clear: my sparksession enables the support for Hive which then need to specify the database. To solve this problem, we need to disable the support for Hive, since I am running spark on my own mac, it is ok to do this.
So I download the spark source file and build it by myself using the command
./make-distribution.sh --name hadoop-2.6_scala-2.11 --tgz -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests
omits -Phive -Phive-thriftserver.
I tested self-built spark, and metastore_db folder has never been created and so fat so good.
For the detail, please refer to this post: Prebuilt Spark 2.1.0 creates metastore_db folder and derby.log when launching spark-shell

Spark Streaming Kafka integration in CDH 5.8.3 in yarn-cluster mode

I'm facing a weird issue in running a Spark Streaming job reading from Kafka. I'm on a CDH 5.8.3 distribution: Spark version is 1.6.0 and Kafka version is 0.9.0.
My code is very simple:
val kafkaParams = Map[String, String]("bootstrap.servers" -> brokersList, "auto.offset.reset" -> "smallest")
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, Set(kafkaTopic))
If I run it in yarn-client mode I have no error. While if I run the program in yarn-cluster mode I am getting an Exception. My launching command is:
spark-submit --master yarn-cluster --files /etc/hbase/conf/hbase-site.xml --num-executors 5 --executor-memory 4G --jars (somejars for HBase interaction) --class mypackage.MyClass myJar.jar
But I'm getting this error:
java.lang.ClassCastException: kafka.cluster.Broker cannot be cast to kafka.cluster.BrokerEndPoint
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6$$anonfun$apply$7.apply(KafkaCluster.scala:90)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:90)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:87)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3.apply(KafkaCluster.scala:87)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3.apply(KafkaCluster.scala:86)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2.apply(KafkaCluster.scala:86)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2.apply(KafkaCluster.scala:85)
at scala.util.Either$RightProjection.flatMap(Either.scala:523)
at org.apache.spark.streaming.kafka.KafkaCluster.findLeaders(KafkaCluster.scala:85)
at org.apache.spark.streaming.kafka.KafkaCluster.getLeaderOffsets(KafkaCluster.scala:179)
at org.apache.spark.streaming.kafka.KafkaCluster.getLeaderOffsets(KafkaCluster.scala:161)
at org.apache.spark.streaming.kafka.KafkaCluster.getEarliestLeaderOffsets(KafkaCluster.scala:155)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$5.apply(KafkaUtils.scala:213)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$5.apply(KafkaUtils.scala:211)
at scala.util.Either$RightProjection.flatMap(Either.scala:523)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at myPackage.Ingestion$.createStreamingContext(Ingestion.scala:120)
at myPackage.Ingestion$$anonfun$1.apply(Ingestion.scala:55)
at myPackage.Ingestion$$anonfun$1.apply(Ingestion.scala:55)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:864)
at myPackage.Ingestion$.main(Ingestion.scala:55)
at myPackage.Ingestion.main(Ingestion.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Surfing the internet I ended thinking that it's a version issue, but I can't figure out why this happens, since the jars are the same running both in yarn-client and yarn-cluster mode.
Do you have any idea?
Thank you,
Marco
Looks like Spark streaming 1.6 is compatible with Kafka 0.8 (see documentation)
I'd guess you're using Kafka client 0.9, which gets picked up in client mode from your jar, but when you switch to cluster mode default Kafka client (0.8.2.1) is used.
Am I right? If so, can you try removing kafka client dependency from your build and use default one provided by spark-streaming-kafka? (0.8 client should work with 0.9 brokers).
For those who might encounter the same issue, our problem was due to the Splice Machine installation. Indeed, Splice Machine requires to set its jars in YARN additional jars configuration (and among them there is also a spark-assembly by them).
Now we're trying to find out a way to make all the things running without unisntall Splice Machine from our cluster.
Thanks.

Subscribe method is throwing error while trying to access kafka (0.90 version) from kafka (0.10 version)

This is our development environment
1) kafka cluster - version is 0.10
2) Spark cluster - 1.6 which has 0.9 Kafka jars
We are trying to produce() and consume() in spark cluster mode. (via spark-submit)
While running spark-submit job, spark chooses 0.9 version of kafka. The following is our observation
1) Producer – works fine ( 0.9 api and 0.10 api producer is compatible )
2) Streaming Kafka Consumer using KafkaUtils – works fine ( seems here also 0.9 api and 0.10 api producer is compatible)
3) Consumer using subscribe() API – Errors out with the following message. Can someone help us know why is it failing ?
16/10/24 02:31:08 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V
java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V
at com.common.kafka.init(Kafkafunction.java:150)
at com.client.Client.main(Client.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
16/10/24 02:31:08 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V)
Updating all the stuff up to 0.10 solves the problem. These versions are definitely not compatible in this line org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V

Consumer group can't rebalance

I am learning kafka and now I am having some problems with the example code i've found here: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
Each time I run the code it throws the exception:
Exception in thread "main" kafka.common.ConsumerRebalanceFailedException: ttt_NB644-1475151991986-76dfa03f can't rebalance after 4 retries
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670)
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:977)
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:264)
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:85)
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:97)
at com.glowbyte.kafka.consumertest.ConsumerGroupExample.run(ConsumerGroupExample.java:44)
at com.glowbyte.kafka.consumertest.ConsumerGroupExample.main(ConsumerGroupExample.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
The kafka version which I'm using is 0.10, the latest one.
There is only one topic with one broker and two partitions, and I'm trying to run the code with 2 threads.
In the meantime, another code, just more simple, runs successfully on the same environment with also 2 threads. So I'd like to understand what's causing the described exception. Thanks.