Flink - kafka connector OAUTHBEARER Class loader issue - apache-kafka

I try to configure kafka authentification using sasl mechanism (OAUTHBEARER)(using flink 1.9.2, kafka-client 2.2.0).
When using Flink with SASL authentification I got the exception bellow.
Kafka is shaded in a fat jar with the application.
After a remote debugging I found that my callback handler has a ChildFirstClassloader and
org.apache.kafka.common.security.auth.AuthenticateCallbackHandler belongs to another ChildFirstClassloader so the instance of the following test is failing (OAuthBearerSaslClientFactory) :
if (!(Objects.requireNonNull(callbackHandler) instanceof AuthenticateCallbackHandler))
throw new IllegalArgumentException(String.format(
"Callback handler must be castable to %s: %s",
AuthenticateCallbackHandler.class.getName(), callbackHandler.getClass().getName()));
I have no idea why these two classes have two different classloader.
Any idea? Any workaround?
Thanks for the help.
Caused by: org.apache.kafka.common.errors.SaslAuthenticationException: Failed to configure SaslClientAuthenticator
Caused by: java.lang.IllegalArgumentException: Callback handler must be castable to org.apache.kafka.common.security.auth.AuthenticateCallbackHandler: org.apache.kafka.common.security.oauthbearer.internals.OAuthBearerSaslClientCallbackHandler
at org.apache.kafka.common.security.oauthbearer.internals.OAuthBearerSaslClient$OAuthBearerSaslClientFactory.createSaslClient(OAuthBearerSaslClient.java:182)
at javax.security.sasl.Sasl.createSaslClient(Sasl.java:420)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.lambda$createSaslClient$0(SaslClientAuthenticator.java:180)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslClient(SaslClientAuthenticator.java:176)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.<init>(SaslClientAuthenticator.java:168)
at org.apache.kafka.common.network.SaslChannelBuilder.buildClientAuthenticator(SaslChannelBuilder.java:254)
at org.apache.kafka.common.network.SaslChannelBuilder.lambda$buildChannel$1(SaslChannelBuilder.java:202)
at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:140)
at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:210)
at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:334)
at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:325)
at org.apache.kafka.common.network.Selector.connect(Selector.java:257)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:920)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:474)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1803)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1771)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:77)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:508)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:552)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:416)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)

I'm not sure if you've solved this already, but I wrestled with this exact same scenario for quite a while. What ended up working for me was copying the kafka-clients jar into Flink's lib/ directory.

Sorry forgot to post the solution, but yes i solved it in the same way, by copying the kafka-client in flink lib.

Related

Exception on startup: NoSuchMethodException: org.springframework.kafka.core.KafkaTemplate.<init>()

I tried to update Spring Kafka version but got exception
Spring Kafka version 2.3.4.RELEASE
Spring Boot version 2.2.2.RELEASE
Kafka-clients version 2.3.1
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.kafka.core.KafkaTemplate]: No default constructor found; nested exception is java.lang.NoSuchMethodException: org.springframework.kafka.core.KafkaTemplate.<init>()
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:83)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateBean(AbstractAutowireCapableBeanFactory.java:1312)
... 101 more
Caused by: java.lang.NoSuchMethodException: org.springframework.kafka.core.KafkaTemplate.<init>()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:78)
... 102 more
You need to show you code and configuration and the full stack trace (you should never edit/truncate stack traces here). The error seems quite clear:
Caused by: java.lang.NoSuchMethodException: org.springframework.kafka.core.KafkaTemplate.()
There is no no-arg constructor - it needs a producer factory; we need to see the code and configuration to figure out who's trying to create a template with no PF.
Normaly, Spring Boot will automatically configure a KafkaTemplate for you.
Thank you! The problem wa in the tests - I incorrectly determined the generic type of KafkaTemplate. I used
KafkaTemplate <String, Bytes> instead of KafkaTemplate<String, Message> which I am using in app code. So I suppose, test spring context could not define proper bean to autowire.

"Failed to construct kafka consumer" using Apache Drill

I am using the Apache Drill (1.14) JDBC driver in my application which consumes the data from the Kafka. The application works just fine for some time and after few iterations it fails to execute due to the following Too many files open issue. I made sure there are no file handle leaks in my code but still nor sure why this issue is happening?
It looks like the issue is happening from with-in the Apache drill libraries when constructing the Kafka consumer. Can any one please guide me help this problem fixed?
The problem perishes when I restart my Apache drillbit but very soon it happens again. I did check the file descriptor count on my unix machine using ulimit -a | wc -l & lsof -a -p <PID> | wc -l before and after the drill process restart and it seems the drill process is considerably taking a lot of file descriptors. I tried increasing the file descriptor count on the system but still no luck.
I have followed the Apache Drill storage plugin documentation in configuring the Kafka plugin into Apache Drill at https://drill.apache.org/docs/kafka-storage-plugin/
Any help on this issue is highly appreciated. Thanks.
JDBC URL: jdbc:drill:drillbit=localhost:31010;schema=kafka
NOTE: I am pushing down the filters in my query
SELECT * FROM myKafkaTopic WHERE kafkaMsgTimestamp > 1560210931626
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Failed to fetch start/end offsets of the topic myKafkaTopic
Failed to construct kafka consumer
[Error Id: 73f896a7-09d4-425b-8cd5-f269c3a6e69a ]
at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.kafka.KafkaGroupScan.init(KafkaGroupScan.java:198) [drill-storage-kafka-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.kafka.KafkaGroupScan.<init>(KafkaGroupScan.java:98) [drill-storage-kafka-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.kafka.KafkaStoragePlugin.getPhysicalScan(KafkaStoragePlugin.java:83) [drill-storage-kafka-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:111) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:99) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:89) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:69) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.logical.DrillScanRel.<init>(DrillScanRel.java:62) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.logical.DrillScanRule.onMatch(DrillScanRule.java:38) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:212) [calcite-core-1.16.0-drill-r6.jar:1.16.0-drill-r6]
at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:652) [calcite-core-1.16.0-drill-r6.jar:1.16.0-drill-r6]
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368) [calcite-core-1.16.0-drill-r6.jar:1.16.0-drill-r6]
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:429) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:369) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:255) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:318) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:180) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:145) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:83) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:266) [drill-java-exec-1.14.0.jar:1.14.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:765) ~[kafka-clients-0.11.0.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:633) ~[kafka-clients-0.11.0.1.jar:na]
at org.apache.drill.exec.store.kafka.KafkaGroupScan.init(KafkaGroupScan.java:168) [drill-storage-kafka-1.14.0.jar:1.14.0]
... 23 common frames omitted
Caused by: org.apache.kafka.common.KafkaException: java.io.IOException: Too many open files
at org.apache.kafka.common.network.Selector.<init>(Selector.java:129) ~[kafka-clients-0.11.0.1.jar:na]
at org.apache.kafka.common.network.Selector.<init>(Selector.java:156) ~[kafka-clients-0.11.0.1.jar:na]
at org.apache.kafka.common.network.Selector.<init>(Selector.java:160) ~[kafka-clients-0.11.0.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:701) ~[kafka-clients-0.11.0.1.jar:na]
... 25 common frames omitted
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) ~[na:1.8.0_181]
at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:130) ~[na:1.8.0_181]
at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:69) ~[na:1.8.0_181]
at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36) ~[na:1.8.0_181]
at java.nio.channels.Selector.open(Selector.java:227) ~[na:1.8.0_181]
at org.apache.kafka.common.network.Selector.<init>(Selector.java:127) ~[kafka-clients-0.11.0.1.jar:na]```
This is because the kafka storage plugin never explicitly closes its connections to kafka.
It relies on the idle timeout at the server ( connections.max.idle.ms ) which defaults to 10 minutes.
I saw exactly the same issue when running queries for a few minutes - and we reduced the default timeout to verify that was the case.

Interrupted while joining ioThread / Error during disposal of stream operator in flink application

I have a flink-based streaming application which uses apache kafka sources and sinks. Since some days I am getting exceptions at random times during development, and I have no clue where they're coming from.
I am running the app within IntelliJ using the mainRunner class, and I am feeding it messages via kafka. Sometimes the first message will trigger the errors, sometimes it happens only after a few messages.
This is how it looks:
16:31:01.935 ERROR o.a.k.c.producer.KafkaProducer - Interrupted while joining ioThread
java.lang.InterruptedException: null
at java.lang.Object.wait(Native Method) ~[na:1.8.0_51]
at java.lang.Thread.join(Thread.java:1253) [na:1.8.0_51]
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1031) [kafka-clients-0.11.0.2.jar:na]
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1010) [kafka-clients-0.11.0.2.jar:na]
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:989) [kafka-clients-0.11.0.2.jar:na]
at org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaProducer.close(FlinkKafkaProducer.java:168) [flink-connector-kafka-0.11_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.close(FlinkKafkaProducer011.java:662) [flink-connector-kafka-0.11_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43) [flink-core-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117) [flink-streaming-java_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:477) [flink-streaming-java_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:378) [flink-streaming-java_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) [flink-runtime_2.11-1.6.1.jar:1.6.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
16:31:01.936 ERROR o.a.f.s.runtime.tasks.StreamTask - Error during disposal of stream operator.
org.apache.kafka.common.KafkaException: Failed to close kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1062) ~[kafka-clients-0.11.0.2.jar:na]
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1010) ~[kafka-clients-0.11.0.2.jar:na]
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:989) ~[kafka-clients-0.11.0.2.jar:na]
at org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaProducer.close(FlinkKafkaProducer.java:168) ~[flink-connector-kafka-0.11_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.close(FlinkKafkaProducer011.java:662) ~[flink-connector-kafka-0.11_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43) ~[flink-core-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117) ~[flink-streaming-java_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:477) [flink-streaming-java_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:378) [flink-streaming-java_2.11-1.6.1.jar:1.6.1]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) [flink-runtime_2.11-1.6.1.jar:1.6.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
Caused by: java.lang.InterruptedException: null
at java.lang.Object.wait(Native Method) ~[na:1.8.0_51]
at java.lang.Thread.join(Thread.java:1253) [na:1.8.0_51]
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1031) ~[kafka-clients-0.11.0.2.jar:na]
... 10 common frames omitted
16:31:01.938 ERROR o.a.k.c.producer.KafkaProducer - Interrupted while joining ioThread
I get around 10-20 of those, and then it seems like flink recovers the app, and it gets usable again, and I can successfully process messages.
What could possibly cause this? Or how can I analyze further to track this down?
I am using flink version 1.6.1 with scala 2.11 on a mac with IntelliJ beeing version 2018.3.2.
I was able to resolve it. Turned out that one of my stream operators (map-function) was throwing an exception because of some invalid array index.
It was not possible to see this in the logs, only when I step-by-step teared down the application into smaller pieces I finally got this very exception in the logs, and after fixing the obvious bug in the array access, the above mentioned exceptions (java.lang.InterruptedException and org.apache.kafka.common.KafkaException) went away.

scala spark raises an error related to derby everytime when doing toDF() or createDataFrame

I am new to scala and scala-api spark and I tried scala-api spark recently on my own computer, which means I run the spark locally by setting SparkSession.builder().master("local[*]"). at first I succeeded in reading the text file using spark.sparkContext.textFile(). After having got the corresponding rdd, I tried convert the rdd to a spark DataFrame, but failed again and again.
To be specific, I used two methods, 1) toDF() and 2) spark.createDataFrame(), all failed, both two methods gave me similar error as shown below.
2018-10-16 21:14:27 ERROR Schema:125 - Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1#199549a5, see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.jdbc.InternalDriver.getNewEmbedConnection(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
I examined the error message, it seems that the errors are related to apache.derby and some connection to some database is failed. I do not know what JDBC is actually. I am somewhat familiar with pyspark and I have never been asked to configure any JDBC database, WHY SCALA-API SPARK need it? what should I do to avoid this error? why scala-api spark dataframe need JDBC or any database while scala-api spark RDD doesn't?
For future googler:
I have googled for several hours and still have no idea about how to get rid of this error. But the origin of this problem is very clear: my sparksession enables the support for Hive which then need to specify the database. To solve this problem, we need to disable the support for Hive, since I am running spark on my own mac, it is ok to do this.
So I download the spark source file and build it by myself using the command
./make-distribution.sh --name hadoop-2.6_scala-2.11 --tgz -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests
omits -Phive -Phive-thriftserver.
I tested self-built spark, and metastore_db folder has never been created and so fat so good.
For the detail, please refer to this post: Prebuilt Spark 2.1.0 creates metastore_db folder and derby.log when launching spark-shell

Apache kafka cluster using MapR Spark streaming not working

We are facing a issue while connecting to Apache kafka cluster using MapR Spark streaming (1.6.1). The setup details are as below:
• MapR cluster with Spark 1.6.1 (3 node cluster)
• Apache Kafka cluster v0.8.1.1 (5 node cluster)
We are using ‘spark-streaming-kafka’ library from mapr v1.6.1-ampr-1605. We also tried to run in local mode with apache spark (not mapr spark) this is working very well.
Below is the stack trace of the error:
Exception in thread "main" org.apache.kafka.common.config.ConfigException: No bootstrap urls given in bootstrap.servers
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:57)
at org.apache.kafka.clients.consumer.KafkaConsumer.initializeConsumer(KafkaConsumer.java:606)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1563)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1$$anonfun$1.apply(KafkaCluster.scala:54)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1$$anonfun$1.apply(KafkaCluster.scala:54)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1.apply(KafkaCluster.scala:53)
at org.apache.spark.streaming.kafka.v09.KafkaCluster$$anonfun$getPartitions$1.apply(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.v09.KafkaCluster.withConsumer(KafkaCluster.scala:164)
at org.apache.spark.streaming.kafka.v09.KafkaCluster.getPartitions(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.v09.KafkaUtils$.getFromOffsets(KafkaUtils.scala:421)
at org.apache.spark.streaming.kafka.v09.KafkaUtils$.createDirectStream(KafkaUtils.scala:292)
at org.apache.spark.streaming.kafka.v09.KafkaUtils$.createDirectStream(KafkaUtils.scala:397)
at org.apache.spark.streaming.kafka.v09.KafkaUtils.createDirectStream(KafkaUtils.scala)
at com.cisco.it.log.KafkaDirectStreamin2.main(KafkaDirectStreamin2.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:742)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
PS: we are passing “metadata.broker.list” while creating connection.
Spark streaming application is not able to connect to ZK and not able to get bootstrap URL. This is what my understanding. Or it could be issue of not having correct version of map-r and kafka jar. We took jar from Map-r side but still not working.
We are able to test with apache spark successfully but not able to get it working on mapr.
Any help appericated.
In your stacktrace there are references to org.apache.spark.streaming.kafka.v09 which probably means that it's an implementation using the new consumer API which became available with Kafka 0.9 and won't work with Kafka 0.8.1.1. You should probably try one of the libraries from MapR's spark-streaming-kafka_2.10 instead.