ClassNotFoundException while creating Spark Session - scala

I am trying to create a Spark Session in Unit Test case using the below code
val spark = SparkSession.builder.appName("local").master("local").getOrCreate()
but while running the tests, I am getting the below error:
java.lang.ClassNotFoundException: org.apache.hadoop.fs.GlobalStorageStatistics$StorageStatisticsProvider
I have tried to add the dependency but to no avail. Can someone point out the cause and the solution to this issue?

It can be because of two reasons.
1. You may have incompatible versions of spark and Hadoop stacks. For example, HBase 0.9 is incompatible with spark 2.0. It will result in the class/method not found exceptions.
2. You may have multiple version of the same library because of dependency hell. You may need to run the dependency tree to make sure this is not the case.

Related

spark, kafka integration issue: object kafka is not a member of org.apache.spark.streaming

I am receiving error while building my spark application (scala) in IntelliJ IDE.
It is a simple application with uses Kafka Stream for further processing. I have added all the jars and the IDE does not show any unresolved import or code statements.
However, when I try to build the artifact, I get two errors stating that
Error:(13, 35) object kafka is not a member of package
org.apache.spark.streaming
import org.apache.spark.streaming.kafka.KafkaUtils
Error:(35, 60) not found: value KafkaUtils
val messages: ReceiverInputDStream[(String, String)] = KafkaUtils.createStream(streamingContext,zkQuorum,"myGroup",topics)
I have seen similar questions but most of the ppl complain about this issue while submitting to spark. However, I one step behind that and merely building the jar file which would be submitted ultimately to spark. On top I am using IntelliJ IDE and a bit new to spark and scala; lost here.
Below is the snapshot of the IntelliJ Error
IntelliJ Error
Thanks
Omer
The reason is that you need to add spark-streaming-kafka-K.version-Sc.version.jar to your pom.xml and as well as your spark lib directory.

Scala Runtime errors calling program on Spark Job Server

I used spark 1.6.2 and Scala 11.8 to compile my project. The generated uber jar with dependencies is placed inside Spark Job Server (that seems to use Scala 10.4 (SCALA_VERSION=2.10.4 specified in .sh file)
There is no problem in starting the server, uploading context/ app jars. But at runtime, the following errors occur
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror
Why do Scala 2.11 and Spark with scallop lead to "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror"? talks about using Scala 10 to compile the sources. Is it true?
Any suggestions please...
Use scala 2.10.4 to compile your project. Otherwise you need to compile spark with 11 too.

Spark Kafka - Issue while running from Eclipse IDE

I am experimenting with Spark Kafka integration. And I want to test the code from my eclipse IDE. However, I got below error:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.<init>(Pool.scala:28)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<init>(FetchRequestAndResponseStats.scala:60)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<clinit>(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:345)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:403)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at com.capiot.platform.spark.SparkTelemetryReceiverFromKafkaStream.executeStreamingCalculations(SparkTelemetryReceiverFromKafkaStream.java:248)
at com.capiot.platform.spark.SparkTelemetryReceiverFromKafkaStream.main(SparkTelemetryReceiverFromKafkaStream.java:84)
UPDATE:
The versions that I am using are:
scala - 2.11
spark-streaming-kafka- 1.4.1
spark - 1.4.1
Can any one resolve the issue? Thanks in advance.
You have the wrong version of Scala. You need 2.10.x per
https://spark.apache.org/docs/1.4.1/
"For the Scala API, Spark 1.4.1 uses Scala 2.10."
Might be late to help OP, but when using kafka streaming with spark, you need to make sure that you use the right jar file.
For example, in my case, I have scala 2.11 (the minimum required for spark 2.0 which im using), and given that kafka spark requires the version 2.0.0 I have to use the artifact spark-streaming-kafka-0-8-assembly_2.11-2.0.0-preview.jar
Notice my scala version and the artifact version can be seen at 2.11-2.0.0
Hope this helps (someone)
Hope that helps.

NoSuchMethod exception in Flink when using dataset with custom object array

I have a problem with Flink
java.lang.NoSuchMethodError: org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(Lorg/apache/flink/api/common/typeinfo/TypeInformation;)Lorg/apache/flink/api/java/typeutils/ObjectArrayTypeInfo;
at LowLevel.FlinkImplementation.FlinkImplementation$$anon$6.<init>(FlinkImplementation.scala:28)
at LowLevel.FlinkImplementation.FlinkImplementation.<init>(FlinkImplementation.scala:28)
at IRLogic.GmqlServer.<init>(GmqlServer.scala:15)
at it.polimi.App$.main(App.scala:20)
at it.polimi.App.main(App.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
the line with the problem is this one
implicit val regionTypeInformation =
api.scala.createTypeInformation[FlinkDataTypes.FlinkRegionType]
in the FlinkRegionType I have an Array of custom object
I developed the app with the maven plugin in the IDE and everything is working good, but when I move to the version I downloaded from the website I get the error above
I am using Flink 0.9
I was thinking that some library may be missing but I am using maven for handling everything. Moreover running through the code of ObjectArrayTypeInfo.java it doesn't seem to be the problem
A NoSuchMethodError commonly indicates a version mismatch between the libraries a Flink program was compiled with and the system the program is executed on. Especially if the same code works in an IDE setup where compile and execution libraries are the same.
In such case, you should check the version of the Flink dependencies, for example in the Maven POM file.

NoSuchMethodError while running Spark Streaming job on HDP 2.2

I am trying to run a simple streaming job on HDP 2.2 Sandbox but facing java.lang.NoSuchMethodError error. I am able to run SparkPi example on this machine without an issue.
Following are the versions I am using-
<kafka.version>0.8.2.0</kafka.version>
<twitter4j.version>4.0.2</twitter4j.version>
<spark-version>1.2.1</spark-version>
<scala.version>2.11</scala.version>
Code Snippet -
val sparkConf = new SparkConf().setAppName("TweetSenseKafkaConsumer").setMaster("yarn-cluster");
val ssc = new StreamingContext(sparkConf, Durations.seconds(5));
Error text from Node Manager UI-
Exception in thread "Driver" scala.MatchError:
java.lang.NoSuchMethodError:
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; (of class
java.lang.NoSuchMethodError) at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:432)
15/02/12 15:07:23 INFO yarn.ApplicationMaster: Waiting for spark
context initialization ... 1 15/02/12 15:07:33 INFO
yarn.ApplicationMaster: Waiting for spark context initialization ... 2
Job is accepted in YARN but it never goes into RUNNING status.
I was thinking it is due to Scala version differences. I tried changing POM configuration but still not able to fix the error.
Thank you for your help in advance.
Earlier I specified dependency for spark-streaming_2.10 ( Spark compiled with Scala 2.10). I did not specify dependency for Scala compiler itself. It seems Maven automatically pulled 2.11 (Maybe due to some other dependency). When trying to debug this issue, I added a dependency on Scala compiler 2.11. Now after Paul's comment I changed that Scala dependency version to 2.10 and it is working.