spark-submit on standalone cluster complain about scala-2.10 jars not exist - scala

I'm new to Spark and downloaded a pre-compiled Spark binaries from Apache (Spark-2.1.0-bin-hadoop2.7)
When submitting my scala (2.11.8) uber jar the cluster throw and error:
java.lang.IllegalStateException: Library directory '/root/spark/assembly/target/scala-2.10/jars' does not exist; make sure Spark is built
I'm not running Scala 2.10 and Spark isn't compiled (as much as I know) with Scala 2.10
Could it be that one of my dependencies is based on Scala 2.10 ?
Any suggestions what can be wrong ?

Note sure what is wrong with the pre-built spark-2.1.0 but I've just downloaded spark 2.2.0 and it is working great.

Try setting SPARK_HOME="location to your spark installation" on your system or IDE

Related

Apache Spark 3 and backward compatibility?

We have several Spark applications running on production developed using Spark 2.4.1 (Scala 2.11.12).
For couple of our new Spark jobs,we are considering utilizing features of DeltaLake.For this we need to use Spark 2.4.2 (or higher).
My questions are:
If we upgrade our Spark cluster to 3.0.0, can our 2.4.1 applications still run on the new cluster (without recompile)?
If we need to recompile our previous Spark jobs with Spark 3, are they source compatible or do they need any migration?
There are some breaking changes in Spark 3.0.0, including source incompatible change and binary incompatible changes. See https://spark.apache.org/releases/spark-release-3-0-0.html. And there are also some source and binary incompatible changes between Scala 2.11 and 2.12, so you may also need to update codes because of Scala version change.
However, only do Delta Lake 0.7.0 and above require Spark 3.0.0. If upgrading to Spark 3.0.0 requires a lot of work, you can use Delta Lake 0.6.x or below. You just need to upgrade Spark to 2.4.2 or above in 2.4.x line. They should be source and binary compatible.
You can cross compile projects Spark 2.4 projects with Scala 2.11 and Scala 2.12. The Scala 2.12 JARs should generally work for Spark 3 applications. There are edge cases when using a Spark 2.4/Scala 2.12 JAR won't work properly on a Spark 3 cluster.
It's best to make a clean migration to Spark 3/Scala 2.12 and cut the cord with Spark 2/Scala 2.11.
Upgrading can be a big pain, especially if your project has a lot of dependencies. For example, suppose your project depends on spark-google-spreadsheets, a project that's not built with Scala 2.12. With this dependency, you won't be able to easily upgrade your project to Scala 2.12. You'll need to either compile spark-google-spreadsheets with Scala 2.12 yourself or drop the dependency. See here for more details on how to migrate to Spark 3.

Scala dependency in Spark/Pyspark

I want to install Spark in a machine and I don't know if I need Scala as a dependency.
I have installed Java (1.8) as the documentation saids. But I don't know If I need any other dependencies
Scala is included in Spark's distribution, so for using Scala in Spark REPL you don't need to install Scala. Here are scala jars included.
If you are setting up IDE for Spark project then you will need Scala as dependency.

Why when Maven Build Works good but adding Spark Jar as external Jars gives a compile error “object Apache is not a member of package org”

On Eclipse, while setting up spark , even after adding external jars to build path to spark-2.4.3-bin-hadoop2.7/jars/<_all.jar>,
Complier complains about '“object apache is not a member of package org''
Yes, Building dependencies via Maven or SBT would fix it. A question is asked
scalac compile yields "object apache is not a member of package org"
But Question over here is , WHY the traditional way is failing like this ?
If we reffer here , Scala/Spark version compatibility We could see a similar issue. The problem is Scala is NOT backward compatible. Hence each Spark module is complied against specific Scala library. But when we run from eclipse, the eclipse Scala environment may not be compatible that particular scala version of which we have the Spark libraries set up.

Scala Runtime errors calling program on Spark Job Server

I used spark 1.6.2 and Scala 11.8 to compile my project. The generated uber jar with dependencies is placed inside Spark Job Server (that seems to use Scala 10.4 (SCALA_VERSION=2.10.4 specified in .sh file)
There is no problem in starting the server, uploading context/ app jars. But at runtime, the following errors occur
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror
Why do Scala 2.11 and Spark with scallop lead to "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror"? talks about using Scala 10 to compile the sources. Is it true?
Any suggestions please...
Use scala 2.10.4 to compile your project. Otherwise you need to compile spark with 11 too.

Spark Kafka - Issue while running from Eclipse IDE

I am experimenting with Spark Kafka integration. And I want to test the code from my eclipse IDE. However, I got below error:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.<init>(Pool.scala:28)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<init>(FetchRequestAndResponseStats.scala:60)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<clinit>(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:345)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:403)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at com.capiot.platform.spark.SparkTelemetryReceiverFromKafkaStream.executeStreamingCalculations(SparkTelemetryReceiverFromKafkaStream.java:248)
at com.capiot.platform.spark.SparkTelemetryReceiverFromKafkaStream.main(SparkTelemetryReceiverFromKafkaStream.java:84)
UPDATE:
The versions that I am using are:
scala - 2.11
spark-streaming-kafka- 1.4.1
spark - 1.4.1
Can any one resolve the issue? Thanks in advance.
You have the wrong version of Scala. You need 2.10.x per
https://spark.apache.org/docs/1.4.1/
"For the Scala API, Spark 1.4.1 uses Scala 2.10."
Might be late to help OP, but when using kafka streaming with spark, you need to make sure that you use the right jar file.
For example, in my case, I have scala 2.11 (the minimum required for spark 2.0 which im using), and given that kafka spark requires the version 2.0.0 I have to use the artifact spark-streaming-kafka-0-8-assembly_2.11-2.0.0-preview.jar
Notice my scala version and the artifact version can be seen at 2.11-2.0.0
Hope this helps (someone)
Hope that helps.