NoSuchMethodError while creating spark session - scala

I am new to spark. I am just trying to create a spark session in my local but I am getting the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.internal.config.package$.SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD()Lorg/apache/spark/internal/config/ConfigEntry;
at org.apache.spark.sql.internal.SQLConf$.<init>(SQLConf.scala:1011)
at org.apache.spark.sql.internal.SQLConf$.<clinit>(SQLConf.scala)
at org.apache.spark.sql.internal.StaticSQLConf$.<init>(StaticSQLConf.scala:31)
at org.apache.spark.sql.internal.StaticSQLConf$.<clinit>(StaticSQLConf.scala)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:938)
Similar error has been posted over here: Error while using SparkSession or sqlcontext
I am using the same version for spark-core and spark-sql. Here is my build.sbt:
libraryDependencies += ("org.apache.spark" %% "spark-core" % "2.3.1" % "provided")
libraryDependencies += ("org.apache.spark" %% "spark-sql" % "2.3.1" % "provided")
I am using scala version 2.11.8.
Can someone explain why I am still getting this error and how to correct it?

If you see a NoSuchMethod error with something like Lorg/… in the log, it's usually due to Spark version mismatch. Do you have Spark 2.3.1 installed in your system? Make sure that the dependencies match that of your local or cluster’s Spark version.

Related

Spark Cassandra Join ClassCastException

I am trying to join two Cassandra tables with:
t1.join(t2, Seq("some column"), "left")
I am getting the below error message:
Exception in thread "main" java.lang.ClassCastException: scala.Tuple8 cannot be cast to scala.Tuple7 at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.apply(CassandraDirectJoinStrategy.scala:27)
I am using cassandra v3.11.13 and Spark 3.3.0. The code dependencies:
libraryDependencies ++= Seq(
"org.scalatest" %% "scalatest" % "3.2.11" % Test,
"com.github.mrpowers" %% "spark-fast-tests" % "1.0.0" % Test,
"graphframes" % "graphframes" % "0.8.1-spark3.0-s_2.12" % Provided,
"org.rogach" %% "scallop" % "4.1.0" % Provided,
"org.apache.spark" %% "spark-sql" % "3.1.2" % Provided,
"org.apache.spark" %% "spark-graphx" % "3.1.2" % Provided,
"com.datastax.spark" %% "spark-cassandra-connector" % "3.2.0" % Provided)
Your help is greatly appreciated
The Spark Cassandra connector does not support Apache Spark 3.3.0 yet and I suspect that is the reason it's not working though I haven't done any verification myself.
Support for Spark 3.3.0 has been requested in SPARKC-686 but the amount of work required is significant so stay tuned.
The latest supported Spark version is 3.2 using spark-cassandra-connector 3.2. Cheers!
this commit
adds initial support for Spark 3.3.x, although it is awaiting RC's/publish at the time of this comment, so you would need to build and package the jars yourself for the time being to begin making use of them to resolve the above error when using spark 3.3. This could be a good opportunity to provide any feedback on any subsequent RC's, as an active user.
I will update this comment when RC's/stable releases are available, which should resolve the above issue for others hitting this issue. Unfortunately, I don't have enough reputation to add this a comment to thread above.

fs.s3a.aws.credentials.provider java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider not found

I'm trying to read data from S3 using spark using following dependencies and configurations:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.2.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.2.1"
spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", config.s3AccessKey)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", config.s3SecretKey)
spark.sparkContext.hadoopConfiguration.set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")
I'm getting error as
java.io.IOException: From option fs.s3a.aws.credentials.provider java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider not found
It was working fine with older version of spark and hadoop. To be exact, i was previously using spark version 2.4.8 and hadoop version 2.8.5
I was looking forward to use the latest EMR version with spark 3.2.0 and hadoop 3.2.1. This issue basically was faced mainly because of hadoop 3.2.1 and hence the only option was to use older version of EMR. Spark 2.4.8 and hadoop 2.10.1 worked for me.

libraryDependencies Spark in build.sbt error (IntelliJ)

I am trying to learning Scala with Spark. I am following a tutorial but I am having an error, when I try to import the library dependencies of Spark:
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3"
I am getting the following error:
And I have 3 Unkwons artifacts.
What could be the problem here?
My code is so simple, it is just a Hello World.
Probably you need to add to your build.sbt:
resolvers += "spark-core" at "https://mvnrepository.com/artifact/org.apache.spark/spark-core"
Please note that this library is supported only for Scala 2.11 and Scala 2.12.

Runtime error on Scala Spark 2.0 code

I have the following code:
import org.apache.spark.sql.SparkSession
.
.
.
val spark = SparkSession
.builder()
.appName("PTAMachineLearner")
.getOrCreate()
When it executes, I get the following error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.apache.spark.sql.SparkSession$Builder.config(SparkSession.scala:750)
at org.apache.spark.sql.SparkSession$Builder.appName(SparkSession.scala:741)
at com.acme.pta.accuracy.ml.PTAMachineLearnerModel.getDF(PTAMachineLearnerModel.scala:52)
The code compiles and builds just fine. Here are the dependencies:
scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
// Spark dependencies
"org.apache.spark" %% "spark-hive" % "2.1.1",
"org.apache.spark" %% "spark-mllib" % "2.1.1",
// Third-party libraries
"net.sf.jopt-simple" % "jopt-simple" % "5.0.3",
"com.amazonaws" % "aws-java-sdk" % "1.3.11",
"org.apache.logging.log4j" % "log4j-api" % "2.8.2",
"org.apache.logging.log4j" % "log4j-core" % "2.8.2",
"org.apache.logging.log4j" %% "log4j-api-scala" % "2.8.2",
"com.typesafe.play" %% "play-ahc-ws-standalone" % "1.0.0-M9",
"net.liftweb" % "lift-json_2.11" % "3.0.1"
)
I am executing the code like this:
/Users/paulreiners/spark-2.1.1-bin-hadoop2.7/bin/spark-submit \
--class "com.acme.pta.accuracy.ml.CreateRandomForestRegressionModel" \
--master local[4] \
target/scala-2.11/acme-pta-accuracy-ocean.jar \
I had this all running with Spark 1.6. I'm trying to upgrade to Spark 2, but am missing something.
The class ArrowAssoc is indeed present in your Scala library. See this Scala doc . But you are getting error in Spark library. So obviously, Spark version you are using is not compatible with Scala ver 2.11 as it is probably compiled with older Scala version. If you see this older Scala API doc , the ArrowSpec has changed a lot. e.g. it is implicit now with lots of implicit dependencies. Make sure your Spark and Scala version are compatible.
I found the problem. I had Scala 2.10.5 installed on my system. So either sbt or spark-submit was calling that and expecting 2.11.11.
I had the same issue. But, in my case, the problem was that I deployed the jar in Spark1.x cluster where as the code is written in Spark2.x.
So, if you see this error, just check the versions of spark & scala used in your code against the respective installed versions.

Exception on spark test

i build tests on my Scala Spark App, but i get the exception below on Intellij while running the test. Other tests, without SparkContext are running fine. If i run the test on the terminal with "sbt test-only" the tests with sparkcontext works? Need i to specially configure intellij for tests with sparkcontext?
An exception or error caused a run to abort: org.apache.spark.rdd.ShuffledRDD.(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/Partitioner;)V
java.lang.NoSuchMethodError: org.apache.spark.rdd.ShuffledRDD.(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/Partitioner;)V
at org.apache.spark.graphx.impl.RoutingTableMessageRDDFunctions.copartitionWithVertices(RoutingTablePartition.scala:36)
at org.apache.spark.graphx.VertexRDD$.org$apache$spark$graphx$VertexRDD$$createRoutingTables(VertexRDD.scala:457)
at org.apache.spark.graphx.VertexRDD$.fromEdges(VertexRDD.scala:440)
at org.apache.spark.graphx.impl.GraphImpl$.fromEdgeRDD(GraphImpl.scala:336)
at org.apache.spark.graphx.impl.GraphImpl$.fromEdgePartitions(GraphImpl.scala:282)
at org.apache.spark.graphx.GraphLoader$.edgeListFile(GraphLoader.scala:91)
The most likely problem is that spark-core version doesn't match.
Check your sbt file to make sure using the corresponding spark core version you have.
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0"
libraryDependencies += "org.apache.spark" %% "spark-graphx" %"1.1.0"