Exception on spark test - scala

i build tests on my Scala Spark App, but i get the exception below on Intellij while running the test. Other tests, without SparkContext are running fine. If i run the test on the terminal with "sbt test-only" the tests with sparkcontext works? Need i to specially configure intellij for tests with sparkcontext?
An exception or error caused a run to abort: org.apache.spark.rdd.ShuffledRDD.(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/Partitioner;)V
java.lang.NoSuchMethodError: org.apache.spark.rdd.ShuffledRDD.(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/Partitioner;)V
at org.apache.spark.graphx.impl.RoutingTableMessageRDDFunctions.copartitionWithVertices(RoutingTablePartition.scala:36)
at org.apache.spark.graphx.VertexRDD$.org$apache$spark$graphx$VertexRDD$$createRoutingTables(VertexRDD.scala:457)
at org.apache.spark.graphx.VertexRDD$.fromEdges(VertexRDD.scala:440)
at org.apache.spark.graphx.impl.GraphImpl$.fromEdgeRDD(GraphImpl.scala:336)
at org.apache.spark.graphx.impl.GraphImpl$.fromEdgePartitions(GraphImpl.scala:282)
at org.apache.spark.graphx.GraphLoader$.edgeListFile(GraphLoader.scala:91)

The most likely problem is that spark-core version doesn't match.
Check your sbt file to make sure using the corresponding spark core version you have.
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0"
libraryDependencies += "org.apache.spark" %% "spark-graphx" %"1.1.0"

Related

NoSuchMethodError while creating spark session

I am new to spark. I am just trying to create a spark session in my local but I am getting the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.internal.config.package$.SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD()Lorg/apache/spark/internal/config/ConfigEntry;
at org.apache.spark.sql.internal.SQLConf$.<init>(SQLConf.scala:1011)
at org.apache.spark.sql.internal.SQLConf$.<clinit>(SQLConf.scala)
at org.apache.spark.sql.internal.StaticSQLConf$.<init>(StaticSQLConf.scala:31)
at org.apache.spark.sql.internal.StaticSQLConf$.<clinit>(StaticSQLConf.scala)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:938)
Similar error has been posted over here: Error while using SparkSession or sqlcontext
I am using the same version for spark-core and spark-sql. Here is my build.sbt:
libraryDependencies += ("org.apache.spark" %% "spark-core" % "2.3.1" % "provided")
libraryDependencies += ("org.apache.spark" %% "spark-sql" % "2.3.1" % "provided")
I am using scala version 2.11.8.
Can someone explain why I am still getting this error and how to correct it?
If you see a NoSuchMethod error with something like Lorg/… in the log, it's usually due to Spark version mismatch. Do you have Spark 2.3.1 installed in your system? Make sure that the dependencies match that of your local or cluster’s Spark version.

sbt task failed when add spark dependencies

I'm new to spark and Scala. I encountered a problem when following a tutorial of setting up scala/spark in Intellij Idea.enter image description here When I add the line: libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5" to the file build.sbt, there is always an error. The returned information shows that "sbt task failed" and "Extracting structure failed". I have tried different spark visions but get the same result. How can I solve this problem?
enter image description here

sryza/spark-timeseries: NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;

I have a Scala project that I build with sbt. It uses the sryza/spark-timeseries library.
I am trying to run the following simple code:
val tsAirPassengers = new DenseVector(Array(
112.0,118.0,132.0,129.0,121.0,135.0,148.0,148.0,136.0,119.0,104.0,118.0,115.0,126.0,
141.0,135.0,125.0,149.0,170.0,170.0,158.0,133.0,114.0,140.0,145.0,150.0,178.0,163.0,
172.0,178.0,199.0,199.0,184.0,162.0,146.0,166.0,171.0,180.0,193.0,181.0,183.0,218.0,
230.0,242.0,209.0,191.0,172.0,194.0,196.0,196.0,236.0,235.0,229.0,243.0,264.0,272.0,
237.0,211.0,180.0,201.0,204.0,188.0,235.0,227.0,234.0,264.0,302.0,293.0,259.0,229.0,
203.0,229.0,242.0,233.0,267.0,269.0,270.0,315.0,364.0,347.0,312.0,274.0,237.0,278.0,
284.0,277.0,317.0,313.0,318.0,374.0,413.0,405.0,355.0,306.0,271.0,306.0,315.0,301.0,
356.0,348.0,355.0,422.0,465.0,467.0,404.0,347.0,305.0,336.0,340.0,318.0,362.0,348.0,
363.0,435.0,491.0,505.0,404.0,359.0,310.0,337.0,360.0,342.0,406.0,396.0,420.0,472.0,
548.0,559.0,463.0,407.0,362.0,405.0,417.0,391.0,419.0,461.0,472.0,535.0,622.0,606.0,
508.0,461.0,390.0,432.0
))
val period = 12
val model = HoltWinters.fitModel(tsAirPassengers, period, "additive", "BOBYQA")
It builds fine, but when I try to run it, I get this error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
at com.cloudera.sparkts.models.HoltWintersModel.convolve(HoltWinters.scala:252)
at com.cloudera.sparkts.models.HoltWintersModel.initHoltWinters(HoltWinters.scala:277)
at com.cloudera.sparkts.models.HoltWintersModel.getHoltWintersComponents(HoltWinters.scala:190)
.
.
.
The error occurs on this line:
val model = HoltWinters.fitModel(tsAirPassengers, period, "additive", "BOBYQA")
My build.sbt includes:
name := "acme-project"
version := "0.0.1"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-hive" % "1.6.0",
"net.liftweb" %% "lift-json" % "2.5+",
"com.github.seratch" %% "awscala" % "0.3.+",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.2"
)
I have placed sparkts-0.4.0-SNAPSHOT.jar in the lib folder of my project. (I would have preferred to add a libraryDependency, but spark-ts does not appear to be on Maven Central.)
What is causing this run-time error?
The library requires Scala 2.11, not 2.10, and Spark 2.0, not 1.6.2, as you can see from
<scala.minor.version>2.11</scala.minor.version>
<scala.complete.version>${scala.minor.version}.8</scala.complete.version>
<spark.version>2.0.0</spark.version>
in pom.xml. You can try changing these and seeing if it still compiles, find which older version of sparkts is compatible with your versions, or update your project's Scala and Spark versions (don't miss spark-mllib_2.10 in this case).
Also, if you put the jar into lib folder, you also have to put its dependencies there (and their dependencies, etc.) or into libraryDependencies. Instead, publish sparkts into your local repository using mvn install (IIRC) and add it to libraryDependencies, which will allow SBT to resolve its dependencies.

Using s3n:// with spark-submit

I've written a Spark app that is to run on a cluster using spark-submit. Here's part of my build.sbt.
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1" % "provided" exclude("asm", "asm")
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.1" % "provided"
asm is excluded because I'm using another library which depends on a different version of it. The asm dependency in Spark seems to come from one of Hadoop's dependents, and I'm not using the functionality.
The problem now is that with this setup, saveToTextFile("s3n://my-bucket/dir/file") throws java.io.IOException: No FileSystem for scheme: s3n.
Why is this happening? Shouldn't spark-submit provide the Hadoop dependencies?
I've tried a few things:
Leaving out "provided"
Putting hadoop-aws on the classpath, via a jar and spark.executor.extraClassPath and spark.driver.extraClassPath. This requires doing the same for all of its transitive dependencies though, which can be painful.
Neither really works. Is there a better approach?
I'm using the pre-built spark-1.6.1-bin-hadoop2.6.

JodaTime issues with scala and spark when invoking spark-submit

I am having trouble using JodaTime in a spark scala program. I tried the solutions posted in the past in Stackoverflow and they don't seem to fix the issue for me.
When I try to spark-submit it comes back with an error like the following:
15/09/04 17:51:57 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://sparkDriver#100.100.100.81:56672]
Exception in thread "main" java.lang.NoClassDefFoundError: org/joda/time/DateTimeZone
at com.ttams.xrkqz.GenerateCsv$.main(GenerateCsv.scala:50)
at com.ttams.xrkqz.GenerateCsv.main(GenerateCsv.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
After sbt package, which seems to work fine, I invoke spark-submit like this ...
~/spark/bin/spark-submit --class "com.ttams.xrkqz.GenerateCsv" --master local target/scala-2.10/scala-xrkqz_2.10-1.0.jar
In my build.sbt file, I have
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
libraryDependencies ++= Seq ("joda-time" % "joda-time" % "2.8.2",
"org.joda" % "joda-convert" % "1.7"
)
I have tried multiple versions of joda-time and joda-convert but am not able to use spark-submit from the command line. However, it seems to work when i run within the ide (scalaide).
Let me know if you have any suggestions or ideas.
It seems that you are missing the dependencies in your class path. You could do a few things: One is you could manually add the joda time jars to spark submit with the --jars argument, the other is you could use the assembly plugin and build an assembly jar (you will likely want to mark spark-core as "provided" so it doesn't end up in your assembly) which will have all of your dependencies.