Error at runtime, sbt compilation passes - scala

I have a piece of code that compiles (Scala + Spark 1.6) fine.
I then run it (with Spark 1.6), but it complains about a 1.6 method not being there. What gives ??
simple.sbt:
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
resolvers += "Typesafe Repo" at "http://repo.typesafe.com/typesafe/releases/"
resolvers += "Conjars" at "http://conjars.org/repo"
resolvers += "cljars" at "https://clojars.org/repo/"
mainClass in Compile := Some("Medtronic.Class")
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.0"
libraryDependencies += "org.elasticsearch" % "elasticsearch" % "1.7.2"
libraryDependencies += "org.elasticsearch" %% "elasticsearch-spark" % "2.1.1"
libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "1.8.0"
Compilation:
$ sbt assembly
[info] Loading project definition from /Users/mlieber/projects/spark/test/project
[info] Set current project to Simple Project (in build file:/Users/mlieber/projects/spark/test/)
[info] Updating {file:/Users/mlieber/projects/spark/test/}test...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[warn] Scala version was updated by one of library dependencies:
[warn] * org.scala-lang:scala-library:(2.10.4, 2.10.0) -> 2.10.5
[warn] To force scalaVersion, add the following:
[warn] ivyScala := ivyScala.value map { _.copy(overrideScalaVersion = true) }
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * org.apache.spark:spark-core_2.10:1.4.1 -> 1.6.0
[warn] Run 'evicted' to see detailed eviction warnings
..
[info] Run completed in 257 milliseconds.
[info] Total number of tests run: 0
[info] Suites: completed 0, aborted 0
[info] Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
[info] No tests were executed.
..
[info] Including from cache: spark-core_2.10-1.6.0.jar
..
[info] Including from cache: spark-streaming_2.10-1.6.0.jar
..
[info] Assembly up to date: /Users/mlieber/projects/spark/test/target/scala-2.10/stream_test_1.0.jar
[success] Total time: 98 s, completed Jan 28, 2016 4:05:22 PM
I run with:
./app/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --jars /Users/mlieber/app/elasticsearch-1.7.2/lib/elasticsearch-1.7.2.jar --master local[4] --class "MyClass" ./target/scala-2.10/stream_test_1.0.jar
Compilation error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.streaming.dstream.PairDStreamFunctions.mapWithState(Lorg/apache/spark/streaming/StateSpec;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;)Lorg/apache/spark/streaming/dstream/MapWithStateDStream;
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
..
16/01/28 18:35:23 INFO SparkContext: Invoking stop() from shutdown hook

Your project is suffering from Dependency Hell. What is happening is that SBT resolves the transitive dependencies by default and one of your dependencies (elasticsearch-spark) requires another version of spark-core. From your logs:
[warn] Here are some of the libraries that were evicted:
[warn] * org.apache.spark:spark-core_2.10:1.4.1 -> 1.6.0
Looks like the version required by elasticsearch-spark is not binary compatible with the one being used by your project, then you get an error when your project runs.
There is no error at compile time because the code being compiled (aka your code) is compatible with the version resolved.
Here are some options about how to solve this:
You can try to upgrade elasticsearch-spark to version 2.1.2 and see if it brings a more updated version of spark-core (which could be compatible with your project). Version 2.2.0-rc1 depends on spark-core 1.6.0 and an upgrade to this version will for sure fix the problem, but keep in mind that you will be using a release candidate version.
You can try to downgrade spark-core and spark-streaming to version 1.4.1 (the one being used by elasticsearch-spark) and adapt your code where necessary.

Related

Scala IntelliJ library import errors

I am new to scala and I am trying to import the following libraries in my build.sbt. When IntelliJ does an auto-update I get the following error:
Error while importing sbt project:
List([info] welcome to sbt 1.3.13 (Oracle Corporation Java 1.8.0_251)
[info] loading global plugins from C:\Users\diego\.sbt\1.0\plugins
[info] loading project definition from C:\Users\diego\development\Meetup\Stream-Processing\project
[info] loading settings for project stream-processing from build.sbt ...
[info] set current project to Stream-Processing (in build file:/C:/Users/diego/development/Meetup/Stream-Processing/)
[info] sbt server started at local:sbt-server-80d70f9339b81b4d026a
sbt:Stream-Processing>
[info] Defining Global / sbtStructureOptions, Global / sbtStructureOutputFile and 1 others.
[info] The new values will be used by cleanKeepGlobs
[info] Run `last` for details.
[info] Reapplying settings...
[info] set current project to Stream-Processing (in build file:/C:/Users/diego/development/Meetup/Stream-Processing/)
[info] Applying State transformations org.jetbrains.sbt.CreateTasks from C:/Users/diego/.IntelliJIdea2019.3/config/plugins/Scala/repo/org.jetbrains/sbt-structure-extractor/scala_2.12/sbt_1.0/2018.2.1+4-88400d3f/jars/sbt-structure-extractor.jar
[info] Reapplying settings...
[info] set current project to Stream-Processing (in build file:/C:/Users/diego/development/Meetup/Stream-Processing/)
[warn]
[warn] Note: Unresolved dependencies path:
[error] stack trace is suppressed; run 'last update' for the full output
[error] stack trace is suppressed; run 'last ssExtractDependencies' for the full output
[error] (update) sbt.librarymanagement.ResolveException: Error downloading org.apache.kafka:kafka-clients_2.11:2.3.1
[error] Not found
[error] Not found
[error] not found: C:\Users\diego\.ivy2\local\org.apache.kafka\kafka-clients_2.11\2.3.1\ivys\ivy.xml
[error] not found: https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients_2.11/2.3.1/kafka-clients_2.11-2.3.1.pom
[error] (ssExtractDependencies) sbt.librarymanagement.ResolveException: Error downloading org.apache.kafka:kafka-clients_2.11:2.3.1
[error] Not found
[error] Not found
[error] not found: C:\Users\diego\.ivy2\local\org.apache.kafka\kafka-clients_2.11\2.3.1\ivys\ivy.xml
[error] not found: https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients_2.11/2.3.1/kafka-clients_2.11-2.3.1.pom
[error] Total time: 2 s, completed Jun 28, 2020 12:11:24 PM
[info] shutting down sbt server)
This is my build.sbt file:
name := "Stream-Processing"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.4"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10_2.12
libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.4"
// https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients
libraryDependencies += "org.apache.kafka" %% "kafka-clients" % "2.3.1"
// https://mvnrepository.com/artifact/mysql/mysql-connector-java
libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.18"
// https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "2.4.1"
I made a Scala project just to make sure Spark works and my python project using Kafka works as well so I am sure it's not a spark/kafka problem. Any reason why I am getting that error?
Try removing one % before "kafka-clients":
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.3.1"
The semantics of %% in SBT is that it appends the Scala version being used to the artifact name, so it becomes org.apache.kafka:kafka-clients_2.11:2.3.1 as the error message shows as well. Note the _2.11 suffix.
This is a nice shorthand for Scala libraries, but can get confusing for beginners, when used with Java libs.

Compile error scala project

I'm taking the coursera scala course. I downloaded the sample project, but I can not compile it. Giving me error when I run the console command.
build.sbt:
name := course.value + "-" + assignment.value
scalaVersion := "2.12.4"
scalacOptions ++= Seq("-deprecation")
// grading libraries
libraryDependencies += "junit" % "junit" % "4.10" % Test
// for funsets
libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "1.0.4"
resolvers += "Artima Maven Repository" at "http://repo.artima.com/releases"
// include the common dir
commonSourcePackages += "common"
courseId := "bRPXgjY9EeW6RApRXdjJPw"
Error:
> console
[info] Updating root
[info] Resolved root dependencies
[trace] Stack trace suppressed: run last *:coursierResolution for the full output.
[error] (*:coursierResolution) coursier.ResolutionException: Encountered 1 error(s) in dependency resolution:
[error] org.scalatest:scalatest_2.12:2.2.4:
[error] not found:
[error] /Users/joaonobre/.ivy2/local/org.scalatest/scalatest_2.12/2.2.4/ivys/ivy.xml
[error] /Users/joaonobre/.sbt/preloaded/org.scalatest/scalatest_2.12/2.2.4/ivys/ivy.xml
[error] /Users/joaonobre/.sbt/preloaded/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error] https://repo1.maven.org/maven2/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error] http://repo.artima.com/releases/org/scalatest/scalatest_2.12/2.2.4/scalatest_2.12-2.2.4.pom
[error] Total time: 1 s, completed Dec 22, 2017 4:16:17 PM
scala -version
Scala code runner version 2.12.4 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.
Any idea how to fix it?
Thanks.
As suggested by #laughedelic, Scala 2.12 does not have scalatest-2.2.4. As a result, sbt cannot find the related scalatest pom file. You can pick any of the versions mentioned here.
For instance, try changing scalaTestDependency version to 3.0.5 in the CommonBuild.scala file
lazy val scalaTestDependency = "org.scalatest" %% "scalatest" % "3.0.5"
Here is the course link (Practice Programming Assignment of Week1)

NoClassDefFoundError - GenTraversableOnce$class with Akka

I am getting following error while using Akka. I suspect it could be because of incompatible Akka version but I do not know which one to use.
build.sbt
name := "Example"
version := "1.0"
scalaVersion := "2.11.1"
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
libraryDependencies += "com.typesafe.akka" % "akka-actor_2.10" % "2.2-M1"
Error
>sbt run
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading global plugins from C:\Users\Manu\.sbt\0.13\plugins
[info] Set current project to Example (in build file:/C:/Users/Manu/Documents/manu/programs/scala/from_book/)
[info] Running Upper
[error] (run-main-0) java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at akka.util.Collections$EmptyImmutableSeq$.<init>(Collections.scala:15)
The prefix _2.10 in akka-actor_2.10 indicates you want the version for Scala 2.10, which does not match your scala version above (2.11).
If you want to use Scala 2.11, the latest version of Akka would be
libraryDependencies += "com.typesafe.akka" % "akka-actor_2.11" % "2.5.4"
You can see all the versions here: https://mvnrepository.com/artifact/com.typesafe.akka/akka-actor_2.11

Why is my build.sbt looking for version 2.11 of hadoop-streaming?

Im doing a pluralsight course on Apache Spark and at one point they ask us to setup a dependency on Hadoop-streaming. I've added it to my build.sbt file, but the results I'm getting are unexpected:
Build.sbt
name := "SparkPlayground"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" % "provided"
libraryDependencies += "com.github.scala-incubator.io" %% "scala-io-core" % "0.4.3"
libraryDependencies += "com.github.scala-incubator.io" %% "scala-io-file" % "0.4.3"
libraryDependencies += "org.apache.hadoop" %% "hadoop-streaming" % "2.7.0"
Error message
SBT 'SparkPlayground' project refresh failed
Error:Error:Error while importing SBT project:<br/>...<br/><pre>[info] Resolving org.scala-sbt#task-system;0.13.8 ...
[info] Resolving org.scala-sbt#tasks;0.13.8 ...
[info] Resolving org.scala-sbt#tracking;0.13.8 ...
[info] Resolving org.scala-sbt#cache;0.13.8 ...
[info] Resolving org.scala-sbt#testing;0.13.8 ...
[info] Resolving org.scala-sbt#test-agent;0.13.8 ...
[info] Resolving org.scala-sbt#test-interface;1.0 ...
[info] Resolving org.scala-sbt#main-settings;0.13.8 ...
[info] Resolving org.scala-sbt#apply-macro;0.13.8 ...
[info] Resolving org.scala-sbt#command;0.13.8 ...
[info] Resolving org.scala-sbt#logic;0.13.8 ...
[info] Resolving org.scala-sbt#precompiled-2_8_2;0.13.8 ...
[info] Resolving org.scala-sbt#precompiled-2_9_2;0.13.8 ...
[info] Resolving org.scala-sbt#precompiled-2_9_3;0.13.8 ...
[trace] Stack trace suppressed: run 'last *:update' for the full output.
[trace] Stack trace suppressed: run 'last *:ssExtractDependencies' for the full output.
[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-streaming_2.11;2.6.0: not found
[error] (*:ssExtractDependencies) sbt.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-streaming_2.11;2.6.0: not found
[error] Total time: 13 s, completed Sep 5, 2016 2:05:47 AM
From the error message it looks like sbt is looking for hadoop-streaming_2.11 for some reason, but I have no idea where this 2.11 comes from. I'm pretty new to Scala and sbt so I'm guessing I made some dumb typo somewhere
"If you use groupID %% artifactID % revision rather than groupID % artifactID % revision (the difference is the double %% after the groupID), sbt will add your project’s Scala version to the artifact name."
From SBT manual.
So you should just use % here.

Modifying and Building Spark core

I am trying to make a modification to the Apache Spark source code. I created a new method and added it to the RDD.scala file within the Spark source code I downloaded. After making the modification to RDD.scala, I built Spark using
mvn -Dhadoop.version=2.2.0 -DskipTests clean package
I then created a sample Scala Spark Application as mentioned here
I tried using the new function I created, and I got a compilation error when using sbt to create a jar for Spark. How exactly do I compile Spark with my modification and attach the modified jar to my project? The file I modified is RDD.scala within the core project. I run sbt package from the root dir of my Spark Application Project.
Here is the sbt file:
name := "N Spark"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.3.0"
Here is the error:
sbt package
[info] Loading global plugins from /Users/Raggy/.sbt/0.13/plugins
[info] Set current project to Noah Spark (in build file:/Users/r/Downloads/spark-proj/n-spark/)
[info] Updating {file:/Users/r/Downloads/spark-proj/n-spark/}n-spark...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/r/Downloads/spark-proj/n-spark/target/scala-2.11/classes...
[error] /Users/r/Downloads/spark-proj/n-spark/src/main/scala/SimpleApp.scala:11: value reducePrime is not a member of org.apache.spark.rdd.RDD[Int]
[error] logData.reducePrime(_+_);
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 24 s, completed Apr 11, 2015 2:24:03 AM
UPDATE
Here is the updated sbt file
name := "N Spark"
version := "1.0"
scalaVersion := "2.10"
libraryDependencies += "org.apache.spark" % "1.3.0"
I get the following error for this file:
[info] Loading global plugins from /Users/Raggy/.sbt/0.13/plugins
/Users/Raggy/Downloads/spark-proj/noah-spark/simple.sbt:7: error: No implicit for Append.Value[Seq[sbt.ModuleID], sbt.impl.GroupArtifactID] found,
so sbt.impl.GroupArtifactID cannot be appended to Seq[sbt.ModuleID]
libraryDependencies += "org.apache.spark" % "1.3.0"
Delete libraryDependencies from build.sbt and just copy the custom-built Spark jar to the lib directory in your application project.