NoClassDefFoundError while using scopt OptionParser with Spark - scala

I am using Apache Spark version 1.2.1 and Scala version 2.10.4. I am trying to get the example MovieLensALS working. However, I am running into errors with scopt library which is a requirement in the code. Any help would be appreciated.
My build.sbt is as follows:
name := "Movie Recommender System"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1"
libraryDependencies += "org.apache.spark" %% "spark-graphx" % "1.2.1"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.1"
libraryDependencies += "com.github.scopt" %% "scopt" % "3.2.0"
resolvers += Resolver.sonatypeRepo("public")
and the errors I am getting are the following:
Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser
at MovieLensALS.main(MovieLensALS.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: scopt.OptionParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 8 more
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
On running sbt assembly to build the jar, I receive the following errors:
[error] Not a valid command: assembly
[error] Not a valid project ID: assembly
[error] Expected ':' (if selecting a configuration)
[error] Not a valid key: assembly
[error] assembly
[error] ^
Edit: As per Justin Piphony's suggestion the solution that was listed in sbt's GitHub page helped fix this error. Basically creating a file assembly.sbt in the project/ directory and adding the line
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
Note that the version should be added according to the version in use.

You need to package scopt in your jar. sbt doesn't do this by default. To create this fat jar, you need to use sbt-assembly

If you are using maven to package your spark project, you need to add maven-assembly-plugin plugin which helps to package dependencies:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.5</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<!-- this is used for inheritance merges -->
<phase>package</phase>
<!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>

Related

Import "google/protobuf/wrappers.proto" was not found scala

I have a .proto file which Imports google/protobuf/wrappers.proto
while I run Scalapbc to generate the relevant scala code out of it it gives Import google/protobuf/wrappers.proto not found error.
as a workaround for now I have kept the wrappers.proto file in file system for now inside --proto_path
But I need to come up with a fix wherein I need add the relevant dependencies in build.sbt / pom.xml to unpack the jar containing default proto files (such as wrappers.proto) before calling Scalapbc
All the required dependencies are provided by scalabp runtime
import sbtprotoc.ProtocPlugin.ProtobufConfig
import scalapb.compiler.Version.scalapbVersion
libraryDependencies ++= Seq(
"com.thesamet.scalapb" %% "scalapb-runtime" % scalapbVersion,
"com.thesamet.scalapb" %% "scalapb-runtime" % scalapbVersion % ProtobufConfig
)
I uses the AkkaGrpcPugin for sbt which seems to handle all the dependencies.
In plugins.sbt I have
addSbtPlugin("com.lightbend.akka.grpc" % "sbt-akka-grpc" % "1.1.1")
In build.sbt I have
enablePlugins(AkkaGrpcPlugin)
In automatically picks up the files in src/main/protobuf for the project and generates the appropriate stub files. I can import standard files, e.g.
import "google/protobuf/timestamp.proto";
For multi-project builds I use something like this:
lazy val allProjects = (project in file("."))
.aggregate(util, grpc)
lazy val grpc =
project
.in(file("grpc"))
.settings(
???
)
.enablePlugins(AkkaGrpcPlugin)
lazy val util =
project
.in(file("util"))
.settings(
???
)
.dependsOn(grpc)
Thanks everyone for your answers. I really appreciate it.
I was able to solve the dependency issue by
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack</id>
<phase>prepare-package</phase>
<goals>
<goal>unpack</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.10.0</version>
<type>jar</type>
<includes>path/to/Files.whatsoever</includes>
<outputDirectory>${project.build.directory}/foldername</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
This generates the required proto files inside target folder

Running Scala Maven Project

I am a scala beginner, and was running a starter project on Maven and using IntelliJ as IDE.
This is the link to project on github which I am using Github project and I compiled the project against OpenJDK8.
The HelloJava class runs successfully, however, when I try running the HelloScala class I come across the following error:
java -cp scala-maven-example-1.0.0-SNAPSHOT.jar com.jesperdj.example.HelloScala
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Function0
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scala.Function0
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Scala has its own runtime libraries above JVM. A compiled Scala .class has imported some classes from Scala runtime libraries. When you try to run a Scala .class file, you need to append the Scala runtime to the classpath.
If you are running inside IntelliJ IDEA, the Scala Plugin will automatically do this, but when you run java from command line, you should do this yourself.
If you are using Maven, then you can add a <plugin>. From Scala Docs -> Scala with Maven -> Creating a Jar:
<build>
...
<plugins>
...
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.your-package.MainClass</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
After adding this, mvn package will also create [artifactId]-[version]-jar-with-dependencies.jar under target. Note: this will also copy the Scala library into your Jar. This is normal. Be careful that your dependencies use the same version of Scala, or you will quickly end up with a massive Jar.
To run your programs on Intellij you need to install scala sdk along with JDK.
Once you have scala SDK in your Intellij classpath you are good to go with scala coding.
Tick the checkbox 'include dependencies with "Provided" scope' under: 'Run\Debug Configurations'.
(Helps if yours is a maven project and you added the 'scala-library' dependency scope as 'Provided')
'Run\Debug Configurations'

sbt resolver for confluent platform

I am unable to add confluent repo in my sbt. I looked at pom example and found definition of adding repo in maven.
<repositories>
<repository>
<id>confluent</id>
<url>https://packages.confluent.io/maven/</url>
</repository>
<!-- further repository entries here -->
</repositories>
and dependencies
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0-cp1</version>
</dependency>
<!-- further dependency entries here -->
</dependencies>
I used
resolvers += Resolver.url("confluent", url("http://packages.confluent.io/maven/")) in build.sbt`
and declared dependencies as
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"
I still get
::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.kafka#kafka-clients;2.0.0-cp1: not found
[warn] :: org.apache.kafka#kafka_2.12;2.0.0-cp1: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
what should be the correct way of doing it?
My build.sbt
name := "kafka-Test"
version := "1.0"
scalaVersion := "2.12.3"
resolvers += Resolver.url("confluent", url("https://packages.confluent.io/maven/"))
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"
The problem is in your resolver definition. It should be:
resolvers += "confluent" at "https://packages.confluent.io/maven/"
I just tried this and it works.

CollectionAccumulator dependency is not resolving in Intellij

I am new to Scala and Spark. I am writing a sample program on CollectionAccumulator. But the dependency for the CollectionAccumulator is not resolving in Intellij.
val slist : CollectionAccumulator[String] = new CollectionAccumulator()
sc.register(slist,"Myslist")
Please find the piece of code used. I tried the Accumulator[String] by replacing the CollectionAccumulator[String]. The Accumulator is getting resolved
I have imported the following:
import org.apache.log4j._
import org.apache.spark.{Accumulator, SparkContext}
import org.apache.spark.util._
Dependencies in pom.xml:
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0-cdh5.3.1</version>
</dependency>
Please help..
CollectionAccumulator are supported in spark 2.0+ version. You are on spark 1.2.0 cdh version.
Reference: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.util.CollectionAccumulator
Replace your spark dependency with
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0.cloudera1</version>
</dependency>
Also make sure that "${scala.version}" resolves to scala 2.11
CollectionAccumulator comes only after spark v2.0.0, simply update your spark version to 2.0+
example build.sbt
name := "smartad-spark-songplaycount"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
example sbt console on above .sbt
sbt console
scala> import org.apache.spark.util.CollectionAccumulator
import org.apache.spark.util.CollectionAccumulator
scala> val slist : CollectionAccumulator[String] = new CollectionAccumulator()
slist: org.apache.spark.util.CollectionAccumulator[String] = Un-registered Accumulator: CollectionAccumulator

Setting target JVM in SBT

How can I set target JVM version in SBT?
In Maven (with maven-scala-plugin) it can be done as follows:
<plugin>
...
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
You can specify compiler options in the project definition:
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
you have to add(in your build.sbt file):
scalacOptions += "-target:jvm-1.8"
otherwise it won't work.
As suggested by others in comments, the current sbt version (1.0, 0.13.15) uses the following notation for setting source and target JVMs.
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")