Scala - spark-corenlp - java.lang.ClassNotFoundException - scala

I want to run spark-coreNLP example, but I get an java.lang.ClassNotFoundException error when running spark-submit.
Here is the scala code, from the github example, which I put into an object, and defined a SparkContext.
package analyzer
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._
import sqlContext.implicits._
object Sentiment {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Sentiment")
val sc = new SparkContext(conf)
val input = Seq(
(1, "<xml>Stanford University is located in California. It is a great university.</xml>")
).toDF("id", "text")
val output = input
.select('sen, tokenize('sen).as('words), ner('sen).as('nerTags), sentiment('sen).as('sentiment)) = false)
I am using the build.sbt provided by spark-coreNLP - I only modified the scalaVersion and sparkVerison to my own.
version := "1.0"
scalaVersion := "2.11.8"
initialize := {
val _ = initialize.value
val required = VersionNumber("1.8")
val current = VersionNumber(sys.props("java.specification.version"))
assert(VersionNumber.Strict.isCompatible(current, required), s"Java $required required.")
sparkVersion := "1.5.2"
// change the value below to change the directory where your zip artifact will be created
spDistDirectory := target.value
sparkComponents += "mllib"
spName := "databricks/spark-corenlp"
licenses := Seq("GPL-3.0" -> url(""))
resolvers += Resolver.mavenLocal
libraryDependencies ++= Seq(
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
"" % "protobuf-java" % "2.6.1"
Then, I created my jar by running without issues.
sbt package
Finally, I submit my job to Spark:
spark-submit --class "analyzer.Sentiment" --master local[4] target/scala-2.11/sentimentanalizer_2.11-0.1-SNAPSHOT.jar
But I get the following error:
java.lang.ClassNotFoundException: analyzer.Sentiment
at java.lang.ClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(
at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:641)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
My file Sentiment.scala is correclty located in a package named "analyzer".
$ find .
When I ran the SimpleApp example from the Spark Quick Start , I noticed that MySimpleProject/bin/ contained a SimpleApp.class. MySentimentProject/bin is empty. So I have tried to clean my project (I am using Eclipse for Scala).
I think it is because I need to generate Sentiment.class, but I don't know how to do it - It was done automatically with SimpleApp.scala, and when it ry to run/build with Eclipse Scala, it crashes.

Maybe You should try to add
scalaSource in Compile := baseDirectory.value / "src"
to your build.sbt, cause sbt document reads that "the directory that contains the main Scala sources is by default src/main/scala".
Or just make your source code in this structure
$ find .


NoClassDefFoundError: scala/Product$class in spark app

I am building a Spark application with bash script and I have an only spark-sql and core dependencies in the build.sbt file. So every time I call some rdd methods or convert the data to case class for dataset creation I get this error:
Caused by: java.lang.NoClassDefFoundError: scala/Product$class
I suspect that it is a dependency error. So how should I change my dependencies to fix this?
dependencies list:
import sbt._
object Dependencies {
lazy val scalaCsv = "com.github.tototoshi" %% "scala-csv" % "1.3.5"
lazy val sparkSql = "org.apache.spark" %% "spark-sql" % "2.3.3"
lazy val sparkCore = "org.apache.spark" %% "spark-core" % "2.3.3"
build.sbt file:
import Dependencies._
lazy val root = (project in file(".")).
scalaVersion := "2.11.12",
version := "test"
name := "project",
libraryDependencies ++= Seq(scalaCsv, sparkSql, sparkCore),
mainClass in (Compile, run) := Some("testproject.spark.Main")
I launch spark app with spark 2.3.3 as my spark home directory like this:
$SPARK_HOME/bin/spark-submit \
--class "testproject.spark.Main " \
--master local[*] \
Not sure what was the problem exactly, however, I have recreated the project and moved the source code there. The error disappeared

SparkSubmit Exception (NoClassDefFoundError), even though SBT compiles and packages sucessfully

I want to use ShapeLogic Scala combined with Spark. I am using Scala 2.11.8, Spark 2.1.1 and ShapeLogic Scala 0.9.0.
I sucessfully imported the classes to manage images with Spark. Also, I sucessfully compiled and packed (by using SBT) the following application in order to spark-submitting it to a cluster.
The following application simply opens an image and write it to a folder:
// imageTest.scala
import org.apache.spark.sql.SparkSession
object imageTestObj {
def main(args: Array[String]) {
// Create a Scala Spark Session
val spark = SparkSession.builder().appName("imageTest").master("local").getOrCreate();
val inPathStr = "/home/vitrion/IdeaProjects/imageTest";
val outPathStr = "/home/vitrion/IdeaProjects/imageTest/output";
val empty = new BufferImage[Byte](0, 0, 0, Array());
var a = Array.fill(3)(empty);
for (i <- 0 to 3) {
val imagePath = inPathStr + "IMG_" + "%01d".format(i + 1);
a(i) = LoadImage.loadBufferImage(inPathStr);
val sc = spark.sparkContext;
val imgRDD = sc.parallelize(a); { outBufferImage =>
val imageOpt = BufferedImageConverter.bufferImage2AwtBufferedImage(outBufferImage)
imageOpt match {
case Some(bufferedImage) => {
LoadImage.saveAWTBufferedImage(bufferedImage, "png", outPathStr)
println("Saved " + outPathStr)
case None => {
println("Could not convert image")
This is my SBT file
name := "imageTest"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.1.1" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.1.1" % "provided",
"org.shapelogicscala" %% "shapelogic" % "0.9.0" % "provided"
However, the following error appears. When the package SBT command is executed, it seems like the ShapeLogic Scala dependencies are not included in the application JAR:
[vitrion#mstr scala-2.11]$ pwd
[vitrion#mstr scala-2.11]$ ls
classes imagetest_2.11-0.1.jar resolution-cache
[vitrion#mstr scala-2.11]$ spark-submit --class imageTestObj imagetest_2.11-0.1.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org/shapelogic/sc/image/BufferImage
at imageTestObj.main(imageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
at java.lang.ClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
... 10 more
I hope someone can help me to solve it?
Thank you very much
This error says everything:
Caused by: java.lang.ClassNotFoundException:
at java.lang.ClassLoader.loadClass(
Add this missing dependency (ShapeLogic class) ------>, that should resolve the issue. Maven or SBT both should give the same error, if you miss this dependency!!
Since you are working on the cluster mode, you can directly add dependencies using --jars on spark-submit, please follow this post for more details.
These threads might help you:
Your dependencies listed in sbt files will not be included by default in your jar submitted to spark, so for sbt you have to use a plugin to build an uber/fat jar that would include shapelogicscala classes. You can use this link on SO, How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA? to see how you can manage this with sbt.

Scala - Spark-corenlp - java.lang.NoClassDefFoundError

I want to run spark-coreNLP example, but I get an java.lang.NoClassDefFoundError error when running spark-submit.
Here is the scala code, from the github example, which I put into an object, and defined a SparkContext and SQLContext
package main.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SQLContext
import com.databricks.spark.corenlp.functions._
object SQLContextSingleton {
#transient private var instance: SQLContext = _
def getInstance(sparkContext: SparkContext): SQLContext = {
if (instance == null) {
instance = new SQLContext(sparkContext)
object Sentiment {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Sentiment")
val sc = new SparkContext(conf)
val sqlContext = SQLContextSingleton.getInstance(sc)
import sqlContext.implicits._
val input = Seq((1, "<xml>Stanford University is located in California. It is a great university.</xml>")).toDF("id", "text")
val output = input
.select('sen, tokenize('sen).as('words), ner('sen).as('nerTags), sentiment('sen).as('sentiment)) = false)
And my build.sbt (modified from here)
version := "1.0"
scalaVersion := "2.10.6"
scalaSource in Compile := baseDirectory.value / "src"
initialize := {
val _ = initialize.value
val required = VersionNumber("1.8")
val current = VersionNumber(sys.props("java.specification.version"))
assert(VersionNumber.Strict.isCompatible(current, required), s"Java $required required.")
sparkVersion := "1.5.2"
// change the value below to change the directory where your zip artifact will be created
spDistDirectory := target.value
sparkComponents += "mllib"
// add any sparkPackageDependencies using sparkPackageDependencies.
// e.g. sparkPackageDependencies += "databricks/spark-avro:0.1"
spName := "databricks/spark-corenlp"
licenses := Seq("GPL-3.0" -> url(""))
resolvers += Resolver.mavenLocal
libraryDependencies ++= Seq(
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
"" % "protobuf-java" % "2.6.1"
I run sbt package without issue, then run Spark with
spark-submit --class "main.scala.Sentiment" --master local[4] target/scala-2.10/sentimentanalizer_2.10-1.0.jar
The program fails after throwing an exception:
Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/simple/Sentence
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:75)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:74)
Things I tried:
I work with Eclipse for Scala, and I made sure to add all the jars from stanford-corenlp as suggested here
I suspect that I need to add something to my command line when submitting the job to Spark, any thoughts?
I was on the right track that my command line was missing something.
spark-submit needs to have all the stanford-corenlp added:
--jars $(echo stanford-corenlp/*.jar | tr ' ' ',')
--class "main.scala.Sentiment"
--master local[4] target/scala-2.10/sentimentanalizer_2.10-1.0.jar

ZeroMQ word count app gives error when you compile in spark 1.2.1

I'm trying to setup zeromq data stream to spark. Basically I took the ZeroMQWordCount.scala app an tried to recompile it and run it.
I have zeromq 2.1 installed, and spark 1.2.1
here is my scala code:
package org.apache.spark.examples.streaming
import akka.zeromq._
import akka.zeromq.Subscribe
import akka.util.ByteString
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.zeromq._
import scala.language.implicitConversions
import org.apache.spark.SparkConf
object ZmqBenchmark {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println("Usage: ZmqBenchmark <zeroMQurl> <topic>")
val Seq(url, topic) = args.toSeq
val sparkConf = new SparkConf().setAppName("ZmqBenchmark")
// Create the context and set the batch size
val ssc = new StreamingContext(sparkConf, Seconds(2))
def bytesToStringIterator(x: Seq[ByteString]) = (
// For this stream, a zeroMQ publisher should be running.
val lines = ZeroMQUtils.createStream(ssc, url, Subscribe(topic), bytesToStringIterator _)
val words = lines.flatMap(_.split(" "))
val wordCounts = => (x, 1)).reduceByKey(_ + _)
and this is my .sbt file for dependencies:
name := "ZmqBenchmark"
version := "1.0"
scalaVersion := "2.10.4"
resolvers += "Typesafe Repository" at ""
resolvers += "Sonatype (releases)" at ""
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.2.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.2.1"
libraryDependencies += "org.apache.spark" % "spark-streaming-zeromq_2.10" % "1.2.1"
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.2.0"
libraryDependencies += "org.zeromq" %% "zeromq-scala-binding" % "0.0.6"
libraryDependencies += "com.typesafe.akka" % "akka-zeromq_2.10.0-RC5" % "2.1.0-RC6"
libraryDependencies += "org.apache.spark" % "spark-examples_2.10" % "1.1.1"
libraryDependencies += "org.spark-project.zeromq" % "zeromq-scala-binding_2.11" % "0.0.7-spark"
The application compiles without any errors using sbt package, however when i run the application with spark submit, i get an error:
zaid#zaid-VirtualBox:~/spark-1.2.1$ ./bin/spark-submit --master local[*] ./zeromqsub/example/target/scala-2.10/zmqbenchmark_2.10-1.0.jar tcp:// hello
15/03/06 10:21:11 WARN Utils: Your hostname, zaid-VirtualBox resolves to a loopback address:; using instead (on interface eth0)
15/03/06 10:21:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/zeromq/ZeroMQUtils$
at ZmqBenchmark$.main(ZmqBenchmark.scala:78)
at ZmqBenchmark.main(ZmqBenchmark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.zeromq.ZeroMQUtils$
at Method)
at java.lang.ClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
... 9 more
Any ideas why this happens? i know the app should work because when i run the same example using the $/run-example $ script and point to the ZeroMQWordCount app from spark, it runs without the exception. My guess is the sbt file is incorrect, what else do I need to have in the sbt file?
You are using ZeroMQUtils.createStream but the line
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.zeromq.ZeroMQUtils
shows that the bytecode for ZeroMQUtils was not located. When the spark examples are run, they are run against a jar file (like spark-1.2.1/examples/target/scala-2.10/spark-examples-1.2.1-hadoop1.0.4.jar) including the ZeroMQUtils class. A solution would be to use the --jars flag so spark-submit command can find the bytecode. In your case, this could be something like
spark-submit --jars /opt/spark/spark-1.2.1/examples/target/scala-2.10/spark-examples-1.2.1-hadoop1.0.4.jar--master local[*] ./zeromqsub/example/target/scala-2.10/zmqbenchmark_2.10-1.0.jar tcp:// hello
assuming that you have installed spark-1.2.1 in /opt/spark.

Spark Scala Error: Exception in thread "main" java.lang.ClassNotFoundException

I tried to run a spark job on a yarn cluster written in Scala, and run into this error:
[!##$% spark-1.0.0-bin-hadoop2]$ export HADOOP_CONF_DIR="/etc/hadoop/conf"
[!##$% spark-1.0.0-bin-hadoop2]$ ./bin/spark-submit --class "SimpleAPP" \
> --master yarn-client \
> test_proj/target/scala-2.10/simple-project_2.10-0.1.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Exception in thread "main" java.lang.ClassNotFoundException: SimpleAPP
at Method)
at java.lang.ClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:289)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
And this is my sbt file:
[!##$% test_proj]$ cat simple.sbt
name := "Simple Project"
version := "0.1"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
// We need to be able to write Avro in Parquet
// libraryDependencies += "com.twitter" % "parquet-avro" % "1.3.2"
resolvers += "Akka Repository" at ""
this is my SimpleApp.scala program, it is the canonical one:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp{
def main(args: Array[String]) {
val logFile = "/home/myname/spark-1.0.0-bin-hadoop2/" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
sbt package is as following:
[!##$% test_proj]$ sbt package
[info] Set current project to Simple Project (in build file:/home/myname/spark-1.0.0-bin-hadoop2/test_proj/)
[info] Compiling 1 Scala source to /home/myname/spark-1.0.0-bin-hadoop2/test_proj/target/scala-2.10/classes...
[info] Packaging /home/myname/spark-1.0.0-bin-hadoop2/test_proj/target/scala-2.10/simple-project_2.10-0.1.jar ...
[info] Done packaging.
[success] Total time: 12 s, completed Mar 3, 2015 10:57:12 PM
As suggested, I did the following:
jar tf simple-project_2.10-0.1.jar | grep .class
Something as followed shows up:
Verify if the name is SimpleAPP in the jar.
Do this:
jar tf simple-project_2.10-0.1.jar | grep .class
And check if the name of the class is right.