Spark SQL - PostgreSQL JDBC Classpath Issues - postgresql

I’m having an issue connecting Spark SQL to a PostgreSQL data source. I’ve downloaded the Postgres JDBC jar and included it in an uber jar using sbt-assembly.
My (failing) source code:
https://gist.github.com/geowa4/a9bc238ca7c372b95267.
I’ve also tried using sqlContext.jdbc() preceded with classOf[org.postgresql.Driver] as well. It appears the driver can access the Driver just fine.
Any help would be much appreciated. Thanks.
SimpleApp.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val commits = sqlContext.load("jdbc", Map(
"url" -> "jdbc:postgresql://192.168.59.103:5432/postgres",
"dbtable" -> "commits",
"driver" -> "org.postgresql.Driver"))
commits.select("message").show(1)
}
}
simple.sbt:
name := "simple-project"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1" % "provided"
libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41"
output (Edited):
Exception in thread "main" java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:102)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at SimpleApp$.main(SimpleApp.scala:17)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
EDIT: I changed the Scala version to 2.10.5 and the output changed to this. I feel like I'm making progress.

There is a problem with general problem with JDBC, where the primordial classloader must know about the jar. In Spark 1.3 this can be addressed using the SPARK_CLASSPATH option as described here:
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#jdbc-to-other-databases
In Spark 1.4, this should be fixed by #5782.

1) Copy file into your jar location
2) Add jar in path as follows
spark-submit --jars /usr/share/java/postgresql-jdbc.jar --class com.examples.WordCount .. .. ..

Related

No such method running forEach in Scala job on Apache Spark

I'm running a very simple Scala job on Apache Spark 2.4.5 and when I try and iterate over the columns in a DataFrame and print there names I get the following stack trace correlating to the line where I try and call the for each.
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
at SimpleApp$.main(SimpleApp.scala:10)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am running Apache Spark in Docker using this image: bde2020/spark-master:2.4.5-hadoop2.7
I am compiling my app using scalaVersion := "2.12.11"
Full application code is:
import org.apache.spark.sql.{Row, SparkSession}
object SimpleApp {
def main(args: Array[String]) {
val file = "/spark/jobs/job1/data/test.json"
val spark = SparkSession.builder.appName("Simple Application Scala").getOrCreate()
val testData = spark.read.json(file)
println("prints fine")
testData.columns.foreach(x => println(x))
spark.stop()
}
build.sbt file is
name := "spark-scala"
version := "0.1"
scalaVersion := "2.12.11"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"
I am at a loss, I have checked and checked I am running the correct versions of things but suspect I must of missed something!
After much head banging discovered the image actually uses Scala 2.11.12 which is deprecated with Spark 2.4.5! Obvious in hindsight, all working now.
You are not setting spark-core in your dependencies.

SparkSubmit Exception (NoClassDefFoundError), even though SBT compiles and packages sucessfully

I want to use ShapeLogic Scala combined with Spark. I am using Scala 2.11.8, Spark 2.1.1 and ShapeLogic Scala 0.9.0.
I sucessfully imported the classes to manage images with Spark. Also, I sucessfully compiled and packed (by using SBT) the following application in order to spark-submitting it to a cluster.
The following application simply opens an image and write it to a folder:
// imageTest.scala
import org.apache.spark.sql.SparkSession
import org.shapelogic.sc.io.LoadImage
import org.shapelogic.sc.image.BufferImage
import org.shapelogic.sc.io.BufferedImageConverter
object imageTestObj {
def main(args: Array[String]) {
// Create a Scala Spark Session
val spark = SparkSession.builder().appName("imageTest").master("local").getOrCreate();
val inPathStr = "/home/vitrion/IdeaProjects/imageTest";
val outPathStr = "/home/vitrion/IdeaProjects/imageTest/output";
val empty = new BufferImage[Byte](0, 0, 0, Array());
var a = Array.fill(3)(empty);
for (i <- 0 to 3) {
val imagePath = inPathStr + "IMG_" + "%01d".format(i + 1);
a(i) = LoadImage.loadBufferImage(inPathStr);
}
val sc = spark.sparkContext;
val imgRDD = sc.parallelize(a);
imgRDD.map { outBufferImage =>
val imageOpt = BufferedImageConverter.bufferImage2AwtBufferedImage(outBufferImage)
imageOpt match {
case Some(bufferedImage) => {
LoadImage.saveAWTBufferedImage(bufferedImage, "png", outPathStr)
println("Saved " + outPathStr)
}
case None => {
println("Could not convert image")
}
}
}
}
}
This is my SBT file
name := "imageTest"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.1.1" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.1.1" % "provided",
"org.shapelogicscala" %% "shapelogic" % "0.9.0" % "provided"
)
However, the following error appears. When the package SBT command is executed, it seems like the ShapeLogic Scala dependencies are not included in the application JAR:
[vitrion#mstr scala-2.11]$ pwd
/home/vitrion/IdeaProjects/imageTest/target/scala-2.11
[vitrion#mstr scala-2.11]$ ls
classes imagetest_2.11-0.1.jar resolution-cache
[vitrion#mstr scala-2.11]$ spark-submit --class imageTestObj imagetest_2.11-0.1.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org/shapelogic/sc/image/BufferImage
at imageTestObj.main(imageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.shapelogic.sc.image.BufferImage
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
I hope someone can help me to solve it?
Thank you very much
This error says everything:
Caused by: java.lang.ClassNotFoundException: org.shapelogic.sc.image.BufferImage
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
Add this missing dependency (ShapeLogic class) ------> org.shapelogic.sc.image.BufferImage, that should resolve the issue. Maven or SBT both should give the same error, if you miss this dependency!!
Since you are working on the cluster mode, you can directly add dependencies using --jars on spark-submit, please follow this post for more details.
These threads might help you:
Link1
Link2
Your dependencies listed in sbt files will not be included by default in your jar submitted to spark, so for sbt you have to use a plugin to build an uber/fat jar that would include shapelogicscala classes. You can use this link on SO, How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA? to see how you can manage this with sbt.

Error while Executing Scala constructs with Spark 1.5.2 and Scala 2.11.7

I have a simple scala object file with the following content:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object X {
def main(args: Array[String]) {
val params = Map[String, String](
"abc" -> "22",)
println("Creating Spark Configuration");
val conf = new SparkConf().setAppName("X")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile("/tmp/x.txt", 2).cache()
val count = txtFileLines.count()
println("Count" + count)
}
}
My build.sbt looks like:
name := "x"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
I then do sbt package to create x.jar under target/scala-2.11/
When I execute the above code as:
spark-submit --class X --master local[2] x.jar
I get the following error:
Creating Spark Configuration
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at Sweeper$.main(Sweeper.scala:14)
at Sweeper.main(Sweeper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
As you are using Scala 2.11 in your project. You should use spark core library build for Scala 2.11.
Can download spark-core_2.11 from here http://mvnrepository.com/search?q=Spark
Refer spark-core_2.11 jar in project.

java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration

I want to create my first scala program using the scala example HBaseTest2.scala, provided in Sparkd 1.4.1. The goal is to connect to HBase and do some basic stuff, such as counting rows or scan rows. However, when I tried to execute the program, I got an error. It seems that Spark couldn't find the class HBaseConfiguration. Assuming the we're located a the root path of my project HBaseTest2 /usr/local/Cellar/spark/programs/HBaseTest2. Here are some details for the exception :
./src/main/scala/com/orange/spark/examples/HBaseTest2.scala
package com.orange.spark.examples
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, TableName}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.spark._
object HBaseTest2 {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HBaseTest2")
val sc = new SparkContext(sparkConf)
val tableName = "personal-cloud-test"
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
val conf = HBaseConfiguration.create()
// Other options for configuring scan behavior are available. More information available at
// http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html
conf.set(TableInputFormat.INPUT_TABLE, tableName)
// Initialize hBase table if necessary
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(tableName)) {
val tableDesc = new HTableDescriptor(TableName.valueOf(tableName))
admin.createTable(tableDesc)
}
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
println("hbaseRDD.count()")
println(hBaseRDD.count())
sc.stop()
admin.close()
}
}
./build.sbt
I've added dependencies in this file to ensure all classes called are included in the jar file.
name := "HBaseTest2"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "1.2.1",
"org.apache.hbase" % "hbase" % "1.0.1.1",
"org.apache.hbase" % "hbase-client" % "1.0.1.1",
"org.apache.hbase" % "hbase-common" % "1.0.1.1",
"org.apache.hbase" % "hbase-server" % "1.0.1.1"
)
Run application
MacBook-Pro-de-Mincong:spark-1.4.1 minconghuang$ bin/spark-submit \
--class "com.orange.spark.examples.HBaseTest2" \
--master local[4] \
../programs/HBaseTest2/target/scala-2.11/hbasetest2_2.11-1.0.jar
Exception
15/08/18 12:06:17 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at com.orange.spark.examples.HBaseTest2$.main(HBaseTest2.scala:21)
at com.orange.spark.examples.HBaseTest2.main(HBaseTest2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
15/08/18 12:06:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
The problem might come from the HBase configuration as mentioned in HBaseTest2.scala line 16 :
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
But I don't know how to configure it... I've added the HBASE_CONF_DIR to CLASSPATH in my command line. The CLASSPATH is now /usr/local/Cellar/hadoop/hbase-1.0.1.1/conf. Nothing happened... T_T So what should I do to get this fixed ? I can add/delete details if needed. Thanks a lot !!
Have you tried
sparkConf.set("spark.driver.extraClassPath", "/usr/local/Cellar/hadoop/hbase-1.0.1.1/conf")
The problem came from class-path-setting as mentioned in HBaseTest2.scala line 33 :
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
As I'm using a MAC OS X, setting is different from Linux. When I tried echo $CLASSPATH, it returned empty. It seems that Mac doesn't use the CLASSPATH to do the driver job. So I need to add all jar files through spark.driver.extraClassPath in spark-defaults.conf file. My collegue did the same way in Linux. I think there's a better way to handle it elegantly, but we didn't find out. Please share if you know the answer. Thanks.
Mac / Linux
add all external jars in conf/spark-defaults.conf
spark.driver.extraClassPath /path/to/a.jar:/path/to/b.jar:/path/to/c.jar

ZeroMQ word count app gives error when you compile in spark 1.2.1

I'm trying to setup zeromq data stream to spark. Basically I took the ZeroMQWordCount.scala app an tried to recompile it and run it.
I have zeromq 2.1 installed, and spark 1.2.1
here is my scala code:
package org.apache.spark.examples.streaming
import akka.actor.ActorSystem
import akka.actor.actorRef2Scala
import akka.zeromq._
import akka.zeromq.Subscribe
import akka.util.ByteString
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.zeromq._
import scala.language.implicitConversions
import org.apache.spark.SparkConf
object ZmqBenchmark {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println("Usage: ZmqBenchmark <zeroMQurl> <topic>")
System.exit(1)
}
//StreamingExamples.setStreamingLogLevels()
val Seq(url, topic) = args.toSeq
val sparkConf = new SparkConf().setAppName("ZmqBenchmark")
// Create the context and set the batch size
val ssc = new StreamingContext(sparkConf, Seconds(2))
def bytesToStringIterator(x: Seq[ByteString]) = (x.map(_.utf8String)).iterator
// For this stream, a zeroMQ publisher should be running.
val lines = ZeroMQUtils.createStream(ssc, url, Subscribe(topic), bytesToStringIterator _)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
}
}
and this is my .sbt file for dependencies:
name := "ZmqBenchmark"
version := "1.0"
scalaVersion := "2.10.4"
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
resolvers += "Sonatype (releases)" at "https://oss.sonatype.org/content/repositories/releases/"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.2.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.2.1"
libraryDependencies += "org.apache.spark" % "spark-streaming-zeromq_2.10" % "1.2.1"
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.2.0"
libraryDependencies += "org.zeromq" %% "zeromq-scala-binding" % "0.0.6"
libraryDependencies += "com.typesafe.akka" % "akka-zeromq_2.10.0-RC5" % "2.1.0-RC6"
libraryDependencies += "org.apache.spark" % "spark-examples_2.10" % "1.1.1"
libraryDependencies += "org.spark-project.zeromq" % "zeromq-scala-binding_2.11" % "0.0.7-spark"
The application compiles without any errors using sbt package, however when i run the application with spark submit, i get an error:
zaid#zaid-VirtualBox:~/spark-1.2.1$ ./bin/spark-submit --master local[*] ./zeromqsub/example/target/scala-2.10/zmqbenchmark_2.10-1.0.jar tcp://127.0.0.1:5553 hello
15/03/06 10:21:11 WARN Utils: Your hostname, zaid-VirtualBox resolves to a loopback address: 127.0.1.1; using 192.168.220.175 instead (on interface eth0)
15/03/06 10:21:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/zeromq/ZeroMQUtils$
at ZmqBenchmark$.main(ZmqBenchmark.scala:78)
at ZmqBenchmark.main(ZmqBenchmark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.zeromq.ZeroMQUtils$
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
Any ideas why this happens? i know the app should work because when i run the same example using the $/run-example $ script and point to the ZeroMQWordCount app from spark, it runs without the exception. My guess is the sbt file is incorrect, what else do I need to have in the sbt file?
Thanks
You are using ZeroMQUtils.createStream but the line
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.zeromq.ZeroMQUtils
shows that the bytecode for ZeroMQUtils was not located. When the spark examples are run, they are run against a jar file (like spark-1.2.1/examples/target/scala-2.10/spark-examples-1.2.1-hadoop1.0.4.jar) including the ZeroMQUtils class. A solution would be to use the --jars flag so spark-submit command can find the bytecode. In your case, this could be something like
spark-submit --jars /opt/spark/spark-1.2.1/examples/target/scala-2.10/spark-examples-1.2.1-hadoop1.0.4.jar--master local[*] ./zeromqsub/example/target/scala-2.10/zmqbenchmark_2.10-1.0.jar tcp://127.0.0.1:5553 hello
assuming that you have installed spark-1.2.1 in /opt/spark.