spark ClassNotFoundException error for performing kmeans - scala

I am trying to submit a spark job for using spark KMeans. I am packaging the scala file correctly, but when I want to submit the job I always have the ClassNotFoundException.
Here is my sbt fille:
name:="sparkKmeans"
libraryDependencies+="org.apache.spark" %% "spark-core" % "1.1.1"
and here is my scala class:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
object sparkKmeans {
def main(args: Array[String]) {
// create Spark context with Spark configuration
val sc = new SparkContext(new SparkConf().setAppName("SparkKmeans"))
//val threshold = args(1).toInt
// Load and parse the data. source is the first argument.
val data = sc.textFile(args(0))
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
// Cluster the data into classes using KMeans. number of itteration is fixed as 100
// and number of clusters is get from the input -second argument
val numClusters = args(1)
val numIterations = 100
val clusters = KMeans.train(parsedData, numClusters, numIterations)
// Evaluate clustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(parsedData)
println("Within Set Sum of Squared Errors = " + WSSSE)
// Save and load model based on thirs argument.
//clusters.save(sc, args(2))
// val sameModel = KMeansModel.load(sc, args(2))
}
}
I have commented last two lines because I saw some places said that spark has problem with serializer. But still has the problem.
and here is the error:
java.lang.ClassNotFoundException: sparkKmeans
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
and Submitting the job using:
./bin/spark-shell --class sparkKmeans ......
If anybody could help me I will be appreciated.

Thanks for the comments.
I have done what you said:
Built.sbt file:
name:="sparkKmeans"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.1"
)
(I used scala 2.11.8 and Spark 1.6.1 version but still the same error.
Also about the other question:
I am packaging my application using:
sbt
compile
package
and to execute use:
./bin/spark-submit --class sparkKmeans k/kmeans/target/scala-2.10/sparkkmeans_2.10-0.1-SNAPSHOT.jar '/home/meysam/spark-1.6.1/kmeans/pima.csv' 3

Related

Exception in thread "main" java.lang.NoSuchMethodError in ubuntu only

I have the code below:
import java.io.File
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
object RDFBenchVerticalPartionedTables {
def main(args: Array[String]): Unit = {
println("Start of programm .... ")
val conf = new SparkConf().setMaster("local").setAppName("SQLSPARK")
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
val sc = new SparkContext(conf)
sc.setLogLevel("ERROR")
println("Conf and SC declared... ")
val spark = SparkSession
.builder()
.master("local[*]")
.appName("SparkConversionSingleTable")
.getOrCreate()
println("SparkSession declared... ")
println("Before Agrs..... ")
val filePathCSV=args(0)
val filePathAVRO=args(1)
val filePathORC=args(2)
val filePathParquet=args(3)
println("After Agrs..... ")
val csvFiles = new File(filePathCSV).list()
println("After List of Files Agrs..... " + csvFiles.length )
println("Before the foreach ... ")
csvFiles.foreach{verticalTableName=>
println("inside the foreach ... ")
val verticalTableName2=verticalTableName.dropRight(4)
val RDFVerticalTableDF = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(filePathCSV+"/"+verticalTableName).toDF()
RDFVerticalTableDF.write.format("com.databricks.spark.avro").save(filePathAVRO+"/"+verticalTableName2+".avro")
RDFVerticalTableDF.write.parquet(filePathParquet+"/"+verticalTableName2+".parquet")
RDFVerticalTableDF.write.orc(filePathORC+"/"+verticalTableName2+".orc")
println("Vertical Table: '" +verticalTableName2+"' Has been Successfully Converted to AVRO, PARQUET and ORC !")
}
}
}
this class transforms list of csv files in adirectory that is given in a arguments (0) and save different formats (avro,orc and parquet) in three directories given also as args(1) args(2) and args(3).
I tried to submit this job using the spark-submit on windows it works, but while running the same job in ubuntu it fails with this error:
ubuntu#ragab:~$ spark-submit --class RDFBenchVerticalPartionedTables --master local[*] /home/ubuntu/testjar/rdfschemaconversion_2.11-0.1.jar "/data/RDFBench4/VerticalPartionnedTables/VerticalPartitionedTables100" "/data/RDFBench3/ConvertedData/SP2Bench100/AVRO/VerticalTables" "/data/RDFBench3/ConvertedData/SP2Bench100/ORC/VerticalTables" "/data/RDFBench3/ConvertedData/SP2Bench100/Parquet"
19/05/04 18:10:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Start of programm ....
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Conf and SC declared...
SparkSession declared...
Before Agrs.....
After Agrs.....
After List of Files Agrs..... 25
Before the foreach ...
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at RDFBenchVerticalPartionedTables$.main(RDFBenchVerticalPartionedTables.scala:45)
at RDFBenchVerticalPartionedTables.main(RDFBenchVerticalPartionedTables.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
this is my sbt file:
name := "RDFSchemaConversion"
version := "0.1"
scalaVersion := "2.11.12"
mainClass in (Compile, run) := Some("RDFBenchVerticalPartionedTables")
mainClass in (Compile, packageBin) := Some("RDFBenchVerticalPartionedTables")
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
libraryDependencies += "com.typesafe" % "config" % "1.3.1"
libraryDependencies += "com.databricks" %% "spark-avro" % "4.0.0"
Your Spark distribution on Ubuntu seems to have been compiled with Scala 2.12. It is incompatible with your jar file which is compiled with Scala 2.11.

SparkSubmit Exception (NoClassDefFoundError), even though SBT compiles and packages sucessfully

I want to use ShapeLogic Scala combined with Spark. I am using Scala 2.11.8, Spark 2.1.1 and ShapeLogic Scala 0.9.0.
I sucessfully imported the classes to manage images with Spark. Also, I sucessfully compiled and packed (by using SBT) the following application in order to spark-submitting it to a cluster.
The following application simply opens an image and write it to a folder:
// imageTest.scala
import org.apache.spark.sql.SparkSession
import org.shapelogic.sc.io.LoadImage
import org.shapelogic.sc.image.BufferImage
import org.shapelogic.sc.io.BufferedImageConverter
object imageTestObj {
def main(args: Array[String]) {
// Create a Scala Spark Session
val spark = SparkSession.builder().appName("imageTest").master("local").getOrCreate();
val inPathStr = "/home/vitrion/IdeaProjects/imageTest";
val outPathStr = "/home/vitrion/IdeaProjects/imageTest/output";
val empty = new BufferImage[Byte](0, 0, 0, Array());
var a = Array.fill(3)(empty);
for (i <- 0 to 3) {
val imagePath = inPathStr + "IMG_" + "%01d".format(i + 1);
a(i) = LoadImage.loadBufferImage(inPathStr);
}
val sc = spark.sparkContext;
val imgRDD = sc.parallelize(a);
imgRDD.map { outBufferImage =>
val imageOpt = BufferedImageConverter.bufferImage2AwtBufferedImage(outBufferImage)
imageOpt match {
case Some(bufferedImage) => {
LoadImage.saveAWTBufferedImage(bufferedImage, "png", outPathStr)
println("Saved " + outPathStr)
}
case None => {
println("Could not convert image")
}
}
}
}
}
This is my SBT file
name := "imageTest"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.1.1" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.1.1" % "provided",
"org.shapelogicscala" %% "shapelogic" % "0.9.0" % "provided"
)
However, the following error appears. When the package SBT command is executed, it seems like the ShapeLogic Scala dependencies are not included in the application JAR:
[vitrion#mstr scala-2.11]$ pwd
/home/vitrion/IdeaProjects/imageTest/target/scala-2.11
[vitrion#mstr scala-2.11]$ ls
classes imagetest_2.11-0.1.jar resolution-cache
[vitrion#mstr scala-2.11]$ spark-submit --class imageTestObj imagetest_2.11-0.1.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org/shapelogic/sc/image/BufferImage
at imageTestObj.main(imageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.shapelogic.sc.image.BufferImage
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
I hope someone can help me to solve it?
Thank you very much
This error says everything:
Caused by: java.lang.ClassNotFoundException: org.shapelogic.sc.image.BufferImage
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
Add this missing dependency (ShapeLogic class) ------> org.shapelogic.sc.image.BufferImage, that should resolve the issue. Maven or SBT both should give the same error, if you miss this dependency!!
Since you are working on the cluster mode, you can directly add dependencies using --jars on spark-submit, please follow this post for more details.
These threads might help you:
Link1
Link2
Your dependencies listed in sbt files will not be included by default in your jar submitted to spark, so for sbt you have to use a plugin to build an uber/fat jar that would include shapelogicscala classes. You can use this link on SO, How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA? to see how you can manage this with sbt.

Error while Executing Scala constructs with Spark 1.5.2 and Scala 2.11.7

I have a simple scala object file with the following content:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object X {
def main(args: Array[String]) {
val params = Map[String, String](
"abc" -> "22",)
println("Creating Spark Configuration");
val conf = new SparkConf().setAppName("X")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile("/tmp/x.txt", 2).cache()
val count = txtFileLines.count()
println("Count" + count)
}
}
My build.sbt looks like:
name := "x"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
I then do sbt package to create x.jar under target/scala-2.11/
When I execute the above code as:
spark-submit --class X --master local[2] x.jar
I get the following error:
Creating Spark Configuration
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at Sweeper$.main(Sweeper.scala:14)
at Sweeper.main(Sweeper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
As you are using Scala 2.11 in your project. You should use spark core library build for Scala 2.11.
Can download spark-core_2.11 from here http://mvnrepository.com/search?q=Spark
Refer spark-core_2.11 jar in project.

java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration

I want to create my first scala program using the scala example HBaseTest2.scala, provided in Sparkd 1.4.1. The goal is to connect to HBase and do some basic stuff, such as counting rows or scan rows. However, when I tried to execute the program, I got an error. It seems that Spark couldn't find the class HBaseConfiguration. Assuming the we're located a the root path of my project HBaseTest2 /usr/local/Cellar/spark/programs/HBaseTest2. Here are some details for the exception :
./src/main/scala/com/orange/spark/examples/HBaseTest2.scala
package com.orange.spark.examples
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, TableName}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.spark._
object HBaseTest2 {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HBaseTest2")
val sc = new SparkContext(sparkConf)
val tableName = "personal-cloud-test"
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
val conf = HBaseConfiguration.create()
// Other options for configuring scan behavior are available. More information available at
// http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html
conf.set(TableInputFormat.INPUT_TABLE, tableName)
// Initialize hBase table if necessary
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(tableName)) {
val tableDesc = new HTableDescriptor(TableName.valueOf(tableName))
admin.createTable(tableDesc)
}
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
println("hbaseRDD.count()")
println(hBaseRDD.count())
sc.stop()
admin.close()
}
}
./build.sbt
I've added dependencies in this file to ensure all classes called are included in the jar file.
name := "HBaseTest2"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "1.2.1",
"org.apache.hbase" % "hbase" % "1.0.1.1",
"org.apache.hbase" % "hbase-client" % "1.0.1.1",
"org.apache.hbase" % "hbase-common" % "1.0.1.1",
"org.apache.hbase" % "hbase-server" % "1.0.1.1"
)
Run application
MacBook-Pro-de-Mincong:spark-1.4.1 minconghuang$ bin/spark-submit \
--class "com.orange.spark.examples.HBaseTest2" \
--master local[4] \
../programs/HBaseTest2/target/scala-2.11/hbasetest2_2.11-1.0.jar
Exception
15/08/18 12:06:17 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at com.orange.spark.examples.HBaseTest2$.main(HBaseTest2.scala:21)
at com.orange.spark.examples.HBaseTest2.main(HBaseTest2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
15/08/18 12:06:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
The problem might come from the HBase configuration as mentioned in HBaseTest2.scala line 16 :
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
But I don't know how to configure it... I've added the HBASE_CONF_DIR to CLASSPATH in my command line. The CLASSPATH is now /usr/local/Cellar/hadoop/hbase-1.0.1.1/conf. Nothing happened... T_T So what should I do to get this fixed ? I can add/delete details if needed. Thanks a lot !!
Have you tried
sparkConf.set("spark.driver.extraClassPath", "/usr/local/Cellar/hadoop/hbase-1.0.1.1/conf")
The problem came from class-path-setting as mentioned in HBaseTest2.scala line 33 :
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
As I'm using a MAC OS X, setting is different from Linux. When I tried echo $CLASSPATH, it returned empty. It seems that Mac doesn't use the CLASSPATH to do the driver job. So I need to add all jar files through spark.driver.extraClassPath in spark-defaults.conf file. My collegue did the same way in Linux. I think there's a better way to handle it elegantly, but we didn't find out. Please share if you know the answer. Thanks.
Mac / Linux
add all external jars in conf/spark-defaults.conf
spark.driver.extraClassPath /path/to/a.jar:/path/to/b.jar:/path/to/c.jar

Spark SQL - PostgreSQL JDBC Classpath Issues

I’m having an issue connecting Spark SQL to a PostgreSQL data source. I’ve downloaded the Postgres JDBC jar and included it in an uber jar using sbt-assembly.
My (failing) source code:
https://gist.github.com/geowa4/a9bc238ca7c372b95267.
I’ve also tried using sqlContext.jdbc() preceded with classOf[org.postgresql.Driver] as well. It appears the driver can access the Driver just fine.
Any help would be much appreciated. Thanks.
SimpleApp.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val commits = sqlContext.load("jdbc", Map(
"url" -> "jdbc:postgresql://192.168.59.103:5432/postgres",
"dbtable" -> "commits",
"driver" -> "org.postgresql.Driver"))
commits.select("message").show(1)
}
}
simple.sbt:
name := "simple-project"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1" % "provided"
libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41"
output (Edited):
Exception in thread "main" java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:102)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at SimpleApp$.main(SimpleApp.scala:17)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
EDIT: I changed the Scala version to 2.10.5 and the output changed to this. I feel like I'm making progress.
There is a problem with general problem with JDBC, where the primordial classloader must know about the jar. In Spark 1.3 this can be addressed using the SPARK_CLASSPATH option as described here:
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#jdbc-to-other-databases
In Spark 1.4, this should be fixed by #5782.
1) Copy file into your jar location
2) Add jar in path as follows
spark-submit --jars /usr/share/java/postgresql-jdbc.jar --class com.examples.WordCount .. .. ..