No such method running forEach in Scala job on Apache Spark - scala

I'm running a very simple Scala job on Apache Spark 2.4.5 and when I try and iterate over the columns in a DataFrame and print there names I get the following stack trace correlating to the line where I try and call the for each.
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
at SimpleApp$.main(SimpleApp.scala:10)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am running Apache Spark in Docker using this image: bde2020/spark-master:2.4.5-hadoop2.7
I am compiling my app using scalaVersion := "2.12.11"
Full application code is:
import org.apache.spark.sql.{Row, SparkSession}
object SimpleApp {
def main(args: Array[String]) {
val file = "/spark/jobs/job1/data/test.json"
val spark = SparkSession.builder.appName("Simple Application Scala").getOrCreate()
val testData = spark.read.json(file)
println("prints fine")
testData.columns.foreach(x => println(x))
spark.stop()
}
build.sbt file is
name := "spark-scala"
version := "0.1"
scalaVersion := "2.12.11"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"
I am at a loss, I have checked and checked I am running the correct versions of things but suspect I must of missed something!

After much head banging discovered the image actually uses Scala 2.11.12 which is deprecated with Spark 2.4.5! Obvious in hindsight, all working now.

You are not setting spark-core in your dependencies.

Related

Scallop Verify() method with scala version `2.10.5` not working

I am using scalaVersion := "2.10.5" and libraryDependencies += "org.rogach" %% "scallop" % "3.1.2".
Getting following error: Exception in thread "main"
java.lang.NoSuchMethodError:
scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object; at
org.rogach.scallop.DefaultConverters$$anon$2.parse(DefaultConverters.scala:27)
at
org.rogach.scallop.ValueConverter$class.parseCached(ValueConverter.scala:21)
at
org.rogach.scallop.DefaultConverters$$anon$2.parseCached(DefaultConverters.scala:24)
at
org.rogach.scallop.Scallop$$anonfun$verify$17.apply(Scallop.scala:632)
at
org.rogach.scallop.Scallop$$anonfun$verify$17.apply(Scallop.scala:630)
at scala.collection.immutable.List.foreach(List.scala:381) at
org.rogach.scallop.Scallop.verify(Scallop.scala:630) at
org.rogach.scallop.ScallopConfBase.verifyBuilder(ScallopConfBase.scala:405)
at
org.rogach.scallop.ScallopConfBase.verify(ScallopConfBase.scala:744)
at
com.unity3d.ads.conf.OperativeEventConverterConf.(OperativeEventConverterConf.scala:50)
at com.unity3d.ads.analytics.TestClass$.main(TestClass.scala:51) at
com.unity3d.ads.analytics.TestClass.main(TestClass.scala) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The exact same code is working fine with scalaVersion := "2.11.8"
Unfortunately, I have to use 2.10.5 because I am using spark version 1.6.
sample code:
`import org.rogach.scallop.{ScallopConf, ScallopOption, Serialization, ValueConverter, singleArgConverter}
class TestClass(args: Seq[String]) extends ScallopConf(args) with Serialization {
val testInput: ScallopOption[String] =
opt[String](
name = "test.input",
descr = "test",
required = false,
default = Option("testPath"))
verify()
}
`
Is there any workaround I can use here to make it work with scala 2.10.5?
Answering my question to in case someone else faces the similar issue.
This turned out to be the classpath issue. the root cause of the issue:
I was using spark 2.1 to run code compiled with spark 1.6 version. apparently, 1.6 uses scala 2.10.. while spark 2.1 uses scala 2.11...

SparkSubmit Exception (NoClassDefFoundError), even though SBT compiles and packages sucessfully

I want to use ShapeLogic Scala combined with Spark. I am using Scala 2.11.8, Spark 2.1.1 and ShapeLogic Scala 0.9.0.
I sucessfully imported the classes to manage images with Spark. Also, I sucessfully compiled and packed (by using SBT) the following application in order to spark-submitting it to a cluster.
The following application simply opens an image and write it to a folder:
// imageTest.scala
import org.apache.spark.sql.SparkSession
import org.shapelogic.sc.io.LoadImage
import org.shapelogic.sc.image.BufferImage
import org.shapelogic.sc.io.BufferedImageConverter
object imageTestObj {
def main(args: Array[String]) {
// Create a Scala Spark Session
val spark = SparkSession.builder().appName("imageTest").master("local").getOrCreate();
val inPathStr = "/home/vitrion/IdeaProjects/imageTest";
val outPathStr = "/home/vitrion/IdeaProjects/imageTest/output";
val empty = new BufferImage[Byte](0, 0, 0, Array());
var a = Array.fill(3)(empty);
for (i <- 0 to 3) {
val imagePath = inPathStr + "IMG_" + "%01d".format(i + 1);
a(i) = LoadImage.loadBufferImage(inPathStr);
}
val sc = spark.sparkContext;
val imgRDD = sc.parallelize(a);
imgRDD.map { outBufferImage =>
val imageOpt = BufferedImageConverter.bufferImage2AwtBufferedImage(outBufferImage)
imageOpt match {
case Some(bufferedImage) => {
LoadImage.saveAWTBufferedImage(bufferedImage, "png", outPathStr)
println("Saved " + outPathStr)
}
case None => {
println("Could not convert image")
}
}
}
}
}
This is my SBT file
name := "imageTest"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.1.1" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.1.1" % "provided",
"org.shapelogicscala" %% "shapelogic" % "0.9.0" % "provided"
)
However, the following error appears. When the package SBT command is executed, it seems like the ShapeLogic Scala dependencies are not included in the application JAR:
[vitrion#mstr scala-2.11]$ pwd
/home/vitrion/IdeaProjects/imageTest/target/scala-2.11
[vitrion#mstr scala-2.11]$ ls
classes imagetest_2.11-0.1.jar resolution-cache
[vitrion#mstr scala-2.11]$ spark-submit --class imageTestObj imagetest_2.11-0.1.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org/shapelogic/sc/image/BufferImage
at imageTestObj.main(imageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.shapelogic.sc.image.BufferImage
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
I hope someone can help me to solve it?
Thank you very much
This error says everything:
Caused by: java.lang.ClassNotFoundException: org.shapelogic.sc.image.BufferImage
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
Add this missing dependency (ShapeLogic class) ------> org.shapelogic.sc.image.BufferImage, that should resolve the issue. Maven or SBT both should give the same error, if you miss this dependency!!
Since you are working on the cluster mode, you can directly add dependencies using --jars on spark-submit, please follow this post for more details.
These threads might help you:
Link1
Link2
Your dependencies listed in sbt files will not be included by default in your jar submitted to spark, so for sbt you have to use a plugin to build an uber/fat jar that would include shapelogicscala classes. You can use this link on SO, How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA? to see how you can manage this with sbt.

Could not copy native binaries to temp directory Kinesis KPL

I am trying to build a simple application that will produce messages to Kinesis using the KPL. I am writing this in scala and am receiving an error message that I can't seem to figure out. My code is as follows:
import java.nio.ByteBuffer
import com.amazonaws.services.kinesis.producer.{KinesisProducer, KinesisProducerConfiguration}
object KinesisStream extends App{
ProduceToKinesis()
def ProduceToKinesis(): Unit ={
val config = new KinesisProducerConfiguration()
val kinesis = new KinesisProducer(config)
val data = ByteBuffer.wrap("myData".getBytes("UTF-8"))
kinesis.addUserRecord("TestStream", "myPartitionKey", data)
}
}
it fails at
val kinesis = new KinesisProducer(config)
with an error message of:
Exception in thread "main" java.lang.RuntimeException: Could not copy native binaries to temp directory C:\Users\************\AppData\Local\Temp\amazon-kinesis-producer-native-binaries
at com.amazonaws.services.kinesis.producer.KinesisProducer.extractBinaries(KinesisProducer.java:844)
at com.amazonaws.services.kinesis.producer.KinesisProducer.<init>(KinesisProducer.java:242)
at KinesisStream$.ProduceToKinesis(KinesisStream.scala:14)
at KinesisStream$.delayedEndpoint$KinesisStream$1(KinesisStream.scala:9)
at KinesisStream$delayedInit$body.apply(KinesisStream.scala:8)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at KinesisStream$.main(KinesisStream.scala:8)
at KinesisStream.main(KinesisStream.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.NullPointerException
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:462)
at com.amazonaws.services.kinesis.producer.KinesisProducer.extractBinaries(KinesisProducer.java:803)
... 18 more
My Build.SBT looks like this:
name := "Kinesis"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "com.amazonaws" % "amazon-kinesis-producer" % "0.12.1"
I know it's been long since this issue was posted. But recently I fell into the same issue and couldn't find much on the web. Fortunately I'v found the solution, this could be helpful to someone else wandering for the fix.
It's a version issue as mentioned here https://github.com/awslabs/amazon-kinesis-producer/issues/113#issuecomment-345028662. It worked for me when downgraded the KPL version to 0.13.1
Here's the pom dependency worked for me.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>amazon-kinesis-producer</artifactId>
<version>0.13.1</version>
</dependency>

Error while Executing Scala constructs with Spark 1.5.2 and Scala 2.11.7

I have a simple scala object file with the following content:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object X {
def main(args: Array[String]) {
val params = Map[String, String](
"abc" -> "22",)
println("Creating Spark Configuration");
val conf = new SparkConf().setAppName("X")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile("/tmp/x.txt", 2).cache()
val count = txtFileLines.count()
println("Count" + count)
}
}
My build.sbt looks like:
name := "x"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
I then do sbt package to create x.jar under target/scala-2.11/
When I execute the above code as:
spark-submit --class X --master local[2] x.jar
I get the following error:
Creating Spark Configuration
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at Sweeper$.main(Sweeper.scala:14)
at Sweeper.main(Sweeper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
As you are using Scala 2.11 in your project. You should use spark core library build for Scala 2.11.
Can download spark-core_2.11 from here http://mvnrepository.com/search?q=Spark
Refer spark-core_2.11 jar in project.

Spark SQL - PostgreSQL JDBC Classpath Issues

I’m having an issue connecting Spark SQL to a PostgreSQL data source. I’ve downloaded the Postgres JDBC jar and included it in an uber jar using sbt-assembly.
My (failing) source code:
https://gist.github.com/geowa4/a9bc238ca7c372b95267.
I’ve also tried using sqlContext.jdbc() preceded with classOf[org.postgresql.Driver] as well. It appears the driver can access the Driver just fine.
Any help would be much appreciated. Thanks.
SimpleApp.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val commits = sqlContext.load("jdbc", Map(
"url" -> "jdbc:postgresql://192.168.59.103:5432/postgres",
"dbtable" -> "commits",
"driver" -> "org.postgresql.Driver"))
commits.select("message").show(1)
}
}
simple.sbt:
name := "simple-project"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1" % "provided"
libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41"
output (Edited):
Exception in thread "main" java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:102)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at SimpleApp$.main(SimpleApp.scala:17)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
EDIT: I changed the Scala version to 2.10.5 and the output changed to this. I feel like I'm making progress.
There is a problem with general problem with JDBC, where the primordial classloader must know about the jar. In Spark 1.3 this can be addressed using the SPARK_CLASSPATH option as described here:
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#jdbc-to-other-databases
In Spark 1.4, this should be fixed by #5782.
1) Copy file into your jar location
2) Add jar in path as follows
spark-submit --jars /usr/share/java/postgresql-jdbc.jar --class com.examples.WordCount .. .. ..