Spark MLLIB error: java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.treeAggregate - scala

I am trying to run the Linear Regression example in the Spark job. I got the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.treeAggregate$default$4(Ljava/lang/Object;)I
at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1.apply$mcVI$sp(GradientDescent.scala:189)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.mllib.optimization.GradientDescent$.runMiniBatchSGD(GradientDescent.scala:184)
at org.apache.spark.mllib.optimization.GradientDescent.optimize(GradientDescent.scala:107)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:267)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:190)
at com.myproject.sample.LinearRegression$.run(LinearRegressionExample.scala:105)
at com.myproject.sample.LinearRegression$$anonfun$main$1.apply(LinearRegressionExample.scala:67)
at com.myproject.sample.LinearRegression$$anonfun$main$1.apply(LinearRegressionExample.scala:66)
at scala.Option.map(Option.scala:145)
at com.myproject.sample.LinearRegression$.main(LinearRegressionExample.scala:66)
at com.myproject.sample.LinearRegression.main(LinearRegressionExample.scala)
Here is my dependencies in pom:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
<exclusions>
:
:
</dependency>
Why did I get java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.treeAggregate ? Did I use the wrong dependency? Thanks!

Related

java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder - Scala Spark

I am trying to execute an WordCount example with Scala Spark but I am having the following problem:
java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder
at org.apache.hadoop.fs.FsTracer.get(FsTracer.java:42)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3232)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3286)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:478)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
at es.santander.prueba.pruebas.WordCount.count(WordCount.scala:123)
at es.santander.prueba.pruebas.WordCount.main(WordCount.scala:50)
at es.santander.prueba.pruebas.Main$.main(WordCount.scala:27)
at es.santander.prueba.pruebas.Main.main(WordCount.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.core.Tracer$Builder
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 11 more
I have been searching to fix this problem and I add to my pom.xml the following jars:
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>2.4.5</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.htrace/htrace-core -->
<dependency>
<groupId>org.htrace</groupId>
<artifactId>htrace-core</artifactId>
<version>3.0.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.htrace/htrace-hbase -->
<dependency>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-hbase</artifactId>
<version>4.1.0-incubating</version>
</dependency>
But the problem persist, Am I doing wrong something or do I need something more?
EDIT: This is where I am having the error:
FileSystem.get(spark.sparkContext.hadoopConfiguration).delete(new Path(outputFile), true)

Embedded Kafka & Spark 2.3 version mismatch issue

When I use this dependency:
<dependency>
<groupId>net.manub</groupId>
<artifactId>scalatest-embedded-kafka_2.11</artifactId>
<version>2.0.0</version>
<scope>test</scope>
</dependency>
With
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.3.0</version>
</dependency>
I run into this error:
Cause: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.reader.SupportsScanUnsafeRow
Trying to figure out which version of 'scalatest-embedded-kafka' will work with Spark 2.3.
Any ideas?

java.lang.ClassNotFoundException: Failed to find data source: json

I want to deploy my first Spark Application on our new cluster, but I get the following stacktrace:
java.lang.ClassNotFoundException: Failed to find data source: json. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:294)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:249)
at de.wlw.mad.company_analyser.util.WebPageReader$.apply(WebPageReader.scala:9)
at de.wlw.mad.company_analyser.Analyse$.sparkCommand(Analyse.scala:54)
at de.wlw.mad.company_analyser.Analyse$.apply(Analyse.scala:171)
at de.wlw.mad.company_analyser.AnalysisWorker$.listen(AnalysisWorker.scala:18)
at de.wlw.mad.bunny.AbstractListener$1.handleDelivery(AbstractListener.java:46)
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: json.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)
I have written some Tests for my Spark Code, and they are running successfully. My pom.xml:
<!-- Hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
<!-- Scala Lib -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<!-- Spark Dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
<type>jar</type>
<version>16.0.1</version>
</dependency>
<!-- JUnit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
The corresponding code looks like this:
import org.apache.spark.sql.{Dataset, SparkSession}
object WebPageReader {
def apply(spark: SparkSession, path: String): Dataset[WebPageCase] = {
import spark.implicits._
spark.read.json(path).as[WebPageCase]
}
}
WebPageCase is a Scala Case Class
The Cluster in running in the cloud. Using bin/spark-submit, I get the same error as running the job from my local environment. The Spark Master and Workers are build without Hadoop.
What Package am I missing or doing wrong?

Getting exception : java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;) while using data frames

I am receiving "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)" error while using dataframes in scala app and running it using spark. However if I work using only RDD's and not dataframes, no such error comes up with same pom and settings. Also while going through other posts with same error, it is mentioned that scala version has to be 2.10 as spark is not compatible with 2.11 scala, and i am using 2.10 scala version with 2.0.0 spark.
Below is the snip from pom:
<properties>
<spark-assembly>/usr/lib/spark/lib/spark-assembly.jar</spark-assembly>
<encoding>UTF-8</encoding>
<hadoop.version>2.7.1</hadoop.version>
<hbase.version>1.1.1</hbase.version>
<scala.version>2.10.5</scala.version>
<scala.tools.version>2.10</scala.tools.version>
<spark.version>2.0.0</spark.version>
<phoenix.version>4.7.0-HBase-1.1</phoenix.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
Error:
16/10/19 02:57:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at com.abc.xyz.Compare$.main(Compare.scala:64)
at com.abc.xyz.Compare.main(Compare.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
16/10/19 02:57:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;)
16/10/19 02:57:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Change scala version
<scala.version>2.11.8</scala.version>
<scala.tools.version>2.11</scala.tools.version>
and add
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>${scala.version}</version>
</dependency>
I also faced this error and this is purely a version issue.
Your scala version is not compatible or may be you are using the correct version but the intellij libraries has the old version.
Quick fix :
I as using spark 2.2.0 and scala 2.10.4 , which I then changed to scala version 2.11.8.After that do the below:
1) right click on intellij module
2) open module-settings.
3) go to libraries and clear all of them
4) Rebuild
Doing above for me issue is resolved.

NullPointerException in Salat

While making any type of call to Mongo from my Scala application, I am getting this NullPointerException. Can somebody please help.
I am using Mongo 3.0.1 and my Scala version is 2.9.0. Other dependencies are as follows
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>casbah_2.9.1</artifactId>
<type>pom</type>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>2.11.1</version>
</dependency>
<dependency>
<groupId>com.novus</groupId>
<artifactId>salat-core_2.9.1</artifactId>
<version>1.9.1</version>
</dependency>
<dependency>
<groupId>com.google.code.morphia</groupId>
<artifactId>morphia</artifactId>
<version>0.99</version>
</dependency>
Error :
Caused by: java.lang.NullPointerException
at com.novus.salat.util.GraterPrettyPrinter$$anonfun$safeDefault$2$$anonfun$apply$1.apply(PrettyPrinters.scala:74)
at com.novus.salat.util.GraterPrettyPrinter$$anonfun$safeDefault$2$$anonfun$apply$1.apply(PrettyPrinters.scala:74)
at scala.Option.map(Option.scala:134)
at com.novus.salat.util.GraterPrettyPrinter$$anonfun$safeDefault$2.apply(PrettyPrinters.scala:74)
at com.novus.salat.util.GraterPrettyPrinter$$anonfun$safeDefault$2.apply(PrettyPrinters.scala:74)
at scala.Option.flatMap(Option.scala:147)
at com.novus.salat.util.GraterPrettyPrinter$class.safeDefault(PrettyPrinters.scala:74)
at com.novus.salat.util.ConstructorInputPrettyPrinter$.safeDefault(PrettyPrinters.scala:108)
at com.novus.salat.util.ConstructorInputPrettyPrinter$$anonfun$apply$3.apply(PrettyPrinters.scala:134)
at com.novus.salat.util.ConstructorInputPrettyPrinter$$anonfun$apply$3.apply(PrettyPrinters.scala:128)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)
at com.novus.salat.util.ConstructorInputPrettyPrinter$.apply(PrettyPrinters.scala:128)
at com.novus.salat.util.ToObjectGlitch.<init>(ToObjectGlitch.scala:44)
at com.novus.salat.ConcreteGrater.feedArgsToConstructor(Grater.scala:294)
at com.novus.salat.ConcreteGrater.asObject(Grater.scala:263)
at com.novus.salat.ConcreteGrater.asObject(Grater.scala:105)
at com.novus.salat.dao.SalatMongoCursorBase$class.next(SalatMongoCursor.scala:47)
at com.novus.salat.dao.SalatMongoCursor.next(SalatMongoCursor.scala:149)
at scala.collection.Iterator$class.foreach(Iterator.scala:652)
at com.novus.salat.dao.SalatMongoCursor.foreach(SalatMongoCursor.scala:149)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:128)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:242)
at com.novus.salat.dao.SalatMongoCursor.toList(SalatMongoCursor.scala:149)
This issue was due to corrupt data in the db. After clearing that it worked.