java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder - Scala Spark - scala

I am trying to execute an WordCount example with Scala Spark but I am having the following problem:
java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder
at org.apache.hadoop.fs.FsTracer.get(FsTracer.java:42)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3232)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3286)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:478)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
at es.santander.prueba.pruebas.WordCount.count(WordCount.scala:123)
at es.santander.prueba.pruebas.WordCount.main(WordCount.scala:50)
at es.santander.prueba.pruebas.Main$.main(WordCount.scala:27)
at es.santander.prueba.pruebas.Main.main(WordCount.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.core.Tracer$Builder
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 11 more
I have been searching to fix this problem and I add to my pom.xml the following jars:
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>2.4.5</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.htrace/htrace-core -->
<dependency>
<groupId>org.htrace</groupId>
<artifactId>htrace-core</artifactId>
<version>3.0.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.htrace/htrace-hbase -->
<dependency>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-hbase</artifactId>
<version>4.1.0-incubating</version>
</dependency>
But the problem persist, Am I doing wrong something or do I need something more?
EDIT: This is where I am having the error:
FileSystem.get(spark.sparkContext.hadoopConfiguration).delete(new Path(outputFile), true)

Related

Apache Flink: How to sink stream to Google Cloud Storage File System

I was trying yo write Some Data Streams into a file in Google Cloud Storage File System as following (Using Flink 1.8 and Scala 2.11):
data.addSink(new BucketingSink[(String, Int)]("gs://url-try/try.txt"))
But I´m facing the following error:
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:638)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
Caused by: java.lang.RuntimeException: Error while creating FileSystem when initializing the state of the BucketingSink.
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:379)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:278)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:403)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1227)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:432)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:376)
... 8 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 'gs' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:179)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:399)
... 11 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/HdfsConfiguration
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:85)
... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.HdfsConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
I have seen several questions about that and I also did:
env variables:
- FLINK_CONF_DIR
file flink-conf.yaml:
- fs.hdfs.hadoopconf: src/main/resources/core-site.xml
core-site.xml:
> <property>
> <name>fs.gs.impl</name>
> <value>com.google.cloud.hadoop.fs.gcs.
> GoogleHadoopFileSystem</value>
> <description>The FileSystem for gs: (GCS) uris.</description>
> </property>
> <property>
> <name>fs.AbstractFileSystem.gs.impl</name>
> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
> <description>The AbstractFileSystem for gs: (GCS)
> uris.</description>
These are my pom dependencies:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-lang/commons-lang -->
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector -->
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop3-1.9.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-hadoop-fs -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-fs</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-filesystem -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.35.0</version>
</dependency>
</dependencies>
Any help?
According to the stack trace posted by you, I have seen that you are having issues trying to write into the GCS container Using Flink and Scala.
So there is a similar post where the issue is resolved, please check it.
do not hesitate to comeback if you have further questions.

ClassNotFoundException: xsbt/WeakLog scala intellij

I'm having trouble Running a scala code on Intellij with windows 7.
I have installed Intellij with scala 2.12 now I am creating a simple mvn application with hello world example as below:
object HelloWorld {
def main(args: Array[String]): Unit = {
println ("hello")
}
}
While running the same on intellij I am getting the error as below: Would be good if someone can help me for same.
Error:scalac: Error: xsbt/WeakLog
java.lang.NoClassDefFoundError: xsbt/WeakLog
at xsbt.CompilerInterface.newCompiler(CompilerInterface.scala:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Caused by: java.lang.ClassNotFoundException: xsbt.WeakLog
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 20 more
Below is the pom:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.2</version>
</dependency>
<dependency>
<groupId>com.sksamuel.elastic4s</groupId>
<artifactId>elastic4s-core_2.12</artifactId>
<version>5.3.2</version>
</dependency>
<dependency>
<groupId>com.sksamuel.elastic4s</groupId>
<artifactId>elastic4s-tcp_2.12</artifactId>
<version>5.3.2</version>
</dependency>
<dependency>
<groupId>com.sksamuel.elastic4s</groupId>
<artifactId>elastic4s-http_2.12</artifactId>
<version>5.3.2</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.5</version>
</dependency>
</dependencies>

java.lang.ClassNotFoundException: Failed to find data source: json

I want to deploy my first Spark Application on our new cluster, but I get the following stacktrace:
java.lang.ClassNotFoundException: Failed to find data source: json. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:294)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:249)
at de.wlw.mad.company_analyser.util.WebPageReader$.apply(WebPageReader.scala:9)
at de.wlw.mad.company_analyser.Analyse$.sparkCommand(Analyse.scala:54)
at de.wlw.mad.company_analyser.Analyse$.apply(Analyse.scala:171)
at de.wlw.mad.company_analyser.AnalysisWorker$.listen(AnalysisWorker.scala:18)
at de.wlw.mad.bunny.AbstractListener$1.handleDelivery(AbstractListener.java:46)
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: json.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)
I have written some Tests for my Spark Code, and they are running successfully. My pom.xml:
<!-- Hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
<!-- Scala Lib -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<!-- Spark Dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
<type>jar</type>
<version>16.0.1</version>
</dependency>
<!-- JUnit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
The corresponding code looks like this:
import org.apache.spark.sql.{Dataset, SparkSession}
object WebPageReader {
def apply(spark: SparkSession, path: String): Dataset[WebPageCase] = {
import spark.implicits._
spark.read.json(path).as[WebPageCase]
}
}
WebPageCase is a Scala Case Class
The Cluster in running in the cloud. Using bin/spark-submit, I get the same error as running the job from my local environment. The Spark Master and Workers are build without Hadoop.
What Package am I missing or doing wrong?

Getting exception : java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;) while using data frames

I am receiving "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)" error while using dataframes in scala app and running it using spark. However if I work using only RDD's and not dataframes, no such error comes up with same pom and settings. Also while going through other posts with same error, it is mentioned that scala version has to be 2.10 as spark is not compatible with 2.11 scala, and i am using 2.10 scala version with 2.0.0 spark.
Below is the snip from pom:
<properties>
<spark-assembly>/usr/lib/spark/lib/spark-assembly.jar</spark-assembly>
<encoding>UTF-8</encoding>
<hadoop.version>2.7.1</hadoop.version>
<hbase.version>1.1.1</hbase.version>
<scala.version>2.10.5</scala.version>
<scala.tools.version>2.10</scala.tools.version>
<spark.version>2.0.0</spark.version>
<phoenix.version>4.7.0-HBase-1.1</phoenix.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
Error:
16/10/19 02:57:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at com.abc.xyz.Compare$.main(Compare.scala:64)
at com.abc.xyz.Compare.main(Compare.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
16/10/19 02:57:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;)
16/10/19 02:57:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Change scala version
<scala.version>2.11.8</scala.version>
<scala.tools.version>2.11</scala.tools.version>
and add
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>${scala.version}</version>
</dependency>
I also faced this error and this is purely a version issue.
Your scala version is not compatible or may be you are using the correct version but the intellij libraries has the old version.
Quick fix :
I as using spark 2.2.0 and scala 2.10.4 , which I then changed to scala version 2.11.8.After that do the below:
1) right click on intellij module
2) open module-settings.
3) go to libraries and clear all of them
4) Rebuild
Doing above for me issue is resolved.

Spark MLLIB error: java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.treeAggregate

I am trying to run the Linear Regression example in the Spark job. I got the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.treeAggregate$default$4(Ljava/lang/Object;)I
at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1.apply$mcVI$sp(GradientDescent.scala:189)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.mllib.optimization.GradientDescent$.runMiniBatchSGD(GradientDescent.scala:184)
at org.apache.spark.mllib.optimization.GradientDescent.optimize(GradientDescent.scala:107)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:267)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:190)
at com.myproject.sample.LinearRegression$.run(LinearRegressionExample.scala:105)
at com.myproject.sample.LinearRegression$$anonfun$main$1.apply(LinearRegressionExample.scala:67)
at com.myproject.sample.LinearRegression$$anonfun$main$1.apply(LinearRegressionExample.scala:66)
at scala.Option.map(Option.scala:145)
at com.myproject.sample.LinearRegression$.main(LinearRegressionExample.scala:66)
at com.myproject.sample.LinearRegression.main(LinearRegressionExample.scala)
Here is my dependencies in pom:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
<exclusions>
:
:
</dependency>
Why did I get java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.treeAggregate ? Did I use the wrong dependency? Thanks!