Apache Flink: How to sink stream to Google Cloud Storage File System - scala

I was trying yo write Some Data Streams into a file in Google Cloud Storage File System as following (Using Flink 1.8 and Scala 2.11):
data.addSink(new BucketingSink[(String, Int)]("gs://url-try/try.txt"))
But I´m facing the following error:
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:638)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
Caused by: java.lang.RuntimeException: Error while creating FileSystem when initializing the state of the BucketingSink.
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:379)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:278)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:403)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1227)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:432)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:376)
... 8 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 'gs' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:179)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:399)
... 11 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/HdfsConfiguration
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:85)
... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.HdfsConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
I have seen several questions about that and I also did:
env variables:
- FLINK_CONF_DIR
file flink-conf.yaml:
- fs.hdfs.hadoopconf: src/main/resources/core-site.xml
core-site.xml:
> <property>
> <name>fs.gs.impl</name>
> <value>com.google.cloud.hadoop.fs.gcs.
> GoogleHadoopFileSystem</value>
> <description>The FileSystem for gs: (GCS) uris.</description>
> </property>
> <property>
> <name>fs.AbstractFileSystem.gs.impl</name>
> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
> <description>The AbstractFileSystem for gs: (GCS)
> uris.</description>
These are my pom dependencies:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-lang/commons-lang -->
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector -->
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop3-1.9.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-hadoop-fs -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-fs</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-filesystem -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.35.0</version>
</dependency>
</dependencies>
Any help?

According to the stack trace posted by you, I have seen that you are having issues trying to write into the GCS container Using Flink and Scala.
So there is a similar post where the issue is resolved, please check it.
do not hesitate to comeback if you have further questions.

Related

java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder - Scala Spark

I am trying to execute an WordCount example with Scala Spark but I am having the following problem:
java.lang.NoClassDefFoundError: org/apache/htrace/core/Tracer$Builder
at org.apache.hadoop.fs.FsTracer.get(FsTracer.java:42)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3232)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3286)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:478)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
at es.santander.prueba.pruebas.WordCount.count(WordCount.scala:123)
at es.santander.prueba.pruebas.WordCount.main(WordCount.scala:50)
at es.santander.prueba.pruebas.Main$.main(WordCount.scala:27)
at es.santander.prueba.pruebas.Main.main(WordCount.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.core.Tracer$Builder
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 11 more
I have been searching to fix this problem and I add to my pom.xml the following jars:
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>2.4.5</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.htrace/htrace-core -->
<dependency>
<groupId>org.htrace</groupId>
<artifactId>htrace-core</artifactId>
<version>3.0.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.htrace/htrace-hbase -->
<dependency>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-hbase</artifactId>
<version>4.1.0-incubating</version>
</dependency>
But the problem persist, Am I doing wrong something or do I need something more?
EDIT: This is where I am having the error:
FileSystem.get(spark.sparkContext.hadoopConfiguration).delete(new Path(outputFile), true)

Feature file executes with an error: Expected scheme-specific part at index 10:

Setting up an automation environment in Eclipse.
Created a Maven project, installed Cucumber plugin, specified all dependencies.
Now, when running feature file I am getting an error.
Shall I glue the step definition file how I did it in Intelij? I cleared the path in Glue option, but it has no effect to the current issue.
Exception in thread "main" java.lang.IllegalArgumentException: Expected scheme-specific part at index 10: classpath:
at io.cucumber.core.model.GluePath.parseAssumeClasspathScheme(GluePath.java:64)
at io.cucumber.core.model.GluePath.parse(GluePath.java:34)
at cucumber.runtime.RuntimeOptions.parse(RuntimeOptions.java:161)
at cucumber.runtime.RuntimeOptions.<init>(RuntimeOptions.java:108)
at cucumber.runtime.RuntimeOptions.<init>(RuntimeOptions.java:101)
at cucumber.runtime.RuntimeOptions.<init>(RuntimeOptions.java:97)
at cucumber.runtime.Runtime$Builder.withArgs(Runtime.java:131)
at cucumber.runtime.Runtime$Builder.withArgs(Runtime.java:127)
at cucumber.api.cli.Main.run(Main.java:22)
at cucumber.api.cli.Main.main(Main.java:8)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 10: classpath:
at java.net.URI$Parser.fail(Unknown Source)
at java.net.URI$Parser.failExpecting(Unknown Source)
at java.net.URI$Parser.parse(Unknown Source)
at java.net.URI.<init>(Unknown Source)
at io.cucumber.core.model.GluePath.parseAssumeClasspathScheme(GluePath.java:62)
... 9 more
Here are all dependencies I added into pom.xml:
<dependencies>
<!-- https://mvnrepository.com/artifact/junit/junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/io.cucumber/cucumber-junit -->
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>4.2.2</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/io.cucumber/cucumber-core -->
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-core</artifactId>
<version>4.2.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.cucumber/cucumber-java -->
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>4.2.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.cucumber/cucumber-jvm -->
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-jvm</artifactId>
<version>4.2.4</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/io.cucumber/cucumber-jvm-deps -->
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-jvm-deps</artifactId>
<version>1.0.6</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/net.masterthought/cucumber-reporting -->
<dependency>
<groupId>net.masterthought</groupId>
<artifactId>cucumber-reporting</artifactId>
<version>4.5.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/info.cukes/gherkin -->
<dependency>
<groupId>info.cukes</groupId>
<artifactId>gherkin</artifactId>
<version>2.12.2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.mockito/mockito-all -->
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>2.0.2-beta</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.codehaus.mojo/cobertura-maven-plugin -->
<dependency>
<groupId>org.codehaus.mojo</groupId>
<artifactId>cobertura-maven-plugin</artifactId>
<version>2.7</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.141.59</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>27.0.1-jre</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.sun/tools -->
<dependency>
<groupId>com.sun</groupId>
<artifactId>tools</artifactId>
<version>1.7.0.13</version>
<scope>system</scope>
<systemPath>C:\Program Files\Java\jdk1.8.0_201\lib\tools.jar</systemPath>
</dependency>
I came across the same stack trace using the Cucumber plugin for Eclipse. I had just changed from Cucumber 1.2.5 (where everything worked fine) to 4.2.4. (Note: I also had to change the import for DataTable from cucumber.api.DataTable to io.cucumber.datatable.*.)
The github page for the code where the error is occurring gives some clues.
The glue path can be written as either a package name: {#code com.example.app},
a path {#code com/example/app} or uri {#code classpath:com/example/app}.
Based on this text, I suspected the "Glue:" field in the Cucumber Feature runner configuration. The default was simply "classpath:". When I changed this to be the package name where the step classes were located, the tests ran.
My best guess is that the old version of Cucumber was somehow able to accept "classpath:" for the Glue value, while the new version does not.

WildFly GWT Error message tomcat container

after deploying my application over WildFly I see the following messages:
2017-02-15 10:06:51,440 ERROR [org.jboss.msc.service.fail]
(ServerService Thread Pool -- 178) MSC000001: Failed to start service
jboss.undertow.deployment.default-server.default-host./cati:
org.jboss.msc.service.StartException in service
jboss.undertow.deployment.default-server.default-host./cati:
java.lang.NoSuchMethodError:
org.apache.tomcat.util.descriptor.DigesterFactory.newDigester(ZZLorg/apache/tomcat/util/digester/RuleSet;Z)Lorg/apache/tomcat/util/digester/Digester;
at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.jboss.threads.JBossThread.run(JBossThread.java:320) Caused by: java.lang.NoSuchMethodError:
org.apache.tomcat.util.descriptor.DigesterFactory.newDigester(ZZLorg/apache/tomcat/util/digester/RuleSet;Z)Lorg/apache/tomcat/util/digester/Digester;
at org.apache.tomcat.util.descriptor.tld.TldParser.(TldParser.java:49)
at org.apache.tomcat.util.descriptor.tld.TldParser.(TldParser.java:44)
at org.apache.jasper.servlet.TldScanner.(TldScanner.java:79)
at org.apache.jasper.servlet.JasperInitializer.newTldScanner(JasperInitializer.java:120)
at org.eclipse.jetty.apache.jsp.JettyJasperInitializer.newTldScanner(JettyJasperInitializer.java:115)
at org.apache.jasper.servlet.JasperInitializer.onStartup(JasperInitializer.java:101)
at io.undertow.servlet.core.DeploymentManagerImpl.deploy(DeploymentManagerImpl.java:184)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:100)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:82)
... 6 more
in my maven project I also import the following depdencies
<!-- GWT -->
<!-- https://mvnrepository.com/artifact/com.google.gwt/gwt-user -->
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-user</artifactId>
<version>2.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.gwt/gwt-servlet -->
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-servlet</artifactId>
<version>2.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.gwt/gwt-dev -->
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-dev</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.tomcat</groupId>
<artifactId>tomcat-util-scan</artifactId>
<version>8.5.2</version>
</dependency>
As I read over the WildFly project the Tomcat container was removed that why I added the following dependecies :
<dependency>
<groupId>org.apache.tomcat</groupId>
<artifactId>tomcat-util-scan</artifactId>
<version>8.5.2</version>
</dependency>
any ideea how to bypass this error message ?
Of course is not working , as you can see if you add for testing the following depedency (gwt-dev):
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-dev</artifactId>
<version>2.8.0</version>
</dependency>
is including the jetty server and this is one of the movtiv for receving the message from my first question.
Be sure to not clutter the server side with compile time dependencies.
gwt-servlet is needed only for runtime, gwt-dev and gwt-user should be set to provided - which results in having them available for compiling to JavaScript, but leaving them out in the final WAR file on the application server.
<!-- GWT -->
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-servlet</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-user</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.gwt</groupId>
<artifactId>gwt-dev</artifactId>
<scope>provided</scope>
</dependency>
But after all it looks to me like you should not try to put any Tomcat libraries in the WildFly server as it already contains everything to run the server part of your application.
Therefore just stick to the plain servlet API like so:
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>javax.servlet-api</artifactId>
<scope>provided</scope>
</dependency>
It is provided again, as WildFly has it built-in.
Can you try to exclude the Tomcat dependencies and see if it works for you?

java.lang.ClassNotFoundException: Failed to find data source: json

I want to deploy my first Spark Application on our new cluster, but I get the following stacktrace:
java.lang.ClassNotFoundException: Failed to find data source: json. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:294)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:249)
at de.wlw.mad.company_analyser.util.WebPageReader$.apply(WebPageReader.scala:9)
at de.wlw.mad.company_analyser.Analyse$.sparkCommand(Analyse.scala:54)
at de.wlw.mad.company_analyser.Analyse$.apply(Analyse.scala:171)
at de.wlw.mad.company_analyser.AnalysisWorker$.listen(AnalysisWorker.scala:18)
at de.wlw.mad.bunny.AbstractListener$1.handleDelivery(AbstractListener.java:46)
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: json.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)
I have written some Tests for my Spark Code, and they are running successfully. My pom.xml:
<!-- Hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
<!-- Scala Lib -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<!-- Spark Dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<artifactId>guava</artifactId>
<groupId>com.google.guava</groupId>
<type>jar</type>
<version>16.0.1</version>
</dependency>
<!-- JUnit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
The corresponding code looks like this:
import org.apache.spark.sql.{Dataset, SparkSession}
object WebPageReader {
def apply(spark: SparkSession, path: String): Dataset[WebPageCase] = {
import spark.implicits._
spark.read.json(path).as[WebPageCase]
}
}
WebPageCase is a Scala Case Class
The Cluster in running in the cloud. Using bin/spark-submit, I get the same error as running the job from my local environment. The Spark Master and Workers are build without Hadoop.
What Package am I missing or doing wrong?

Getting exception : java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;) while using data frames

I am receiving "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)" error while using dataframes in scala app and running it using spark. However if I work using only RDD's and not dataframes, no such error comes up with same pom and settings. Also while going through other posts with same error, it is mentioned that scala version has to be 2.10 as spark is not compatible with 2.11 scala, and i am using 2.10 scala version with 2.0.0 spark.
Below is the snip from pom:
<properties>
<spark-assembly>/usr/lib/spark/lib/spark-assembly.jar</spark-assembly>
<encoding>UTF-8</encoding>
<hadoop.version>2.7.1</hadoop.version>
<hbase.version>1.1.1</hbase.version>
<scala.version>2.10.5</scala.version>
<scala.tools.version>2.10</scala.tools.version>
<spark.version>2.0.0</spark.version>
<phoenix.version>4.7.0-HBase-1.1</phoenix.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
Error:
16/10/19 02:57:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at com.abc.xyz.Compare$.main(Compare.scala:64)
at com.abc.xyz.Compare.main(Compare.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
16/10/19 02:57:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;)
16/10/19 02:57:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Change scala version
<scala.version>2.11.8</scala.version>
<scala.tools.version>2.11</scala.tools.version>
and add
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>${scala.version}</version>
</dependency>
I also faced this error and this is purely a version issue.
Your scala version is not compatible or may be you are using the correct version but the intellij libraries has the old version.
Quick fix :
I as using spark 2.2.0 and scala 2.10.4 , which I then changed to scala version 2.11.8.After that do the below:
1) right click on intellij module
2) open module-settings.
3) go to libraries and clear all of them
4) Rebuild
Doing above for me issue is resolved.