java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType$.apply(Lscala/collection/immutable/Seq;)Lorg/apache/spark/sql/types/StructType; - scala

Im facing this error while doing spark submit. I used scala 2.13.8 with spark dependencies version 3.3.0.. ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics message: User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType$.apply(Lscala/collection/immutable/Seq;)Lorg/apache/spark/sql/types/StructType;
I used the following pom.xml file to create jar file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>morningreport</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.scala-tools/maven-scala-plugin -->
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<id>scala-compile-first</id>
<goals>
<goal>compile</goal>
</goals>
<configuration>
<includes>
<include>**/*.scala</include>
</includes>
</configuration>
</execution>
<execution>
<id>scala-test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.example.newtest</mainClass>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

Related

cannot resolve symbol apache in spark scala maven

I am creating Spark application with scala, and it is Maven Project.
If Possible may Someone can share POM file. My application is only having SPARKSQL.
Do i need to set HADOOP_HOME to the directory containing winutils.exe
as i have not added in the config part of the code.
My POM file looks like:-
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>net.martinprobson.spark</groupId>
<artifactId>spark_example</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>Spark Batch And Streaming Application</description>
<inceptionYear>2019</inceptionYear>
<properties>
<scala.version>2.11</scala.version>
<scala.full.version>2.11.8</scala.full.version>
<spark.version>2.4.4</spark.version>
<java.version>1.8</java.version>
<jackson.version>2.6.5</jackson.version>
<scala.maven.plugin.version>3.2.2</scala.maven.plugin.version>
<maven.surefire.plugin.version>2.13</maven.surefire.plugin.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<!--Scala dependencies-->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.full.version}</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>2.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>2.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-yarn -->
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.7</version>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.6</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
<!-- disable surefire -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.12.4</version>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>
<!-- enable scala test -->
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.10</artifactId>
<version>2.2.6</version>
<configuration>
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<junitxml>.</junitxml>
<filereports>TestSuite.txt</filereports>
</configuration>
<executions>
<execution>
<id>test</id>
<goals>
<goal>test</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<executions>
<execution>
<id>jar-with-dependencies</id>
<phase>package</phase>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
My Scala code looks like
package Batch
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
import org.apache.spark.sql.functions._
Object BatchJob {
def main(args: Array[String]) {
val spark = SparkSession.builder
.master("local")
.appName("Fraud Detector")
.config("spark.driver.memory", "2g")
.enableHiveSupport
.getOrCreate()
import spark.implicits._
val financesDF = spark.read.json("Data/finances-small.json")
}
}
But getting Error as
**Cannot Resolve Symbol Apache
cannot Resolve Symbol Savemode
cannot Resolve Symbol SparkSession**
Is any problem with POM........ Highly Appreciate suggestion.
Kind Regards
you can simply replace your build tag with below. It worked for me.
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.3.2</version>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>process-resources</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>process-test-resources</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<executions>
<execution>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>

Maven shade-plugin not copying classes to .jar file

I have downloaded Gatling Maven Example and trying add mvn shade plugin in it as below, the jar get created but it doesn't contains any classes, so it fails during execution
E:\projects\gatling-maven>java -jar target\gatling-maven-plugin-demo-2.2.3.jar
Error: Could not find or load main class Engine
Here is pom.xml I have added
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>io.gatling</groupId>
<artifactId>gatling-maven-plugin-demo</artifactId>
<version>2.2.3</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<gatling.version>${project.version}</gatling.version>
<gatling-plugin.version>2.2.1</gatling-plugin.version>
<scala-maven-plugin.version>3.2.2</scala-maven-plugin.version>
</properties>
<dependencies>
<dependency>
<groupId>io.gatling.highcharts</groupId>
<artifactId>gatling-charts-highcharts</artifactId>
<version>${gatling.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala-maven-plugin.version}</version>
</plugin>
<plugin>
<groupId>io.gatling</groupId>
<artifactId>gatling-maven-plugin</artifactId>
<version>${gatling-plugin.version}</version>
<executions>
<execution>
<goals>
<goal>execute</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>Engine</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
and package structure is as
You have to move all your resources and scala files to src\main\resources and src\main\scala. Shade plugin will not include your test resources and scala files. I have also tried shadedTestjar and it also does not work. The other option could be you use either
Maven Dependency plugin and move all dependency manually - Error prone and ugly
Use Assembly plugin - Not suitable
I have tried moving your resources and scala files in src/main and it has worked. Following is working pom content,
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>io.gatling</groupId>
<artifactId>gatling-maven-plugin-demo</artifactId>
<version>2.2.3</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<gatling.version>${project.version}</gatling.version>
<gatling-plugin.version>2.2.1</gatling-plugin.version>
<scala-maven-plugin.version>3.2.2</scala-maven-plugin.version>
</properties>
<dependencies>
<dependency>
<groupId>io.gatling.highcharts</groupId>
<artifactId>gatling-charts-highcharts</artifactId>
<version>${gatling.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala-maven-plugin.version}</version>
</plugin>
<plugin>
<groupId>io.gatling</groupId>
<artifactId>gatling-maven-plugin</artifactId>
<version>${gatling-plugin.version}</version>
<executions>
<execution>
<goals>
<goal>execute</goal>
</goals>
<configuration>
<configFolder>src/main/resources</configFolder>
<dataFolder>src/main/resources/data</dataFolder>
<resultsFolder>target/gatling/results</resultsFolder>
<requestBodiesFolder>src/main/resources/request-bodies</requestBodiesFolder>
<simulationsFolder>src/main/scala</simulationsFolder>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>Engine</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Hope it solves your problem.
Try after changing your following plugin configuration,
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala-maven-plugin.version}</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
and
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>Engine</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
There are still file not found exception but I believe they are trivial to solve.
First learn about the Maven standard directory structure convention. Your main project source is now put under src/test/ instead of the correct src/main/, so they are treated as to be used in unit test. Therefore it will not be packaged.

Maven deploy to Nexus, No X509TrustManager

I am trying to deploy a deb-package to Nexus with maven. But I get the error
No X509TrustManager implementation available
No stacktrace available, just an Exception: java.security.cert.CertificationException.
So, how do I get a X509TrustManager into maven?
pom.xml looks like this:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>******</groupId>
<artifactId>************</artifactId>
<name>**********</name>
<description>*****************</description>
<properties>
<aspectj.version>1.6.12</aspectj.version>
<maven.compiler.version>1.6</maven.compiler.version>
<maven.compiler.source>1.6</maven.compiler.source>
<maven.compiler.target>1.6</maven.compiler.target>
<maven.wagon.http.ssl.insecure>true</maven.wagon.http.ssl.insecure>
<maven.wagon.http.ssl.allowall>true</maven.wagon.http.ssl.allowall>
<maven.wagon.http.ssl.ignore.validity.dates>true</maven.wagon.http.ssl.ignore.validity.dates>
</properties>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<includes>
<include>aop.xml</include>
</includes>
<targetPath>META-INF</targetPath>
</resource>
<resource>
<directory>src/main/resources</directory>
<excludes>
<exclude>aop.xml</exclude>
</excludes>
</resource>
</resources>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>aspectj-maven-plugin</artifactId>
<version>1.5</version>
<dependencies>
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjtools</artifactId>
<version>${aspectj.version}</version>
</dependency>
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjrt</artifactId>
<version>1.5.4</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.6.2</version>
</dependency>
</dependencies>
<configuration>
<complianceLevel>${maven.compiler.target}</complianceLevel>
<source>${maven.compiler.source}</source>
<target>${maven.compiler.target}</target>
<outxml>true</outxml>
<verbose>true</verbose>
<showWeaveInfo>true</showWeaveInfo>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal> <!-- use this goal to weave all your main classes -->
<goal>test-compile</goal> <!-- use this goal to weave all your test classes -->
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>${maven.compiler.version}</source>
<target>${maven.compiler.version}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>
org.codehaus.mojo
</groupId>
<artifactId>
aspectj-maven-plugin
</artifactId>
<versionRange>
[1.3,)
</versionRange>
<goals>
<goal>compile</goal>
<goal>test-compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore></ignore>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<artifactId>jdeb</artifactId>
<groupId>org.vafer</groupId>
<version>1.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>jdeb</goal>
</goals>
<configuration>
<classifier>all</classifier>
<dataSet>
<data>
<src>${project.build.directory}/${project.build.finalName}.jar</src>
<type>file</type>
<mapper>
<type>perm</type>
<prefix>/opt/tomcat/webapps/jsf/WEB-INF/lib</prefix>
</mapper>
</data>
</dataSet>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>aspectj-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
<distributionManagement>
<repository>
<id>deployment</id>
<name>**************</name>
<url>***********************************</url>
</repository>
</distributionManagement>
<version>1.0</version>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>**********</groupId>
<artifactId>**********</artifactId>
<version>1.2.6</version>
<scope>system</scope>
<systemPath>*****************</systemPath>
</dependency>
<dependency>
<groupId>aspectj</groupId>
<artifactId>aspectjrt</artifactId>
<version>1.5.4</version>
</dependency>
<dependency>
<groupId>javax.faces</groupId>
<artifactId>jsf-api</artifactId>
<version>2.1</version>
</dependency>
</dependencies>
</project>
Thanks, BadJoke
This seems to be an issue with your JDK. E.g. check out this bug.
I would suggest to upgrade to the latest release of your chosen JDK and if that still fails potentially try with the latest Java8 release from Oracle.

Packaging and Running Scala Spark Project with Maven

I am writing an application in Scala that uses Spark. I am packaging the app using Maven and running into problems when constructing an "uber" or "fat" jar.
The problem I am facing is that running the application works fine inside of an IDE or if I provide a non-uber-jar'd version of the dependencies as the java class path, but it does not work if I give the uber jar as the class path, i.e.
java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt
does not work. I get the following error message:
ERROR SparkContext: Error initializing SparkContext.
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:168)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at debug.spark_example.Example$.main(Example.scala:9)
at debug.spark_example.Example.main(Example.scala)
I would really appreciate help understanding what I need to add to the pom.xml file and why I need to add it to get this to work.
I have searched online and found the following resources, which I tried (see in the pom), but could not get to work:
1) Spark User Mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-td5615.html
2) how to package spark scala application
I have a simple example that demonstrates this problem, a simple 1 class project (src/main/scala/debug/spark_example/Example.scala):
package debug.spark_example
import org.apache.spark.{SparkConf, SparkContext}
object Example {
def main(args: Array[String]): Unit = {
val sc = new SparkContext(new SparkConf().setAppName("Test").setMaster("local[2]"))
val lines = sc.textFile(args(0))
val lineLengths = lines.map(s => s.length)
val totalLength = lineLengths.reduce((a, b) => a + b)
lineLengths.foreach(println)
println(totalLength)
}
}
Here is the pom.xml file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>debug.spark-example</groupId>
<artifactId>spark-example</artifactId>
<version>0.1-SNAPSHOT</version>
<inceptionYear>2015</inceptionYear>
<properties>
<scala.majorVersion>2.11</scala.majorVersion>
<scala.minorVersion>.2</scala.minorVersion>
<spark.version>1.4.1</spark.version>
</properties>
<repositories>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.majorVersion}${scala.minorVersion}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.majorVersion}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<configuration>
<downloadSources>true</downloadSources>
<buildcommands>
<buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<additionalProjectnatures>
<projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
</additionalProjectnatures>
<classpathContainers>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
<classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
</classpathContainers>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>attached</goal>
</goals>
</execution>
</executions>
<configuration>
<tarLongFileMode>gnu</tarLongFileMode>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>false</minimizeJar>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<includes>
<!-- Include here the dependencies you want to be packed in your fat jar -->
<include>*:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.7</version>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
</plugin>
</plugins>
</reporting>
</project>
Many thanks in advance for your help.
It seems that the Spark submit script must be used to run the program.
Rather than:
java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt
Do something like:
<path-to>/spark-1.4.1/bin/spark-submit --class debug.spark_example.Example --master local[2] target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar data.txt
It also seems to work without the shaded jar; with only the jar-with-dependencies. The following pom.xml file worked for me:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>debug.spark-example</groupId>
<artifactId>spark-example</artifactId>
<version>0.1-SNAPSHOT</version>
<inceptionYear>2015</inceptionYear>
<properties>
<scala.majorVersion>2.11</scala.majorVersion>
<scala.minorVersion>.2</scala.minorVersion>
<spark.version>1.4.1</spark.version>
</properties>
<repositories>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.majorVersion}${scala.minorVersion}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.majorVersion}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<configuration>
<downloadSources>true</downloadSources>
<buildcommands>
<buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<additionalProjectnatures>
<projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
</additionalProjectnatures>
<classpathContainers>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
<classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
</classpathContainers>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>attached</goal>
</goals>
</execution>
</executions>
<configuration>
<tarLongFileMode>gnu</tarLongFileMode>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.7</version>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
</plugin>
</plugins>
</reporting>
</project>
This may have something to do with the order of your maven plugins. You're using both the "maven-assembly-plugin" and "maven-shade-plugin" plugins in your project, both bound to the same phase in the maven lifecycle. When this happens, maven executes the plugins in the order that they appear in the plugins section, so in your case it executes the assembly plugin, then the shade plugin.
Based on the output jar you're trying to run and the shade transformation you have, you probably want the opposite order. However, you may not even need the assembly plugin for your use case. You might be able to use the target/spark-example-0.1-SNAPSHOT-shaded.jar file.
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<!-- SNIP -->
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<!-- SNIP -->
</plugin>
</plugins>
Akka Docs helped me fix the issue. If you are using Shade then you must specify a transformer
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>akka.Main</Main-Class>
</manifestEntries>
</transformer>

maven scalaest plugin encoding

I have tried to use maven-scalaest-plugin and he is working well.
The problem is that the results are not looking good because encoding.
I have tried to use it with eclipse or cmd but got same results.
Image of what i see
the pom is this:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>jp.mwsoft.sample</groupId>
<artifactId>java-scala-test</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<scala-version>2.9.2</scala-version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>${scala-version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.10</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<!--
<artifactId>scalatest_${scala-version}</artifactId>
-->
<artifactId>scalatest_2.9.0</artifactId>
<version>2.0.M5</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/java</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.7</version>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<version>1.0-M2</version>
<configuration>
<argLine></argLine>
</configuration>
<executions>
<execution>
<goals>
<goal>test</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
You can remove colors in outup with
<stdout>W</stdout>
Like:
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<version>1.0-RC2</version>
<configuration>
<reportsDirectory>${project.build.directory}/scalatest-reports</reportsDirectory>
<junitxml>.</junitxml>
<stdout>W</stdout> <!-- Skip coloring output -->
<encoding>UTF-8</encoding>
<htmlreporters>./html</htmlreporters>
<filereports>WDF TestSuite.txt</filereports>
<membersOnlySuites>test.common</membersOnlySuites>
</configuration>
<executions>
<execution>
<id>test</id>
<goals>
<goal>test</goal>
</goals>
<configuration>
<membersOnlySuites>test.common</membersOnlySuites>
</configuration>
</execution>
</executions>
</plugin>
It's because scalatest outputs special characters to set the text colour and such like. Here is a screenshot from linux. It's probably that Eclipse console doesn't know what the characters are. Here is a screenshot from linux.
http://s24.postimg.org/zeymeowmd/scrimage.png