I am writing an application in Scala that uses Spark. I am packaging the app using Maven and running into problems when constructing an "uber" or "fat" jar.
The problem I am facing is that running the application works fine inside of an IDE or if I provide a non-uber-jar'd version of the dependencies as the java class path, but it does not work if I give the uber jar as the class path, i.e.
java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt
does not work. I get the following error message:
ERROR SparkContext: Error initializing SparkContext.
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:168)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at debug.spark_example.Example$.main(Example.scala:9)
at debug.spark_example.Example.main(Example.scala)
I would really appreciate help understanding what I need to add to the pom.xml file and why I need to add it to get this to work.
I have searched online and found the following resources, which I tried (see in the pom), but could not get to work:
1) Spark User Mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-td5615.html
2) how to package spark scala application
I have a simple example that demonstrates this problem, a simple 1 class project (src/main/scala/debug/spark_example/Example.scala):
package debug.spark_example
import org.apache.spark.{SparkConf, SparkContext}
object Example {
def main(args: Array[String]): Unit = {
val sc = new SparkContext(new SparkConf().setAppName("Test").setMaster("local[2]"))
val lines = sc.textFile(args(0))
val lineLengths = lines.map(s => s.length)
val totalLength = lineLengths.reduce((a, b) => a + b)
Here is the pom.xml file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<name>Scala-Tools Maven2 Repository</name>
<name>Scala-Tools Maven2 Repository</name>
<!-- Include here the dependencies you want to be packed in your fat jar -->
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
Many thanks in advance for your help.
It seems that the Spark submit script must be used to run the program.
Rather than:
java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt
Do something like:
<path-to>/spark-1.4.1/bin/spark-submit --class debug.spark_example.Example --master local[2] target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar data.txt
It also seems to work without the shaded jar; with only the jar-with-dependencies. The following pom.xml file worked for me:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<name>Scala-Tools Maven2 Repository</name>
<name>Scala-Tools Maven2 Repository</name>
This may have something to do with the order of your maven plugins. You're using both the "maven-assembly-plugin" and "maven-shade-plugin" plugins in your project, both bound to the same phase in the maven lifecycle. When this happens, maven executes the plugins in the order that they appear in the plugins section, so in your case it executes the assembly plugin, then the shade plugin.
Based on the output jar you're trying to run and the shade transformation you have, you probably want the opposite order. However, you may not even need the assembly plugin for your use case. You might be able to use the target/spark-example-0.1-SNAPSHOT-shaded.jar file.
<!-- SNIP -->
<!-- SNIP -->
Akka Docs helped me fix the issue. If you are using Shade then you must specify a transformer
I am using Scala IDE for creating a maven project with spark
1.I have created a maven project with skipped the archetype and added the following pom file.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- mixed scala/java compile -->
<!-- for fatjar -->
<!--This plugin's configuration is used to store Eclipse m2e settings
only. It has no influence on the Maven build itself. -->
I have added scala nature in Configure.
I have added src/main/scala folder in the Properties-->Build path-->Source
I have set java compiler 1.8 and scala compiler 2.10.6 with jvm 1.8 in my
project properties.
But i am getting the following error on maven clean install :
enter image description here
Go to this location
and try deleting hadoop-common-2.6.0-cdh5.8.2.jar file and do maven build again.
Can you try with an recent version of scala plugin ? (it's the successor of the one you used, I'm the author of both)
(see http://davidb.github.io/scala-maven-plugin/example_java.html)
if you use a jdk 9+ to compile (regardless of jvm 1.8 in the config) you can have this kind of issue. So check that the jdk is 1.8 .
I am trying to build a scala based jar file in eclipse that uses log4j to make logs. It prints out perfectly in the console but when I try to use log4j.properties file to make it write to a log file, nothing happens.
The project structure is as follows
package scala.n*****.n****
import org.apache.log4j.Logger
object loggerTest extends LogHelper {
def main(args : Array[String]){
log.info("This is info");
log.error("This is error");
log.warn("This is warn");
trait LogHelper {
lazy val log = Logger.getLogger(this.getClass.getName)
# Root logger option
log4j.rootLogger=WARN, stdout, file
# Redirect log messages to console
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
# Redirect log messages to a log file
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<name>Scala-tools Maven2 Repository</name>
<name>Hadoop Releases</name>
<name>Cloudera Repos</name>
<name>Scala-tools Maven2 Repository</name>
<!-- mixed scala/java compile -->
<!--This plugin's configuration is used to store Eclipse m2e settings
only. It has no influence on the Maven build itself. -->
When I run it as a maven build, it generates a jar file in "target" folder.
I copy the jar to /home/abc/test
I run that jar in spark shell with command
$ spark-submit --class scala.n*****.n*****.loggerTest loggerTest-0.0.1-SNAPSHOT.jar
The console come out alright but it should write to a file in /home/abc/log which it does not.
Could someone please help?
Hello while you are deploying you application you should define log4j file for executor and driver as follows
spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J" --master MASTER_IP:PORT JAR_PATH
For more details and step by step solution you can check this link https://blog.knoldus.com/2016/02/23/logging-spark-application-on-standalone-cluster/
I'm new in a Scala and need help to read some piece of the code. I'm looking now for the next code: ALS.scala from Apache Spark and try to understand, how it works and which classes/objects are involved in this source code.
Currently I have difficulty with the line 166 in the code, because I can not understand the nature of the SchemaUtils object.
I copied already the source code to my local repository and got an recognition error for SchemaUtils object. In my pom.xml I defined the Spark version 1.6.1, but I suppose this object is not anymore available in this version (probably older one). Consequently is not recognized by Scala. The error message is:
not found: value SchemaUtils
How I can fix this bug?
Here is my pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<name>Scala-tools Maven2 Repository</name>
<!-- mixed scala/java compile -->
<!-- for fatjar -->
<!--This plugin's configuration is used to store Eclipse m2e settings
only. It has no influence on the Maven build itself. -->
From the Spark docs
For the Scala API, Spark 1.6.1 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
You can't use 2.11 with Spark (yet), so change your <artifactId>spark-core_2.11</artifactId> and all related dependencies with the Scala version encoded to read _2.10 and try that.
Because of an incompatibility between a scala 2.9.2 project and the java 8 version, i need to manually specify jvm usage in my maven project.
The pom.xml i make, using documentation here :
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<name>${project.artifactId} ${project.version}</name>
<name>Scala snapshots repository</name>
<name>Scala repository</name>
I try this without success, maven continue to use my current jvm 8 and not the jvm given in maven-compiler-plugin : <executable>/home/reyman/Logiciels/jdk1.7.0_80/bin/javac</executable>
How can i force usage of the jvm 7 during mvn compile of my mixed scala/java sources project ?
This worked for me:
Use Something like this:
modify it per your need.
Here is my complete Build , I don't know if it will help you.
I'm pasting it here because it's too long for comment:
I had the same problem trying to use minify-maven-plugin 1.7 with java6. It needs Java7.
The problem is that Maven needs to run with the proper JVM to use the plugins, so the only solution I found was to export JAVA_HOME before running maven:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_79
mvn jetty:run
if you need to do this frequently maybe you can add to your bashrc an alias like
alias mvn7='ORIGINAL_JAVA_HOME=$JAVA_HOME; export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_79; mvn jetty:run; export JAVA_HOME=$ORIGINAL_JAVA_HOME'
This worked for me, but I had to allow Eclipse (Mars) to adjust plugin configuration. I tested my pom.xml with an empty project and it seems to build without incident.
I am using jenv to manage my Java environment outside of Eclipse which is much easier when you have many Java versions installed.
Here is the pom.xml I am using:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<!--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-->
<name>${project.artifactId} ${project.version}</name>
<name>Scala snapshots repository</name>
<name>Scala repository</name>
I am trying to get Cobertura to work on a really simple example project with Maven and Scala.
Here is my pom:
EDIT: Meanwhile, I found out that this pom is rather bad. If you're looking for a better example, see the one in the accepted answer.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<name>Scala-Tools Maven2 Repository</name>
<name>Scala-Tools Maven2 Repository</name>
<!-- Scalatest -->
<!-- disable surefire -->
<!-- enable scalatest -->
<filereports>WDF TestSuite.txt</filereports>
<!--enable cobertura-->
I disabled surefire, enabled Scalatest, and the tests do indeed get executed.
I added the Maven Cobertura plugin to the build and the reporting section of the pom. When I run
mvn clean install cobertura:cobertura -Dcobertura.report.format=xml
I do get a coverage report - which states that nothing is covered, coverage is 0%.
Now I tried various things: I played around with the executions part, I moved stuff from the build to the reporting section and vice versa, I tried different Maven goals. It was all in vain - either no report was created or it stated 0% coverage.
I even tried Scoverage! But with similar results.
So I guess I made some very basic mistake. Can anybody point me to it?
After a lot of trying out, I found the solution. The problem seems to have been the maven-scala-plugin dependency in the build section. But since there are so many things wrong with the pom shown above, I post here the new version, which works. (At least for the really tiny example project. With my real, bigger project, I ran into new problems with it.)
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<!-- Scalatest -->
<!-- work-around for https://issues.scala-lang.org/browse/SI-8358 -->
<!-- disable Surefire -->
<!-- enable Scalatest -->
<!--enable SCoverage-->
Hope this helps someone!