How to compile and build spark examples into jar? - scala

So I am editing MovieLensALS.scala and I want to just recompile the examples jar with my modified MovieLensALS.scala.
I used build/mvn -pl :spark-examples_2.10 compile followed by build/mvn -pl :spark-examples_2.10 package which finish normally. I have SPARK_PREPEND_CLASSES=1 set.
But when I re-run MovieLensALS using bin/spark-submit --class org.apache.spark.examples.mllib.MovieLensALS examples/target/scala-2.10/spark-examples-1.4.0-hadoop2.4.0.jar --rank 5 --numIterations 20 --lambda 1.0 --kryo data/mllib/sample_movielens_data.txt I get java.lang.StackOverflowError even though all I added to MovieLensALS.scala is a println saying that this is the modified file, with no other modifications whatsoever.
My scala version is 2.11.8 and spark version is 1.4.0 and I am following the discussion on this thread to do what I am doing.
Help will be appreciated.

So I ended up figuring it out myself. I compiled using mvn compile -rf :spark-examples_2.10 followed by mvn package -rf :spark-examples_2.10 to generate the .jar file. Note that the jar file produced here is spark-examples-1.4.0-hadoop2.2.0.jar.
On the other hand, the stackoverflow error was because of a long lineage. For that I could either use checkpoints of reduce numiterations, I did the later. I followed this for it.

Related

Building Customize Spark

We are creating a customize version of Spark since we are changing some lines of code from ALS.scala. We build the customize spark version using
mvn command:
./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn
However, upon using the customized version of Spark, we run into this error:
Do you guys have some idea on what causes the error and how we might solve the issue?
I am actually using a jar file in the local machine by building them using sbt: sbt compile then sbt clean package and putting the jar file here: /Users/user/local/kernel/kernel-0.1.5-SNAPSHOT/lib.
However in the hadoop environment, the installation is different. Thus, I use maven to build spark and that's where the error comes in. I am thinking that this error might be dependent on using maven to build spark as there are some reports like this:
https://issues.apache.org/jira/browse/SPARK-2075
or probably on building spark assembly files

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

Please note that I am better dataminer than programmer.
I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"),
and I run into following problem.
When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD"
Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?
First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven.
I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success.
I am also using windows 7.
My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.
The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)
The author suggest an hybrid development approach:
Keep the frontier of development in the REPL, and, as pieces of code
harden, move them over into a compiled library.
Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.
In the book the author describes how the simplesparkproject can be executed:
use maven to compile and package the project
cd simplesparkproject/
mvn package
start the spark-shell with the jar dependencies
spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md
Then you can access you object within the spark-shell as follows:
val myApp = com.cloudera.datascience.MyApp
However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml.
Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.
<!--<scope>provided</scope>-->
you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.
Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

Run Spark in standalone mode with Scala 2.11?

I follow the instructions to build Spark with Scala 2.11:
mvn -Dscala-2.11 -DskipTests clean package
Then I launch per instructions:
./sbin/start-master.sh
It fails with two lines in the log file:
Failed to find Spark assembly in /etc/spark-1.2.1/assembly/target/scala-2.10
You need to build Spark before running this program.
Obviously, it's looking for a scala-2.10 build, but I did a scala-2.11 build. I tried the obvious -Dscala-2.11 flag, but that didn't change anything. The docs don't mention anything about how to run in standalone mode with scala 2.11.
Thanks in advance!
Before building you must run the script under:
dev/change-version-to-2.11.sh
Which should replace references to 2.10 with 2.11.
Note that this will not necessarily work as intended with non-GNU sed (e.g. OS X)

How to Compile Apache Spark with Scala 2.11.1 using SBT?

I've been trying to compile Apache spark with scala-2.11.1 (the latest version at the time). However, each time I try it ends up compiling everything to scala-2.10.*. I don't understand why.
The official documentation suggests that we use maven for compilation after switching to 2.11 using script in the dev/ folder.
What if I wanted to use sbt instead?
You need to enable scala-2.11 profile
>sbt -Dscala-2.11=true
sbt> compile

Running Scala^Z3 with Scala 2.10

I installed Scala^Z3 on my Mac OSX (Mountain Lion, JDK 7, Scala 2.10, Z3 4.3) successfully (following this: http://lara.epfl.ch/w/ScalaZ3). Everything went fine except that I cannot run any example from this website (http://lara.epfl.ch/w/jniz3-scala-examples) without getting this nasty error:
java.lang.NoClassDefFoundError: scala/reflect/ClassManifest
at .<init>(<console>:8)
at .<clinit>(<console>)
at .<init>(<console>:7)
...
Caused by: java.lang.ClassNotFoundException: scala.reflect.ClassManifest
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 29 more
I think this happens because of the incompatibility between the Scala 2.9.x and 2.10.x in handling reflections. As I was able to run the same set of examples under Scala 2.9.x. My question is, is there anyway to go around this and run Scala^Z3 under Scala 2.10?
From looking at the project propertie and build file (https://github.com/psuter/ScalaZ3/blob/master/project/build.properties and https://github.com/psuter/ScalaZ3/blob/master/project/build/scalaZ3.scala) I infer that scalaZ3 is currently provided for scala 2.9.2 only. There is no cross version support at the moment.
You might try to get the code and compile it yourself after having changed the version to scala 2.10.0 in the "build.properties" file.
See this page for instructions on how to compile it: https://github.com/psuter/ScalaZ3.
If you're lucky, the code will compile as is under scala 2.10. If you're not, there might be some small fixes to do. Cross your fingers.
If you are not in a hurry, you could also bug the Scala^Z3 authors and ask them for scala 2.10 version of the library.
I'm copying the instructions from my response to your issue on GitHub, as it may help someone in the future.
The current status is that the old sbt project does not seem mix well with Scala 2.10. Here are the instructions for a "manual" compilation of the project, for Linux. This works for me with Z3 4.3 (grabbed from the Z3 git repo) and Scala 2.10. After installing Z3 per the original instructions:
First compile the Java files:
$ mkdir bin
$ find src/main -name '*.java' -exec javac -d bin {} +
Then compile the C files. For this, you need to generate the JNI headers first, then compile the shared library. The options in the commands below are for Linux. To find our where the JNI headers are, I run (new java.io.File(System.getProperty("java.home")).getParent in a Scala console (and add /include to the result).
$ javah -classpath bin -d src/c z3.Z3Wrapper
$ gcc -o lib-bin/libscalaz3.so -shared -Wl,-soname,libscalaz3.so \
-I/usr/lib/jvm/java-6-sun-1.6.0.26/include \
-I/usr/lib/jvm/java-6-sun-1.6.0.26/include/linux \
-Iz3/4.3/include -Lz3/4.3/lib \
-g -lc -Wl,--no-as-needed -Wl,--copy-dt-needed -lz3 -fPIC -O2 -fopenmp \
src/c/*.[ch]
Now compile the Scala files:
$ find src/main -name '*.scala' -exec scalac -classpath bin -d bin {} +
You'll get "feature warnings", which is typical when moving to 2.10, and another warning about a non-exhaustive pattern match.
Now let's make a jar file out of everything...
$ cd bin
$ jar cvf scalaz3.jar z3
$ cd ..
$ jar uf bin/scalaz3.jar lib-bin/libscalaz3.so
...and now you should have bin/scalaz3.jar containing everything you need. Let's try it out:
$ export LD_LIBRARY_PATH=z3/4.3/lib
$ scala -cp bin/scalaz3.jar
scala> z3.scala.version
Hope this helps!
This does not directly answer the question, but might help others trying to build scalaz3 with scala 2.10.
I built ScalaZ3 with Scala 2.10.1 and Z3 4.3.0 on Windows 7. I tested it with the integer constraints example at http://lara.epfl.ch/w/jniz3-scala-examples and it is working fine.
Building Z3
The Z3 4.3.0 download at codeplex does not include libZ3.lib file. So I had to download the source and build it on my machine. The build process is quite simple.
Building ScalaZ3
Currently, build.properties has sbt version 0.7.4 and scala version 2.9.2. This builds fine. ( I had to make some minor modifications to the build.scala file. Modify z3LibPath(z3VN).absolutePath to z3LibPath(z3VN).absolutePath + "\\libz3.lib" in the gcc task. )
Now if I change scala version to 2.10.1 in build.properties, I get "Error compiling sbt component 'compiler-interface'" error on launching sbt. I have no clue why this happens.
I then changed sbt version to 0.12.2 and scala version to 2.10.1, and started with the fresh source. I have also added build.sbt in project root folder containing scalaVersion := "2.10.1". This is required since in sbt 0.12.2, the file build.properties is supposed to be used for specifying the sbt version only. More info about sbt version differences at (https://github.com/harrah/xsbt/wiki/Migrating-from-SBT-0.7.x-to-0.10.x).
I get error Z3Wrapper.java:27: cannot find symbol LibraryChecksum. This happens because the file LibraryChecksum.java which is supposted to be generated by the build (project\build\build.scala) is not generated. Looks like the package task does not execute the tasks in (project\build\build.scala). The tasks compute-checksum, javah, and gcc are not executed. This may be happening because sbt 0.12.2 expects the build.scala file to be directly under the project folder.
I then copied the LibraryChecksum.java generated from the previous build, the build then goes through. The generated jar file does not contain scalaz3.dll.
I then executed javah and gcc tasks manually. The command for these tasks can be copied from the log of successful build with scala 2.9.2(I made appropriate modifications to the commands for scala 2.10.1). Here also, I had to make some changes. I had to explicitly add full path of scala-library.jar classpath of the javah task.
I then added the lib-bin\scalaz3.dll to the jar file using jar uf target\scala-2.10\scalaz3.jar lib-bin/scalaz3.dll.