Scala package ith SBT - Can't find " .../immutalbe/Map" - scala

I've created a simple app that generates PageView for later Spark tasks.
I've only one scala file taht use a simple MAP
When created a package with SBT I run my class with command:
java -cp .\target\scala-2.10\pageviewstream_2.10-1.0.0.jar "clickstream.PageViewGenerator"
but I receive this error:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/immutable/Map
What I am doing wrong?
Many thanks in advance
Roberto

To run it correctly you need to add Scala runtime library into your class path:
java -cp $SCALA_HOME/lib/scala-library.jar;.\target\scala-2.10\pageviewstream_2.10-1.0.0.jar "clickstream.PageViewGenerator"
But .. you can run your application also as:
scala -classpath .\target\scala-2.10\pageviewstream_2.10-1.0.0.jar "clickstream.PageViewGenerator"
when you have scala already in PATH
or use directly sbt as:
sbt "runMain clickstream.PageViewGenerator"
when clickstream.PageViewGenerator is you only application it is enough to run:
sbt run
or when you are in sbt interactive mode just type:
> runMain clickstream.PageViewGenerator
or when it is only application in your project it is enough to run:
> run

Related

No such file or class on classpath: com.<name>.<abc>.<classname> when executing scala jar using command line

I am getting the
No such file or class on classpath: com
when executing the scala uber jar using the below command.
scala -classpath kafka-scala-1.0-SNAPSHOT.jar com.< name >.< abc >.KafkaAggregateConsumerApp
I am using scala 2.11.12
From the error it looks like you might have a space after the com in your command. Re-check that please.
Also, if you already have an uber jar you can run the jar directly using the java command.
java -cp kafka-scala-1.0-SNAPSHOT.jar com.foo.bar.KafkaAggregateConsumerApp

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

Please note that I am better dataminer than programmer.
I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"),
and I run into following problem.
When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD"
Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?
First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven.
I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success.
I am also using windows 7.
My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.
The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)
The author suggest an hybrid development approach:
Keep the frontier of development in the REPL, and, as pieces of code
harden, move them over into a compiled library.
Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.
In the book the author describes how the simplesparkproject can be executed:
use maven to compile and package the project
cd simplesparkproject/
mvn package
start the spark-shell with the jar dependencies
spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md
Then you can access you object within the spark-shell as follows:
val myApp = com.cloudera.datascience.MyApp
However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml.
Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.
<!--<scope>provided</scope>-->
you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.
Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

build and executable jar using SBT

I have a simple Scala command line App that I want to package using SBT.
object Transform extends App {
val source = scala.io.Source.fromFile(args(0))
...
}
I can't seem to find anything in the SBT docs or an online example of a SBT configuration/command that would allows me to create a standalone executable jar (java -jar ...) with the appropriate manifest and dependencies included.
I did find SBT Assembly, but it looks to be a plugin for SBT < 0.13.5.
sbt-onejar was created for exactly this use case.

Running Spark sbt project without sbt?

I have a Spark project which I can run from sbt console. However, when I try to run it from the command line, I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext. This is expected, because the Spark libs are listed as provided in the build.sbt.
How do I configure things so that I can run the JAR from the command line, without having to use sbt console?
To run Spark stand-alone you need to build a Spark assembly.
Run sbt/sbt assembly on the spark root dir. This will create: assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
Then you build your job jar with dependencies (either with sbt assembly or maven-shade-plugin)
You can use the resulting binaries to run your spark job from the command line:
ADD_JARS=job-jar-with-dependencies.jar SPARK_LOCAL_IP=<IP> java -cp spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar:job-jar-with-dependencies.jar com.example.jobs.SparkJob
Note: If you need other HDFS version, you need to follow additional steps before building the assembly. See About Hadoop Versions
Using sbt assembly plugin we can create a single jar. After doing that you can simply run it using java -jar command
For more details refer

Run junit4 test from cmd

I tried to run junit4 test case from command line using:
java -cp junit-4.8.1.jar;test\Dijkstra;test\Dijkstra\bin org.junit.runner.JUnitCore Data0PathTest00
but I got the following error:
java.lang.NoClassDefFoundError: graph/shortestgraphpath;
while the test case is working without any problems in eclipse.
Hint: in eclipse, shortestgraphpath was added in Referenced Libraries.
You need to the jar file containing shortestgraphpath to java class path.
java -cp junit-4.8.1.jar;test\Dijkstra; test\Dijkstra\bin org.junit.runner.JUnitCore Data0PathTest00
The class path is the value that you pass to java with -cp so in your question you just supply junitand your compiled classes.
Try updating it with the jar file with the missing class.
java -cp junit-4.8.1.jar;<path to jar file>;test\Dijkstra;test\Dijkstra\bin org.junit.runner.JUnitCore Data0PathTest00
You might have to add additional jar files as well. I recommend that you take a look at some build tool to help you build and run your java applications for example Maven, Gradle, Buildr.