Running scala code using java -jar <jarfile> - scala

I am trying to run scala code using java -jar <> i am getting below issue
ERROR:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataOutputStream at com.cargill.finance.cdp.blackline.Ingest.main(Ingest.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataOutputStream
The same code is running fine with spark-submit.
I am trying to write data to hdfs file.
I have imported below classes
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FSDataOutputStream

You need to add all dependencies (including transitive dependencies, i.e. dependencies of dependencies) to -cp argument. If you just look at direct dependencies of hadoop-core you'll see why you should never do this manually. Instead use a build system. If you followed e.g. https://spark.apache.org/docs/latest/quick-start.html it actually sets up SBT, so you can do sbt run to run the main class like java -cp <lots of libraries> -jar <jarfile> would). If you didn't, add build.sbt as described there.

Related

Class org.apache.hadoop.fs.s3a.S3AFileSystem not found on spark-scala-s3 using build.sbt, failing at reading file on S3

Trying to read a file on S3 bucket from spark and scala program but it is failing at read step which reads the file from S3 with the below error.
Caused by: java.lang.ClassNotFoundException Create breakpoint: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found.
But same code is working with sbt on linux machine with the below steps
project_dir# sbt clean
project_dir# sbt compile
project_dir# sbt run
If jar is created for the same code, that jar is not working when it is executed using spark-submit
project_dir# sbt package

scala play unmanaged jar added but import not working

I'm trying to add Jep to a Scala (2.11.8) Play Framework (2.5.8) project of mine.
As far as I can tell, Sbt can see the unmanaged jar:
[play-scala] $ show unmanagedClasspath
[info] List(Attributed(/home/stondo/dev/git/play-dashboard-mongo/lib/jep.cpython-35m-x86_64-linux-gnu.so), Attributed(/home/stondo/dev/git/play-dashboard-mongo/lib/libjep.so), Attributed(/home/stondo/dev/git/play-dashboard-mongo/lib/jep-3.6.0.jar))
but when I run a very simple test it fails:
[error] cannot create an instance for class IntegrationSpec
...
[error] CAUSED BY java.lang.UnsatisfiedLinkError: no jep in java.library.path
...
Let me mention that running scala -cp /path/to/myjar and then importing Jep, works:
scala -cp ./lib/jep-3.6.0.jar
scala> import jep.Jep
import jep.Jep
Any ideas about what's going on?
Thanks in advance
It's not a problem of import-not-working. It's a problem of cannot loading the native library. Unlike java libraries, native libraries (jep.cpython-35m-x86_64-linux-gnu.so) must be put in some directory listed in either the PATH environment variable or the "java.library.path" system property.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

Please note that I am better dataminer than programmer.
I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"),
and I run into following problem.
When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD"
Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?
First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven.
I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success.
I am also using windows 7.
My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.
The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)
The author suggest an hybrid development approach:
Keep the frontier of development in the REPL, and, as pieces of code
harden, move them over into a compiled library.
Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.
In the book the author describes how the simplesparkproject can be executed:
use maven to compile and package the project
cd simplesparkproject/
mvn package
start the spark-shell with the jar dependencies
spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md
Then you can access you object within the spark-shell as follows:
val myApp = com.cloudera.datascience.MyApp
However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml.
Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.
<!--<scope>provided</scope>-->
you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.
Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

Scala package ith SBT - Can't find " .../immutalbe/Map"

I've created a simple app that generates PageView for later Spark tasks.
I've only one scala file taht use a simple MAP
When created a package with SBT I run my class with command:
java -cp .\target\scala-2.10\pageviewstream_2.10-1.0.0.jar "clickstream.PageViewGenerator"
but I receive this error:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/immutable/Map
What I am doing wrong?
Many thanks in advance
Roberto
To run it correctly you need to add Scala runtime library into your class path:
java -cp $SCALA_HOME/lib/scala-library.jar;.\target\scala-2.10\pageviewstream_2.10-1.0.0.jar "clickstream.PageViewGenerator"
But .. you can run your application also as:
scala -classpath .\target\scala-2.10\pageviewstream_2.10-1.0.0.jar "clickstream.PageViewGenerator"
when you have scala already in PATH
or use directly sbt as:
sbt "runMain clickstream.PageViewGenerator"
when clickstream.PageViewGenerator is you only application it is enough to run:
sbt run
or when you are in sbt interactive mode just type:
> runMain clickstream.PageViewGenerator
or when it is only application in your project it is enough to run:
> run

Scala REHL jar importing issue

I was trying to run a scala file using command
scala myclass.scala
However, it complains about one of the import library. I included the jar using the -classpath option like this.
scala -class ncscala-time.jar myclass.scala
Error I got is:
myclass.scala:5: error: object github is not a member of package com
import com.github.nscala_time.time.Imports._
Any idea why?