How to run scala spark intellij-idea project in mac terminal? - scala

I um using intellj IDE for my spark project using scala. The requirement is to run the program from the terminal by passing the csv filename to the project using terminal.
My project structure is as follows:
Scaladatavalidaton:
-src
-main
-scala
-ValidationPackage
-validator.scala
-test
so what I am looking for is :
command_name Scalafilename_to_be_run CSVfilename_to_be_Validated
I tried using sbt run but I got main class not detected error even though I have specified the main object.

Related

How to setup different Scala versions on the same machine?

I want to follow the book on Scala[1] but it uses Scala 3 and I have Scala 2 installed. I want to use both the versions, something on the lines of python2 and python3.
I tried installing Scala3 on my local using the official source but I could only grasp the project-level working directory. The sbt prompt does not work like a REPL would and I can only open REPL using Scala 2 (I checked the version everytime).
How do I open the REPL of Scala3 given I cannot uninstall Scala2?
The sbt prompt does not work like a REPL
If you execute sbt console from within project directory it will drop you into REPL version corresponding to the project's scalaVersion. For example, executing sbt console within project created with sbt new lampepfl/dotty.g8 would start Scala 3 REPL.
but I could only grasp the project-level working directory
For system-wide installation first install coursier and then execute cs install scala3-repl. This will install Scala 3 REPL alongside the Scala 2 one. Now Scala 3 REPL can be started with scala3-repl command whilst Scala 2 REPL simply with scala command.

Running scala in cmd makes i look like I am missing 'build.sbt'

I'm trying to run Scala in my command line.
I checked my java, went to the Scala website, downloaded and installed it, updated my environment variables.
So far the only thing different from guides online is that the folder where sbt is installed does not include a "lib" folder.
I then run sbt command in my prompt, and I get this message:
It looks like I'm missing a file called build.sbt, what is this? and do i need it?
Edit:
If I press 'continue' on the picture above, I get
sbt:scalaproj>
Which looks fine, but if i type some code, like this:
sbt:scalaproj> var a : Int = 12;
Then it returns errors:
[error] Expected ';'
[error] var a : Int = 12
What in the world is going wrong? can someone point me to a guide for writing Scala in the prompt that is not too old to work?
Let's first understand the terminology. Scala is the language you are writing. SBT is an acronym for Scala Build Tool. Both of them have REPL.
When you call sbt in the command line, you initiate the REPL of sbt. The commands you can run there, are all commands sbt supports. You can find here the common commands. For example, if you run compile, it will compile the build.sbt located at the directory where you called the sbt command. Anyway, Scala commands WILL NOT work here. Scala commands are not sbt commands.
In order to run Scala REPL, you need to type console in the sbt REPL. You can find here the Scala REPL documentation. Within the Scala REPL you can run Scala commands.
P.S.
You can find the Scala download page here.

Building Customize Spark

We are creating a customize version of Spark since we are changing some lines of code from ALS.scala. We build the customize spark version using
mvn command:
./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn
However, upon using the customized version of Spark, we run into this error:
Do you guys have some idea on what causes the error and how we might solve the issue?
I am actually using a jar file in the local machine by building them using sbt: sbt compile then sbt clean package and putting the jar file here: /Users/user/local/kernel/kernel-0.1.5-SNAPSHOT/lib.
However in the hadoop environment, the installation is different. Thus, I use maven to build spark and that's where the error comes in. I am thinking that this error might be dependent on using maven to build spark as there are some reports like this:
https://issues.apache.org/jira/browse/SPARK-2075
or probably on building spark assembly files

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

Please note that I am better dataminer than programmer.
I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"),
and I run into following problem.
When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD"
Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?
First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven.
I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success.
I am also using windows 7.
My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.
The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)
The author suggest an hybrid development approach:
Keep the frontier of development in the REPL, and, as pieces of code
harden, move them over into a compiled library.
Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.
In the book the author describes how the simplesparkproject can be executed:
use maven to compile and package the project
cd simplesparkproject/
mvn package
start the spark-shell with the jar dependencies
spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md
Then you can access you object within the spark-shell as follows:
val myApp = com.cloudera.datascience.MyApp
However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml.
Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.
<!--<scope>provided</scope>-->
you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.
Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

opencv 3.0.0 java imread_0 undefined

I am trying to develop an application using java opencv 3.0.0-beta using scala.
I am getting a runtime error:
java.lang.UnsatisfiedLinkError: java.lang.UnsatisfiedLinkError: org.opencv.imgcodecs.Imgcodecs.imread_1(Ljava/lang/String;)J
While researching the cause i have created the following simple application the exhibits similar behaviour:
import reflect._
import org.opencv.core.Core
import org.opencv.core.Mat
import org.opencv.core.CvType
import org.opencv.imgcodecs.Imgcodecs
object main extends Application {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME)
val what = "something.png"
val mat = Imgcodecs.imread(what)
Imgcodecs.imwrite("something_else.png", mat)
}
The major difference is that, if run as "sbt run" it performs as expected. if the appropriate lines are removed from the above the code fails in REPL.
I suspect that this issue is related to the original issue, but have no proof.
If i look at the memory map of the JVM in both cases i have the expected libs loaded.
If the code is inspected i find no definition of org.opencv.imgcodecs.Imgcodecs.imread_1
I am quite lost as to where to go next in diagnosing this issue.
Is there anyone who has come across this issue?
Thanks
i haven't used openCV3.0 yet, as it has major changes and breaks opencv 2.4.x code , are you supplying the library path to
sbt run
add
javaOptions in run += "-Djava.library.path=lib/opencv/"
to your build.sbt file or pass on cmd line
sbt run -Djava.library.path=lib/opencv/
opencv folder should have your files that gets generated along with your jar file
i have java bindins for 2.4.9 , 2.4.10 and 3.0.0 for java 7 and 8 in this git repo if you need them
git#gitlab.com:opencv/java_lib.git