Scala REPL Remove Dependency - scala

So I am using the Spark-Shell which is essentially the Scala REPL with added in dependencies and a few setup procedures. The problem with some of my code that I am running through it is there are already dependencies added in. I was wondering if there is a way to remove dependencies to add newer ones I want?
I can easily add in new ones with the :cp .jar command but it does not seem to be overwriting the one that is currently there.

You could try writing you own Spark-Shell which would give you fine grain control over your dependencies. Spark-Shell is essentially sbt console with a few initialCommands in a build.sbt.

Related

compile scala-spark file to jar file

Im working on a project of frequent item sets, and I use the Algorithm FP-Growth, I depend on the version developed in Scala-Spark
https://github.com/apache/spark/blob/v2.1.0/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
I need to modify this code and recompile it to have a jar file that I can include it to Spark-shell , and call its functions in spark
the problem s that spark-shell is un interpreter , and it finds errors in this file, Ive tried Sbt with eclipse but it did not succeded .
what i need is compiler that can use the last version of scala and spark-shel libraries to compile this file to jar file.
Got your question now!
All you need to do is add dependency jars(scala, java, etc.,) with respect to the machine you are going to use you own jar. Later on add the jars to spark-shell and you can use it like below,
spark-shell --jars your_jar.jar
Follow this steps:
check out Spark repository
modify files to want to modify
build project
run ./dev/make-distribution.sh script, which is inside Spark repository
run Spark Shell from your Spark distribution

How to execute a task/command right after the project is loaded in sbt shell

I would like to execute a task/command every time I enter the sbt shell. Is there any init-task or init-command setting? Is there any other way?
Do you use *.sbt or *.scala to define you sbt project?
In case of scala files it's supposed to be simple. Basically during sbt startup all code in scala classes is compiled and executed. So basically what can you do is to define command(function) you want to execute directly inside class/object where your project is defined.
Option 2. Based on docs.
http://www.scala-sbt.org/0.12.2/docs/faq.html#how-can-i-take-action-when-the-project-is-loaded-or-unloaded
Pay attention on onLoad setting key.

Is Scala installed multiple times if using Scala IDE, Scala on the command line, and SBT?

As far as I understand, I have multiple versions/installs of Scala to be able to access it via Eclipse, bash/OS-X shell, and for SBT:
one version of Scala as supplied with the Scala IDE;
the Scala binaries to be able to run it from within a shell; and,
Scala as part of SBT.
Is my understanding correct? If so, is there any way to run with just the one version/install for all uses?
Is my understanding correct?
No. You don't "install" Scala. You just have multiple versions of Executable Jar file of scala-compiler, scala-library etc. The version that you have on your PATH is the one that seems installed but its nothing more than running a jar file.
TO run on a specific version, just add the scala jars to the classpath of your project. If you are using SBT, you can specify the scalaVersion in your build.sbt and it will add the proper Jar to the classpath

Scala dependency on Spark installation

I am just getting started with Spark, so downloaded the for Hadoop 1 (HDP1, CDH3) binaries from here and extracted it on a Ubuntu VM. Without installing Scala, I was able to execute the examples in the Quick Start guide from the Spark interactive shell.
Does Spark come included with Scala? If yes, where are the libraries/binaries?
For running Spark in other modes (distributed), do I need to install Scala on all the nodes?
As a side note, I observed that Spark has one of the best documentation around open source projects.
Does Spark come included with Scala? If yes, where are the libraries/binaries?
The project configuration is placed in project/ folder. I my case here it is:
$ ls project/
build.properties plugins.sbt project SparkBuild.scala target
When you do sbt/sbt assembly, it downloads appropriate version of Scala along with other project dependencies. Checkout the folder target/ for example:
$ ls target/
scala-2.9.2 streams
Note that Scala version is 2.9.2 for me.
For running Spark in other modes (distributed), do I need to install Scala on all the nodes?
Yes. You can create a single assembly jar as described in Spark documentation
If your code depends on other projects, you will need to ensure they are also present on the slave nodes. A popular approach is to create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark itself as a provided dependency; it need not be bundled since it is already present on the slaves. Once you have an assembled jar, add it to the SparkContext as shown here. It is also possible to submit your dependent jars one-by-one when creating a SparkContext.
Praveen -
checked now the fat-master jar.
/SPARK_HOME/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar
this jar included with all the scala binaries + spark binaries.
you are able to run because this file is added to your CLASSPAH when you run spark-shell
check here : run spark-shell > http:// machine:4040 > environment > Classpath Entries
if you downloaded pre build spark , then you don't need to have scala in nodes, just this file in CLASSAPATH in nodes is enough.
note: deleted the last answer i posted, cause it may mislead some one. sorry :)
You do need Scala to be available on all nodes. However, with the binary distribution via make-distribution.sh, there is no longer a need to install Scala on all nodes. Keep in mind the distinction between installing Scala, which is necessary to run the REPL, and merely packaging Scala as just another jar file.
Also, as mentioned in the file:
# The distribution contains fat (assembly) jars that include the Scala library,
# so it is completely self contained.
# It does not contain source or *.class files.
So Scala does indeed come along for the ride when you use make-distribution.sh.
From spark 1.1 onwards, there is no SparkBuild.scala
You ahve to make your changes in pom.xml and build using Maven

How to make classes from a Scala jar library of mine accessible in Scala console and Scala scripts?

I just wonder how can I extend Scala console and "script" runner with my own classes so that I can actually use my code by means of using the actual Scala language to communicate to it? Where am I to put my jars so that they can be seamlessly accessed from every Scala instance without ad-hoc configuration?
If you just need to interact with your code you can add a -classpath to the commandline when starting the repl.
scala -classpath mycode.jar
If you need to do more than that, start browsing the repl source. You can download it from github at https://github.com/scala/scala
I use sbt to accomplish this. It can start the repl with project classes and dependencies on the classpath by using "console" action.
You can use the CLASSPATH variable directly, e.g.:
CLASSPATH="/Users/opyate/.ivy2/cache/com.mongodb.casbah/casbah-core_2.9.1/jars/casbah-core_2.9.1-2.1.5-1.jar:/Users/opyate/.ivy2/cache/com.mongodb.casbah/casbah-commons_2.9.1/jars/casbah-commons_2.9.1-2.1.5-1.jar" scala