How do I make packages available to the Scala REPL? - scala

I'm trying to get familiar with Scala. I am using macOS.
I've installed scala using brew install scala which is hassle-free and once complete I can launch the scala REPL simply by issuing scala and I'm at the scala> prompt.
I now want to import some packages, so I try:
import org.apache.spark.sql.Column
and unsurprisingly it fails with
error: object apache is not a member of package org
This makes sense, how would it know where to get that package from? Thing is, I don't know what I need to do to make that package available. Is there anything I can do from the command-line that would allow me to import org.apache.spark.sql.Column?
I have googled around a little but not found anything that explains in a jargon-free way. Complete Scala noob here so jargon-free responses would be appreciated.

Here are two ways to start a REPL with dependencies that I'm aware of:
Use SBT to manage dependencies, use console to start a REPL with those dependencies
Use Ammonite REPL
You could create a separate directory with a build.sbt where you set
scalaVersion := "2.11.12"
and then copy the
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
snippets from MavenCentral. Then you can run the REPL with sbt console. Note that this will create a project and target subdirectories, so it "leaves traces", you can't use it like the standalone scala-repl. You could also omit the build.sbt, and add the library-dependencies by typing them into the SBT-shell itself.
Alternatively you can just use Ammonite REPL that has been created exactly for that purpose.

You can use classpath to make the lib available i.e. download the jar locally and use the command as follows (here I'm using Apache IO lib to move files from scala prompt )
C0:Desktop pvangala$ scala -cp /Users/pvangala/Downloads/commons-io-2.6/commons-io-2.6.jar
Welcome to Scala 2.12.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161).
Type in expressions for evaluation. Or try :help.
scala> import java.io.File
import java.io.File
scala> val src = new File("/Users/pvangala/Downloads/commons-io-2.6-bin.tar")
src: java.io.File = /Users/pvangala/Downloads/commons-io-2.6-bin.tar
scala> val dst = new File("/Users/pvangala/Desktop")
dst: java.io.File = /Users/pvangala/Desktop
scala> org.apache.commons.io.FileUtils.moveFileToDirectory(src, dst, true)

If you want to use spark stuff I'd recommend you use the spark-shell that comes with the spark-installation. I don't know macOS so I can't help you much with the install of Spark there.
For normal Scala I recommend ammonite http://ammonite.io/#Ammonite-REPL that has included syntax to allow to pull packages/dependencies.

If you want to use spark, you should use the spark-shell instead the scala REPL. It has almost the same behaviour but includes all the spark dependencies by default.
You should download spark binaries from here
Then if you are using Linux, you should create the variable SPARK_HOME pointing to the downloaded folder and include its bin folder in PATH.
then you can start it in any console with the command spark-shell
In Windows i'm not sure, but i think that you should have a spark-shell.cmd file or something similar which you can use to start the spark-shell,

I did the following in Windows:
for /f "tokens=*" %%a in ('java -jar coursier fetch -p "com.lihaoyi::requests:0.2.0" "com.lihaoyi::upickle:0.7.5"') do set SCP=%%a
scala -nc -classpath %SCP% %1 %2 %3
Instead of the two libraries listed here you can use an unlimited number of other libraries you need. They must be available in maven central, though. The %1 could be a scala script (".sc" extension). But you could leave it empty and the REPL will start with the libraries on the classpath.

Related

Add multiple classpath entries to scala REPL classpath

:cp seems to only accept a single entry
scala> :cp /usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:
/home/sboesch/spark-master/lib_managed/jars/*:/home/sboesch/spark-master/lib_managed/bundles/*:
The path '/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/home/sboesch/spark-master/lib_managed/jars/*:/home/sboesch/spark-master/lib_managed/bundles/*:'
doesn't seem to exist.
Any thoughts on how to do this when already in the REPL. Yes I know how to set it up from outside the REPL :
CLASSPATH=/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hbase/*:/usr/lib/hbase/lib/*
:/home/sboesch/spark-master/lib_managed/jars/*:
/home/sboesch/spark-master/lib_managed/bundles/*: scala
EDIT It seems the intent were not clear. I am working on code in the REPL. Then have a new snippet of code that requires a few classpath entries. It is a ONE OFF affair: so I do not want to add to build.sbt or to the scala/lib dir , etc. I did not receive any answer really satisifying this use case, but awarded the best efforts anyways.
scala -cp "path1:path2" now seems to work.
scala -version Picked up _JAVA_OPTIONS: -Xms512m -Xmx4096m
-XX:MaxPermSize=1024m -XX:ReservedCodeCacheSize=128m Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=1024m; support
was removed in 8.0 Scala code runner version 2.11.8 -- Copyright
2002-2016, LAMP/EPFL
The help text for :cp says:
:cp <path> add a jar or directory to the classpath
So I'm guessing there's no exact way for you to get that. I'd use this:
:load <path> interpret lines in a file
I confirmed that it works for REPL commands as well as Scala code.
Addendum:
If you use SBT then all your projects dependencies are in the class-path for the REPL launched by SBT's console task.
A quick and dirty approach, add a link from $SCALA_HOME/lib/ to a folder with additional jar files. Then from REPL you can import packages of interest.

Scala: How can I install a package system wide for working with in the repl?

In Python, if I install a package with pip install package_name, I can open a Python repl by typing python and simply import the package by its name, regardless of what directory I'm currently in in the filesystem.
Like so
$ python
Python 2.7.3 (default, Sep 26 2013, 20:03:06)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>>
and the requests library is imported and I can play with it in the repl.
In Scala, I know how to do this in a project that uses sbt, but for learning purposes, I would like to install a package in such a way so that I can simply type scala at the command line and then import the installed package, without being tied to a specific project.
$ scala
Welcome to Scala version 2.10.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_40).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scalaz._
<console>:7: error: not found: value scalaz
import scalaz._
How can I do this?
Scala is different from Python. Code compiled for Scala 2.9.x is not compatible to 2.10.x. So global definitions can cause a lot of problems if you work with different versions.
You can use SBT and add to $HOME/.sbt/plugins/build.sbt
libraryDependencies += "org.scalaz" %% "scalaz-core" % "7.0.4"
or
libraryDependencies += "org.scalaz" % "scalaz-core_2.10" % "7.0.4"
and then go to /tmp and start a Scala REPL with SBT:
sbt console
But on long term it is not a good idea.
The best thing would be to install SBT, create a file build.sbt
and put this in it:
libraryDependencies += "org.scalaz" % "scalaz-core_2.10" % "7.0.4"
scalaVersion := "2.10.2"
initialCommands in console := "import scalaz._, Scalaz._"
Now change with the console into the folder of build.sbt and run
sbt console
With this you can experiment with the REPL and have already scalaz imported and in the class path. In addition it is easy to add additional dependencies.
SBT is cool, you don't need to install new Scala versions manually, just declare it in build.sbt.
In addition to S.R.I i'm using the following solution, shell script:
/usr/bin/scalaz
#!/bin/sh
scala -cp ~/.ivy2/cache/org.scalaz/scalaz-core_2.10/bundles/scalaz-core_2.10-7.1.0-M3.jar ... other libs
then just call it in the terminal:
$ scalaz
Welcome to Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_40).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scalaz._; import Scalaz._
import scalaz._
import Scalaz._
scala> Monad[Option].point(1)
res0: Option[Int] = Some(1)
scala>
This is generally not recommended since most jar libraries are meant to be used along with programmer projects. Also, unlike other ecosystems, jar libraries are usually installed via some user mode library management tool like ivy or maven or sbt, as you may have observed.
If you really want to do this, you can install jars to scala's TOOL_CLASSPATH location which you can find out from scala.bat or scala shell script file bundled along with your scala distribution. Alternatively, you can build your own custom scala repl, which can load globally installable libraries from some configured location. Either way, it requires futzing with the TOOL_CLASSPATH.
P.S: I currently do not have access to the actual scala.bat file to help you with this, but you can look it up here and here to understand what I mean. Note that these files may not show how the .bat files are structured as those in the distribution(and may be quite dated). Please look it up in the official distribution for information.
EDIT
I can explain a bit more now that I'm back and looked at the actual scala batch and shell scripts included in official distribution :-)
Like I said above, the scala script loads all jar files present in its TOOL_CLASSPATH folder which is usually ${SCALA_HOME}/lib. It also offers the ability to add to TOOL_CLASSPATH with the promising -toolcp option - Let's see what it shows: (the batch script is similar - I'll just show things from scala shell script)
while [[ $# -gt 0 ]]; do
case "$1" in
-D*)
# pass to scala as well: otherwise we lose it sometimes when we
# need it, e.g. communicating with a server compiler.
java_args=("${java_args[#]}" "$1")
scala_args=("${scala_args[#]}" "$1")
shift
;;
-J*)
# as with -D, pass to scala even though it will almost
# never be used.
java_args=("${java_args[#]}" "${1:2}")
scala_args=("${scala_args[#]}" "$1")
shift
;;
-toolcp)
TOOL_CLASSPATH="${TOOL_CLASSPATH}${SEP}${2}"
shift 2
;;
-nobootcp)
unset usebootcp
shift
;;
-usebootcp)
usebootcp="true"
shift
;;
-debug)
SCALA_RUNNER_DEBUG=1
shift
;;
*)
scala_args=("${scala_args[#]}" "$1")
shift
;;
esac
done
As you can see, this is sorely limiting - you'd have to specify each jar to be added. You could just use -cp! Can we make it better? Sure, we'd have to muck around in this toolcp business.
addtoToolCP() {
for i in $(find $1 -name "*.jar")
do
if [[ -z "$TOOLCP" ]]
then
TOOLCP="$i"
else
TOOLCP="${TOOLCP}:$i"
fi
done
}
So, you could just check for the emptiness of our TOOLCP parameter and accordingly call scala as scala -toolcp $TOOLCP if it's non-empty. Now, you can just invoke your shell script as: myscalascript <list-of-paths-to-be-added-to-toolcp>. Or you could just keep one folder and go on adding new libraries to that folder. Hope this helps - as others have said, do watch out for binary compatibility issues. Binary incompatibility issues would only affect major scala versions, minor versions should be perfectly compatible. Lastly, at risk of repeating myself to death, use this only if you're sure you want this. :-)

How to use third party libraries with Scala REPL?

I've downloaded Algebird and I want to try out few things in the Scala interpreter using this library. How do I achieve this?
Of course, you can use scala -cp whatever and manually manage your dependencies. But that gets quite tedious, especially if you have multiple dependencies.
A more flexible approach is to use sbt to manage your dependencies. Search for the library you want to use on search.maven.org. Algebird for example is available by simply searching for algebird. Then create a build.sbt referring to that library, enter the directory and enter sbt console. It will download all your dependencies and start a scala console session with all dependencies automatically on the classpath.
Changing things like the scala version or the library version is just a simple change in the build.sbt. To play around you don't need any scala code in your directory. An empty directory with just the build.sbt will do just fine.
Here is a build.sbt for using algebird:
name := "Scala Playground"
version := "1.0"
scalaVersion := "2.10.2"
libraryDependencies += "com.twitter" % "algebird-core" % "0.2.0"
Edit: often when you want to play around with a library, the first thing you have to do is to import the namespace(s) of the library. This can also be automated in the build.sbt by adding the following line:
initialCommands in console += "import com.twitter.algebird._"
Running sbt console will not import libraries declared with a test scope. To use those libraries in the REPL, start the console with
sbt test:consoleQuick
You should be aware, however, that starting the console this way skips compiling your test sources.
Source: http://www.scala-sbt.org/0.13/docs/Howto-Scala.html
You can use the scala's -cp switch to keep jars on the classpath. There are other switches available too, for example, -deprecation and -unchecked for turning on various warnings. Many more to be found with scala -X... and scala -Y.... You can find out more information about these switches with scala -help
This is an answer using Ammonite (as opposed to the Scala REPL) - but it is such a great tool that it is worth mentioning.
You can install it with a one liner such as:
sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/2.1.2/2.13-2.1.2) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
or using brew on macOS:
brew install ammonite-repl
For scala 2.10, you need to use an oder version 1.0.3:
sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.0.3/2.10-1.0.3) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
Run Ammonite in your terminal:
amm
// Displays
Loading...
Welcome to the Ammonite Repl 2.1.0 (Scala 2.12.11 Java 1.8.0_242)
Use in ivy import to import your 3rd part library:
import $ivy.`com.twitter::algebird-core:0.2.0`
Then you can use your library within the Ammonite-REPL:
import com.twitter.algebird._
import com.twitter.algebird.Operators._
Map(1 -> Max(2)) + Map(1 -> Max(3)) + Map(2 -> Max(4))
...

Scala : trying to get log4j working

Scala newb here (it's my 2nd day of using it). I want to get log4j logging working in my Scala script. The script and the results are below, any ideas as to what's going wrong?
[sean#ibmp2 pybackup]$ cat backup.scala
import org.apache.log4j._
val log = LogFactory.getLog()
log.info("started backup")
[sean#ibmp2 pybackup]$ scala -cp log4j-1.2.16.jar:. backup.scala
/home/sean/projects/personal/pybackup/backup.scala:1: error: value apache is not a member of package org
import org.apache.log4j._
^
one error found
I reproduce it under Windows: delimiter of '-classpath' must be ';' there (not ':'). Are you use cygwin or some sort of unix emulator?
But Scala script works anywhere without current dir in classpath. Try to use:
$ scala -cp log4j-1.2.16.jar backup.scala
JFI: LogFactory is a class of slf4j library (not log4j).
UPDATE
Another possible case: broken jar in classpath, maybe during download or something else. Scala interpreter does report only about unavailable member of the package.
$ echo "qwerty" > example.jar
$ scala -cp example.jar backup.scala
backup.scala:1: error: value apache is not a member of package org
...
Need to inspect content of the jar-file:
$ jar -tf log4j-1.2.16.jar
...
org/apache/log4j/Appender.class
...
Did you remember to put log4j.jar in your classpath?
Had Similar issue when started doing Scala Development using Eclipse, doing a clean build solved the problem.
Guess the Scala tools are not matured et.
Instead of using log4j directly, you might try using Configgy. It's the Scala Way™ to work with log4j, as well as configuration files. It also plays nicely with SBT and Maven.
I asked and answered this question myself, have a look:
Put it under src/main/resources/logback.xml. It will be copied to the right location when SBT is doing the artifact assembly.

Include jar file in Scala interpreter

Is it possible to include a jar file run running the Scala interpreter?
My code is working when I compile from scalac:
scalac script.scala -classpath *.jar
But I would like to be able to include a jar file when running the interpreter.
In scala2.8,you can use
scala>:jar JarName.jar
to add a jar to the classpath.
In Scala 2.8.1, it is not :jar but :cp
And in Scala 2.11.7 it is not :cp but :re(quire)
According to scala executable help all options of scalac are allowed ,
so you can run scala -classpath some.jar, i've just tried and it looks like it works
Include multiple jars int Scala REPL 2.10.0-RC2
scala -classpath my_1st.jar:my_2nd.jar:my_3rd.jar
in my case i am using Scala code runner version 2.9.2. and i had to add quotation marks.
I am using this jar files:
jdom-b10.jar, rome-0.9.jar
and everything goes fine with this:
scala -classpath "*.jar" feedparser.scala
In Scala version 2.11.6 from scala REPL use :require, can best be figured out by using :help from REPL
For example:
$ scala
Welcome to Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45).
Type in expressions to have them evaluated.
Type :help for more information.
scala> :require lift-json_2.11-3.0-M5-1.jar
Added '<path to lift json library>/lift-json/lift-json_2.11-3.0-M5-1.jar' to classpath.
Scala version 2.11.5:
Here is an example of adding all jars in your ivy cache:
scala -cp /Users/dbysani/.ivy2/cache/org.apache.spark/spark-streaming_2.10/jars/*
scala> import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext
You can also create a local folder of all the jars that you need to get added and add it in a similar way.
Hope this helps.
"lib/*.jar" generates a list with blank between items not ":" or ";" as required.
Since Java 6 "lib/*" should work, but sometimes doesn't (classpath is set somewhere else)
I use a script like:
Windows:
#rem all *.jars in lib subdirectory
#echo off
set clp=.
for %%c in (lib\*.jar) do call :Setclasspath %%c
echo The classpath is %clp%
scala -classpath %clp% script.scala
exit /B %ERRORLEVEL%
:Setclasspath
set clp=%clp%;%~1
exit /B 0
Linux:
#!/bin/bash
#all *.jars in lib subdirectory
clp="."
for file in lib/*
do
clp="$clp:$file"
done
echo $clp
scala -classpath $clp script.scala