Spark-shell -i path/to/filename alternative - scala

We have:
spark-shell -i path/to/script.scala
to run a scala script, is it possible to add something like this to the spark-defaults.conf file so that it always loads the scala script on start up of the spark-shell and thus does not have to be added to the command line.
I would like to use this to store import _, credentials and user defined functions that I use regularly so that I don't have to enter the commands every time I start spark-shell.
Thanks,
Shane

You can go to spark directory /bin, create file spark-shell-new.cmd and paste
spark-shell -i path/to/script.scala then run spark-shell-new in cmd like a default spark-shell.

You can do something like this
:load <path_to_script>
Write all the required lines of code in that script

Related

Scala: IOException during execute sh file

I'm trying to execute sh-file from resources.
Executed file is located at the root of resources: src/main/resources/hiveCommand.sh
import sys.process._
"./hiveCommand.sh" !!
But receive IOException: not such file or directory
What am I doing wrong?
Scala's Process integration does not know how to handle shell scripts. It can only start programs. To run a shell script you need start a shell (e.g. bash) and give it the file to run as an argument.
There is a complication however. Since the script is a resource (located at src/main/resources/hiveCommand.sh during compile time), it is located in a jar at run time.
So in short:
First extract the shell script (use getClass.getResourceAsStream("/hiveCommand.sh") to read the resource) and store it on disk.
Then start with something like:
"bash /tmp/hiveCommand.sh".!!

Is it possible to run a Spark Scala script without going inside spark-shell?

The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?
You can simply use the stdin redirection with spark-shell:
spark-shell < YourSparkCode.scala
This command starts a spark-shell, interprets your YourSparkCode.scala line by line and quits at the end.
Another option is to use -I <file> option of spark-shell command:
spark-shell -I YourSparkCode.scala
The only difference is that the latter command leaves you inside the shell and you must issue :quit command to close the session.
[UDP]
Passing parameters
Since spark-shell does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.
Fortunately, there may be a lot of options to approach the same (e.g, externalizing the parameters in another file and read it in the very beginning in your script).
But I personally find the Spark configuration the most clean and convenient way.
Your pass your parameters via --conf option:
spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala
(please note that spark. prefix in your property name is mandatory, otherwise Spark will discard your property as invalid)
And read these arguments in your Spark code as below:
val arg1: String = spark.conf.get("spark.myscript.arg1")
val arg2: String = spark.conf.get("spark.myscript.arg2")
It is possible via spark-submit.
https://spark.apache.org/docs/latest/submitting-applications.html
You can even put it to bash script either create sbt-task
https://www.scala-sbt.org/1.x/docs/Tasks.html
to run your code.

Execute the scala script through spark-shell in silent mode

Need to execute the scala script through spark-shell with silent mode. When I am using spark-shell -i "file.scala", after the execution, I am getting into the scala interactive mode. I don't want to get into there.
I have tried to execute the spark-shell -i "file.scala". But I don't know how to execute the script in silent mode.
spark-shell -i "file.scala"
after execution, I get into
scala>
I don't want to get into the scala> mode
Updating (October 2019) for a script that terminates
This question is also about running a script that terminates, that is, a "scala script" that run by spark-shell -i script.scala > output.txt that stopts by yourself (internal instruction System.exit(0) terminates the script). See this question with a good example.
It also needs a "silent mode", it is expected to not pollute the output.txt.
Suppose Spark v2.2+.
PS: there are a lot of cases (typically small tools and module/algorithm tests) where Spark interpreter can be better than compiler... Please, "let's compile!" is not an answer here.
spark-shell -i file.scala keeps the interpreter open
in the end, so System.exit(0) is required to be at the end of your script. The most appropriate solution is to place your code in try {} and put System.exit(0) in finally {} section.
If logging is requiered you can use something like this:
spark-shell < file.scala > test.log 2>&1 &
If you have limitations on editing file and you can't add System.exit(0), use:
echo :quit | scala-shell -i file.scala
UPD
If you want to suppress everything in output except printlns you have to turn off logging for spark-shell. The sample of configs is here. Disabling any kind of logging in $SPARK-HOME/conf/log4j.properties should allow you to see only pritnlns. But I would not follow this approach with printlns. Using general Logging with log4j should be used instead of printlns. You can configure it so obtain the same results as with printlns. It boils down to configuring a pattern. This answer provides an example of a pattern that solves your issue.
The best way is definitively to compile your scala code to a jar and use spark-submit but if you're simply looking for a quick iteration loop, you can simply issue a :quit after parsing your scala code:
echo :quit | scala-shell -i yourfile.scala
Adding onto #rluta's answer. You can place the call to spark-shell command inside a shell script. Say the below in a shell script:
spark-shell < yourfile.scala
But this would require you to keep the lines of code within a line in case a statement is written on different lines.
OR
echo :quit | spark-shell -i yourfile.scala
This should

Load spark scala script into spark shell

I am trying to load a spark scala script into a spark shell using load command where the location of the script is passed in a variable. It's not working
val scriptLoc="/abc/spark"
:load ${scriptLoc}/scriptName.scala
Even tried like this which didn't work either
:load scriptLoc/scriptName.scala
Any help would be appreciated.
You can try
spark-shell -i /path/to/file.scala

Execute external command

I do not know whether it is a Scala or Play! question. I want to execute some external command from my Play application, get the output from the command and show a report to user based on the command output. Can anyone help?
For example, when I enter my-command from shell it shows output like below, which I want to capture and show in web:
Id Name IP
====================
1 A x.y.z.a
2 B p.q.r.s
Please, do not worry about format and parsing of the output. Functionally, I am looking something like PHP exec. I know about java Runtime.getRuntime().exec("command") but is there any Scala/Play version to serve the purpose?
The method !! of the Scala process package does what you need, it executes the statement and captures the text output. For example:
import scala.sys.process._
val cmd = "uname -a" // Your command
val output = cmd.!! // Captures the output
scala> import scala.sys.process._
scala> Process("cat temp.txt")!
This assumes there is a temp file in your home directory. ! is for actual execution of the command. See scala.sys.process for more info.
You can use the Process library: for instance
import scala.sys.process.Process
Process("ls").!!
to get the list of files in the folder as a string. The !! get the output of the command