Execute the scala script through spark-shell in silent mode - scala

Need to execute the scala script through spark-shell with silent mode. When I am using spark-shell -i "file.scala", after the execution, I am getting into the scala interactive mode. I don't want to get into there.
I have tried to execute the spark-shell -i "file.scala". But I don't know how to execute the script in silent mode.
spark-shell -i "file.scala"
after execution, I get into
scala>
I don't want to get into the scala> mode
Updating (October 2019) for a script that terminates
This question is also about running a script that terminates, that is, a "scala script" that run by spark-shell -i script.scala > output.txt that stopts by yourself (internal instruction System.exit(0) terminates the script). See this question with a good example.
It also needs a "silent mode", it is expected to not pollute the output.txt.
Suppose Spark v2.2+.
PS: there are a lot of cases (typically small tools and module/algorithm tests) where Spark interpreter can be better than compiler... Please, "let's compile!" is not an answer here.

spark-shell -i file.scala keeps the interpreter open
in the end, so System.exit(0) is required to be at the end of your script. The most appropriate solution is to place your code in try {} and put System.exit(0) in finally {} section.
If logging is requiered you can use something like this:
spark-shell < file.scala > test.log 2>&1 &
If you have limitations on editing file and you can't add System.exit(0), use:
echo :quit | scala-shell -i file.scala
UPD
If you want to suppress everything in output except printlns you have to turn off logging for spark-shell. The sample of configs is here. Disabling any kind of logging in $SPARK-HOME/conf/log4j.properties should allow you to see only pritnlns. But I would not follow this approach with printlns. Using general Logging with log4j should be used instead of printlns. You can configure it so obtain the same results as with printlns. It boils down to configuring a pattern. This answer provides an example of a pattern that solves your issue.

The best way is definitively to compile your scala code to a jar and use spark-submit but if you're simply looking for a quick iteration loop, you can simply issue a :quit after parsing your scala code:
echo :quit | scala-shell -i yourfile.scala

Adding onto #rluta's answer. You can place the call to spark-shell command inside a shell script. Say the below in a shell script:
spark-shell < yourfile.scala
But this would require you to keep the lines of code within a line in case a statement is written on different lines.
OR
echo :quit | spark-shell -i yourfile.scala
This should

Related

Is it possible to run a Spark Scala script without going inside spark-shell?

The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?
You can simply use the stdin redirection with spark-shell:
spark-shell < YourSparkCode.scala
This command starts a spark-shell, interprets your YourSparkCode.scala line by line and quits at the end.
Another option is to use -I <file> option of spark-shell command:
spark-shell -I YourSparkCode.scala
The only difference is that the latter command leaves you inside the shell and you must issue :quit command to close the session.
[UDP]
Passing parameters
Since spark-shell does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.
Fortunately, there may be a lot of options to approach the same (e.g, externalizing the parameters in another file and read it in the very beginning in your script).
But I personally find the Spark configuration the most clean and convenient way.
Your pass your parameters via --conf option:
spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala
(please note that spark. prefix in your property name is mandatory, otherwise Spark will discard your property as invalid)
And read these arguments in your Spark code as below:
val arg1: String = spark.conf.get("spark.myscript.arg1")
val arg2: String = spark.conf.get("spark.myscript.arg2")
It is possible via spark-submit.
https://spark.apache.org/docs/latest/submitting-applications.html
You can even put it to bash script either create sbt-task
https://www.scala-sbt.org/1.x/docs/Tasks.html
to run your code.

Spark-shell -i path/to/filename alternative

We have:
spark-shell -i path/to/script.scala
to run a scala script, is it possible to add something like this to the spark-defaults.conf file so that it always loads the scala script on start up of the spark-shell and thus does not have to be added to the command line.
I would like to use this to store import _, credentials and user defined functions that I use regularly so that I don't have to enter the commands every time I start spark-shell.
Thanks,
Shane
You can go to spark directory /bin, create file spark-shell-new.cmd and paste
spark-shell -i path/to/script.scala then run spark-shell-new in cmd like a default spark-shell.
You can do something like this
:load <path_to_script>
Write all the required lines of code in that script

Deal with executing Unix command which produces an endless output

Some unix command such as tail -f or starting a python web server (i.e. cherrypy) will produce an endless output, i.e. the only way to stop it is to Ctrl-C. I'm working on a scala application which execute the command like that, my implemetation is:
import scala.sys.process._
def exe(command: String): Unit = {
command !
}
However, as the command produces an endless output stream, the application hangs there until I either terminate it or kill the process started by the command. I also try to add & at the end of the command in order to run it in background but my application still hangs.
Hence, I'm looking for another way to execute a command without hanging my application.
You can use a custom ProcessLogger to deal with output however you wish as soon as it is available.
val proc =
Process(command).run(ProcessLogger(line => (), err => println("Uh-oh: "+err)))
You may kill a process with the destroy method.
proc.destroy
If you are waiting to get a certain output before killing it, you can create a custom ProcessLogger that can call destroy on its own process once it has what it needs.
You may prefer to use lines (in 2.10; the name is changing to lineStream in 2.11) instead of run to gather standard output, since that will give you a stream that will block when no new output is available. Then you wrap the whole thing in a Future, read lines from the stream until you have what you need, and then kill the process--this simplifies blocking/waiting.
Seq("sh", "-c", "tail -f /var/log/syslog > /dev/null &") !
works for me. I think Randall's answer fails because scala is just executing the commands, and can't interpret shell operators like "&". If the command passed to scala is "sh" and the arguments are a complete shell command, we work around this issue. There also seems to be an issue with how scala parses/separates individual arguments, and using a Seq instead of single String works better for that.
The above is equivalent to the unix command:
sh -c 'tail -f /var/log/syslog > /dev/null &'
If you close the descriptor(s) from which you're reading the process' output, it will get a SIGPIPE and (usually) terminate.
If you just don't want the output, redirect to /dev/null:
command arg arg arg >/dev/null 2>&1
Addendum: This pertains only to Unix-alike systems, not Windows.

Is there a way to enter interactive mode after running a scala script from the command line? (equivalent to Python's `-i` option)

In Python, you can pass an option to the interpreter so that dumps you into an interactive session once the script finishes executing.
python -i myscript.py
Once in interactive mode, you can then inspect the state and objects in your script. Is there a similar functionality with scala and the REPL?
scala -i myscript.scala should do the trick.

"scala" command terminates batch scripts

during my work here I collided with a somewhat peculiar problem. It's possible that there is a highly simple explanation for this behaviour, but to me it just doesn't make much sense.
Here's the situation:
I wrote a batch file "test.bat" that, right now, looks like this:
echo 1
scala myProgram
echo 2
When I open the command prompt in the according directory and run test.bat, it starts by echoing 1, then runs myProgram (which also has certain outputs that appear in the console, so the scala program myProgram works properly) - and then stops. 2 does not appear in the console and the console waits for me to input another command.
Why this behaviour? Is is a malfunction of the console? Or of the scala command? Or not a malfunction at all and it is actually meant to behave that way?
What I was actually trying to do is redirecting the output of "scala myProgram" to a file (which works well) and rename this file after the scala program has terminated, so my batch file originally looked somewhat like this:
scala myProgram > log.txt 2>&1
ren "log.txt" "log2.txt"
And I was confused about the fact that "log2.txt" was never created.
Your answers are greatly appreciated, thank you.
Adding -nc to scala command worked for me:
$ scala -nc /tmp/2.scala
Hello world
So I guess, the issue has something to do with the compilation daemon
-nc no compilation daemon: do not use the fsc offline compiler
Could you try that?