I am trying to load a spark scala script into a spark shell using load command where the location of the script is passed in a variable. It's not working
val scriptLoc="/abc/spark"
:load ${scriptLoc}/scriptName.scala
Even tried like this which didn't work either
:load scriptLoc/scriptName.scala
Any help would be appreciated.
You can try
spark-shell -i /path/to/file.scala
Related
The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?
You can simply use the stdin redirection with spark-shell:
spark-shell < YourSparkCode.scala
This command starts a spark-shell, interprets your YourSparkCode.scala line by line and quits at the end.
Another option is to use -I <file> option of spark-shell command:
spark-shell -I YourSparkCode.scala
The only difference is that the latter command leaves you inside the shell and you must issue :quit command to close the session.
[UDP]
Passing parameters
Since spark-shell does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.
Fortunately, there may be a lot of options to approach the same (e.g, externalizing the parameters in another file and read it in the very beginning in your script).
But I personally find the Spark configuration the most clean and convenient way.
Your pass your parameters via --conf option:
spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala
(please note that spark. prefix in your property name is mandatory, otherwise Spark will discard your property as invalid)
And read these arguments in your Spark code as below:
val arg1: String = spark.conf.get("spark.myscript.arg1")
val arg2: String = spark.conf.get("spark.myscript.arg2")
It is possible via spark-submit.
https://spark.apache.org/docs/latest/submitting-applications.html
You can even put it to bash script either create sbt-task
https://www.scala-sbt.org/1.x/docs/Tasks.html
to run your code.
We have:
spark-shell -i path/to/script.scala
to run a scala script, is it possible to add something like this to the spark-defaults.conf file so that it always loads the scala script on start up of the spark-shell and thus does not have to be added to the command line.
I would like to use this to store import _, credentials and user defined functions that I use regularly so that I don't have to enter the commands every time I start spark-shell.
Thanks,
Shane
You can go to spark directory /bin, create file spark-shell-new.cmd and paste
spark-shell -i path/to/script.scala then run spark-shell-new in cmd like a default spark-shell.
You can do something like this
:load <path_to_script>
Write all the required lines of code in that script
Need to execute the scala script through spark-shell with silent mode. When I am using spark-shell -i "file.scala", after the execution, I am getting into the scala interactive mode. I don't want to get into there.
I have tried to execute the spark-shell -i "file.scala". But I don't know how to execute the script in silent mode.
spark-shell -i "file.scala"
after execution, I get into
scala>
I don't want to get into the scala> mode
Updating (October 2019) for a script that terminates
This question is also about running a script that terminates, that is, a "scala script" that run by spark-shell -i script.scala > output.txt that stopts by yourself (internal instruction System.exit(0) terminates the script). See this question with a good example.
It also needs a "silent mode", it is expected to not pollute the output.txt.
Suppose Spark v2.2+.
PS: there are a lot of cases (typically small tools and module/algorithm tests) where Spark interpreter can be better than compiler... Please, "let's compile!" is not an answer here.
spark-shell -i file.scala keeps the interpreter open
in the end, so System.exit(0) is required to be at the end of your script. The most appropriate solution is to place your code in try {} and put System.exit(0) in finally {} section.
If logging is requiered you can use something like this:
spark-shell < file.scala > test.log 2>&1 &
If you have limitations on editing file and you can't add System.exit(0), use:
echo :quit | scala-shell -i file.scala
UPD
If you want to suppress everything in output except printlns you have to turn off logging for spark-shell. The sample of configs is here. Disabling any kind of logging in $SPARK-HOME/conf/log4j.properties should allow you to see only pritnlns. But I would not follow this approach with printlns. Using general Logging with log4j should be used instead of printlns. You can configure it so obtain the same results as with printlns. It boils down to configuring a pattern. This answer provides an example of a pattern that solves your issue.
The best way is definitively to compile your scala code to a jar and use spark-submit but if you're simply looking for a quick iteration loop, you can simply issue a :quit after parsing your scala code:
echo :quit | scala-shell -i yourfile.scala
Adding onto #rluta's answer. You can place the call to spark-shell command inside a shell script. Say the below in a shell script:
spark-shell < yourfile.scala
But this would require you to keep the lines of code within a line in case a statement is written on different lines.
OR
echo :quit | spark-shell -i yourfile.scala
This should
I am trying to connect to DB from in spark shell using scripts in scala file.
when Connecting the script takes password from other location but it does print in console of spark shell.
I just want to avoid those.
Code in Scala is as below,
val config=Map("driver"->"drivername","url"->"dburl","user"->"username","password"->"741852963");
When loading this code in spark shell this prints the code in spark shell too. I want these alone part of to not print in spark console.
How can I achieve this?
You have several ways to achieve this:
You can wrap your config definition in an object. Spark shell will just output that an object is defined
scala> object ConfigHolder {
| val config=Map("secret"->"value")
| }
defined object ConfigHolder
You can then simply reference your config as ConfigHolder.config instead of config
You can disable/re-enable output printing in the shell with the :silent command
scala> :silent
scala> val config=Map("secret"->"value")
scala> :silent
Of course, none of these actions prevent anyone with access to the spark shell from reading your credentials, it just prevents casual onlookers from seeing them.
I have Pyspark code which writes hql commands to a .hql file. I thought of using the subprocess library to run the hql file directly but when I do so my hql isnt running and the program is closing fine..
I know I can use sqlcontext to read each and every line from the hql and running it individually.. but I want to run the hql file from subprocess command isnt this possible???
note: i do spark-submit to run the .py code
You can directly submit it in a shell script with spark-sql
$ spark-sql –master yarn-client <..other parameters for executor memory etc..> -i ./script.hql
spark-sql internally invokes spark-submit.