I am new to Spark and trying to figure out how can I use the Spark shell.
Looked into Spark's site documentation and it doesn't show how to create directories or how to see all my files in spark shell. If anyone could help me I would appreciate it.
In this context you can assume that Spark shell is just a normal Scala REPL so the same rules apply. You can get a list of the available commands using :help.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :help
All commands can be abbreviated, e.g., :he instead of :help.
:edit <id>|<line> edit history
:help [command] print this summary or command-specific help
:history [num] show the history (optional num is commands to show)
:h? <string> search the history
:imports [name name ...] show import history, identifying sources of names
:implicits [-v] show the implicits in scope
:javap <path|class> disassemble a file or class name
:line <id>|<line> place line(s) at the end of history
:load <path> interpret lines in a file
:paste [-raw] [path] enter paste mode or paste a file
:power enable power user mode
:quit exit the interpreter
:replay [options] reset the repl and replay all previous commands
:require <path> add a jar to the classpath
:reset [options] reset the repl to its initial state, forgetting all session entries
:save <path> save replayable session to a file
:sh <command line> run a shell command (result is implicitly => List[String])
:settings <options> update compiler options, if possible; see reset
:silent disable/enable automatic printing of results
:type [-v] <expr> display the type of an expression without evaluating it
:kind [-v] <expr> display the kind of expression's type
:warnings show the suppressed warnings from the most recent line which had any
As you can see above you can invoke shell commands using :sh. For example:
scala> :sh mkdir foobar
res0: scala.tools.nsc.interpreter.ProcessResult = `mkdir foobar` (0 lines, exit 0)
scala> :sh touch foobar/foo
res1: scala.tools.nsc.interpreter.ProcessResult = `touch foobar/foo` (0 lines, exit 0)
scala> :sh touch foobar/bar
res2: scala.tools.nsc.interpreter.ProcessResult = `touch foobar/bar` (0 lines, exit 0)
scala> :sh ls foobar
res3: scala.tools.nsc.interpreter.ProcessResult = `ls foobar` (2 lines, exit 0)
scala> res3.line foreach println
line lines
scala> res3.lines foreach println
bar
foo
:q or :quit command is used to exit from your scala REPL.
Related
I want to use decline to parse command line parameters for a spark application. I use sbt assembly to create a fat jar and use it in the spark-submit. Unfortunately I get an error
java.lang.NoSuchMethodError: cats.kernel.Semigroup$.catsKernelMonoidForList()Lcats/kernel/Monoid; when the parameters get parsed, example below. To reproduce the error you can check out my github repo.
This is my code:
package example
import cats.implicits._
import com.monovore.decline._
object Minimal {
case class Minimal(input: String, count: Int)
val configOpts: Opts[Minimal] = (
Opts.option[String]("input", "the input"),
Opts.option[Int]("count", "the count")
).mapN(Minimal.apply)
def parseMinimalConfig(
args: Array[String]
): Either[Help, Minimal] = {
val command = Command(name = "min-example", header = "my-header")(configOpts)
command.parse(args)
}
}
and this is my build.sbt:
name := "example"
version := "0.1"
scalaVersion := "2.12.10"
libraryDependencies ++= Seq("com.monovore" %% "decline" % "2.3.0")
This is how I reproduce the error locally (spark version is 3.1.2)
~/playground/decline-test » ~/apache/spark-3.1.2-bin-hadoop3.2/bin/spark-shell --jars "target/scala-2.12/example-assembly-0.1.jar"
22/08/31 14:36:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://airi:4040
Spark context available as 'sc' (master = local[*], app id = local-1661949407775).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.2
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_345)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import example.Minimal._
import example.Minimal._
scala> parseMinimalConfig(Array("x", "x"))
java.lang.NoSuchMethodError: cats.kernel.Semigroup$.catsKernelMonoidForList()Lcats/kernel/Monoid;
at com.monovore.decline.Help$.optionList(Help.scala:74)
at com.monovore.decline.Help$.detail(Help.scala:105)
at com.monovore.decline.Help$.fromCommand(Help.scala:50)
at com.monovore.decline.Parser.<init>(Parser.scala:21)
at com.monovore.decline.Command.parse(opts.scala:20)
at example.Minimal$.parseMinimalConfig(Minimal.scala:19)
... 49 elided
scala> :quit
Interestingly adding the assembled jar to the scala classpath does not yield the same error but gives the expected help message. My local scala version is 2.12.16 and the spark scala version is 2.12.10, but I'm unsure whether this can be the cause.
~/playground/decline-test » scala -cp "target/scala-2.12/example-assembly-0.1.jar"
Welcome to Scala 2.12.16-20220611-202836-281c3ee (OpenJDK 64-Bit Server VM, Java 1.8.0_345).
Type in expressions for evaluation. Or try :help.
scala> import example.Minimal._
import example.Minimal._
scala> parseMinimalConfig(Array("x", "x"))
res0: Either[com.monovore.decline.Help,example.Minimal.Minimal] =
Left(Unexpected argument: x
Usage: command --input <string> --count <integer>
our command
Options and flags:
--input <string>
the input
--count <integer>
the count)
scala>
I also tried scala 2.13 with spark 3.2.2 and I got the same error, although need to double check on that.
What could I be missing?
Have you tried using ShadeRules to avoid getting stuck in the dependency hell?
assembly / assemblyShadeRules := Seq(
ShadeRule.rename("org.typelevel.cats.**" -> "repackaged.org.typelevel.cats.#1").inAll,
ShadeRule.rename("cats.**" -> "repackaged.cats.#1").inAll,
)
As reported here Execute external command
in order to run an external shell command or script in Scala the right code should be:
import scala.sys.process._
val cmd = "ls -l /home" // Your command
val output = cmd.!! // Captures the output
I've noticed this works for some commands but not for others like "java -version" (especially if they have dash "-" before arguments)
Is there a correct way to execute commands like "python --version" or a more complex python script like "python /path/to/my_script.py -x value -y value" ?
Seems to work with dashes
$ scala
Welcome to Scala 2.13.6 (Eclipse OpenJ9 VM, Java 1.8.0_292).
Type in expressions for evaluation. Or try :help.
scala> import scala.sys.process._
import scala.sys.process._
scala> "java -version".!!
openjdk version "1.8.0_292"
...
scala> "python3 --version".!!
val res1: String =
"Python 3.8.5
"
Inside a program - not the REPL - is it possible to introduce a string variable to represent the shell command to be executed ?
import sys.process._
val npath = opath.substring(0,opath.lastIndexOf("/"))
s"rm -rf $npath/*" !
s"mv $tmpName/* $npath/" !
The compiler says:
:103: error: type mismatch;
found : String
required: scala.sys.process.ProcessLogger
s"mv $tmpName/* $npath/" !
^
Note that in the REPL this can be fixed by using
:power
But .. we're not in the REPL here.
I found a useful workaround that mostly preserves the intended structure:
Use the
Seq[String].!
syntax. But by using spaces as a delimiter we can still write it out in a kind of wysiwig way
import sys.process._
val npath = opath.substring(0,opath.lastIndexOf("/"))
s"rm -rf $npath/*".split(" ").toSeq.!
s"mv $tmpName/* $npath/".split(" ").toSeq.!
The limitation here is that embedded spaces in the command would not work - they would require an explicit Seq of each portion of the command.
Here is a bit nicer if there were a set of commands to run:
Seq(s"rm -rf $npath/*",s"mv $tmpName/* $npath/").foreach{ cmd=>
println(cmd)
cmd.split(" ").toSeq.!
}
As can be seen in the following console session, the same command invoked from Scala produces different results than when run in the terminal.
~> scala
Welcome to Scala 2.12.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_172).
Type in expressions for evaluation. Or try :help.
scala> import sys.process._
import sys.process._
scala> """emacsclient --eval '(+ 4 5)'""".!
*ERROR*: End of file during parsingres0: Int = 1
scala> :quit
~> emacsclient --eval '(+ 4 5)'
9
Has anyone encountered this issue and/or know of a work around?
I thought this may have been a library bug, so opened an issue as well: https://github.com/scala/bug/issues/10897
It seems that Scala's sys.process api doesn't support quoting. The following works: Seq("emacsclient", "--eval", "(+ 4 5)").!.
I want use Scala like Python, so I install REPL in Sublime Text(Os is win8)
Everytime in REPL, I have to
scala> :load <my file>
, so I think it's inconvenient.
And I can't change
scala> :settings -d <路径名>
in Chinese directory.
I'm confused whether I can't change Scala script's directory with non-english language.
Thanks a lot!
If you use sbt then you can define initial commands when you launch the console.
yourproject/build.sbt:
// build.sbt
name := "initial-commands-example"
initialCommands := "import Foo._"
yourproject/script.scala:
// script.scala
object Foo {
def hello(name: String) = s"hello $name"
val msg = hello("world")
}
Inside yourproject, run sbt console, and you will have everything in Foo available inside that repl. See sbt initialCommands docs for more information.