Passing arguments to multiple tasks in SBT - scala

In sbt 0.13.9, I want to be able to run a task which takes in arguments from the command line and then passes those arguments on to two other tasks.
My initial attempt was something along the lines of:
lazy val logTask = InputKey[Unit](...)
lazy val runTask = InputKey[Unit](...)
lazy val TestCase = config("testCase") extend Test
runTask in TestCase := Def.inputTaskDyn {
val args: Seq[String] = spaceDelimited("<arg>").parsed
runReg(args)
}.evaluated
logTask in TestCase := Def.inputTaskDyn {
val args: Seq[String] = spaceDelimited("<arg>").parsed
log(args)
}.evaluated
def runReg(args: Seq[String]) = Def.taskDyn {
val argString = args.mkString(" ")
(logTask in TestCase).toTask(argString).value
(testOnly in TestCase).toTask(s" $argString")
}
def log(args: Seq[String]) {
(runMain in TestCase).toTask(s" LoggingClass $args.mkString(" ")")
}
But then it complains of an Illegal Dynamic Reference argString in (logTask in TestCase).toTask(argsString).value
I've also tried something like:
runTask in TestCase := {
val args: Seq[String] = spaceDelimited("<arg>").parsed
log(args).value
runReg(args).value
}
which also has an Illegal Dynamic Reference for args.
Is there any way of passing in parsed arguments into two tasks and run one after the other?
Thanks for any help.

Instead of assigning args.mkString(" ") to a variable, just pass it without assigning to any variable like below:
(logTask in TestCase).toTask(args.mkString(" ")).value
Update 1:
This kind of issues can also be sorted out with lazy initialization in sbt. So, try something like below:
lazy val argString = args.mkString(" ")
(logTask in TestCase).toTask(argString).value

Related

How to invoke Spark functions (with arguments) from applications.properties(config file)?

So, I have a typesafe config file named application.properties which contains certain values like:
dev.execution.mode = local
dev.input.base.dir = /Users/debaprc/Documents/QualityCheck/Data
dev.schema.lis = asin StringType,subs_activity_date DateType,marketplace_id DecimalType
I have used these values as Strings in my Spark code like:
def main(args: Array[String]): Unit = {
val props = ConfigFactory.load()
val envProps = props.getConfig("dev")
val spark = SparkSession.builder.appName("DataQualityCheckSession")
.config("spark.master", envProps.getString("execution.mode"))
.getOrCreate()
Now I have certain functions defined in my spark code (func1, func2, etc...). I want to specify which functions are to be called, along with the respective arguments, in my application.properties file. Something like this:
dev.functions.lis = func1,func2,func2,func3
dev.func1.arg1.lis = arg1,arg2
dev.func2.arg1.lis = arg3,arg4,arg5
dev.func2.arg2.lis = arg6,arg7,arg8
dev.func3.arg1.lis = arg9,arg10,arg11,arg12
Now, once I specify these, what do I do in Spark, to call the functions with the provided arguments? Or do I need to specify the functions and arguments in a different way?
I agree with #cchantep the approach seems wrong. But if you still want to do something like that, I would decouple the function names in the properties file from the actual functions/methods in your code.
I have tried this and worked fine:
def function1(args: String): Unit = {
println(s"func1 args: $args")
}
def function2(args: String): Unit = {
println(s"func2 args: $args")
}
val functionMapper: Map[String, String => Unit] = Map(
"func1" -> function1,
"func2" -> function2
)
val args = "arg1,arg2"
functionMapper("func1")(args)
functionMapper("func2")(args)
Output:
func1 args: arg1,arg2
func2 args: arg1,arg2
Edited: Simpler approach with output example.

What is Spark execution order with function calls in scala?

I have a spark program as follows:
object A {
var id_set: Set[String] = _
def init(argv: Array[String]) = {
val args = new AArgs(argv)
id_set = args.ids.split(",").toSet
}
def main(argv: Array[String]) {
init(argv)
val conf = new SparkConf().setAppName("some.name")
val rdd1 = getRDD(paras)
val rdd2 = getRDD(paras)
//......
}
def getRDD(paras) = {
//function details
getRDDDtails(paras)
}
def getRDDDtails(paras) = {
//val id_given = id_set
id_set.foreach(println) //worked normal, not empty
someRDD.filter{ x =>
val someSet = x.getOrElse(...)
//id_set.foreach(println) ------wrong, id_set just empty set
(some_set & id_set).size > 0
}
}
class AArgs(args: Array[String]) extends Serializable {
//parse args
}
I have a global variable id_set. At first, it is just an empty set. In main function, I call init which sets id_set to a non-empty set from args. After that, I call getRDD function which calls getRDDDtails. In getRDDDtails, I filter a rdd based on contents in id_set. However, the result semms to be empty. I tried to print is_set in executor, and it is just an empty line. So, the problem seems to be is_set is not well initilized(in init function). However, when I try to print is_set in driver(in head lines of function getRDDDtails), it worked normal, not empty.
So, I have tried to add val id_given = id_set in function getRDDDtails, and use id_given later. This seems to fix the problem. But I'm totally confused why should this happen? What is the execution order of Spark programs? Why does my solution work?

How to create a task in SBT that calls a method?

I am new developing tasks in SBT and I'm trying to figure out how to create a task that integrate my existing code.
In my code I have a singleton object that has run method that requires several parameters
object MyObject {
def run( param1: String, param2: Int, param3: String, ...) = {}
}
My question is: How can I define a Task in SBT that calls my run method specifying all its parameters in my build.sbt file?
I can imagine something like this in build.sbt
lazy val myTask: TaskKey[Seq[File]] = taskKey[Seq[File]]("My task")
lazy val myTaskRun = (sourceManaged, dependencyClasspath in Compile, runner in Compile, streams) map { (dir, cp, r, s) =>
val param1 = ...
val param2 = ...
val param3 = ...
val paramN = ....
MyObject.run( param1, param2, param3, ...)
Seq(file("path"))
}
1. You can use fullRunInputTask for that.
In your build.sbt
lazy val example = InputKey[Unit]("example", "Run something.")
fullRunInputTask( example, Compile, "somefun.CallMe")
Under src/main/scala/somefun/CallMe.scala
package somefun
object CallMe {
def main(args: Array[String]) : Unit = {
println("Params are: " + args.mkString(", "))
}
}
To call your task use example, e.g. "example 1 2 3"
2. You can create your own InputTask
see InputTask in SBT Doc
Creating a custom InputTask allows a flexible input parser (with suggestions on tab), allows linking with other tasks. It simple integrates better into SBT.

Scala script in 2.11

I have found an example code for a Scala runtime scripting in answer to Generating a class from string and instantiating it in Scala 2.10, however the code seems to be obsolete for 2.11 - I cannot find any function corresponding to build.setTypeSignature. Even if it worked, the code seems hard to read and follow to me.
How can Scala scripts be compiled and executed in Scala 2.11?
Let us assume I want following:
define several variables (names and values)
compile script
(optional improvement) change variable values
execute script
For simplicity consider following example:
I want to define following variables (programmatically, from the code, not from the script text):
val a = 1
val s = "String"
I want a following script to be compiled and on execution a String value "a is 1, s is String" returned from it:
s"a is $a, s is $s"
How should my functions look like?
def setupVariables() = ???
def compile() = ???
def changeVariables() = ???
def execute() : String = ???
Scala 2.11 adds a JSR-223 scripting engine. It should give you the functionality you are looking for. Just as a reminder, as with all of these sorts of dynamic things, including the example listed in the description above, you will lose type safety. You can see below that the return type is always Object.
Scala REPL Example:
scala> import javax.script.ScriptEngineManager
import javax.script.ScriptEngineManager
scala> val e = new ScriptEngineManager().getEngineByName("scala")
e: javax.script.ScriptEngine = scala.tools.nsc.interpreter.IMain#566776ad
scala> e.put("a", 1)
a: Object = 1
scala> e.put("s", "String")
s: Object = String
scala> e.eval("""s"a is $a, s is $s"""")
res6: Object = a is 1, s is String`
An addition example as an application running under scala 2.11.6:
import javax.script.ScriptEngineManager
object EvalTest{
def main(args: Array[String]){
val e = new ScriptEngineManager().getEngineByName("scala")
e.put("a", 1)
e.put("s", "String")
println(e.eval("""s"a is $a, s is $s""""))
}
}
For this application to work make sure to include the library dependency.
libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value

SBT 0.13 taskKey macro doesn't work with [Unit]?

lazy val buildDb = taskKey[Unit]("Initializes the database")
buildDb := {
(compile in Compile).value
val s: TaskStreams = streams.value
s.log.info("Building database")
try {
...
} catch {
case e: Throwable =>
sys.error("Failed to initialize the database: " + e.getMessage)
}
s.log.info("Finished building database")
}
This produces the following error
C:\work\server\build.sbt:98: error: type mismatch;
found : Unit
required: T
s.log.info("Finished building database")
^
[error] Type error in expression
But if I define it like this lazy val buildDb = taskKey[String]("Initializes the database") and then add to the last line in the task "Happy end!" string everything seem to work. Am I to blame, or something wrong with the macro?
The same happened to me. I was able to fix the issue e.g. by adding a : TaskKey[Unit] to the taskKey definition. Here are my findings for sbt 0.13.5:
The following definition is OK (it seems that it is pure luck that this is OK):
lazy val collectJars = taskKey[Unit]("collects JARs")
collectJars := {
println("these are my JARs:")
(externalDependencyClasspath in Runtime).value foreach println
}
The following definition (the same as above without the first println) yields the same error "found: Unit, required: T":
lazy val collectJars = taskKey[Unit]("collects JARs")
collectJars := {
(externalDependencyClasspath in Runtime).value foreach println
}
My findings are that this is definitely something magical: For example, if I indent the line lazy val collectJars = ... by one blank, then it compiles. I would expect (but have not checked) that .sbt and .scala build definitions also behave differently.
However, if you add the type signature, it seems to always compile:
lazy val collectJars: TaskKey[Unit] = taskKey[Unit]("collects JARs")
collectJars := {
(externalDependencyClasspath in Runtime).value foreach println
}
Last but not least: The issue seems to be specific for TaskKey[Unit]. Unit tasks are not a good idea - in your example, you could at least return Boolean (true for success / false for failure).