I am trying to execute sample code on databricks in scala. It is an object.
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.toString)
}
}
When I run on databricks; it says 'object defined main'. I am not sure how to execute it now or what is the code to execute it. Please help.
What you are working with is kind of scala REPL. Basically "main" function does not have any significance over there. Having said that you can run your function as follows
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println(res)
println("Arguments: " + res.toString)
}
}
Main.main(Array("123","23123"))
As is you can call Object Main's main method.
You can call the main method in the Main object as follows:
val args: Array[String] = Array("test1", "test2", "test3")
Main.main(args)
What you have in your main method won't print what you expect, which I assume is the values contained in the res array. To accomplish that you would need to change it to something like the following:
object Main {
def main(args: Array[String]): Unit = {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.mkString(" "))
}
}
Related
I am trying to use Monix Observable to control the big memory of file into smaller chunks of bytes so that it won't use up too many RAM to load the file's bytes.
However, when I using Observable.frominputStreram, it doesn't provide the Array[Byte] that fills into update() function from MessageDigest.
Any suggestions on my codes?
def SHA256_5(file: File)= {
val sha256 = MessageDigest.getInstance("SHA-256")
val in: Observable[Array[Byte]] = {
Observable.fromInputStream(Task(new FileInputStream(file)))
}
in.map(byteArray=>sha256.update(byteArray)).completed
sha256.digest().map("%02x".format(_)).mkString
}
def main(args: Array[String]): Unit = {
val path = "C:\\Users\\ME\\IdeaProjects\\HELLO\\src\\main\\scala\\TRY.scala"
println(SHA256_5(new File(path)))
}
in.map(byteArray=>sha256.update(byteArray)).completed
returns Task - it means that you have to execute that Task and when it finishes you will be able to call
sha256.digest().map("%02x".format(_)).mkString
because Task is used for lazily building asynchronous operation.
Try this instead:
def calcuateSHA(file: File) = for {
sha256 <- Task(MessageDigest.getInstance("SHA-256"))
in = Observable.fromInputStream(Task(new FileInputStream(file)))
_ <- in.map(byteArray=>sha256.update(byteArray)).completed
} yield sha256.digest().map("%02x".format(_)).mkString
def main(args: Array[String]): Unit = {
val path = "C:\\Users\\ME\\IdeaProjects\\HELLO\\src\\main\\scala\\TRY.scala"
import monix.execution.Implicits.global
Await.result(calcuateSHA(new File(path)).runToFuture, Duration.Inf)
}
for starters, or if you want to do it using build in Monix TaskApp instead of hacks for running asynchronous computation in a synchronous main:
object Test extends TaskApp {
def calcuateSHA(file: File) = for {
sha256 <- Task(MessageDigest.getInstance("SHA-256"))
in = Observable.fromInputStream(Task(new FileInputStream(file)))
_ <- in.map(byteArray=>sha256.update(byteArray)).completed
} yield sha256.digest().map("%02x".format(_)).mkString
def run(args: List[String]) = {
val path = "C:\\Users\\ME\\IdeaProjects\\HELLO\\src\\main\\scala\\TRY.scala"
for {
sha <- calcuateSHA(new File(path)
_ = println(sha)
} yield ExitCode.Success
}
}
I have written some code in scala and created one method inside that now i want to call this method in another program but I'm not not getting any result after calling this method.
First Program
object helper_class {
def driver {
def main(args: Array[String]) {
val sparksession = SparkSession.builder().appName("app").enableHiveSupport().getOrCreate();
val filepath: String = args(0)
val d1 = spark.sql(s"load data inpath '${args(0)}' into table databasename.tablename")
//some more reusable code
}
}
}
second Program
import Packagename.helper_class.driver
object child_program {
def main(args: Array[String]) {
driver //I want to call this method from helper_class
}
}
if I'm removing def main(args: Array[String]) from 1st code its giving error near args(0) as args(0) not found
args(0) I am planning to pass as spark-submit
can someone please help me how should i Implement this.
The main method is your entry point to run a program. It can't be wrapped within another method. Thus, you can't wrap it within another method.
I think what you are trying to do, is to load some Spark setup code and then use that "driver" to do something else... This is my best guess on how that could work
object ChildProgram {
def main(args: Array[String]): Unit = {
val driver = Helper.driver(args)
// do something with the driver
}
}
object Helper {
def driver(args: Array[String]) = {
val sparksession = SparkSession.builder().appName("app").enableHiveSupport().getOrCreate()
val filepath: String = args(0)
val d1 = spark.sql(s"load data inpath '${args(0)}' into table databasename.tablename")
//some more reusable code
// I am assuming you are going to return the driver here
}
}
That said, I would highly recommend reading a bit on Scala before attempting to go down that route, because you are likely to face even more obstacles. If I am to recommend a resource, you can try the awesome book "Scala for the Impatient" which should get you up and running very quickly.
I have a spark program as follows:
object A {
var id_set: Set[String] = _
def init(argv: Array[String]) = {
val args = new AArgs(argv)
id_set = args.ids.split(",").toSet
}
def main(argv: Array[String]) {
init(argv)
val conf = new SparkConf().setAppName("some.name")
val rdd1 = getRDD(paras)
val rdd2 = getRDD(paras)
//......
}
def getRDD(paras) = {
//function details
getRDDDtails(paras)
}
def getRDDDtails(paras) = {
//val id_given = id_set
id_set.foreach(println) //worked normal, not empty
someRDD.filter{ x =>
val someSet = x.getOrElse(...)
//id_set.foreach(println) ------wrong, id_set just empty set
(some_set & id_set).size > 0
}
}
class AArgs(args: Array[String]) extends Serializable {
//parse args
}
I have a global variable id_set. At first, it is just an empty set. In main function, I call init which sets id_set to a non-empty set from args. After that, I call getRDD function which calls getRDDDtails. In getRDDDtails, I filter a rdd based on contents in id_set. However, the result semms to be empty. I tried to print is_set in executor, and it is just an empty line. So, the problem seems to be is_set is not well initilized(in init function). However, when I try to print is_set in driver(in head lines of function getRDDDtails), it worked normal, not empty.
So, I have tried to add val id_given = id_set in function getRDDDtails, and use id_given later. This seems to fix the problem. But I'm totally confused why should this happen? What is the execution order of Spark programs? Why does my solution work?
I am trying to write a test that will redirect the stdout of a main method, but it seems that once I call the main, it seems to start on another thread and I cannot capture the output. Here is the code:
This works:
val baos = new ByteArrayOutputStream
val ps = new PrintStream(baos)
System.setOut(ps)
print("123")
Assert.assertEquals("123", baos.toString)
This does not:
val baos = new ByteArrayOutputStream
val ps = new PrintStream(baos)
System.setOut(ps)
GameRunner.main(_)
Assert.assertEquals("123", baos.toString)
....
object GameRunner {
def main(args: Array[String]) {
print("123")
How can I catch the call to print in my test?
*I have also tried scala.Console.setOut
EDIT
I do notice that running GameRunner.main(_) does not even list anything in the console when I am not redirecting. What is causing this?
print is really Predef.print which calls Console.print. Even though you call System.setOut I don't know if that has an impact on Console.print. Try to call Console.setOut or try:
Console.withOut(ps)(GameRunner.main(null))
The other possibility is that by calling GameRunner.main(_) you are not executing anything (as may be it's just returning the function (args: Array[String]) => GameRunner.main(args)?. Should be quick to rule that out.
Edit yep:
scala> object A { def main(args: Array[String]) { println("1") } }
defined module A
scala> A.main(null)
1
scala> A.main(_)
res1: Array[String] => Unit = <function1>
I already handled to start another VM in Java.
See ProcessBuilder - Start another process / JVM - HowTo?
For some reason, I can't manage to do the same in Scala.
Here's my code
object NewProcTest {
def main(args :Array[String]) {
println("Main")
// val clazz = classOf[O3]
val clazz = O4.getClass
Proc.spawn(clazz, true)
println("fin")
}
}
object Proc{
def spawn(clazz :Class[_], redirectStream :Boolean) {
val separator = System.getProperty("file.separator")
val classpath = System.getProperty("java.class.path")
val path = System.getProperty("java.home") +
separator + "bin" + separator + "java"
val processBuilder =
new ProcessBuilder(path, "-cp",
classpath,
clazz.getCanonicalName())
processBuilder.redirectErrorStream(redirectStream)
val process = processBuilder.start()
process.waitFor()
System.out.println("Fin")
}
}
I've tried to define the main in an object and in class. Both within the same .scala file or within a separate one.
What am I doing wrong?
The issue seems to be that the class name for an object has a '$' suffix.
If you strip off that suffix, the Java invocation line triggered from ProcessBuilder works.
I've hacked something below to show a couple of test cases. I'm not yet sure yet why this is the case but at least it provides a workaround.
import java.io.{InputStreamReader, BufferedReader}
import System.{getProperty => Prop}
object O3 {def main(args: Array[String]) {println("hello from O3")}}
package package1 {
object O4 {def main(args: Array[String]) {println("hello from O4")}}
}
object NewProcTest {
val className1 = O3.getClass().getCanonicalName().dropRight(1)
val className2 = package1.O4.getClass().getCanonicalName().dropRight(1)
val sep = Prop("file.separator")
val classpath = Prop("java.class.path")
val path = Prop("java.home")+sep+"bin"+sep+"java"
println("className1 = " + className1)
println("className2 = " + className2)
def spawn(className: String,
redirectStream: Boolean) {
val processBuilder = new ProcessBuilder(path, "-cp", classpath, className)
val pbcmd = processBuilder.command().toString()
println("processBuilder = " + pbcmd)
processBuilder.redirectErrorStream(redirectStream)
val process = processBuilder.start()
val reader = new BufferedReader(new InputStreamReader(process.getInputStream()))
println(reader.readLine())
reader.close()
process.waitFor()
}
def main(args :Array[String]) {
println("start")
spawn(className1, false)
spawn(className2, false)
println("end")
}
}