Redirect stdout in another thread - scala

I am trying to write a test that will redirect the stdout of a main method, but it seems that once I call the main, it seems to start on another thread and I cannot capture the output. Here is the code:
This works:
val baos = new ByteArrayOutputStream
val ps = new PrintStream(baos)
System.setOut(ps)
print("123")
Assert.assertEquals("123", baos.toString)
This does not:
val baos = new ByteArrayOutputStream
val ps = new PrintStream(baos)
System.setOut(ps)
GameRunner.main(_)
Assert.assertEquals("123", baos.toString)
....
object GameRunner {
def main(args: Array[String]) {
print("123")
How can I catch the call to print in my test?
*I have also tried scala.Console.setOut
EDIT
I do notice that running GameRunner.main(_) does not even list anything in the console when I am not redirecting. What is causing this?

print is really Predef.print which calls Console.print. Even though you call System.setOut I don't know if that has an impact on Console.print. Try to call Console.setOut or try:
Console.withOut(ps)(GameRunner.main(null))
The other possibility is that by calling GameRunner.main(_) you are not executing anything (as may be it's just returning the function (args: Array[String]) => GameRunner.main(args)?. Should be quick to rule that out.
Edit yep:
scala> object A { def main(args: Array[String]) { println("1") } }
defined module A
scala> A.main(null)
1
scala> A.main(_)
res1: Array[String] => Unit = <function1>

Related

Print Source[ByteString, NotUsed] values to console

How can I print the values of a source in the console.
val someSource = Source.single(ByteString("SomeValue"))
I want to print the String "SomeValue" from this source. I tried:
someSource.to(Sink.foreach(println)) //This one prints RunnableGraph object
someSource.map(each => {
val pqr = each.decodeString(ByteString.UTF_8)
print(pqr)
}) // THis one prints res3: soneSource.Repr[Unit] = Source(SourceShape(Map.out(169373838)))
How do I print the original string which was originally used to create Source of single object.
From what is written in the question, I think you are probably using Scala console or Scala worksheet.
In Scala console or workseet, it prints a representation of the things created in current statement. For example,
scala> val i = 5
val i: Int = 5
scala> val s = "ssfdf"
val s: String = ssfdf
But, what happens when you use something like a println here,
scala> val u = println("dfsd")
dfsd
val u: Unit = ()
It also execute the println and afterwards prints that the value u created by that println is actually an Unit.
And that is where your confusion is probably coming from, because your println in Sink.foreach is not working in this case.
That is because this case is more like following, where you are actually defining a function.
scala> val f1 = (s: String) => println(s)
val f1: String => Unit = $Lambda$1062/0x0000000800689840#1796b2d4
You are not using println here, you are just defining a function (an instance of String => Unit or Function1[String, Unit]) which will use println.
So, console just prints that the value f1 created here is of type String => Unit.
You will require to call this function to actually execute that println,
scala> f1.apply("dsfsd")
dsfsd
Similarly, someSource.to(Sink.foreach(println)) will create a value of type RunnableGraph, hence scala console will print something like val res0: RunnableGraph....
You will now require to run this graph to actually execute it.
But compared to the function example earlier, the execution of graph happens asynchronously on a thread pool, which means that it might not work in some versions of Scala console or workseet (depending on how the thread pool lifecycle is managed). So, if you just do,
scala> val someSource = Source.single(ByteString("SomeValue"))
val someSource: akka.stream.scaladsl.Source[akka.util.ByteString,akka.NotUsed] = Source(SourceShape(single.out(369296388)))
scala> val runnableGraph = someSource.to(Sink.foreach(println))
val runnableGraph: akka.stream.scaladsl.RunnableGraph[akka.NotUsed] = RunnableGraph
scala> runnableGraph.run()
If it works then you will see following,
scala> runnableGraph.run()
val res0: akka.NotUsed = NotUsed
ByteString(83, 111, 109, 101, 86, 97, 108, 117, 101)
But chances are that you will just see some errors related to the console failing to complete the graph run due to some reasion.
You will actually need to materialize the Sink which will reasult into a Future[Done] on running the graph. Then you will have to wait on that Future[Done] using Await.
You will have to put all this into a normal Scala file and execute as an Scala application.
import akka.{Done, actor}
import akka.actor.typed.ActorSystem
import akka.actor.typed.scaladsl.Behaviors
import akka.stream.scaladsl.{Keep, Sink, Source}
import akka.util.ByteString
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
object TestAkkaStream extends App {
val actorSystem = ActorSystem(Behaviors.empty, "test-stream-system")
implicit val classicActorSystem = actorSystem.classicSystem
val someSource = Source.single(ByteString("SomeValue"))
val runnableGraph = someSource.toMat(Sink.foreach(println))(Keep.right)
val graphRunDoneFuture: Future[Done] = runnableGraph.run()
Await.result(graphRunDoneFuture, Duration.Inf)
}

How to Call Methods in Scala

I have written some code in scala and created one method inside that now i want to call this method in another program but I'm not not getting any result after calling this method.
First Program
object helper_class {
def driver {
def main(args: Array[String]) {
val sparksession = SparkSession.builder().appName("app").enableHiveSupport().getOrCreate();
val filepath: String = args(0)
val d1 = spark.sql(s"load data inpath '${args(0)}' into table databasename.tablename")
//some more reusable code
}
}
}
second Program
import Packagename.helper_class.driver
object child_program {
def main(args: Array[String]) {
driver //I want to call this method from helper_class
}
}
if I'm removing def main(args: Array[String]) from 1st code its giving error near args(0) as args(0) not found
args(0) I am planning to pass as spark-submit
can someone please help me how should i Implement this.
The main method is your entry point to run a program. It can't be wrapped within another method. Thus, you can't wrap it within another method.
I think what you are trying to do, is to load some Spark setup code and then use that "driver" to do something else... This is my best guess on how that could work
object ChildProgram {
def main(args: Array[String]): Unit = {
val driver = Helper.driver(args)
// do something with the driver
}
}
object Helper {
def driver(args: Array[String]) = {
val sparksession = SparkSession.builder().appName("app").enableHiveSupport().getOrCreate()
val filepath: String = args(0)
val d1 = spark.sql(s"load data inpath '${args(0)}' into table databasename.tablename")
//some more reusable code
// I am assuming you are going to return the driver here
}
}
That said, I would highly recommend reading a bit on Scala before attempting to go down that route, because you are likely to face even more obstacles. If I am to recommend a resource, you can try the awesome book "Scala for the Impatient" which should get you up and running very quickly.

run object scala in databricks

I am trying to execute sample code on databricks in scala. It is an object.
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.toString)
}
}
When I run on databricks; it says 'object defined main'. I am not sure how to execute it now or what is the code to execute it. Please help.
What you are working with is kind of scala REPL. Basically "main" function does not have any significance over there. Having said that you can run your function as follows
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println(res)
println("Arguments: " + res.toString)
}
}
Main.main(Array("123","23123"))
As is you can call Object Main's main method.
You can call the main method in the Main object as follows:
val args: Array[String] = Array("test1", "test2", "test3")
Main.main(args)
What you have in your main method won't print what you expect, which I assume is the values contained in the res array. To accomplish that you would need to change it to something like the following:
object Main {
def main(args: Array[String]): Unit = {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.mkString(" "))
}
}

What is Spark execution order with function calls in scala?

I have a spark program as follows:
object A {
var id_set: Set[String] = _
def init(argv: Array[String]) = {
val args = new AArgs(argv)
id_set = args.ids.split(",").toSet
}
def main(argv: Array[String]) {
init(argv)
val conf = new SparkConf().setAppName("some.name")
val rdd1 = getRDD(paras)
val rdd2 = getRDD(paras)
//......
}
def getRDD(paras) = {
//function details
getRDDDtails(paras)
}
def getRDDDtails(paras) = {
//val id_given = id_set
id_set.foreach(println) //worked normal, not empty
someRDD.filter{ x =>
val someSet = x.getOrElse(...)
//id_set.foreach(println) ------wrong, id_set just empty set
(some_set & id_set).size > 0
}
}
class AArgs(args: Array[String]) extends Serializable {
//parse args
}
I have a global variable id_set. At first, it is just an empty set. In main function, I call init which sets id_set to a non-empty set from args. After that, I call getRDD function which calls getRDDDtails. In getRDDDtails, I filter a rdd based on contents in id_set. However, the result semms to be empty. I tried to print is_set in executor, and it is just an empty line. So, the problem seems to be is_set is not well initilized(in init function). However, when I try to print is_set in driver(in head lines of function getRDDDtails), it worked normal, not empty.
So, I have tried to add val id_given = id_set in function getRDDDtails, and use id_given later. This seems to fix the problem. But I'm totally confused why should this happen? What is the execution order of Spark programs? Why does my solution work?

The Future is not complete?

object Executor extends App {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
implicit val ec = system.dispatcher
import akka.stream.io._
val file = new File("res/AdviceAnimals.tsv")
import akka.stream.io.Implicits._
val foreach: Future[Long] = SynchronousFileSource(file)
.to( Sink.outputStream(()=>System.out))
.run()
foreach onComplete { v =>
println(s"the foreach is ${v.get}") // the will not be print
}
}
but if I change the Sink.outputStream(()=>System.out) to Sink.ignore, the println(s"the foreach is ${v.get}") will print.
Can somebody explain why?
You are not waiting for the stream to complete, instead, your main method (the body of Executor) will complete, and since the main method is done exits the JVM is shut down.
What you want to do, is to block that thread and not exit the app before the future completes.
object Executor extends App {
// ...your stuff with streams...
val yourFuture: Future[Long] = ???
val result = Await.result(yourFuture, 5 seconds)
println(s"the foreach is ${result}")
// stop the actor system (or it will keep the app alive)
system.terminate()
}
Coincidently I created almost the same app for testing/playing with Akka Streams.
Could the imported implicits cause the problem?
This app works fine for me:
object PrintAllInFile extends App {
val file = new java.io.File("data.txt")
implicit val system = ActorSystem("test")
implicit val mat = ActorMaterializer()
implicit val ec = system.dispatcher
SynchronousFileSource(file)
.to(Sink.outputStream(() => System.out))
.run()
.onComplete(_ => system.shutdown())
}
Note the stopping of the ActorSystem in the 'onComplete'. Otherwise the app will not exit.