Scala Parallel Print Hanging the console - scala

I am new to this Scala world and I am trying some exercises from a book. So, I have an example that print a vector in sequential and parallel fashion. The former works perfectly and the later hangs the console.
Code
val v = Vector.range(0, 10)
v.foreach(println)
Code output
0123456789
But if I use the same code, but instead of using foearch, use par, it freezes the console
val v = Vector.range(0,10)
v.par.foreach(println)
The book I am using says that the output should be something like:
5678901234
But it hangs and the program never finishes.
Can someone explain me why?

It is better to post whole the program wich is hanged.
I've just tested it with scala 2.12.8 and jvm 1.8.0_161:
object MainClassss {
def main(args: Array[String]): Unit = {
val v = Vector.range(0,10)
(0 to 999999).foreach(_ => v.par.foreach(println))
}
}
the program has executed well with output and hasn't hanged.
If your program reproduced the aforementioned issue you need to take a thread dump by:
$ jstack <PID>
where PID is the process id.
Or you can take by jvisualvm jvm tool.
You can analyze why your program is hanged owing to caught thread dump if the program has some code blocking it from execution.

Related

How to execute scala tests programmatically

I'm looking for a way to execute scala tests (implemented in munit, but it could be also ScalaTest) programmatically. I want to perform more or less what sbt test does out-of-the box inside my own scala code, without running sbt (focusing on test discovery and execution and getting back a report).
I some something like this in mind:
object Test extends App {
val tests = TestDiscovery.discover("package.that.has.tests")
val reports = tests.foreach(test => test.execute())
// do something with the reports, maybe print to console
}
Is there any documentation related to this?
Scala Test has execute() and run().
In order to understand the impact of all the args it's worth looking at the Scala Test shell as well

How to allocate less memory to Scala with IntelliJ

I'm trying to crash my program (run in IntelliJ) with an OutOfMemoryException:
def OOMCrasher(acc: String): String = {
OOMCrasher(acc + "ADSJKFAKLWJEFLASDAFSDFASDFASERASDFASEASDFASDFASERESFDHFDYJDHJSDGFAERARDSHFDGJGHYTDJKXJCV")
}
OOMCrasher("")
However, it just runs for a very long time. My suspicions is that it simply takes a very long time to fill up all the gigabytes of memory allocated to the JVM with a string. So I'm looking at how to make IntelliJ allocate less memory to the JVM. Here's what I've tried:
In Run Configurations -> VM options:
--scala.driver.memory 1k || --driver.memory 1k
Both of these cause crashes with:
Unrecognized option: --scala.driver.memory
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
I've also tried to put the options in the Edit Configurations -> Program Arguments. This causes the program to run for a very long time again, not yielding an OutOfMemoryException.
EDIT:
I will accept any answer that successfully explains how to allocate less memory to the program, since that is the main question.
UPDATE:
Changing the function to:
def OOMCrasher(acc: HisList[String]): HisList[String] = {
OOMCrasher(acc.add("Hi again!"))
}
OOMCrasher(Cons("Hi again!", Empty))
where HisList is a simple LinkedList implementation as well as running with -Xmx3m caused the wanted exception.
To functionally reach an OutOfMemoryException is harder than it looks, because recursive functions almost always run first into a StackOverflowException.
But there is a mutable approach that will guarantee an OutOfMemoryException: Doubling a List over and over again. Scala's Lists are not limited by the maximum array size and thus can expand until there is just no more memory left.
Here's an example:
def explodeList[A](list: List[A]): Unit = {
var mlist = list
while(true) {
mlist = mlist ++ mlist
}
}
To answer your actual question, try to fiddle with the JVM option -Xmx___m (e.g. -Xmx256m). This defines the maximum heap size the JVM is allowed to allocate.

Spark running time in local changes a lot with "println"

A tricky problem happened related to the sharp increase of executing time.
I run my scala code in local spark, part of which is to build a n*n matrix.
When running a small dataset, it just takes 5s to finish. The most time-consuming part is to build 2000*2000 matrix. And this part is executed within map, which just deals with array data structure.
However, just out of curiosity, I add "println" within the matrix-building code to see the number of iterations. Suddenly, the whole running time increases to 1min23s.
And the final results are same.
I am new to Spark and have no idea what really causes this situation.
The codes are simply:
val x = someRDD.map(buildMatrix)
def buildMatrix(stringVect:Array[String]): Array[Array[Double]] = {
//var count = 0
val num = stringVect.length
var simi_matrix = Array[Array[Double]]()
for (i<- 0 until num-1){
for (j<- (i+1) until num){
"build the matrix with some computation"
//println(count)
//count += 1
}
}
}
TL;DR
This does not have to do anything with Spark. I/O access to the console is synchronized and costly. It will slow down any program on the JVM (Scala/Java/Clojure/...).
println defaults to java.lang.System.out which is a PrintStream. println delegates to PrintStream#println, hence entering the synchronized block of the println implementation to output to the console: There are two expenses:
Getting a synchronized lock
I/O to the console OutputStream
The slowdown observed is to be expected. Just don't use println in hot parts of the code (like a tight loop in this case).

How does the “scala.sys.process” from Scala 2.9 work?

I just had a look at the new scala.sys and scala.sys.process packages to see if there is something helpful here. However, I am at a complete loss.
Has anybody got an example on how to actually start a process?
And, which is most interesting for me: Can you detach processes?
A detached process will continue to run when the parent process ends and is one of the weak spots of Ant.
UPDATE:
There seem to be some confusion what detach is. Have a real live example from my current project. Once with z-Shell and once with TakeCommand:
Z-Shell:
if ! ztcp localhost 5554; then
echo "[ZSH] Start emulator"
emulator \
-avd Nexus-One \
-no-boot-anim \
1>~/Library/Logs/${PROJECT_NAME}-${0:t:r}.out \
2>~/Library/Logs/${PROJECT_NAME}-${0:t:r}.err &
disown
else
ztcp -c "${REPLY}"
fi;
Take-Command:
IFF %#Connect[localhost 5554] lt 0 THEN
ECHO [TCC] Start emulator
DETACH emulator -avd Nexus-One -no-boot-anim
ENDIFF
In both cases it is fire and forget, the emulator is started and will continue to run even after the script has ended. Of course having to write the scripts twice is a waste. So I look into Scala now for unified process handling without cygwin or xml syntax.
First import:
import scala.sys.process.Process
then create a ProcessBuilder
val pb = Process("""ipconfig.exe""")
Then you have two options:
run and block until the process exits
val exitCode = pb.!
run the process in background (detached) and get a Process instance
val p = pb.run
Then you can get the exitcode from the process with (If the process is still running it blocks until it exits)
val exitCode = p.exitValue
If you want to handle the input and output of the process you can use ProcessIO:
import scala.sys.process.ProcessIO
val pio = new ProcessIO(_ => (),
stdout => scala.io.Source.fromInputStream(stdout)
.getLines.foreach(println),
_ => ())
pb.run(pio)
I'm pretty sure detached processes work just fine, considering that you have to explicitly wait for it to exit, and you need to use threads to babysit the stdout and stderr. This is pretty basic, but it's what I've been using:
/** Run a command, collecting the stdout, stderr and exit status */
def run(in: String): (List[String], List[String], Int) = {
val qb = Process(in)
var out = List[String]()
var err = List[String]()
val exit = qb ! ProcessLogger((s) => out ::= s, (s) => err ::= s)
(out.reverse, err.reverse, exit)
}
Process was imported from SBT. Here's a thorough guide on how to use the process library as it appears in SBT.
https://github.com/harrah/xsbt/wiki/Process
Has anybody got an example on how to
actually start a process?
import sys.process._ // Package object with implicits!
"ls"!
And, which is most interesting for me:
Can you detach processes?
"/path/to/script.sh".run()
Most of what you'll do is related to sys.process.ProcessBuilder, the trait. Get to know that.
There are implicits that make usage less verbose, and they are available through the package object sys.process. Import its contents, like shown in the examples. Also, take a look at its scaladoc as well.
The following function will allow easy use if here documents:
def #<<< (command: String) (hereDoc: String) =
{
val process = Process (command)
val io = new ProcessIO (
in => {in.write (hereDoc getBytes "UTF-8"); in.close},
out => {scala.io.Source.fromInputStream(out).getLines.foreach(println)},
err => {scala.io.Source.fromInputStream(err).getLines.foreach(println)})
process run io
}
Sadly I was not able to (did not have the time to) to make it an infix operation. Suggested calling convention is therefore:
#<<< ("command") {"""
Here Document data
"""}
It would be call if anybody could give me a hint on how to make it a more shell like call:
"command" #<<< """
Here Document data
""" !
Documenting process a little better was second on my list for probably two months. You can infer my list from the fact that I never got to it. Unlike most things I don't do, this is something I said I'd do, so I greatly regret that it remains as undocumented as it was when it arrived. Sword, ready yourself! I fall upon thee!
If I understand the dialog so far, one aspect of the original question is not yet answered:
how to "detach" a spawned process so it continues to run independently of the parent scala script
The primary difficulty is that all of the classes involved in spawning a process must run on the JVM, and they are unavoidably terminated when the JVM exits. However, a workaround is to indirectly achieve the goal by leveraging the shell to do the "detach" on your behalf. The following scala script, which launches the gvim editor, appears to work as desired:
val cmd = List(
"scala",
"-e",
"""import scala.sys.process._ ; "gvim".run ; System.exit(0);"""
)
val proc = cmd.run
It assumes that scala is in the PATH, and it does (unavoidably) leave a JVM parent process running as well.

Internal scala compilation. Working with interactive.Global

I am trying to retrieve the AST from scala souce file. I have simplified the code (only relevant code) to following.
trait GetAST {
val settings = new Settings
val global = new Global(settings, new ConsoleReporter(settings))
def getSt = "hello" //global.typedTree(src, true)
}
object Tre extends GetAST {
def main(args:Array[String])
{
println(getSt.getClass)
println("exiting program")
}
}
The above code compiles fine and runs fine. But the problem is the program does not exit. The prompt is not displayed after printing "exiting program". I have to use ^c to exit.
Any idea what the problem might be
I believe Michael is correct, the compiler uses Threads and therefore the JVM doesn't just exit.
The good news is that interactive.Global mixes in interactive.CompilerControl trait whose askShutdown method you can call at the end of your main to let the program exit.
Without knowing what Settings, Global and ConsoleReporter are nobody can give you an exact answer. I would guess that at least one of them is creating a thread. The JVM waits until all threads are done (or all running are deamon threads). See here.
I would bet if you comment out the settings and global lines it will exit as expected.