scala implicit par makes no progress - scala

This is my first attempt with using Scala's parallelism. I have huge data (that can be stored as any collection) which I would like to parallelize on a multi-core system using a simple map operation (such as val out = data.par.map(foo(_)). The example I saw given at scala docs has the following snippet that gave weird output. The 'serial' versions seems to run with implicit parallelism and the parallel versions are not working. Any pointers towards a solution would be very much appreciated.
scala> val list = (1 to 1000000).toList
list: List[Int] = List(1, 2, 3, 4, 5,... // used > 1000% cpu
scala> val out = list.map(_ + 42) // again used > 1000% cpu
out: List[Int] = List(43, 44, 45, 46,
scala> val out = list.par.map(_ + 42) // process stalls, consumes no cpu!
scala> (1 to 10) map println // initially used >400% cpu
1
2
3
4
5
6
7
8
9
10
scala> (1 to 10).par map println // process stalls, consumes no cpu!
I am using Scala 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_275)
Edit: The above code ran in a script but not in scala shell. Probably some limitation of the shell itself.

Try with -Yrepl-class-based
$ scala-runners/scala --scala-version 2.12.10 -Yrepl-class-based
Welcome to Scala 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 10.0.2).
Type in expressions for evaluation. Or try :help.
scala> (1 to 10).par map println
6
9
8
1
10
7
2
3
4
5
res0: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((), (), (), (), (), (), (), (), (), ())

Related

Using vim editor to create a Scala script within REPL

Version of Scala I am using is Scala 2.12.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121) and the Jline Library present is 2.14.3 .
This could sound silly but I am trying to figure out the problem when trying to create a scala file using editor cmd line vi or vim during the Scala REPL mode its throwing error. Below is my error .. Could you please let me know if there is any specific Scala Terminal console that I am suppose to use or am I doing something wrong ?
scala> vi test1.scala
<console>:1: error: ';' expected but '.' found.
vi test1.scala
I am able to do a VI and VIM as well in my system without the SCALA REPL mode but when I am in REPL I am not able to create a scala script file and execute it . What could be wrong ? Is there any settings that needs to be enabled for this ?
For saving your REPL history, use :save file.
There is limited support for using an external editor. The result of the edit is run immediately. After a reset, only the edited lines are in session history, so save will save only those lines.
$ EDITOR=gedit scala
Welcome to Scala 2.12.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111).
Type in expressions for evaluation. Or try :help.
scala> val x = 42
x: Int = 42
scala> println(x)
42
scala> :edit -2
+val x = 17
+println(x)
17
x: Int = 17
scala> :hi 3
1896 val x = 17
1897 println(x)
1898 :hi 3
scala> :reset
Resetting interpreter state.
Forgetting this session history:
val x = 42
println(x)
val x = 17
println(x)
Forgetting all expression results and named terms: $intp, x
scala> :ed 1896+2
+val x = 5
+println(x)
5
x: Int = 5
scala> :save sc.sc
scala> :load sc.sc
Loading sc.sc...
x: Int = 5
5

scala spark paritions: understanding the output

I am running spark and scala. What is the meaning of the line that i get when i run rawblocks.partitions.length? My linkage folder had 10 files.
what does res1 and Int stand for?
Also is there a place where i
can find official documentation for spark methods? For example i
want to see details of textFile.
spark version 1.6.1
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65)
scala> val rawblocks=sc.textFile("linkage")
rawblocks: org.apache.spark.rdd.RDD[String] = linkage MapPartitionsRDD[3] at textFile at <console>:27
scala> rawblocks.partitions.length
res1: Int = 10
The res1 and Int are not special to Spark: res1 is a name given in Scala REPL (shell) to unnamed values - results are numerated (starting from zero), for example:
scala> 10
res0: Int = 10
scala> "hello"
res1: String = hello
This should also give you a clue about Int - it's the inferred type of this value (Scala's Int is somewhat equivalent to Java's Integer).
Spark API: here's the documentation for the two primary entry points of Spark-core: SparkContext, RDD

why the last computed item in a stream doesn't appear in REPL?

Lately, I've run a few tests on stream in the REPL and strangely the last computed item in the stream isn't displayed. Example of what I mean:
val s = Stream.from(1)
// scala.collection.immutable.Stream[Int] = Stream(1, ?)
s(5)
// Int = 6
s
// scala.collection.immutable.Stream[Int] = Stream(1, 2, 3, 4, 5, ?)
Maybe I am missing something, but I would expect s to contain 6 (i.e. s(5)).
Can anyone explain this?
[scala version 2.11.6 (OpenJDK 64-Bit Server VM, Java 1.7.0_91)]
This was a bug in scala 2.11.6 which has been fixed for 2.11.7
See https://issues.scala-lang.org/browse/SI-9219 for more details

How to clear all variables in Scala REPL

Is there a quick command? I don't want to Ctrl+d and run Scala everytime I want to clear all variables. reset, clear and clean don't work and :help doesn't have anything listed
You can use :reset
Welcome to Scala version 2.10.0-RC2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_37).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val a = 1
a: Int = 1
scala> val b = 3
b: Int = 3
scala> :reset
Resetting interpreter state.
Forgetting this session history:
val a = 1
val b = 3
Forgetting all expression results and named terms: $intp, a, b
scala>

Can views be used with parallel collections?

The idiom for finding a result within a mapping of a collection goes something like this:
list.view.map(f).find(p)
where list is a List[A], f is an A => B, and p is a B => Boolean.
Is it possible to use view with parallel collections? I ask because I'm getting some very odd results:
Welcome to Scala version 2.9.1.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val f : Int => Int = i => {println(i); i + 10}
f: Int => Int = <function1>
scala> val list = (1 to 10).toList
list: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> list.par.view.map(f).find(_ > 5)
1
res0: Option[Int] = Some(11)
scala> list.par.view.map(f).find(_ > 5)
res1: Option[Int] = None
See "A Generic Parallel Collection Framework", the paper by Martin Odersky et al that discusses the new parallel collections. Page 8 has a section "Parallel Views" that talks about how view and par can be used together, and how this can give the performance benefits of both views and parallel computation.
As for your specific example, that is definately a bug. The exists method also breaks, and having it break on one list breaks it for all other lists, so I think it is a problem where operations that may be aborted part way through (find and exists can stop once the have an answer) manage to break the thread pool in some way. It could be related to the bug with exceptions being thrown inside functions passed to parallel collections. If so, it should be fixed in 2.10.