ParSeq.fill running sequentially? - scala

I am trying to initialize an array in Scala, using parallelization. However, when using ParSeq.fill method, the performance doesn't seem to be better any better than sequential initialization (Seq.fill). If I do the same task, but initializing the collection with map, then it is much faster.
To show my point, I set up the following example:
import scala.collection.parallel.immutable.ParSeq
import scala.util.Random
object Timer {
def apply[A](f: => A): (A, Long) = {
val s = System.nanoTime
val ret = f
(ret, System.nanoTime - s)
}
}
object ParallelBenchmark extends App {
def randomIsPrime: Boolean = {
val n = Random.nextInt(1000000)
(2 until n).exists(i => n % i == 0)
}
val seqSize = 100000
val (_, timeSeq) = Timer { Seq.fill(seqSize)(randomIsPrime) }
println(f"Time Seq:\t\t $timeSeq")
val (_, timeParFill) = Timer { ParSeq.fill(seqSize)(randomIsPrime) }
println(f"Time Par Fill:\t $timeParFill")
val (_, timeParMap) = Timer { (0 until seqSize).par.map(_ => randomIsPrime) }
println(f"Time Par map:\t $timeParMap")
}
And the result is:
Time Seq: 32389215709
Time Par Fill: 32730035599
Time Par map: 17270448112
Clearly showing that the fill method is not running in parallel.

The parallel collections library in Scala can only parallelize existing collections, fill hasn't been implemented yet (and may never be). Your method of using a Range to generate a cheap placeholder collection is probably your best option if you want to see a speed boost.
Here's the underlying method being called by ParSeq.fill, obviously not parallel.

Related

Functional way of interrupting lazy iteration depedning on timeout and comparisson between previous and next, while, LazyList vs Stream

Background
I have the following scenario. I want to execute the method of a class from an external library, repeatedly, and I want to do so until a certain timeout condition and result condition (compared to the previous result) is met. Furthermore I want to collect the return values, even on the "failed" run (the run with the "failing" result condition that should interrupt further execution).
Thus far I have accomplished this with initializing an empty var result: Result, a var stop: Boolean and using a while loop that runs while the conditions are true and modifying the outer state. I would like to get rid of this and use a functional approach.
Some context. Each run is expected to run from 0 to 60 minutes and the total time of iteration is capped at 60 minutes. Theoretically, there's no bound to how many times it executes in this period but in practice, it's generally 2-60 times.
The problem is, the runs take a long time so I need to stop the execution. My idea is to use some kind of lazy Iterator or Stream coupled with scanLeft and Option.
Code
Boiler plate
This code isn't particularly relevant but used in my approach samples and provide identical but somewhat random pseudo runtime results.
import scala.collection.mutable.ListBuffer
import scala.util.Random
val r = Random
r.setSeed(1)
val sleepingTimes: Seq[Int] = (1 to 601)
.map(x => Math.pow(2, x).toInt * r.nextInt(100))
.toList
.filter(_ > 0)
.sorted
val randomRes = r.shuffle((0 to 600).map(x => r.nextInt(10)).toList)
case class Result(val a: Int, val slept: Int)
class Lib() {
def run(i: Int) = {
println(s"running ${i}")
Thread.sleep(sleepingTimes(i))
Result(randomRes(i), sleepingTimes(i))
}
}
case class Baz(i: Int, result: Result)
val lib = new Lib()
val timeout = 10 * 1000
While approach
val iteratorStart = System.currentTimeMillis()
val iterator = for {
i <- (0 to 600).iterator
if System.currentTimeMillis() < iteratorStart + timeout
f = Baz(i, lib.run(i))
} yield f
val iteratorBuffer = ListBuffer[Baz]()
if (iterator.hasNext) iteratorBuffer.append(iterator.next())
var run = true
while (run && iterator.hasNext) {
val next = iterator.next()
run = iteratorBuffer.last.result.a < next.result.a
iteratorBuffer.append(next)
}
Stream approach (Scala.2.12)
Full example
val streamStart = System.currentTimeMillis()
val stream = for {
i <- (0 to 600).toStream
if System.currentTimeMillis() < streamStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = stream.headOption
val tail = if (stream.nonEmpty) stream.tail else stream
val streamVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
LazyList approach (Scala 2.13)
Full example
val lazyListStart = System.currentTimeMillis()
val lazyList = for {
i <- (0 to 600).to(LazyList)
if System.currentTimeMillis() < lazyListStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = lazyList.headOption
val tail = if (lazyList.nonEmpty) lazyList.tail else lazyList
val lazyListVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
Result
Both approaches appear to yield the correct end result:
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)))
and they interrupt execution as desired.
Edit: The desired outcome is to not execute the next iteration but still return the result of the iteration that caused the interruption. Thus the desired result is
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)), Baz(2,Result(2,256))
and lib.run(i) should only run 3 times.
This is achieved by the while approach, as well as the LazyList approach but not the Stream approach which executes lib.run 4 times (Bad!).
Question
Is there another stateless approach, which is hopefully more elegant?
Edit
I realized my examples were faulty and not returning the "failing" result, which it should, and that they kept executing beyond the stop condition. I rewrote the code and examples but I believe the spirit of the question is the same.
I would use something higher level, like fs2.
(or any other high-level streaming library, like: monix observables, akka streams or zio zstreams)
def runUntilOrTimeout[F[_]: Concurrent: Timer, A](work: F[A], timeout: FiniteDuration)
(stop: (A, A) => Boolean): Stream[F, A] = {
val interrupt =
Stream.sleep_(timeout)
val run =
Stream
.repeatEval(work)
.zipWithPrevious
.takeThrough {
case (Some(p), c) if stop(p, c) => false
case _ => true
} map {
case (_, c) => c
}
run mergeHaltBoth interrupt
}
You can see it working here.

Scala how to decrease execution time

I have one method which generate UUID and code as below :
def generate(number : Int): List[String] = {
List.fill(number)(Generators.randomBasedGenerator().generate().toString.replaceAll("-",""))
}
and I called this as below :
for(i <-0 to 100) {
val a = generate(1000000)
println(a)
}
But for running the above for loop it take almost 8-9 minutes for execution, is there any other way to minimised execution time ?
Note: Here for understanding I added for loop but in real situation the generate method will call thousand of times from other request at same time.
The problem is the List. Filling a List with 1,000,000 generated and processed elements is going to take time (and memory) because every one of those elements has to be materialized.
You can generate an infinite number of processed UUID strings instantly if you don't have to materialize them until they are actually needed.
def genUUID :Stream[String] = Stream.continually {
Generators.randomBasedGenerator().generate().toString.filterNot(_ == '-')
}
val next5 = genUUID.take(5) //only the 1st (head) is materialized
next5.length //now all 5 are materialized
You can use Stream or Iterator for the infinite collection, whichever you find most conducive (or least annoying) to your work flow.
Basically you used not the fastest implementation. You should use that one when you pass Random to the constructor Generators.randomBasedGenerator(new Random(System.currentTimeMillis())). I did next things:
Use Array instead of List (Array is faster)
Removed string replacing, let's measure pure performance of generation
Dependency: "com.fasterxml.uuid" % "java-uuid-generator" % "3.1.5"
Result:
Generators.randomBasedGenerator(). Per iteration: 1579.6 ms
Generators.randomBasedGenerator() with passing Random Per iteration: 59.2 ms
Code:
import java.util.{Random, UUID}
import com.fasterxml.uuid.impl.RandomBasedGenerator
import com.fasterxml.uuid.{Generators, NoArgGenerator}
import org.scalatest.{FunSuiteLike, Matchers}
import scala.concurrent.duration.Deadline
class GeneratorTest extends FunSuiteLike
with Matchers {
val nTimes = 10
// Let use Array instead of List - Array is faster!
// and use pure UUID generators
def generate(uuidGen: NoArgGenerator, number: Int): Seq[UUID] = {
Array.fill(number)(uuidGen.generate())
}
test("Generators.randomBasedGenerator() without passed Random (secure one)") {
// Slow generator
val uuidGen = Generators.randomBasedGenerator()
// Warm up JVM
benchGeneration(uuidGen, 3)
val startTime = Deadline.now
benchGeneration(uuidGen, nTimes)
val endTime = Deadline.now
val perItermTimeMs = (endTime - startTime).toMillis / nTimes.toDouble
println(s"Generators.randomBasedGenerator(). Per iteration: $perItermTimeMs ms")
}
test("Generators.randomBasedGenerator() with passing Random (not secure)") {
// Fast generator
val uuidGen = Generators.randomBasedGenerator(new Random(System.currentTimeMillis()))
// Warm up JVM
benchGeneration(uuidGen, 3)
val startTime = Deadline.now
benchGeneration(uuidGen, nTimes)
val endTime = Deadline.now
val perItermTimeMs = (endTime - startTime).toMillis / nTimes.toDouble
println(s"Generators.randomBasedGenerator() with passing Random Per iteration: $perItermTimeMs ms")
}
private def benchGeneration(uuidGen: RandomBasedGenerator, nTimes: Int) = {
var r: Long = 0
for (i <- 1 to nTimes) {
val a = generate(uuidGen, 1000000)
r += a.length
}
println(r)
}
}
You could use scala's parallel collections to split the load on multiple cores/threads.
You could also avoid creating a new generator every time:
class Generator {
val gen = Generators.randomBasedGenerator()
def generate(number : Int): List[String] = {
List.fill(number)(gen.generate().toString.replaceAll("-",""))
}
}

In Apache Spark, how to make an RDD/DataFrame operation lazy?

Assuming that I would like to write a function foo that transforms a DataFrame:
object Foo {
def foo(source: DataFrame): DataFrame = {
...complex iterative algorithm with a stopping condition...
}
}
since the implementation of foo has many "Actions" (collect, reduce etc.), calling foo will immediately triggers the expensive execution.
This is not a big problem, however since foo only converts a DataFrame to another, by convention it should be better to allow lazy execution: the implementation of foo should be executed only if the resulted DataFrame or its derivative(s) are being used on the Driver (through another "Action").
So far, the only way to reliably achieve this is through writing all implementations into a SparkPlan, and superimpose it into the DataFrame's SparkExecution, this is very error-prone and involves lots of boilerplate codes. What is the recommended way to do this?
It is not exactly clear to me what you try to achieve but Scala itself provides at least few tools which you may find useful:
lazy vals:
val rdd = sc.range(0, 10000)
lazy val count = rdd.count // Nothing is executed here
// count: Long = <lazy>
count // count is evaluated only when it is actually used
// Long = 10000
call-by-name (denoted by => in the function definition):
def foo(first: => Long, second: => Long, takeFirst: Boolean): Long =
if (takeFirst) first else second
val rdd1 = sc.range(0, 10000)
val rdd2 = sc.range(0, 10000)
foo(
{ println("first"); rdd1.count },
{ println("second"); rdd2.count },
true // Only first will be evaluated
)
// first
// Long = 10000
Note: In practice you should create local lazy binding to make sure that arguments are not evaluated on every access.
infinite lazy collections like Stream
import org.apache.spark.mllib.random.RandomRDDs._
val initial = normalRDD(sc, 1000000L, 10)
// Infinite stream of RDDs and actions and nothing blows :)
val stream: Stream[RDD[Double]] = Stream(initial).append(
stream.map {
case rdd if !rdd.isEmpty =>
val mu = rdd.mean
rdd.filter(_ > mu)
case _ => sc.emptyRDD[Double]
}
)
Some subset of these should be more than enough to implement complex lazy computations.

Scala TypeTags and performance

There are some answers around for equivalent questions about Java, but is scala reflection (2.11, TypeTags) really slow? there's a long narrative write-up about it at http://docs.scala-lang.org/overviews/reflection/overview.html, where the answer to this question is hard to extract.
I see a lot of advice floating around about avoiding reflection, maybe some of it predating the improvements of 2.11, but if this works well it looks like it can solve the debilitating aspect of the JVM's type erasure, for scala code.
Thanks!
Let's measure it.
I've created simple class C that has one method. All what this method do is sleep for 10ms.
Let's invoke this method
within reflection
directly
And see which is faster and how fast it is.
I've created three tests.
Test 1. Invoke via reflection. Execution time include all work that necessary to be done for setup reflection.
Create runtimeMirror, reflect class, create declaration for method, and at last step - execute method.
Test 2. Do not take into account this preparation stage, as it can be re-used.
We are calculate time of method invoking via reflection only.
Test 3. Invoke method directly.
Results:
Reflection from start : job done in 2561ms got 101 (1,5seconds for setup each execution)
Invoke method reflection: job done in 1093ms got 101 ( < 1ms for setup each execution)
No reflection: job done in 1087ms got 101 ( < 1ms for setup each execution)
Conclusion:
Setup phase increase execution time dramatically. But there are no need to perform setup on each execution (this is like class initialization - can be done once). So if you use reflection in right way(with separated init stage) it shows relevant performance and can be used for production.
Source code:
class C {
def x = {
Thread.sleep(10)
1
}
}
class XYZTest extends FunSpec {
def withTime[T](procName: String, f: => T): T = {
val start = System.currentTimeMillis()
val r = f
val end = System.currentTimeMillis()
print(s"$procName job done in ${end-start}ms")
r
}
describe("SomeTest") {
it("rebuild each time") {
val s = withTime("Reflection from start : ", (0 to 100). map {x =>
val ru = scala.reflect.runtime.universe
val m = ru.runtimeMirror(getClass.getClassLoader)
val im = m.reflect(new C)
val methodX = ru.typeOf[C].declaration(ru.TermName("x")).asMethod
val mm = im.reflectMethod(methodX)
mm().asInstanceOf[Int]
}).sum
println(s" got $s")
}
it("invoke each time") {
val ru = scala.reflect.runtime.universe
val m = ru.runtimeMirror(getClass.getClassLoader)
val im = m.reflect(new C)
val s = withTime("Invoke method reflection: ", (0 to 100). map {x =>
val methodX = ru.typeOf[C].declaration(ru.TermName("x")).asMethod
val mm = im.reflectMethod(methodX)
mm().asInstanceOf[Int]
}).sum
println(s" got $s")
}
it("invoke directly") {
val c = new C()
val s = withTime("No reflection: ", (0 to 100). map {x =>
c.x
}).sum
println(s" got $s")
}
}
}

Are recursive computations with Apache Spark RDD possible?

I'm developing chess engine using Scala and Apache Spark (and I need to stress that my sanity is not the topic of this question). My problem is that Negamax algorithm is recursive in its essence and when I try naive approach:
class NegaMaxSparc(#transient val sc: SparkContext) extends Serializable {
val movesOrdering = new Ordering[Tuple2[Move, Double]]() {
override def compare(x: (Move, Double), y: (Move, Double)): Int =
Ordering[Double].compare(x._2, y._2)
}
def negaMaxSparkHelper(game: Game, color: PieceColor, depth: Int, previousMovesPar: RDD[Move]): (Move, Double) = {
val board = game.board
if (depth == 0) {
(null, NegaMax.evaluateDefault(game, color))
} else {
val moves = board.possibleMovesForColor(color)
val movesPar = previousMovesPar.context.parallelize(moves)
val moveMappingFunc = (m: Move) => { negaMaxSparkHelper(new Game(board.boardByMakingMove(m), color.oppositeColor, null), color.oppositeColor, depth - 1, movesPar) }
val movesWithScorePar = movesPar.map(moveMappingFunc)
val move = movesWithScorePar.min()(movesOrdering)
(move._1, -move._2)
}
}
def negaMaxSpark(game: Game, color: PieceColor, depth: Int): (Move, Double) = {
if (depth == 0) {
(null, NegaMax.evaluateDefault(game, color))
} else {
val movesPar = sc.parallelize(new Array[Move](0))
negaMaxSparkHelper(game, color, depth, movesPar)
}
}
}
class NegaMaxSparkBot(val maxDepth: Int, sc: SparkContext) extends Bot {
def nextMove(game: Game): Move = {
val nms = new NegaMaxSparc(sc)
nms.negaMaxSpark(game, game.colorToMove, maxDepth)._1
}
}
I get:
org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
The question is: can this algorithm be implemented recursively using Spark? If not, then what is the proper Spark-way to solve that problem?
Only the driver can launch computation on RDD. The reason is that even though RDD "feel" like regular collections of data, behind the scene they are still distributed collections, so launching operations on them requires coordinating execution of tasks on all remote slaves, which spark hides from us most of the time.
So recursing from the slaves, i.e. launching new distributed tasks dynamically directly from slaves is not possible: only the drive can take care of such coordination.
Here's a possible alternative of a simplification of your problem (if I get things correctly). The idea is to successively build instances of Moves, each one representing the full sequence of Move from initial state.
Each instance of Moves is able to transform itself into a set of Moves, each one corresponding to the same sequence of Move plus one possible next Move.
From there the driver just has to successively flatMap the Moves for as deep as we want, and the resulting RDD[Moves] will execute all operations in parallel for us.
The downside of the approach is that all depth level are kept synchronized, i.e. we have to compute all moves at level n (i.e. the RDD[Moves] for level n) before going to the next one.
The code below is not tested, it probably has flaws and does not even compile, but hopefully it provides an idea on how to approach the problem.
/* one modification to the board */
case class Move(from: String, to: String)
case class PieceColor(color: String)
/* state of the game */
case class Board {
// TODO
def possibleMovesForColor(color: PieceColor): Seq[Move] =
Move("here", "there") :: Move("there", "over there") :: Move("there", "here") :: Nil
// TODO: compute a new instance of board here, based on current + this move
def update(move: Move): Board = new Board
}
/** Solution, i.e. a sequence of moves*/
case class Moves(moves: Seq[Move], game: Board, color: PieceColor) {
lazy val score = NegaMax.evaluateDefault(game, color)
/** #return all valid next Moves */
def nextPossibleMoves: Seq[Moves] =
board.possibleMovesForColor(color).map {
nextMove =>
play.copy(moves = nextMove :: play.moves,
game = play.game.update(nextMove)
}
}
/** Driver code: negaMax: looks for the best next move from a give game state */
def negaMax(sc: SparkContext, game: Board, color: PieceColor, maxDepth: Int):Moves = {
val initialSolution = Moves(Seq[moves].empty, game, color)
val allPlays: rdd[Moves] =
(1 to maxDepth).foldLeft (sc.parallelize(Seq(initialSolution))) {
rdd => rdd.flatMap(_.nextPossibleMoves)
}
allPlays.reduce { case (m1, m2) => if (m1.score < m2.score) m1 else m2}
}
This is a limitation that makes sense in terms of the implementation, but it can be a pain to work with.
You can try pulling out the recursion to top level, just in the "driver" code that creates and operates with RDDs? Something like:
def step(rdd: Rdd[Move], limit: Int) =
if(0 == limit) rdd
else {
val newRdd = rdd.flatMap(...)
step(newRdd, limit - 1)
}
Alternately it's always possible to translate recursion into iteration, by managing the "stack" explicitly by hand (although it may result in more cumbersome code).