Is there any way to replace nested For loop with Higher order methods in scala - scala

I am having a mutableList and want to take sum of all of its rows and replacing its rows with some other values based on some criteria. Code below is working fine for me but i want to ask is there any way to get rid of nested for loops as for loops slows down the performance. I want to use scala higher order methods instead of nested for loop. I tried flodLeft() higher order method to replace single for loop but can not implement to replace nested for loop
def func(nVect : Int , nDim : Int) : Unit = {
var Vector = MutableList.fill(nVect,nDimn)(math.random)
var V1Res =0.0
var V2Res =0.0
var V3Res =0.0
for(i<- 0 to nVect -1) {
for (j <- i +1 to nVect -1) {
var resultant = Vector(i).zip(Vector(j)).map{case (x,y) => x + y}
V1Res = choice(Vector(i))
V2Res = choice(Vector(j))
V3Res = choice(resultant)
if(V3Res > V1Res){
Vector(i) = res
}
if(V3Res > V2Res){
Vector(j) = res
}
}
}
}

There are no "for loops" in this code; the for statements are already converted to foreach calls by the compiler, so it is already using higher-order methods. These foreach calls could be written out explicitly, but it would make no difference to the performance.
Making the code compile and then cleaning it up gives this:
def func(nVect: Int, nDim: Int): Unit = {
val vector = Array.fill(nVect, nDim)(math.random)
for {
i <- 0 until nVect
j <- i + 1 until nVect
} {
val res = vector(i).zip(vector(j)).map { case (x, y) => x + y }
val v1Res = choice(vector(i))
val v2Res = choice(vector(j))
val v3Res = choice(res)
if (v3Res > v1Res) {
vector(i) = res
}
if (v3Res > v2Res) {
vector(j) = res
}
}
}
Note that using a single for does not make any difference to the result, it just looks better!
At this point it gets difficult to make further improvements. The only parallelism possible is with the inner map call, but vectorising this is almost certainly a better option. If choice is expensive then the results could be cached, but this cache needs to be updated when vector is updated.
If the choice could be done in a second pass after all the cross-sums have been calculated then it would be much more parallelisable, but clearly that would also change the results.

Related

How to speed up this Scala solution for an easy Leetcode problem?

I solved easy Leetcode problem Ransom Note in Scala like this :
def canConstruct(ransomNote: String, magazine: String): Boolean = {
val magazineChars = magazine.toSeq.groupBy(identity).mapValues(_.size)
val ransomChars = ransomNote.toSeq.groupBy(identity).mapValues(_.size)
ransomChars.forall { case (c, num) => magazineChars.getOrElse(c, 0) >= num }
}
This solution is Ok but slower than other accepted solutions in Scala.
Now I wonder how to speed it up. How would you suggest optimize this solution ?
For performance purpose, you should use low level data structure (primitive type instead of object type, array of primitive type instead of List, Map, i.e.), and low level syntax (while instead of foreach loop, i.e.)
Here is my solution, which beats 90% ~ 100% (it's random), you can speed up it by replace foreach to while loop and replace forall to while loop too, but it's too tedious:
a slightly optimized version of the above solution:
def canConstruct(ransomNote: String, magazine: String): Boolean = {
if (magazine.length < ransomNote.length) {
false // if the magazine has fewer letters than the ransom note, then we definitely can't make the note
} else {
var i = 0
val counts = Array.ofDim[Int](26)
while (i < magazine.length) {
counts(magazine(i) - 'a') += 1
if (i < ransomNote.length) counts(ransomNote(i) - 'a') -= 1 // avoid the need for another loop for the ransom note letters
i += 1
}
var c = 0;
while (c < counts.length) {
if (counts(c) < 0) {
return false
}
c += 1
}
true
}
}
with the following results (after a few runs):

TapeEquilibrium ScalaCheck

I have been trying to code some scalacheck property to verify the Codility TapeEquilibrium problem. For those who do not know the problem, see the following link: https://app.codility.com/programmers/lessons/3-time_complexity/tape_equilibrium/.
I coded the following yet incomplete code.
test("Lesson 3 property"){
val left = Gen.choose(-1000, 1000).sample.get
val right = Gen.choose(-1000, 1000).sample.get
val expectedSum = Math.abs(left - right)
val leftArray = Gen.listOfN(???, left) retryUntil (_.sum == left)
val rightArray = Gen.listOfN(???, right) retryUntil (_.sum == right)
val property = forAll(leftArray, rightArray){ (r: List[Int], l: List[Int]) =>
val array = (r ++ l).toArray
Lesson3.solution3(array) == expectedSum
}
property.check()
}
The idea is as follows. I choose two random numbers (values left and right) and calculate its absolute difference. Then, my idea is to generate two arrays. Each array will be random numbers whose sum will be either "left" or "right". Then by concatenating these array, I should be able to verify this property.
My issue is then generating the leftArray and rightArray. This itself is a complex problem and I would have to code a solution for this. Therefore, writing this property seems over-complicated.
Is there any way to code this? Is coding this property an overkill?
Best.
My issue is then generating the leftArray and rightArray
One way to generate these arrays or (lists), is to provide a generator of nonEmptyList whose elements sum is equal to a given number, in other word, something defined by method like this:
import org.scalacheck.{Gen, Properties}
import org.scalacheck.Prop.forAll
def listOfSumGen(expectedSum: Int): Gen[List[Int]] = ???
That verifies the property:
forAll(Gen.choose(-1000, 1000)){ sum: Int =>
forAll(listOfSumGen(sum)){ listOfSum: List[Int] =>
(listOfSum.sum == sum) && listOfSum.nonEmpty
}
}
To build such a list only poses a constraint on one element of the list, so basically here is a way to generate:
Generate list
The extra constrained element, will be given by the expectedSum - the sum of list
Insert the constrained element into a random index of the list (because obviously any permutation of the list would work)
So we get:
def listOfSumGen(expectedSum: Int): Gen[List[Int]] =
for {
list <- Gen.listOf(Gen.choose(-1000,1000))
constrainedElement = expectedSum - list.sum
index <- Gen.oneOf(0 to list.length)
} yield list.patch(index, List(constrainedElement), 0)
Now we the above generator, leftArray and rightArray could be define as follows:
val leftArray = listOfSumGen(left)
val rightArray = listOfSumGen(right)
However, I think that the overall approach of the property described is incorrect, as it builds an array where a specific partition of the array equals the expectedSum but this doesn't ensure that another partition of the array would produce a smaller sum.
Here is a counter-example run-through:
val left = Gen.choose(-1000, 1000).sample.get // --> 4
val right = Gen.choose(-1000, 1000).sample.get // --> 9
val expectedSum = Math.abs(left - right) // --> |4 - 9| = 5
val leftArray = listOfSumGen(left) // Let's assume one of its sample would provide List(3,1) (whose sum equals 4)
val rightArray = listOfSumGen(right)// Let's assume one of its sample would provide List(2,4,3) (whose sum equals 9)
val property = forAll(leftArray, rightArray){ (l: List[Int], r: List[Int]) =>
// l = List(3,1)
// r = List(2,4,3)
val array = (l ++ r).toArray // --> Array(3,1,2,4,3) which is the array from the given example in the exercise
Lesson3.solution3(array) == expectedSum
// According to the example Lesson3.solution3(array) equals 1 which is different from 5
}
Here is an example of a correct property that essentially applies the definition:
def tapeDifference(index: Int, array: Array[Int]): Int = {
val (left, right) = array.splitAt(index)
Math.abs(left.sum - right.sum)
}
forAll(Gen.nonEmptyListOf(Gen.choose(-1000,1000))) { list: List[Int] =>
val array = list.toArray
forAll(Gen.oneOf(array.indices)) { index =>
Lesson3.solution3(array) <= tapeDifference(index, array)
}
}
This property definition might collides with the way the actual solution has been implemented (which is one of the potential pitfall of scalacheck), however, that would be a slow / inefficient solution hence this would be more a way to check an optimized and fast implementation against slow and correct implementation (see this presentation)
Try this with c# :
using System;
using System.Collections.Generic;
using System.Linq;
private static int TapeEquilibrium(int[] A)
{
var sumA = A.Sum();
var size = A.Length;
var take = 0;
var res = new List<int>();
for (int i = 1; i < size; i++)
{
take = take + A[i-1];
var resp = Math.Abs((sumA - take) - take);
res.Add(resp);
if (resp == 0) return resp;
}
return res.Min();
}

Understanding performance of a tailrec annotated recursive method in scala

Consider the following method - which has been verified to conform to the proper tail recursion :
#tailrec
def getBoundaries(grps: Seq[(BigDecimal, Int)], groupSize: Int, curSum: Int = 0, curOffs: Seq[BigDecimal] = Seq.empty[BigDecimal]): Seq[BigDecimal] = {
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (newSum%50==0) { println(s"id=$id newsum=$newSum") }
if (newSum >= groupSize) {
getBoundaries(grps.tail, groupSize, 0, curOffs :+ id) // r1
} else {
getBoundaries(grps.tail, groupSize, newSum, curOffs) // r2
}
}
}
This is running very slowly - about 75 loops per second. When I hit the stacktrace (a nice feature of Intellij) almost every time the line that is currently being invoked is the second tail-recursive call r2. That fact makes me suspicious of the purported "scala unwraps the recursive calls into a while loop". If the unwrapping were occurring then why are we seeing so much time in the invocations themselves?
Beyond having a properly structured tail recursive method are there other considerations to get a recursive routine have performance approaching a direct iteration?
The performance will depend on the underlying type of the Seq.
If it is List then the problem is appending (:+) to the List because this gets very slow with long lists because it has to scan the whole list to find the end.
One solution is to prepend to the list (+:) each time and then reverse at the end. This can give very significant performance improvements, because adding to the start of a list is very quick.
Other Seq types will have different performance characteristics, but you can convert to a List before the recursive call so that you know how it is going to perform.
Here is sample code
def getBoundaries(grps: Seq[(BigDecimal, Int)], groupSize: Int): Seq[BigDecimal] = {
#tailrec
def loop(grps: List[(BigDecimal, Int)], curSum: Int, curOffs: List[BigDecimal]): List[BigDecimal] =
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (newSum >= groupSize) {
loop(grps.tail, 0, id +: curOffs) // r1
} else {
loop(grps.tail, newSum, curOffs) // r2
}
}
loop(grps.toList, 0, Nil).reverse
}
This version gives 10x performance improvement over the original code using the test data provided by the questioner in his own answer to the question.
The issue is not in the recursion but instead in the array manipulation . With the following testcase it runs at about 200K recursions per second
type Fgroups = Seq[(BigDecimal, Int)]
test("testGetBoundaries") {
val N = 200000
val grps: Fgroups = (N to 1 by -1).flatMap { x => Array.tabulate(x % 20){ x2 => (BigDecimal(x2 * 1e9), 1) }}
val sgrps = grps.sortWith { case (a, b) =>
a._1.longValue.compare(b._1.longValue) < 0
}
val bb = getBoundaries(sgrps, 100 )
println(bb.take(math.min(50,bb.length)).mkString(","))
assert(bb.length==1900)
}
My production data sample has a similar number of entries (Array with 233K rows ) but runs at 3 orders of magnitude more slowly. I am looking into the tail operation and other culprits now.
Update The following reference from Alvin Alexander indicates that the tail operation should be v fast for immutable collections - but deadly slow for long mutable ones - including Array's !
https://alvinalexander.com/scala/understanding-performance-scala-collections-classes-methods-cookbook
Wow! I had no idea about the performance implications of using mutable collections in scala!
Update By adding code to convert the Array to an (immutable) Seq I see the 3 orders of magnitude performance improvement on the production data sample:
val grps = if (grpsIn.isInstanceOf[mutable.WrappedArray[_]] || grpsIn.isInstanceOf[Array[_]]) {
Seq(grpsIn: _*)
} else grpsIn
The (now fast ~200K/sec) final code is:
type Fgroups = Seq[(BigDecimal, Int)]
val cntr = new java.util.concurrent.atomic.AtomicInteger
#tailrec
def getBoundaries(grpsIn: Fgroups, groupSize: Int, curSum: Int = 0, curOffs: Seq[BigDecimal] = Seq.empty[BigDecimal]): Seq[BigDecimal] = {
val grps = if (grpsIn.isInstanceOf[mutable.WrappedArray[_]] || grpsIn.isInstanceOf[Array[_]]) {
Seq(grpsIn: _*)
} else grpsIn
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (cntr.getAndIncrement % 500==0) { println(s"[${cntr.get}] id=$id newsum=$newSum") }
if (newSum >= groupSize) {
getBoundaries(grps.tail, groupSize, 0, curOffs :+ id)
} else {
getBoundaries(grps.tail, groupSize, newSum, curOffs)
}
}
}

Replacing while loop with idiomatic functional style

Below code demonstrates two methods of summing a value where the termination condition is if the value reaches 5 :
object ob extends App {
def withoutRecursion = {
var currentTotal = 0.0;
while(5 > currentTotal){
currentTotal = currentTotal + scala.util.Random.nextFloat
println(currentTotal);
}
}
def withRecursion = {
var currentTotal = 0.0;
def whileLoop(cond : =>Boolean)(block : =>Unit) : Unit =
if(cond) {
block
whileLoop(cond)(block)
}
whileLoop(5 > currentTotal) {
currentTotal = currentTotal + scala.util.Random.nextFloat
println(currentTotal);
}
}
}
Is the withRecursion method the idiomatic method of replacing a while loop with a functional programming style in this case ?
Not really. In your examples you are still using a var to collect the intermediate results. That's not very FP.
One (of many) options is to create an infinite, but lazy, Stream to repeat the necessary calculations. Then if you want all the intermediate values you takeWhile(), or if you want just the end result you dropWhile().
Example:
Stream.iterate(0F)(_ + util.Random.nextFloat).takeWhile(_ < 5)
This creates a finite Stream with all the values from 0.0 to the largest sum less than 5.
On the other hand ...
Stream.iterate(0F)(_ + util.Random.nextFloat).dropWhile(_ < 5).head
... this will give you the first result greater than 5.

Spark job slow on cluster than standalone

I have this piece of code which works fine in standalone but works slowly when it comes to work on a cluster of 4 slaves (8cores 30Go memory) at AWS.
For a file of 10000 entries
Standalone : 257s
Aws 4S : 369s
def tabHash(nb:Int, dim:Int) = {
var tabHash0 = Array(Array(0.0)).tail
for( ind <- 0 to nb-1) {
var vechash1 = Array(0.0).tail
for( ind <- 0 to dim-1) {
val nG = Random.nextGaussian
vechash1 = vechash1 :+ nG
}
tabHash0 = tabHash0 :+ vechash1
}
tabHash0
}
def hashmin3(x:Vector, w:Double, b:Double, tabHash1:Array[Array[Double]]) = {
var tabHash0 = Array(0.0).tail
val x1 = x.toArray
for( ind <- 0 to tabHash1.size-1) {
var sum = 0.0
for( ind2 <- 0 to x1.size-1) {
sum = sum + (x1(ind2)*tabHash1(ind)(ind2))
}
tabHash0 = tabHash0 :+ (sum+b)/w
}
tabHash0
}
def pow2(tab1:Array[Double], tab2:Array[Double]) = {
var sum = 0.0
for( ind <- 0 to tab1.size-1) {
sum = sum - Math.pow(tab1(ind)-tab2(ind),2)
}
sum
}
val w = ww
val b = Random.nextDouble * w
val tabHash2 = tabHash(nbseg,dim)
var rdd_0 = parsedData.map(x => (x.get_id,(x.get_vector,hashmin3(x.get_vector,w,b,tabHash2)))).cache
var rdd_Yet = rdd_0
for( ind <- 1 to maxIterForYstar ) {
var rdd_dist = rdd_Yet.cartesian(rdd_0).flatMap{ case (x,y) => Some((x._1,(y._2._1,pow2(x._2._2,y._2._2))))}//.coalesce(64)
var rdd_knn = rdd_dist.topByKey(k)(Ordering[(Double)].on(x=>x._2))
var rdd_bary = rdd_knn.map(x=> (x._1,Vectors.dense(bary(x._2,k))))
rdd_Yet = rdd_bary.map(x=>(x._1,(x._2,hashmin3(x._2,w,b,tabHash2))))
}
I tried to broadcast some variables
val w = sc.broadcast(ww.toDouble)
val b = sc.broadcast(Random.nextDouble * ww)
val tabHash2 = sc.broadcast(tabHash(nbseg,dim))
Without any effects
I know that's not the bary function because i tried another version of this code without hashmin3 which works fine with 4 slaves and worse with 8 slaves which is for another topic.
Bad code. Especially for distributed and large computations. I can't fast tell what is root of problem, but you anyway must rewrite this code.
Array is terrible for universal and sharable data. It is mutable and require continious memory allocation (last may be a problem even you have enough of memory). Better use Vector (or List sometimes). Never use arrays, really.
var vechash1 = Array(0.0).tail You create collection with one element, then call function to get empty collection. If it's rare no worry about performance, but it's ugly! var vechash1: Array[Double] = Array() or var vechash1: Vector[Double] = Vector() or var vechash1 = Vector.empty[Double].
def tabHash(nb:Int, dim:Int) = Always set return type of function when it's unclear. Power of scala is rich type system. It's very helpful to have compile time checks (about what you exactly get in result but not what you imagine to get!). It's very important when deal with huge data, because compile checks save your time and money. Also it's more easy to read such code later. def tabHash(nb:Int, dim:Int): Vector[Vector[Double]] =
def hashmin3(x: Vector, typo? it doesn't compile without type parameter.
First function more compact:
def tabHash(nb:Int, dim:Int): Vector[Vector[Double]] = {
(0 to nb-1).map {_ =>
(0 to dim - 1).map(_ => Random.nextGaussian()).toVector
}.toVector
}
Second function is ((x*M) + scalar_b)/scalar_w. It's may be more efficient to use library that is specifically optimized for work with matrices.
Third (i suppose mistake here with sign of computation, if you count square error):
def pow2(tab1:Vector[Double], tab2:Vector[Double]): Double =
tab1.zip(tab2).map{case (t1,t2) => Math.pow(t1 - t2, 2)}.reduce(_ - _)
var rdd_Yet = rdd_0 Cached RDD is rewrited in cycle. So it's useless storage.
Last cycle is hard to analyse. I think it's must be simplified.