Need pointers for optimization of Merge Sort implementation in Scala - scala

I have just started learning Scala and sideways I am doing some algorithms also. Below is an implementation of merge sort in Scala. I know it isn't very "scala" in nature, and some might even reckon that I have tried to write java in scala. I am not totally familiar with scala, i just know some basic syntax and i keep googling if i need something more. So please give me some pointers on to what can i do in this code to make it more functional and in accord with scala conventions and best practices. Please dont just give correct/optimized code, i will like to do it myself. Any suggestions are welcomed !
def mergeSort(list: Array[Int]): Array[Int] = {
val len = list.length
if (len == 1) list
else {
var x, y = new Array[Int](len / 2)
val z = new Array[Int](len)
Array.copy(list, 0, x, 0, len / 2)
Array.copy(list, len / 2, y, 0, len / 2)
x = mergeSort(x)
y = mergeSort(y)
var i, j = 0
for (k <- 0 until len) {
if (j >= y.length || (i < x.length && x(i) < y(j))) {
z(k) = x(i)
i = i + 1
} else {
z(k) = y(j)
j = j + 1
}
}
z
}
}
[EDIT]
This code works fine and I have assumed for now that input array will always be of even length.
UPDATE
Removed vars x and y
def mergeSort(list: Array[Int]): Array[Int] = {
val len = list.length
if (len == 1) list
else {
val z = new Array[Int](len)
val x = mergeSort(list.dropRight(len/2))
val y = mergeSort(list.drop(len/2))
var i, j = 0
for (k <- 0 until len) {
if (j >= y.length || (i < x.length && x(i) < y(j))) {
z(k) = x(i)
i = i + 1
} else {
z(k) = y(j)
j = j + 1
}
}
z
}
}

Removing the var x,y = ... would be a good start to being functional. Prefer immutability to mutable datasets.
HINT: a method swap that takes two values and returns them ordered using a predicate
Also consider removing the for loop(or comprehension).

Related

Facing Issues in Recursion of Perfect Number Problem

I've been working on the scala recursion problem. I used to develop the program using loops and then use the concept of recursion to convert the existing loop problem in a recursive solution.
So I have written the following code to find the perfect number using loops.
def isPerfect(n: Int): Boolean = {
var sum = 1
// Find all divisors and add them
var i = 2
while ( {
i * i <= n
}) {
if (n % i == 0) if (i * i != n) sum = sum + i + n / i
else sum = sum + i
i += 1
}
// If sum of divisors is equal to
// n, then n is a perfect number
if (sum == n && n != 1) return true
false
}
Here is my attempt to convert it into a recursive solution. But I'm getting the incorrect result.
def isPerfect(n: Int): Boolean = {
var sum = 1
// Find all divisors and add them
var i = 2
def loop(i:Int, n:Int): Any ={
if(n%i == 0) if (i * i != n) return sum + i + n / i
else
return loop(i+1, sum+i)
}
val sum_ = loop(2, n)
// If sum of divisors is equal to
// n, then n is a perfect number
if (sum_ == n && n != 1) return true
false
}
Thank you in advance.
Here is a tail-recursive solution
def isPerfectNumber(n: Int): Boolean = {
#tailrec def loop(d: Int, acc: List[Int]): List[Int] = {
if (d == 1) 1 :: acc
else if (n % d == 0) loop(d - 1, d :: acc)
else loop(d - 1, acc)
}
loop(n-1, Nil).sum == n
}
As a side-note, functions that have side-effects such as state mutation scoped locally are still considered pure functions as long as the mutation is not visible externally, hence having while loops in such functions might be acceptable.

scala: Loop through a file to read 20 bytes at a time and blank out bytes at 3rd position

I have a code snippet in java that loops through the file byte by byte and blanks out byte at 3rd position on every 20 bytes. This is done using for each loop.
logic:
for(byte b: raw){
if (pos is 3) b = 32;
if (i > 20) i = 0;
i++
}
Since I am learning scala, I would like to know if there is a better way of looping byte by byte in scala.
I have read into byte array as below in scala:
val result = IOUtils.toByteArray(new FileInputStream (new File(fileDir)))
Thanks.
Here is a diametrically opposite solution to that of Tzach Zohar:
def parallel(ba: Array[Byte], blockSize: Int = 2048): Unit = {
val n = ba.size
val numJobs = (n + blockSize - 1) / blockSize
(0 until numJobs).par.foreach { i =>
val startIdx = i * blockSize
val endIdx = n min ((i + 1) * blockSize)
var j = startIdx + ((3 - startIdx) % 20 + 20) % 20
while (j < endIdx) {
ba(j) = 32
j += 20
}
}
}
You see a lot of mutable variables, scary imperative while-loops, and some strange tricks with modular arithmetic. That's actually not idiomatic Scala at all. But the interesting thing about this solution is that it processes blocks of the byte array in parallel. I've compared the time needed by this solution to your naive solution, using various block sizes:
Naive: 38.196
Parallel( 16): 11.676000
Parallel( 32): 7.260000
Parallel( 64): 4.311000
Parallel( 128): 2.757000
Parallel( 256): 2.473000
Parallel( 512): 2.462000
Parallel(1024): 2.435000
Parallel(2048): 2.444000
Parallel(4096): 2.416000
Parallel(8192): 2.420000
At least in this not very thorough microbenchmark (1000 repetitions on 10MB array), the more-or-less efficiently implemented parallel version outperformed the for-loop in your question by factor 15x.
The question is now: What do you mean by "better"?
My proposal was slightly faster than your naive approach
#TzachZohar's functional solution could generalize better should the
code be moved on a cluster like Apache Spark.
I would usually prefer something closer to #TzachZohar's solution, because it's easier to read.
So, it depends on what you are optimizing for: performance? generality? readability? maintainability? For each notion of "better", you could get a different answer. I've tried to optimize for performance. #TzachZohar optimized for readability and maintainability. That lead to two rather different solutions.
Full code of the microbenchmark, just in case someone is interested:
val array = Array.ofDim[Byte](10000000)
def naive(ba: Array[Byte]): Unit = {
var pos = 0
for (i <- 0 until ba.size) {
if (pos == 3) ba(i) = 32
pos += 1
if (pos == 20) pos = 0
}
}
def parallel(ba: Array[Byte], blockSize: Int): Unit = {
val n = ba.size
val numJobs = (n + blockSize - 1) / blockSize
(0 until numJobs).par.foreach { i =>
val startIdx = i * blockSize
val endIdx = n min ((i + 1) * blockSize)
var j = startIdx + ((3 - startIdx) % 20 + 20) % 20
while (j < endIdx) {
ba(j) = 32
j += 20
}
}
}
def measureTime[U](repeats: Long)(block: => U): Double = {
val start = System.currentTimeMillis
var iteration = 0
while (iteration < repeats) {
iteration += 1
block
}
val end = System.currentTimeMillis
(end - start).toDouble / repeats
}
println("Basic sanity check (did I get the modulo arithmetic right?):")
{
val testArray = Array.ofDim[Byte](50)
naive(testArray)
println(testArray.mkString("[", ",", "]"))
}
{
for (blockSize <- List(3, 7, 13, 16, 17, 32)) {
val testArray = Array.ofDim[Byte](50)
parallel(testArray, blockSize)
println(testArray.mkString("[", ",", "]"))
}
}
val Reps = 1000
val naiveTime = measureTime(Reps)(naive(array))
println("Naive: " + naiveTime)
for (blockSize <- List(16,32,64,128,256,512,1024,2048,4096,8192)) {
val parallelTime = measureTime(Reps)(parallel(array, blockSize))
println("Parallel(%4d): %f".format(blockSize, parallelTime))
}
Here's one way to do this:
val updated = result.grouped(20).flatMap { arr => arr.update(3, 32); arr }

Merge for mergesort in scala

I'm migrating from Java to Scala and I am trying to come up with the procedure merge for mergesort algorithm. My solution:
def merge(src: Array[Int], dst: Array[Int], from: Int,
mid: Int, until: Int): Unit = {
/*
* Iteration of merge:
* i - index of src[from, mid)
* j - index of src[mid, until)
* k - index of dst[from, until)
*/
#tailrec
def loop(i: Int, j: Int, k: Int): Unit = {
if (k >= until) {
// end of recursive calls
} else if (i >= mid) {
dst(k) = src(j)
loop(i, j + 1, k + 1)
} else if (j >= until) {
dst(k) = src(j)
loop(i + 1, j, k + 1)
} else if (src(i) <= src(j)) {
dst(k) = src(i);
loop(i + 1, j, k + 1)
} else {
dst(k) = src(j)
loop(i, j + 1, k + 1)
}
}
loop(from, mid, from)
}
seems to work, but it seems to me that it is written in quite "imperative" style
(despite i have used recursion and no mutable variables except for the arrays, for which the side effect is intended). I want something like this:
/*
* this code is not working and at all does the wrong things
*/
for (i <- (from until mid); j <- (mid until until);
k <- (from until until) if <???>) yield dst(k) = src(<???>)
But i cant come up with the proper solution of such kind. Can you please help me?
Consider this:
val left = src.slice(from, mid).buffered
val right = src.slice(mid, until).buffered
(from until until) foreach { k =>
dst(k) = if(!left.hasNext) right.next
else if(!right.hasNext || left.head < right.head) left.next
else right.next
}

Best purely functional alternative to a while loop

Is the a better functional idiom alternative to the code below? ie Is there a neater way to get the value j without having to use a var?
var j = i + 1
while (j < idxs.length && idxs(j) == x) j += 1
val j = idxs.drop(i).indexWhere(_ != x) + i
Or, as suggested by #kosii in the comments, use the indexWhere overload that takes an index from where to start searching:
val j = idxs.indexWhere(_ != x, i)
Edit
Since j must equal the length of idxs in case all items following i are equal to x:
val index = idxs.indexWhere(_ != x, i)
val j = if(index < 0) idxs.length else index
// or
val j = if (idxs.drop(i).forall(_ == x)) idxs.length
else idxs.indexWhere(_ != x, i)
Maybe with streams, something like:
((i + 1) to idxs.length).toStream.takeWhile(j => idxs(j) == x).last

Scala - Most elegant way of initialising values inside array that's already been declared?

I have a 3d array defined like so:
val 3dArray = new Array[Array[Array[Int]]](512, 8, 8)
In Javascript I would do the following to assign each element to 1:
for (i = 0; i < 512; i++)
{
3dArray[i] = [];
for (j = 0; j < 8; j++)
{
3dArray[i][j] = [];
for (k = 0; k < 8; k++)
{
3dArray[i][j][k] = 1;
}
}
}
What's the most elegant approach to doing the same?
Not sure there's a particularly elegant way to do it, but here's one way (I use suffix s to indicate dimension, i.e. xss is a two-dimensional array).
val xsss = Array.ofDim[Int](512, 8, 8)
for (xss <- xsss; xs <- xss; i <- 0 until 8)
xs(i) = 1
Or, using transform it gets even shorter:
for (xss <- xsss; xs <- xss)
xs transform (_ => 1)
for {
i <- a.indices
j <- a(i).indices
k <- a(i)(j).indices
} a(i)(j)(k) = 1
or
for {
e <- a
ee <- e
i <- ee.indices
} ee(i) = 1
See: http://www.scala-lang.org/api/current/index.html#scala.Array$
You have Array.fill to initialize an array of 1 to 5 dimension to some given value, and Array.tabulate to initialize an array of 1 to 5 dimension given the current indexes:
scala> Array.fill(2,1,1)(42)
res1: Array[Array[Array[Int]]] = Array(Array(Array(42)), Array(Array(42)))
enter code here
scala> Array.tabulate(3,2,1){ (x,y,z) => x+y+z }
res2: Array[Array[Array[Int]]] = Array(Array(Array(0), Array(1)), Array(Array(1), Array(2)), Array(Array(2), Array(3)))