Breakdown of reduce function - scala

I currently have:
x.collect()
# Result: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
val a = x.reduce((x,y) => x+1)
# a: Int = 6
val b = x.reduce((x,y) => y + 1)
# b: Int = 12
I have tried to follow what has been said here (http://www.python-course.eu/lambda.php) but still don't quite understand what the individual operations are that lead to these answers.
Could anyone please explain the steps being taken here?
Thank you.

The reason is that the function (x, y) => x + 1 is not associative. reduce requires an associative function. This is necessary to avoid indeterminacy when combining results across different partitions.

You can think of the reduce() method as grabbing two elements from the collection, applying them to a function that results in a new element, and putting that new element back in the collection, replacing the two it grabbed. This is done repeatedly until there is only one element left. In other words, that previous result is going to get re-grabbed until there are no more previous results.
So you can see where (x,y) => x+1 results in a new value (x+1) which would be different from (x,y) => y+1. Cascade that difference through all the iterations and ...

Related

how to accumulate 2D array elements in Scala?

I have 2D array as:
Array(Array(1,1,0), Array(1,0,1))
and I would like to accumulate values over column so my final output look like
Array(Array(1,1,0), Array(2,1,1))
If this is 1D array, I can simply use 'scan' but I'm having trouble with using scan in 2D array.
can anyone help on this issue?
Here's one way to do it:
val t = Array(Array(1,1,0), Array(1,0,1))
val result = t.scanLeft(Array.fill(t(0).length)(0)) ((x,y) =>
x.zip(y).map(e => e._1 + e._2)).drop(1)
//to see the results
result.foreach(e => println(e.toList))
gives:
List(1, 1, 0)
List(2, 1, 1)
The idea is to create an array filled with zeros (using Array.fill) and then scan the 2D array using that as an accumulator. In the end, drop(1) gets rid of the zero-filled array.
EDIT:
In response to the comment, this solution works for a matrix of any size. The zip function takes care of element-wise addition.
EDIT 2 (Step by step explanation):
You already know about scan or a one-dimensional array. The idea is essentially the same.
We initialize the accumulator with zero. In this case, zero means an array of zeros. Array.fill is used to create an array filled with zeros.
Instead of a single addition, we need to add arrays element-wise. This is what the combination of zip and map do. There are a lot of examples available on the Internet about how these methods work.
Finally, we drop the zero element using Scala's drop(1). The result is an array of arrays containing accumulated values.
I would solve it as for given row r, sum all previous rows.
val accumatedMatrix =
for(row <- array.indices)
yield array.take(row + 1).foldLeft(Array(0, 0, 0)) {
case (a, b) => Array(a(0) + b(0), a(1) + b(1), a(2) + b(2))
}
input:
val array = Array(
Array(1,1,0),
Array(1,0,1)
)
output:
1,1,0
2,1,1
Instead of summing all the previous rows repeatedly you can improve it to memoize as well.
Pretty much the same approach as a 1D array:
a.tail.scan(a.head)((acc, value) =>
Array(acc(0) + value(0), acc(1) + value(1), acc(2) + value(2))
)
For arbitrary number (as long as they all have the same number):
a.tail.scan(a.head)((acc, value) =>
acc zip value map {case (a,b) => a+b}
)

Scala Shuffle A List Randomly And repeat it

I want to shuffle a scala list randomly.
I know i can do this by using scala.util.Random.shuffle
But here by calling this i will always get a new set of list. What i really want is that in some cases i want the shuffle to give me the same output always. So how can i achieve that?
Basically what i want to do is to shuffle a list randomly at first and then repeat it in the same order. For the first time i want to generate the list randomly and then based on some parameter repeat the same shuffling.
Use setSeed() to seed the generator before shuffling. Then if you want to repeat a shuffle reuse the seed.
For example:
scala> util.Random.setSeed(41L) // some seed chosen for no particular reason
scala> util.Random.shuffle(Seq(1,2,3,4))
res0: Seq[Int] = List(2, 4, 1, 3)
That shuffled: 1st -> 3rd, 2nd -> 1st, 3rd -> 4th, 4th -> 2nd
Now we can repeat that same shuffle pattern.
scala> util.Random.setSeed(41L) // same seed
scala> util.Random.shuffle(Seq(2,4,1,3)) // result of previous shuffle
res1: Seq[Int] = List(4, 3, 2, 1)
Let a be the seed parameter
Let b be the how many time you want to shuffle
There are two ways to kinda of do this
you can use scala.util.Random.setSeed(a) where 'a' can be any integer so after you finish your shuffling b times you can set the 'a' seed again and then your shuffling will be in the same order as your parameter 'a'
The other way is to shuffle List(1,2,3,...a) == 1 to a b times save that as a nested list or vector and then you can map it to your iterable
val arr = List(Bob, Knight, John)
val randomer = (0 to b).map(x => scala.util.Random.shuffle((0 to arr.size))
randomer.map(x => x.map(y => arr(y)))
You can use the same randomer for you other list you want to shuffle by mapping it

Scala function, unexpected behaviour

I have the following Scala code snippet:
(1 to 10).foreach(a => (1 to 100 by 3).toList.count(b => b % a == 0))
which, I would expect to behave like the following:
Create a list of multiple of 3 less than 100
For each item in the list previously generated, count how many multiples of 1, 2, 3... 10 there are
But, when I run the snippet, I get an empty list. What am I doing wrong?
Thanks for your help!
The behavior is totally expect when using foreach.
foreach takes a procedure — a function with a result type Unit — as the right operand. It simply applies the procedure to each List element. The result of the operation is again Unit; no list of results is assembled.
It's typically used for its side effects — something like printing or saving into a database, etc.
You ought using map instead :
scala> (1 to 10).map(a => (1 to 100 by 3).toList.count(b => b % a == 0))
// res3: scala.collection.immutable.IndexedSeq[Int] = Vector(34, 17, 0, 9, 7, 0, 5, 4, 0, 4)

Is it possible to update a variable in foldLeft function based on a condition?

I am trying to write scala code which gives maximum sum from contiguous sub-array from the given array. For example, val arr= Array(-2, -3, 4, -1, -2, 1, 5, -3). In this array I need to get maximum contiguous sub-array sum i.e 4+(-1)+(-2)+(1)+5 = 7. I wrote following code for getting this result.
scala> arr.foldLeft(0) { (currsum,newnum) => if((currsum+newnum)<0) 0 else { if(currsum<(currsum+newnum)) (currsum+newnum) else currsum }}
res5: Int = 10
but deviated from actual result as I am unable to update the maximum_so_farvalue as the counting/summation goes on. Since I have used foldLeft to do this functionality, is there any possibility to update the maximum_so_farvariable only when sum of contiguous sub array elements is greater than previous max_sum?
reference link for better understanding of scenario
Well, obviously you have to propagate two values along your input data for this computation, just as you would need to do in the imperative case:
arr.foldLeft((0,0)){
case ((maxSum, curSum), value) => {
val newSum = Math.max(0, curSum + value)
(Math.max(maxSum, newSum), newSum)
}
}._1
An other way would be to compute the intermediate results (lazily if you want) and then select the maximum:
arr.toIterator.scanLeft(0){
case (curSum, value) =>
Math.max(0, curSum + value)
}.max

Scala: what is the most appropriate data structure for sorted subsets?

Given a large collection (let's call it 'a') of elements of type T (say, a Vector or List) and an evaluation function 'f' (say, (T) => Double) I would like to derive from 'a' a result collection 'b' that contains the N elements of 'a' that result in the highest value under f. The collection 'a' may contain duplicates. It is not sorted.
Maybe leaving the question of parallelizability (map/reduce etc.) aside for a moment, what would be the appropriate Scala data structure for compiling the result collection 'b'? Thanks for any pointers / ideas.
Notes:
(1) I guess my use case can be most concisely expressed as
val a = Vector( 9,2,6,1,7,5,2,6,9 ) // just an example
val f : (Int)=>Double = (n)=>n // evaluation function
val b = a.sortBy( f ).take( N ) // sort, then clip
except that I do not want to sort the entire set.
(2) one option might be an iteration over 'a' that fills a TreeSet with 'manual' size bounding (reject anything worse than the worst item in the set, don't let the set grow beyond N). However, I would like to retain duplicates present in the original set in the result set, and so this may not work.
(3) if a sorted multi-set is the right data structure, is there a Scala implementation of this? Or a binary-sorted Vector or Array, if the result set is reasonably small?
You can use a priority queue:
def firstK[A](xs: Seq[A], k: Int)(implicit ord: Ordering[A]) = {
val q = new scala.collection.mutable.PriorityQueue[A]()(ord.reverse)
val (before, after) = xs.splitAt(k)
q ++= before
after.foreach(x => q += ord.max(x, q.dequeue))
q.dequeueAll
}
We fill the queue with the first k elements and then compare each additional element to the head of the queue, swapping as necessary. This works as expected and retains duplicates:
scala> firstK(Vector(9, 2, 6, 1, 7, 5, 2, 6, 9), 4)
res14: scala.collection.mutable.Buffer[Int] = ArrayBuffer(6, 7, 9, 9)
And it doesn't sort the complete list. I've got an Ordering in this implementation, but adapting it to use an evaluation function would be pretty trivial.