Vector product using map and reduce in scala - scala

I'm trying to calculate the vector product between two vector using the map and reduce functions.
Let's see what happens in the REPL of Scala:
First of all I define 2 vectors with same length
scala> val v1 = Array(1,4,5,2)
v1: Array[Int] = Array(1, 4, 5, 2)
scala> val v2 = Array (3,5,1,5)
v2: Array[Int] = Array(3, 5, 1, 5)
Now I create a new array vecZip using the zip function
scala> val vecZip = v1 zip v2
vecZip: Array[(Int, Int)] = Array((1,3), (4,5), (5,1), (2,5))
Now I'd like to apply the reduce method
(to do the product of each tuple) for each element of this array.
I thought this:
val vecToSum = vecZip.map(x=>(List(x).reduce(_*_)))
I want to get a list (vecToSum) where apply the reduce method to calculate the total result. However I get this error:
scala> val vecToSum = vecZip.map(x=>(List(x).reduce(_*_)))
<console>:10: error: value * is not a member of (Int, Int)
val vecToSum = vecZip.map(x=>(List(x).reduce(_*_)))
^

You just need to call map and multiply the tuples values with each other, like this:
val vecToSum = vecZip.map(x => x._1 * x._2)
vecToSum is a List of tuples, so x is a Tuple of (Int, Int). Therefore if you call List(x).reduce(...), you're creating a List with the only value being the tuple, so that's not really what you want.

What your code is actually trying to do is it creates a list of a single tuple element, and then tries to reduce it. It would never work this way, as there is nothing to reduce - there is already single element in a list - a tuple.
Instead you need to map your vecZip array elements (tuples) via multiplying their elements:
vecZip.map { case (x, y) => x * y }

You don't need to reduce here. Reducing an Array[(Int, Int)] would mean performing some associative binary operation on all tuples inside the array. Note that it could be performing the operation on the first couple of tuples, then on the result of that and the third tuple, then on the result of that and the fourth tuple etc. but also, due to associativity, it could perform the operation on first and second tuple, simultaneously on third and fourth tuple, and then on their results etc., which is nice for parallelization (and frameworks such as Spark rely on it heavily)).
For example you could sum all first elements and all second elements of each tuple:
val reduced = vecZip.reduce((pair1, pair2) => (pair1._1 + pair2._1, pair1._2 + pair2._2))
What you want however is to simply map each tuple into the product of its elements:
val vecToSum = vecZip.map { case (x, y) => x * y }
Note that I used the partial function (see that case over there) in order to perform pattern matching on the tuple; without the partial function it would look like this:
val vecToSum = vecZip.map(tuple => tuple._1 * tuple._2)

Related

Scala - create a new list and update particular element from existing list

I am new to Scala and new OOP too. How can I update a particular element in a list while creating a new list.
val numbers= List(1,2,3,4,5)
val result = numbers.map(_*2)
I need to update third element only -> multiply by 2. How can I do that by using map?
You can use zipWithIndex to map the list into a list of tuples, where each element is accompanied by its index. Then, using map with pattern matching - you single out the third element (index = 2):
val numbers = List(1,2,3,4,5)
val result = numbers.zipWithIndex.map {
case (v, i) if i == 2 => v * 2
case (v, _) => v
}
// result: List[Int] = List(1, 2, 6, 4, 5)
Alternatively - you can use patch, which replaces a sub-sequence with a provided one:
numbers.patch(from = 2, patch = Seq(numbers(2) * 2), replaced = 1)
I think the clearest way of achieving this is by using updated(index: Int, elem: Int). For your example, it could be applied as follows:
val result = numbers.updated(2, numbers(2) * 2)
list.zipWithIndex creates a list of pairs with original element on the left, and index in the list on the right (indices are 0-based, so "third element" is at index 2).
val result = number.zipWithIndex.map {
case (n, 2) => n*2
case n => n
}
This creates an intermediate list holding the pairs, and then maps through it to do your transformation. A bit more efficient approach is to use iterator. Iterators a 'lazy', so, rather than creating an intermediate container, it will generate the pairs one-by-one, and send them straight to the .map:
val result = number.iterator.zipWithIndex.map {
case (n, 2) => n*2
case n => n
}.toList
1st and the foremost scala is FOP and not OOP. You can update any element of a list through the keyword "updated", see the following example for details:
Signature :- updated(index,value)
val numbers= List(1,2,3,4,5)
print(numbers.updated(2,10))
Now here the 1st argument is the index and the 2nd argument is the value. The result of this code will modify the list to:
List(1, 2, 10, 4, 5).

Filtering RDDs based on value of Key

I have two RDDs that wrap the following arrays:
Array((3,Ken), (5,Jonny), (4,Adam), (3,Ben), (6,Rhonda), (5,Johny))
Array((4,Rudy), (7,Micheal), (5,Peter), (5,Shawn), (5,Aaron), (7,Gilbert))
I need to design a code in such a way that if I provide input as 3 I need to return
Array((3,Ken), (3,Ben))
If input is 6, output should be
Array((6,Rhonda))
I tried something like this:
val list3 = list1.union(list2)
list3.reduceByKey(_+_).collect
list3.reduceByKey(6).collect
None of these worked, can anyone help me out with a solution for this problem?
Given the following that you would have to define yourself
// Provide you SparkContext and inputs here
val sc: SparkContext = ???
val array1: Array[(Int, String)] = ???
val array2: Array[(Int, String)] = ???
val n: Int = ???
val rdd1 = sc.parallelize(array1)
val rdd2 = sc.parallelize(array2)
You can use the union and filter to reach your goal
rdd1.union(rdd2).filter(_._1 == n)
Since filtering by key is something that you would probably want to do in several occasions, it makes sense to encapsulate this functionality in its own function.
It would also be interesting if we could make sure that this function could work on any type of keys, not only Ints.
You can express this in the old RDD API as follows:
def filterByKey[K, V](rdd: RDD[(K, V)], k: K): RDD[(K, V)] =
rdd.filter(_._1 == k)
You can use it as follows:
val rdd = rdd1.union(rdd2)
val filtered = filterByKey(rdd, n)
Let's look at this method a little bit more in detail.
This method allows to filterByKey and RDD which contains a generic pair, where the type of the first item is K and the type of the second type is V (from key and value). It also accepts a key of type K that will be used to filter the RDD.
You then use the filter function, that takes a predicate (a function that goes from some type - in this case K - to a Boolean) and makes sure that the resulting RDD only contains items that respect this predicate.
We could also have written the body of the function as:
rdd.filter(pair => pair._1 == k)
or
rdd.filter { case (key, value) => key == k }
but we took advantage of the _ wildcard to express the fact that we want to act on the first (and only) parameter of this anonymous function.
To use it, you first parallelize your RDDs, call union on them and then invoke the filterByKey function with the number you want to filter by (as shown in the example).

Calculate sliding durations in scala

I have a list of Tuples and a datum like below
val datum =("R",89)
val dataList = Seq(("R",91),("R",95),("X",96),("S",98))
I want to calculate the duration between elements in the list , starting with the datum so the result would be
res0:> Seq(("R",7) , ("X",2)) //R - 96-89 , X - 98-96
Things I have tried are not functional
a) I used a sliding on the list and used a pattern match with an accumulator to hold the values. This used a Boolean and a listBuffer to keep adding values into the list
b) Used a map function with an accumulator tuple with a pattern match for the tuple , compare the _1 values and when the values change compare reset the accumulator and collect the result of the subtraction
I was imagining if foldLeft or fold functions could be used to make it more "functional". In this case we would have .foldLeft(List()) as the initial value and then write a map function that takes in 2 tuples and compare manually possibly with a flag as well.
Any pointers as to how this can be made "functional"
b) Used a map function that
Here's what you can do
first create a function that takes datum, dataList and empty list (which will be the final list). And you iterate through the function using your logic.
def func(x : Tuple2[String, Int], y : Seq[Tuple2[String, Int]], z: List[Tuple2[String, Int]]) : List[Tuple2[String, Int]] = y match {
case (a::b) => if(x._1 == a._1) func(x, b, z) else func(a, b, z :+ (x._1, a._2-x._2))
case Nil => z
}
Thats all now just call the function
val finalTuples = func(datum, dataList, List.empty[Tuple2[String, Int]])
finalTuples is finalTuples: List[(String, Int)] = List((R,7), (X,2))

Functional style in Scala to collect results from a method

I have two lists that I zip and go through the zipped result and call a function. This function returns a List of Strings as response. I now want to collect all the responses that I get and I do not want to have some sort of buffer that would collect the responses for each iteration.
seq1.zip(seq2).foreach((x: (Obj1, Obj1)) => {
callMethod(x._1, x._2) // This method returns a Seq of String when called
}
What I want to avoid is to create a ListBuffer and keep collecting it. Any clues to do it functionally?
Why not use map() to transform each input into a corresponding output ? Here's map() operating in a simple scenario:
scala> val l = List(1,2,3,4,5)
scala> l.map( x => x*2 )
res60: List[Int] = List(2, 4, 6, 8, 10)
so in your case it would look something like:
seq1.zip(seq2).map((x: (Obj1, Obj1)) => callMethod(x._1, x._2))
Given that your function returns a Seq of Strings, you could use flatMap() to flatten the results into one sequence.

How to sum the corresponding values in the List into a Tuple?

I have a list details of this type :
case class Detail(point: List[Double], cluster: Int)
val details = List(Detail(List(2.0, 10.0),1), Detail(List(2.0, 5.0),3),
Detail(List(8.0, 4.0),2), Detail(List(5.0, 8.0),2))
I want filter this list into a tuple which contains a sum of each corresponding point where the cluster is 2
So I filter this List :
details.filter(detail => detail.cluster == 2)
which returns :
List(Detail(List(8.0, 4.0),2), Detail(List(5.0, 8.0),2))
It's the summing of the corresponding values I'm having trouble with. In this example the tuple should contain (8+5, 4+8) = (13, 12)
I'm thinking to flatten the List and then sum each corresponding value but
List(details).flatten
just returns the same List
How to sum the corresponding values in the List into a Tuple ?
I could achieve this easily using a for loop and just extract the details I need into a counter but what is the functional solution ?
What do you want to happen if the lists for different Details have different lengths?
Or same length which is different from 2? Tuples are generally only used when you need a fixed in advance number of elements; you won't even be able to write a return type if you need tuples of different lengths.
Assuming that all of them are lists of the same length and you get a list in return, something like this should work (untested):
details.filter(_.cluster == 2).map(_.point).transpose.map(_.sum)
I.e. first get all points as a list of lists, transpose it so you get a list for each "coordinate", and sum each of these lists.
If you do know that each point has two coordinates, this should likely be reflected in your Point type, by using (Double, Double) instead of List[Double] and you can just fold over the list of points, which should be a bit more efficient. Look at definition of foldLeft and the standard implementation of sum in terms of foldLeft:
def sum(list: List[Int]): Int = list.foldLeft(0)((acc, x) => acc + x)
and it should be easy to do what you want.
You can use just one foldLeft with PF without filter:
details.foldLeft((0.0,0.0))({
case ((accX, accY), Detail(x :: y :: Nil, 2)) => (accX + x, accY + y)
case (acc, _) => acc
})
res1: (Double, Double) = (13.0,12.0)