How can I get a sum of arrays of tuples in scala - scala

I have a simple array of tuples
val arr = Array((1,2), (3,4),(5,6),(7,8),(9,10))
I wish to get (1+3+5+7+9, 2+4+6+8+10) tuple as the answer
What is the best way to get the sum as tuples, similar to regular arrays. I tried
val res = arr.foldLeft(0,0)(_ + _)
This does not work.
Sorry about not writing the context. I was using it in scalding with algebird. Algebird allows sums of tuples and I assumed this would work. That was my mistake.

There is no such thing as Tuple addition, so that can't work. You would have to operate on each ordinate of the Tuple:
val res = arr.foldLeft(0,0){ case (sum, next) => (sum._1 + next._1, sum._2 + next._2) }
res: (Int, Int) = (25,30)

This should work nicely:
arr.foldLeft((0,0)){ case ((a0,b0),(a1,b1)) => (a0+a1,b0+b1) }
Addition isn't defined for tuples.

Use scalaz, which defines a tuple as a semigroup, allowing you to use the append operator |+|
import scalaz._
import Scalaz._
arr.fold((0,0))(_ |+| _)

Yet another alternative
val (a, b) = arr.unzip
//> a : Array[Int] = Array(1, 3, 5, 7, 9)
//| b : Array[Int] = Array(2, 4, 6, 8, 10)
(a.sum, b.sum)
//> res0: (Int, Int) = (25,30)

Related

scala map function of map vs. list

Snippet 1:
val l = List(1,2,43,4)
l.map(i => i *2)
Snippet 2:
val s = "dsadadaqer12"
val g = s.groupBy(c=>c)
g.map ( {case (c,s) => (c,s.length)})
In snippet #2, the syntax different than #1 , i.e. curly braces required -- why?
I thought the following would compile, but it does not:
g.map ( (c,s) => (c,s.length))
Can someone explain why?
Thanks
The difference between the two is - the latter uses Pattern Matching and the former doesn't.
The syntax g.map({case (c,s) => (c,s.length)}) is just syntax sugar for:
g.map(v => v match { case (c,s) => (c,s.length) })
Which means: we name the input argument of our anonymous function v, and then in the function body we match it to a tuple (c,s). Since this is so useful, Scala provides the shorthand version you used.
Of course - this doesn't really have anything to do with whether you use a Map or a List - consider all the following possibilities:
scala> val l = List(1,2,43,4)
l: List[Int] = List(1, 2, 43, 4)
scala> l.map({ case i => i*2 })
res0: List[Int] = List(2, 4, 86, 8)
scala> val l2 = List((1,2), (3,4))
l2: List[(Int, Int)] = List((1,2), (3,4))
scala> l2.map({ case (i, j) => i*j })
res1: List[Int] = List(2, 12)
scala> val g = Map(1 -> 2, 3 -> 4)
g: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 3 -> 4)
scala> g.map(t => t._1 * t._2)
res2: scala.collection.immutable.Iterable[Int] = List(2, 12)
Both Map and List can use both syntax options, depending mostly on what you actually want to do.
1- g.map{case (c,s) => (c,s.length)}
2- g.map((c,s) => (c,s.length))
The map method pulls a single argument, a 2-tuple, from the g collection. The 1st example compiles because the case statement uses pattern matching to extract the tuple's elements whereas the 2nd example doesn't and it won't compile. For that you'd have to do something like: g.map(t => (t._1, t._2.length))
As for the parenthesis vs. curly braces: braces have always been required for "partial functions," which is what that case statement is. You can use either braces or parens for anonymous functions (i.e. x => ...) although you are required to use braces if the function is more than a single line (i.e. has a carriage-return).
I read somewhere that this parens/braces distinction might be relaxed but I don't know if that's going to happen any time soon.

Spark RDD tuple transformation

I'm trying to transform an RDD of tuple of Strings of this format :
(("abc","xyz","123","2016-02-26T18:31:56"),"15") TO
(("abc","xyz","123"),"2016-02-26T18:31:56","15")
Basically seperating out the timestamp string as a seperate tuple element. I tried following but it's still not clean and correct.
val result = rdd.map(r => (r._1.toString.split(",").toVector.dropRight(1).toString, r._1.toString.split(",").toList.last.toString, r._2))
However, it results in
(Vector(("abc", "xyz", "123"),"2016-02-26T18:31:56"),"15")
The expected output I'm looking for is
(("abc", "xyz", "123"),"2016-02-26T18:31:56","15")
This way I can access the elements using r._1, r._2 (the timestamp string) and r._3 in a seperate map operation.
Any hints/pointers will be greatly appreciated.
Vector.toString will include the String 'Vector' in its result. Instead, use Vector.mkString(",").
Example:
scala> val xs = Vector(1,2,3)
xs: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3)
scala> xs.toString
res25: String = Vector(1, 2, 3)
scala> xs.mkString
res26: String = 123
scala> xs.mkString(",")
res27: String = 1,2,3
However, if you want to be able to access (abc,xyz,123) as a Tuple and not as a string, you could also do the following:
val res = rdd.map{
case ((a:String,b:String,c:String,ts:String),d:String) => ((a,b,c),ts,d)
}

Scala List Operation

Given a List of Int and variable X of Int type . What is the best in Scala functional way to retain only those values in the List (starting from beginning of list) such that sum of list values is less than equal to variable.
This is pretty close to a one-liner:
def takeWhileLessThan(x: Int)(l: List[Int]): List[Int] =
l.scan(0)(_ + _).tail.zip(l).takeWhile(_._1 <= x).map(_._2)
Let's break that into smaller pieces.
First you use scan to create a list of cumulative sums. Here's how it works on a small example:
scala> List(1, 2, 3, 4).scan(0)(_ + _)
res0: List[Int] = List(0, 1, 3, 6, 10)
Note that the result includes the initial value, which is why we take the tail in our implementation.
scala> List(1, 2, 3, 4).scan(0)(_ + _).tail
res1: List[Int] = List(1, 3, 6, 10)
Now we zip the entire thing against the original list. Taking our example again, this looks like the following:
scala> List(1, 2, 3, 4).scan(0)(_ + _).tail.zip(List(1, 2, 3, 4))
res2: List[(Int, Int)] = List((1,1), (3,2), (6,3), (10,4))
Now we can use takeWhile to take as many values as we can from this list before the cumulative sum is greater than our target. Let's say our target is 5 in our example:
scala> res2.takeWhile(_._1 <= 5)
res3: List[(Int, Int)] = List((1,1), (3,2))
This is almost what we want—we just need to get rid of the cumulative sums:
scala> res2.takeWhile(_._1 <= 5).map(_._2)
res4: List[Int] = List(1, 2)
And we're done. It's worth noting that this isn't very efficient, since it computes the cumulative sums for the entire list, etc. The implementation could be optimized in various ways, but as it stands it's probably the simplest purely functional way to do this in Scala (and in most cases the performance won't be a problem, anyway).
In addition to Travis' answer (and for the sake of completeness), you can always implement these type of operations as a foldLeft:
def takeWhileLessThanOrEqualTo(maxSum: Int)(list: Seq[Int]): Seq[Int] = {
// Tuple3: the sum of elements so far; the accumulated list; have we went over x, or in other words are we finished yet
val startingState = (0, Seq.empty[Int], false)
val (_, accumulatedNumbers, _) = list.foldLeft(startingState) {
case ((sum, accumulator, finished), nextNumber) =>
if(!finished) {
if (sum + nextNumber > maxSum) (sum, accumulator, true) // We are over the sum limit, finish
else (sum + nextNumber, accumulator :+ nextNumber, false) // We are still under the limit, add it to the list and sum
} else (sum, accumulator, finished) // We are in a finished state, just keep iterating over the list
}
accumulatedNumbers
}
This only iterates over the list once, so it should be more efficient, but is more complicated and requires a bit of reading code to understand.
I will go with something like this, which is more functional and should be efficient.
def takeSumLessThan(x:Int,l:List[Int]): List[Int] = (x,l) match {
case (_ , List()) => List()
case (x, _) if x<= 0 => List()
case (x, lh :: lt) => lh :: takeSumLessThan(x-lh,lt)
}
Edit 1 : Adding tail recursion and implicit for shorter call notation
import scala.annotation.tailrec
implicit class MyList(l:List[Int]) {
def takeSumLessThan(x:Int) = {
#tailrec
def f(x:Int,l:List[Int],acc:List[Int]) : List[Int] = (x,l) match {
case (_,List()) => acc
case (x, _ ) if x <= 0 => acc
case (x, lh :: lt ) => f(x-lh,lt,acc ++ List(lh))
}
f(x,l,Nil)
}
}
Now you can use this like
List(1,2,3,4,5,6,7,8).takeSumLessThan(10)

What does map in Scala do

Can somebody explain what does map on Lists exactly do in Scala?
For example the following line of code:
map(row => row(column))
map does transformation by applying a function to each element, your example is hard to read without more code, simple example is
scala> val l = List(1,2,3)
scala> l.map( x => x*2 )
res1: List[Int] = List(2, 4, 6)

Index with Many Indices

Is there a quick scala idiom to have retrieve multiple elements of a a traversable using indices.
I am looking for something like
val L=1 to 4 toList
L(List(1,2)) //doesn't work
I have been using map so far, but wondering if there was a more "scala" way
List(1,2) map {L(_)}
Thanks in advance
Since a List is a Function you can write just
List(1,2) map L
Although, if you're going to be looking things up by index, you should probably use an IndexedSeq like Vector instead of a List.
You could add an implicit class that adds the functionality:
implicit class RichIndexedSeq[T](seq: IndexedSeq[T]) {
def apply(i0: Int, i1: Int, is: Int*): Seq[T] = (i0+:i1+:is) map seq
}
You can then use the sequence's apply method with one index or multiple indices:
scala> val data = Vector(1,2,3,4,5)
data: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)
scala> data(0)
res0: Int = 1
scala> data(0,2,4)
res1: Seq[Int] = ArrayBuffer(1, 3, 5)
You can do it with a for comprehension but it's no clearer than the code you have using map.
scala> val indices = List(1,2)
indices: List[Int] = List(1, 2)
scala> for (index <- indices) yield L(index)
res0: List[Int] = List(2, 3)
I think the most readable would be to implement your own function takeIndices(indices: List[Int]) that takes a list of indices and returns the values of a given List at those indices. e.g.
L.takeIndices(List(1,2))
List[Int] = List(2,3)