How to access to a value of a scala Tuples - scala

I have a sequence of tuples that with a value and his power 2:
val fields3: Seq[(Int, Int)] = Seq((3, 9), (5, 25))
the thing that I want to know is if there is a way to access to a value of the same tuple directly when I create the object whithout use a foreach:
val fields3: Seq[(Int, Int)] = Seq((3, 3 * 3 ), (5, 5 * 5))
my idea is something like:
val fields3: Seq[(Int, Int)] = Seq((3, _1 * _1 ), (5, _1 * _1)) //like this doesn't compile

You can do something like this:
Seq(2,3,4).map(i => (i, i*i))

You could wrap the tuple in a case class potentially:
case class TupleInt(base: Int) {
val tuple: (Int, Int) = (base, base*base)
}
Then you could create the sequence like this:
val fields3: Seq[(Int, Int)] = Seq(TupleInt(3), TupleInt(5)).map(_.tuple)
I would prefer the answer #geek94 gave, this is too verbose for what you want to do.
An equally valid way to express this is:
val fields3: Seq[(Int, Int)] = Seq(3, 5).map(i => i -> i*i)

Related

Flatmap scala [String, String,List[String]]

I have this prbolem, I have an RDD[(String,String, List[String]), and I would like to "flatmap" it to obtain a RDD[(String,String, String)]:
e.g:
val x :RDD[(String,String, List[String]) =
RDD[(a,b, list[ "ra", "re", "ri"])]
I would like get:
val result: RDD[(String,String,String)] =
RDD[(a, b, ra),(a, b, re),(a, b, ri)])]
Use flatMap:
val rdd = sc.parallelize(Seq(("a", "b", List("ra", "re", "ri"))))
// rdd: org.apache.spark.rdd.RDD[(String, String, List[String])] = ParallelCollectionRDD[7] at parallelize at <console>:28
rdd.flatMap{ case (x, y, z) => z.map((x, y, _)) }.collect
// res23: Array[(String, String, String)] = Array((a,b,ra), (a,b,re), (a,b,ri))
This is an alternative way of doing it using flatMap again
val rdd = sparkContext.parallelize(Seq(("a", "b", List("ra", "re", "ri"))))
rdd.flatMap(array => array._3.map(list => (array._1, array._2, list))).foreach(println)

How to pass custom function to reduceByKey of RDD in scala

My requirement is to find the maximum of each group in RDD.
I tried the below;
scala> val x = sc.parallelize(Array(Array("A",3), Array("B",5), Array("A",6)))
x: org.apache.spark.rdd.RDD[Array[Any]] = ParallelCollectionRDD[0] at parallelize at <console>:27
scala> x.collect
res0: Array[Array[Any]] = Array(Array(A, 3), Array(B, 5), Array(A, 6))
scala> x.filter(math.max(_,_))
<console>:30: error: wrong number of parameters; expected = 1
x.filter(math.max(_,_))
^
I also tried the below;
Option 1:
scala> x.filter((x: Int, y: Int) => { math.max(x,y)} )
<console>:30: error: type mismatch;
found : (Int, Int) => Int
required: Array[Any] => Boolean
x.filter((x: Int, y: Int) => { math.max(x,y)} )
Option 2:
scala> val myMaxFunc = (x: Int, y: Int) => { math.max(x,y)}
myMaxFunc: (Int, Int) => Int = <function2>
scala> myMaxFunc(56,12)
res10: Int = 56
scala> x.filter(myMaxFunc(_,_) )
<console>:32: error: wrong number of parameters; expected = 1
x.filter(myMaxFunc(_,_) )
How to get this right ?
I can only guess, but probably you want to do:
val rdd = sc.parallelize(Array(("A", 3), ("B", 5), ("A", 6)))
val max = rdd.reduceByKey(math.max)
println(max.collect().toList) // List((B,5), (A,6))
Instead of "How to get this right ?" you should have explained what your expected result is. I think you made a few mistakes:
using filter instead of reduceByKey (why??)
reduceByKey only works on PairRDDs, so you need tuples instead of Array[Any] (which is a bad type anyways)
you do not need to write your own wrapper function for math.max, you can just use it as-is

Apache Spark - Scala - how to FlatMap (k, {v1,v2,v3,...}) to ((k,v1),(k,v2),(k,v3),...)

I got this:
val vector: RDD[(String, Array[String])] = [("a", {v1,v2,..}),("b", {u1,u2,..})]
wanna convert to:
RDD[(String, String)] = [("a",v1), ("a",v2), ..., ("b",u1), ("b",u2), ...]
Any idea how to do that using flatMap.
This:
vector.flatMap { case (x, arr) => arr.map((x, _)) }
Will give you:
scala> val vector = sc.parallelize(Vector(("a", Array("b", "c")), ("b", Array("d", "f"))))
vector: org.apache.spark.rdd.RDD[(String, Array[String])] =
ParallelCollectionRDD[3] at parallelize at <console>:27
scala> vector.flatMap { case (x, arr) => arr.map((x, _)) }.collect
res4: Array[(String, String)] = Array((a,b), (a,c), (b,d), (b,f))
You can definitely need to use flatMap like you mentioned, but in addition, you need to use scala map as well.
For example:
val idToVectorValue: RDD[(String, String ] = vector.flatMap((id,values) => values.map(value => (id, value)))
Using single parameter function:
vector.flatMap(data => data._2.map((data._1, _)))

Use 4 (or N) collections to yield only one value at a time (1xN) (i.e. zipped for tuple4+)

scala> val a = List(1,2)
a: List[Int] = List(1, 2)
scala> val b = List(3,4)
b: List[Int] = List(3, 4)
scala> val c = List(5,6)
c: List[Int] = List(5, 6)
scala> val d = List(7,8)
d: List[Int] = List(7, 8)
scala> (a,b,c).zipped.toList
res6: List[(Int, Int, Int)] = List((1,3,5), (2,4,6))
Now:
scala> (a,b,c,d).zipped.toList
<console>:12: error: value zipped is not a member of (List[Int], List[Int], List[Int], List[Int])
(a,b,c,d).zipped.toList
^
I've searched for this elsewhere, including this one and this one, but no conclusive answer.
I want to do the following or similar:
for((itemA,itemB,itemC,itemD) <- (something)) yield itemA + itemB + itemC + itemD
Any suggestions?
Short answer:
for (List(w,x,y,z) <- List(a,b,c,d).transpose) yield (w,x,y,z)
// List[(Int, Int, Int, Int)] = List((1,3,5,7), (2,4,6,8))
Why you want them as tuples, I'm not sure, but a slightly more interesting case would be when your lists are of different types, and for example, you want to combine them into a list of objects:
case class Person(name: String, age: Int, height: Double, weight: Double)
val names = List("Alf", "Betty")
val ages = List(22, 33)
val heights = List(111.1, 122.2)
val weights = List(70.1, 80.2)
val persons: List[Person] = ???
Solution 1: using transpose, as above:
for { List(name: String, age: Int, height: Double, weight: Double) <-
List(names, ages, heights, weights).transpose
} yield Person(name, age, height, weight)
Here, we need the type annotations in the List extractor, because transpose gives a List[List[Any]].
Solution 2: using iterators:
val namesIt = names.iterator
val agesIt = ages.iterator
val heightsIt = heights.iterator
val weightsIt = weights.iterator
for { name <- names }
yield Person(namesIt.next, agesIt.next, heightsIt.next, weightsIt.next)
Some people would avoid iterators because they involve mutable state and so are not "functional". But they're easy to understand if you come from the Java world and might be suitable if what you actually have are already iterators (input streams etc).
Shameless plug-- product-collections does something similar:
a flatZip b flatZip c flatZip d
res0: org.catch22.collections.immutable.CollSeq4[Int,Int,Int,Int] =
CollSeq((1,3,5,7),
(2,4,6,8))
scala> res0(0) //first row
res1: Product4[Int,Int,Int,Int] = (1,3,5,7)
scala> res0._1 //first column
res2: Seq[Int] = List(1, 2)
val g = List(a,b,c,d)
val result = ( g.map(x=>x(0)), g.map(x=>x(1) ) )
result : (List(1, 3, 5, 7),List(2, 4, 6, 8))
basic, zipped assit tuple2 , tuple3
http://www.scala-lang.org/api/current/index.html#scala.runtime.Tuple3Zipped
so, You want 'tuple4zippped' you make it
gool luck
found a possible solution, although it's very imperative to my taste:
val a = List(1,2)
val b = List(3,4)
val c = List(5,6)
val d = List(7,8)
val g : List[Tuple4[Int,Int,Int,Int]] = {
a.zipWithIndex.map { case (value,index) => (value, b(index), c(index), d(index))}
}
zipWithIndex would allow me to go through all the other collections. However, i'm sure there's a better way to do this. Any suggestions?
Previous attempts included:
Ryan LeCompte's zipMany or transpose.
however, it a List, not a tuple4. this is not as convenient to work with since i can't name the variables.
Tranpose it's already built in in the standard library and doesn't require higher kinds imports so it's preferrable, but not ideal.
I also, incorrectly, tried the following example with Shapeless
scala> import Traversables._
import Tuples._
import Traversables._
import Tuples._
import scala.language.postfixOps
scala> val a = List(1,2)
a: List[Int] = List(1, 2)
scala> val b = List(3,4)
b: List[Int] = List(3, 4)
scala> val c = List(5,6)
c: List[Int] = List(5, 6)
scala> val d = List(7,8)
d: List[Int] = List(7, 8)
scala> val x = List(a,b,c,d).toHList[Int :: Int :: Int :: Int :: HNil] map tupled
x: Option[(Int, Int, Int, Int)] = None

With scala, can I unapply a tuple and then run a map over it?

I have some financial data gathered at a List[(Int, Double)], like this:
val snp = List((2001, -13.0), (2002, -23.4))
With this, I wrote a formula that would transform the list, through map, into another list (to demonstrate investment grade life insurance), where losses below 0 are converted to 0, and gains above 15 are converted to 15, like this:
case class EiulLimits(lower:Double, upper:Double)
def eiul(xs: Seq[(Int, Double)], limits:EiulLimits): Seq[(Int, Double)] = {
xs.map(item => (item._1,
if (item._2 < limits.lower) limits.lower
else if (item._2 > limits.upper) limits.upper
else item._2
}
Is there anyway to extract the tuple's values inside this, so I don't have to use the clunky _1 and _2 notation?
List((1,2),(3,4)).map { case (a,b) => ... }
The case keyword invokes the pattern matching/unapply logic.
Note the use of curly braces instead of parens after map
And a slower but shorter quick rewrite of your code:
case class EiulLimits(lower: Double, upper: Double) {
def apply(x: Double) = List(x, lower, upper).sorted.apply(1)
}
def eiul(xs: Seq[(Int, Double)], limits: EiulLimits) = {
xs.map { case (a,b) => (a, limits(b)) }
}
Usage:
scala> eiul(List((1, 1.), (3, 3.), (4, 4.), (9, 9.)), EiulLimits(3., 7.))
res7: Seq[(Int, Double)] = List((1,3.0), (3,3.0), (4,4.0), (7,7.0), (9,7.0))
scala> val snp = List((2001, -13.0), (2002, -23.4))
snp: List[(Int, Double)] = List((2001,-13.0), (2002,-23.4))
scala> snp.map {case (_, x) => x}
res2: List[Double] = List(-13.0, -23.4)
scala> snp.map {case (x, _) => x}
res3: List[Int] = List(2001, 2002)