Can map function be applied to Tuple? - scala

I have a Tuple of type :
val data1 : (String , scala.collection.mutable.ArrayBuffer[(String , Int)]) = ( ("" , scala.collection.mutable.ArrayBuffer( ("a" , 1) , ("b" , 1) , ("a" , 1) ) )) // , ("" , scala.collection.mutable.ArrayBuffer("" , 1)) )
When I attempt to map using : data1.map(m => println(m)) I receive error :
Multiple markers at this line - value map is not a member of (String, scala.collection.mutable.ArrayBuffer[(String,
Int)]) - value map is not a member of (String, scala.collection.mutable.ArrayBuffer[(String, Int)])
Is it possible to use map function using Tuple accessor syntax : ._2 ?
This type of syntax data1.map(m._2 => println(m._2))) does not compile
Using this syntax I'm attempting to apply a map function to sum all the letters associated with the ArrayBuffer. So above example should map to -> ( (a , 2) , (b , 1) )

Its unclear what you want. What output are you expecting?
Do you want to print the second item of data1?
println(data1._2)
Or print each item of the buff in data1?
data1._2.foreach(m => println(m))
Do you want for data1 to be a collection of tuples and to map over that?
import scala.collection.mutable.ArrayBuffer
val data1 = Vector(("" , ArrayBuffer(("", 1))), ("", ArrayBuffer("", 1)))
data1.foreach { case (a,b) => println(b) }
Note that if you're just printing stuff out, you want foreach, not map.
Based on your edits:
import scala.collection.mutable.ArrayBuffer
val data1 = (("", ArrayBuffer(("a", 1), ("b", 1), ("a", 1))))
val grouped = data1._2.groupBy(_._1).map { case (k, vs) => (k, vs.map(_._2).sum) }
// Map(b -> 1, a -> 2)

You can't use map on tuples. Tuples don't have such method.
Also map function should transform value and return it, but you just want to print it, not change.
To print ArrayBuffer in your case try this:
data1._2.foreach(x=>println(x))
or just
data1._2.foreach(println)

Try
data1.foreach( { case ( x, y ) => println( y ) } )

Related

How to convert RDD[Array[String]] to RDD[(Int, HashMap[String, List])]?

I have input data:
time, id, counter, value
00.2, 1 , c1 , 0.2
00.2, 1 , c2 , 0.3
00.2, 1 , c1 , 0.1
and I want for every id to create a structure to store counters and values. After thinking about vectors and rejecting them, I came to this:
(id, Hashmap( (counter1, List(Values)), (Counter2, List(Values)) ))
(1, HashMap( (c1,List(0.2, 0.1)), (c2,List(0.3)))
The problem is that I can't convert to Hashmap inside the map transformation and additionaly I dont't know if I will be able to reduce by counter the list inside map.
Does anyone have any idea?
My code is :
val data = inputRdd
.map(y => (y(1).toInt, mutable.HashMap(y(2), List(y(3).toDouble)))).reduceByKey(_++_)
}
Off the top of my head, untested:
import collection.mutable.HashMap
inputRdd
.map{ case Array(t, id, c, v) => (id.toInt, (c, v)) }
.aggregateByKey(HashMap.empty[String, List[String]])(
{ case (m, (c, v)) => { m(c) ::= v; m } },
{ case (m1, m2) => { for ((k, v) <- m2) m1(k) ::= v ; m1 } }
)
Here's one approach:
val rdd = sc.parallelize(Seq(
("00.2", 1, "c1", 0.2),
("00.2", 1, "c2", 0.3),
("00.2", 1, "c1", 0.1)
))
rdd.
map{ case (t, i, c, v) => (i, (c, v)) }.
groupByKey.mapValues(
_.groupBy(_._1).mapValues(_.map(_._2)).map(identity)
).
collect
// res1: Array[(Int, scala.collection.immutable.Map[String,Iterable[Double]])] = Array(
// (1,Map(c1 -> List(0.2, 0.1), c2 -> List(0.3)))
// )
Note that the final map(identity) is a remedy for the Map#mapValues not serializable problem suggested in this SO answer.
If, as you have mentioned, have inputRdd as
//inputRdd: org.apache.spark.rdd.RDD[Array[String]] = ParallelCollectionRDD[0] at parallelize at ....
Then a simple groupBy and foldLeft on the grouped values should do the trick for you to have the final desired result
val resultRdd = inputRdd.groupBy(_(1))
.mapValues(x => x
.foldLeft(Map.empty[String, List[String]]){(a, b) => {
if(a.keySet.contains(b(2))){
val c = a ++ Map(b(2) -> (a(b(2)) ++ List(b(3))))
c
}
else{
val c = a ++ Map(b(2) -> List(b(3)))
c
}
}}
)
//resultRdd: org.apache.spark.rdd.RDD[(String, scala.collection.immutable.Map[String,List[String]])] = MapPartitionsRDD[3] at mapValues at ...
//(1,Map(c1 -> List(0.2, 0.1), c2 -> List(0.3)))
changing RDD[(String, scala.collection.immutable.Map[String,List[String]])] to RDD[(Int, HashMap[String,List[String]])] would just be casting and I hope it would be easier for you to do that
I hope the answer is helpful

How to add all the values of a map without using recurrsion or var

I want to add all the values in a map without using var or any mutable structures. I have tried to do something like this but it doens't work:
val mymap = ("a" -> 1, "b" -> 2)
val sum_of_alcohol_consumption =
for ((k,v) <- mymap ) yield (sum_of_alcohol_consumption += v)
I have been told that I can use .sum on a list
Please help
Thanks
You can use the .values function of a Map to return an Iterable List of its values (all of the Integers) and then call the .sum function on that:
val myMap = Map("a" -> 1, "b" -> 2)
val sum = myMap.values.sum
println(sum) // Outputs: 3
An equivalent answer to the more elegant use of sum is to use a fold operation. sum is implemented in a manner similar to this:
val myMap = Map("a" -> 1, "b" -> 2)
val sumAlcoholConsumption = myMap.values.foldLeft(0)(_ + _)
values returns a sequence of only the values in the map. The first foldLeft argument is the zero value (think of it as the initial value for an accumulator value) for the operation. The second argument is a function that adds the current value of the accumulator to the current element, returning the sum of the two values - and it is applied to each value in turn. That said, sum is a lot more convenient.
To get the only values of map, it provides a function values which will return iterable,we can directly appy sum function to it.
scala> val mymap = Map("a" -> 1, "b" -> 2)
mymap: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2)
scala> mymap.values.sum
res7: Int = 3

Could Anyone explain about this code?

according this link: https://github.com/amplab/training/blob/ampcamp6/machine-learning/scala/solution/MovieLensALS.scala
I don't understand what is the point of :
val numUsers = ratings.map(_._2.user).distinct.count
val numMovies = ratings.map(_._2.product).distinct.count
_._2.[user|product] , what does that mean?
That is accessing the tuple elements: The following example might explain it better.
val xs = List(
(1, "Foo"),
(2, "Bar")
)
xs.map(_._1) // => List(1,2)
xs.map(_._2) // => List("Foo", "Bar")
// An equivalent way to write this
xs.map(e => e._1)
xs.map(e => e._2)
// Perhaps a better way is
xs.collect {case (a, b) => a} // => List(1,2)
xs.collect {case (a, b) => b} // => List("Foo", "Bar")
ratings is a collection of tuples:(timestamp % 10, Rating(userId, movieId, rating)). The first underscore in _._2.user refers to the current element being processed by the map function. So the first underscore now refers to a tuple (pair of values). For a pair tuple t you can refer to its first and second elements in the shorthand notation: t._1 & t._2 So _._2 is selecting the second element of the tuple currently being processed by the map function.
val ratings = sc.textFile(movieLensHomeDir + "/ratings.dat").map { line =>
val fields = line.split("::")
// format: (timestamp % 10, Rating(userId, movieId, rating))
(fields(3).toLong % 10, Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble))
}

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))

Its a nested map with contents like this when i print it onto screen
(5, Map ( "ABCD" -> Map("3200" -> 3,
"3350.800" -> 4,
"200.300" -> 3)
(1, Map ( "DEF" -> Map("1200" -> 32,
"1320.800" -> 4,
"2100" -> 3)
I need to get something like this
Case Class( 5, ABCD 3200, 3)
Case Class(5, ABCD 3350.800, 4)
CaseClass(5,ABCD., 200.300, 3)
CaseClass(1, DEF 1200, 32)
CaseClass(1 DEF, 1320.800, 4)
etc etc.
basically a list of case classes
And map it to a case class object so that i can save it to cassandra.
I have tried doing flatMapValues but that un nests the map only one level. Also used flatMap . that doesnt work either or I'am making mistakes
Any suggestions ?
Fairly straightforward using a for-comprehension and some pattern matching to destructure things:
val in = List((5, Map ( "ABCD" -> Map("3200" -> 3, "3350.800" -> 4, "200.300" -> 3))),
(1, Map ("DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3))))
case class Thing(a:Int, b:String, c:String, d:Int)
for { (index, m) <- in
(k,v) <-m
(innerK, innerV) <- v}
yield Thing(index, k, innerK, innerV)
//> res0: List[maps.maps2.Thing] = List(Thing(5,ABCD,3200,3),
// Thing(5,ABCD,3350.800,4),
// Thing(5,ABCD,200.300,3),
// Thing(1,DEF,1200,32),
// Thing(1,DEF,1320.800,4),
// Thing(1,DEF,2100,3))
So let's pick part the for-comprehension
(index, m) <- in
This is the same as
t <- in
(index, m) = t
In the first line t will successively be set to each element of in.
t is therefore a tuple (Int, Map(...))
Patten matching lets us put that "patten" for the tuple on the right hand side and the compiler picks apart the tuple, sets index to the Int and m to the Map.
(k,v) <-m
As before this is equivalent to
u <-m
(k, v) = u
And this time u takes each element of Map. Which again are tuples of key and value. So k is set successively to each key and v to the value.
And v is your inner map so we do the same thing again with the inner map
(innerK, innerV) <- v}
Now we have everything we need to create the case class. yield just says make a collection of whatever is "yielded" each time through the loop.
yield Thing(index, k, innerK, innerV)
Under the hood, this just translates to a set of maps/flatmaps
The yield is just the value Thing(index, k, innerK, innerV)
We get one of those for each element of v
v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}
but there's an inner map per element of the outer map
m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
(flatMap because we get a List of Lists if we just did a map and we want to flatten it to just the list of items)
Similarly, we do one of those for every element in the List
in.flatMap (z => val (index, m) = z; m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
Let's do that in _1, _2 style-y.
in.flatMap (z=> z._2.flatMap{y=>y._2.map{x=>;Thing(z._1, y._1, x._1, x._2)}}}
which produces exactly the same result. But isn't it clearer as a for-comprehension?
You can do this like this if you prefer collection operation
case class Record(v1: Int, v2: String, v3: Double, v4: Int)
val data = List(
(5, Map ( "ABC" ->
Map(
3200. -> 3,
3350.800 -> 4,
200.300 -> 3))
),
(1, Map ( "DEF" ->
Map(
1200. -> 32,
1320.800 -> 4,
2100. -> 3))
)
)
val rdd = sc.parallelize(data)
val result = rdd.flatMap(p => {
p._2.toList
.flatMap(q => q._2.toList.map(l => (q._1, l)))
.map((p._1, _))
}).map(p => Record(p._1, p._2._1, p._2._2._1, p._2._2._2))
println(result.collect.toList)
//List(
// Record(5,ABC,3200.0,3),
// Record(5,ABC,3350.8,4),
// Record(5,ABC,200.3,3),
// Record(1,DEF,1200.0,32),
// Record(1,DEF,1320.8,4),
// Record(1,DEF,2100.0,3)
//)

How to convert String Iterator into List of Tuples

How can
val s = Iterator("a|b|2","a|c|3")
be converted to
List( (("a" , "b") , 2) , (("a" , "c") , 3)))
This is my current progress :
val v = s.map(m => m.split("|")(0))
How can I parse the String into its constituent parts so can be converted to a List of Tuples ?
You can match on the array returned from split:
val v = s.map(_.split('|') match { case Array(a, b, n) => ((a, b), n.toInt) })