Reduce by key in an array of arrays in Scala - scala

I've created an array in Scala
val L = Array((1,Array(one, two, one)), (2,Array(one, three, three)))
And I want to get this result:
Array((1,(one,2), (two,1)), (2,(one,1),(three,2)))
I did
val LL = L.map({case (s, contents) => (s, contents.map(s=>(s,1)))})
and got
LL = Array((1,Array((one,1), (two,1), (one,1))), (2,Array((one,1), (three,1), (three,1))))
and I want to do a reduceByKey to have that result but it doesn't seem to work. Any help?

Try:
L.map { case (key, values) =>
(key, values.groupBy(identity).mapValues(_.size).toArray)
}

Use foldLeft to iterate on each second value of the array item and count the occurrence of strings in each Array.
Scala repl
scala> val arr: Array[(Int, Array[String])] = Array((1, Array("one", "one", "two")))
arr: Array[(Int, Array[String])] = Array((1,Array(one, one, two)))
scala> arr.map { case (k, v) => k -> (v.foldLeft(Map.empty[String, Int])((r, c) => r.get(c).map(x => r.updated(c, x + 1)).getOrElse(r.updated(c, 1))).toArray) }
res16: Array[(Int, Array[(String, Int)])] = Array((1,Array((one,2), (two,1))))

Related

Merging two arrays in Scala

My requirement is that :
arr1 : Array[(String, String)] = Array((bangalore,Kanata), (kannur,Kerala))
arr2 : Array[(String, String)] = Array((001,anup), (002,sithu))
should give me
Array((001,anup,bangalore,Krnata), (002,sithu,kannur,Kerala))
I tried this :
val arr3 = arr2.map(field=>(field,arr1))
but it didn't work
#nicodp's answer addressed your question very nicely. zip and then map will give you the resultant array.
Recall that if one list is larger than the other, its remaining elements are ignored.
My attempt tries to address this:
Consider:
val arr1 = Array(("bangalore","Kanata"), ("kannur","Kerala"))
val arr2 = Array(("001","anup", "ramakrishan"), ("002","sithu", "bhattacharya"))
zip and mapping on tuples will give the result as:
arr1.zip(arr2).map(field => (field._1._1, field._1._2, field._2._1, field._2._2))
Array[(String, String, String, String)] = Array((bangalore,Kanata,001,anup), (kannur,Kerala,002,sithu))
// This ignores the last field of arr2
While mapping, you can convert the tuple in iterator and get a list from it. This will enable you to not keep a track of Tuple2 or Tuple3
arr1.zip(arr2).map{ case(k,v) => List(k.productIterator.toList, v.productIterator.toList).flatten }
// Array[List[Any]] = Array(List(bangalore, Kanata, 001, anup, ramakrishan), List(kannur, Kerala, 002, sithu, bhattacharya))
You can do a zip followed by a map:
scala> val arr1 = Array((1,2),(3,4))
arr1: Array[(Int, Int)] = Array((1,2), (3,4))
scala> val arr2 = Array((5,6),(7,8))
arr2: Array[(Int, Int)] = Array((5,6), (7,8))
scala> arr1.zip(arr2).map(field => (field._1._1, field._1._2, field._2._1, field._2._2))
res1: Array[(Int, Int, Int, Int)] = Array((1,2,5,6), (3,4,7,8))
The map acts as a flatten for tuples, that is, takes things of type ((A, B), (C, D)) and maps them to (A, B, C, D).
What zip does is... meh, let's see its type:
def zip[B](that: GenIterable[B]): List[(A, B)]
So, from there, we can argue that it takes an iterable collection (which can be another list) and returns a list which is the combination of the corresponding elements of both this: List[A] and that: List[B] lists. Recall that if one list is larger than the other, its remaining elements are ignored. You can dig more about list functions in the documentation.
I agree that the cleanes solution is using the zip method from collections
val arr1 = Array(("bangalore","Kanata"), ("kannur","Kerala"))
val arr2 = Array(("001","anup"), ("002","sithu"))
arr1.zip(arr2).foldLeft(List.empty[Any]) {
case (acc, (a, b)) => acc ::: List(a.productIterator.toList ++ b.productIterator.toList)
}

replace string variable in Scala for solution

I have a variable which has the following data structure, I want to replace all "foo" with all "goo" for the String part. Is there any one line neat code to do that? Want to see if any smart solutions to skip to write a loop. :)
var result = List[List[(List[String], Double)]]
regards,
Lin
I am not sure if I got it right but maybe this is what you are looking for?
scala> val a: List[List[(List[String], Double)]] = List(List((List("foo asd", "asd foo"), 2.6)))
scala> a map (_ map { case (k, v) => (k map (_.replaceAll("foo", "goo")), v) })
res1: List[List[(List[String], Double)]] = List(List((List(goo asd, asd goo),2.6)))
Edit
to answer the comment let me first remove spaces and use dots
scala> a.map(_.map { case (k, v) => (k.map(_.replaceAll("foo", "goo")), v) })
and now, expand _.method(param) to x => x.method(param)
scala> a.map(b => b.map { case (k, v) => (k.map(c => c.replaceAll("foo", "goo")), v) })
You have 3 levels of nested lists, and one is inside a tuple, it won't be pretty, you need to map over each of them and extract last one from tuple.
scala> val l = List(List((List("foo","afoo"),3.4),(List("gfoo","cfoo"),5.6)))
l: List[List[(List[String], Double)]] = List(List((List(foo, afoo),3.4), (List(gfoo, cfoo),5.6)))
scala> def replaceFoo(y:List[String]) = y.map(s => s.replace("foo","goo"))
replaceFoo: (y: List[String])List[String]
scala> l.map(x => x.map(y => (replaceFoo(y._1),y._2)))
res0: List[List[(List[String], Double)]] = List(List((List(goo, agoo),3.4), (List(ggoo, cgoo),5.6)))

Spark 1.5.1, Scala 2.10.5: how to expand an RDD[Array[String], Vector]

I am using Spark 1.5.1 with Scala 2.10.5
I have an RDD[Array[String], Vector] for each element of the RDD:
I want to take each String in the Array[String] and combine it
with the Vector to create a tuple (String, Vector), this step will lead to the creation of several tuples from each element of the initial RDD
The goal is to end by building an RDD of tuples: RDD[(String,
Vector)], this RDD contains all the tuples created in the previous step.
Thanks
Consider this :
rdd.flatMap { case (arr, vec) => arr.map( (s) => (s, vec) ) }
(The first flatMap lets you get a RDD[(String, Vector)] as an output as opposed to a map which would get you a RDD[Array[(String, Vector)]])
Have you tried this?
// rdd: RDD[Array[String], Vector] - initial RDD
val new_rdd = rdd
.flatMap {
case (array: Array[String], vec: Vector) => array.map(str => (str, vec))
}
Toy example (I'm running it in spark-shell):
val rdd = sc.parallelize(Array((Array("foo", "bar"), 100), (Array("one", "two"), 200)))
val new_rdd = rdd
.map {
case (array: Array[String], vec: Int) => array.map(str => (str, vec))
}
.flatMap(arr => arr)
new_rdd.collect
res14: Array[(String, Int)] = Array((foo,100), (bar,100), (one,200), (two,200))

Appending to Map with value as a list

I have initialized a mutable Map (not sure if I should use a mutable here, but to start with I use mutable):
val aKey = "aa"
val myMap = scala.collection.mutable.Map[String, List[String]]()
if (myMap.exists(_._1 == aKey))
myMap(aKey) = myMap.get(aKey) :: "test"
But on myMap.get(aKey) I get the following error:
Type Mismatch expected List[String] actual option[String]
I thought the way I did is correct to append to list.
You can append to a mutable map with +=.
scala> myMap += ("placeholders" -> List("foo", "bar", "baz"))
res0: scala.collection.mutable.Map[String,List[String]] = Map(placeholders -> List(foo, bar, baz))
To append a new item to the list for aKey as mentioned in the commments.
myMap.get("placeholders") match {
case Some(xs:List[String]) => myMap.update("placeholders", xs :+ "buzz")
case None => myMap
}
res22: Any = ()
scala> myMap
res23: scala.collection.mutable.Map[String,List[String]] = Map(placeholders -> List(foo, bar, baz, buzz))
If you have mutable Map and inside that map immutable List. This is a simple example on how to do it. The example is also defining using withDefaultValue - so that you always get something back from Map.
var posts: collection.mutable.Map[Int, List[String]] = collection.mutable.Map().
withDefaultValue List.empty[String]
def addTag(postID: Int, tag: String): Unit = posts.get(postID) match {
case Some(xs: List[String]) => posts(postID) = xs :+ tag
case None => posts(postID) = List(tag)
}
posts(42)
// List[String] = List()
addTag(42, "tag-a")
addTag(42, "tag-b")
addTag(42, "tag-c")
posts(42)
// List[String] = List(tag-a, tag-b, tag-c)
Everything is fine. Except for list append operator
To add an element to the list. The operator should be something like
myList = element :: myList
myList = myList :: element // wrong
so your program should be
val aKey = "aa"
var myMap = scala.collection.mutable.Map[String, List[String]]().withDefaultValues(List())
myMap(aKey) = "test" :: myMap(aKey)
Starting Scala 2.13, you could alternatively use Map#updateWith on mutable Maps (or Map#updatedWith on immutable Maps):
map.updateWith("a")({
case Some(list) => Some("test" :: list)
case None => Some(List("test"))
})
def updateWith(key: K)(remappingFunction: (Option[V]) => Option[V]): Option[V]
For instance,
val map = collection.mutable.Map[String, List[String]]()
// map: collection.mutable.Map[String, List[String]] = HashMap()
val key = "key"
// key: String = "key"
if the key doesn't exist:
map.updateWith(key)({ case Some(list) => Some("test" :: list) case None => Some(List("test")) })
// Option[List[String]] = Some(List("test"))
map
// collection.mutable.Map[String, List[String]] = HashMap("key" -> List("test"))
and if the key exists:
map.updateWith(key)({ case Some(list) => Some("test2" :: list) case None => Some(List("test2")) })
// Option[List[String]] = Some(List("test2", "test"))
map
// collection.mutable.Map[String, List[String]] = HashMap("key" -> List("test2", "test"))
I figured the how to do it:
val aKey = "aa"
var myMap = scala.collection.mutable.Map[String, List[String]]()
if (myMap.exists(_._1 == aKey))
myMap.get(aKey).get :+ "foo"
else {
myMap(aKey) = List("zoo")
}
This might not be the scala way of doing but this gives the correct results.
first you shouldn't Mutable Map :). for add one item on List within Map, you can use method get of Map.
val m = Map(1 -> List(2))
val m2 = m.get(1).fold{
// In this case don't exist any List for this key
m + (1 -> List(3))
}{ list =>
// In this case exist any List for this key
m + (1 -> (3 :: list))
}

How to apply function to each tuple of a multi-dimensional array in Scala?

I've got a two dimensional array and I want to apply a function to each value in the array.
Here's what I'm working with:
scala> val array = Array.tabulate(2,2)((x,y) => (0,0))
array: Array[Array[(Int, Int)]] = Array(Array((0,0), (0,0)), Array((0,0), (0,0)))
I'm using foreach to extract the tuples:
scala> array.foreach(i => i.foreach(j => println(i)))
[Lscala.Tuple2;#11d559a
[Lscala.Tuple2;#11d559a
[Lscala.Tuple2;#df11d5
[Lscala.Tuple2;#df11d5
Let's make a simple function:
//Takes two ints and return a Tuple2. Not sure this is the best approach.
scala> def foo(i: Int, j: Int):Tuple2[Int,Int] = (i+1,j+2)
foo: (i: Int,j: Int)(Int, Int)
This runs, but need to apply to array(if mutable) or return new array.
scala> array.foreach(i => i.foreach(j => foo(j._1, j._2)))
Shouldn't be to bad. I'm missing some basics I think...
(UPDATE: removed the for comprehension code which was not correct - it returned a one dimensional array)
def foo(t: (Int, Int)): (Int, Int) = (t._1 + 1, t._2 + 1)
array.map{_.map{foo}}
To apply to a mutable array
val array = ArrayBuffer.tabulate(2,2)((x,y) => (0,0))
for (sub <- array;
(cell, i) <- sub.zipWithIndex)
sub(i) = foo(cell._1, cell._2)
2dArray.map(_.map(((_: Int).+(1) -> (_: Int).+(1)).tupled))
e.g.
scala> List[List[(Int, Int)]](List((1,3))).map(_.map(((_: Int).+(1) -> (_: Int).+(1)).tupled))
res123: List[List[(Int, Int)]] = List(List((2,4)))
Dot notation required to force