Map in Array list - scala

I am new in scala/spark so could you please help me about this:
I have this:
val list= Array(("a",(1,2,3)),("b",(1,2))
I want to have in output :
(a,1),(a,2),(a,3),(b,1),(b,2)
What can I do to have this.

You could use the productIterator to iterate over the tuples. The following results in List[(String, Any)] = List((a,1), (a,2), (a,3), (b,1), (b,2)). Not sure if there is a nice way to infer that the types of the items in all your tuples is Integer instead of Any though.
val list: Array[(String, Product)]= Array(("a",(1,2,3)),("b",(1,2)))
(for {
i <- list.iterator
n <- i._2.productIterator
} yield {
(i._1, n)
}).toList

You would only need to do this:
list.flatMap(t => t._2.productIterator.map(n => t._1->n))
Personally i wouldn't use dynamic tuples :/ I would use lists of ints or something.

Related

How to make a tuple from a list in scala?

I have list say -
List("aa","1","bb","2","cc","3","dd","4")
How to make a list of tuples with even and odd positions :
(aa,1),(bb,2),(cc,3),(dd,4)
Hope it will help.
val list = List("aa","1","bb","2","cc","3","dd","4")
val tuple =
list.grouped(2).map { e =>
(e.head,e.last)
}.toList
We should consider the case of oddly sized lists, for example, List("aa","1","bb","2","cc","3","dd"):
Should we return List((aa,1), (bb,2), (cc,3), (dd,dd))?
Should we drop the last element and return List((aa,1), (bb,2), (cc,3))?
Should we indicate error is some way, perhaps with Option?
Should we crash?
Here is an example of returning Option[List(String, String)] to indicate error case:
def maybeGrouped(list: List[String]): Option[List[(String, String)]] =
Try(
list
.sliding(2, 2)
.map { case List(a,b) => (a, b) }
.toList
).toOption

How to convert (key,array(value)) to (key,value) in Spark

I have a RDD like below:
val rdd1 = sc.parallelize(Array((1,Array((3,4),(4,5))),(2,Array((4,2),(4,4),(3,9)))))
which is RDD[(Int,Array[(Int,Int)])] I want to get the result like RDD[(Int,(Int,Int)] by some operations such as flatMap or else. In this example, the result should be:
(1,(3,4))
(1,(4,5))
(2,(4,2))
(2,(4,4))
(2,(3,9))
I am quite new to spark, so what could I do to achieve this?
Thanks a lot.
you can use flatMap in your case like this :
val newRDD: RDD[(Int, (Int, Int))] = rdd1
.flatMap { case (k, values) => values.map(v => (k, v))}
Assume that as RDD as rd. Use below code to get the data as you want
rdd1.flatMap(x => x._2.map(y => (x._1,y)))
Internal map method in flatmap read x._2 which is array and read each value of array at a time as y. After that flat map will give them as separate items. x._1 is the first value in the RDD.

split strings in array and convert to map

I am reading a file composed of lines in the format a=b.
Using Source.fromFile("file").getLines I get an Iterator[String]. I figure I need to split the strings into tuples and then form the map from the tuples - that is easy. I am not being able to go from the Iterator[String] to an Iterator[(String,String)].
How can I do this? I am a beginner to scala and not experienced with functional programming, so I am receptive to alternatives :)
You can do so by splitting the string and then creating the tuples from the first and second elements using Iterator.map:
val strings = List("a=b", "c=d", "e=f").iterator
val result: Iterator[(String, String)] = strings.map { s =>
val split = s.split("=")
(split(0), split(1))
}
If you don't mind the extra iteration and intermediate collection you can make this a little prettier:
val result: Iterator[(String, String)] =
strings
.map(_.split("="))
.map(arr => (arr(0), arr(1)))
You can transform the values returned by an Iterator using the map method:
def map[B](f: (A) ⇒ B): Iterator[B]
Maybe like this?
Source.fromFile("file").getLines.map(_.split("=").map( x => (x.head,x.tail) ) )
You might want to wrap this into Try.
This is my try:
val strings = List("a=b", "c=d", "e=f")
val map = strings.map(_.split("=")).map { case Array(f1,f2) => (f1,f2) }

Spark use reduceByKey on nested structure

Currently I have a structure like this:
Array[(Int, Array[(String, Int)])], and I want to use reduceByKey on the Array[(String, Int)], which is inside the Array of tuple. I tried code like
//data is in Array[(Int, Array[(String, Int)])] structure
val result = data.map(l => (l._1, l._2.reduceByKey(_ + _)))
The error is telling that Array[(String,Int)]does not have method called reduceByKey, and I understand that this method can only be used on RDD. So my question is, is there any way to use "reduceByKey" feature, doesn't need to use exactly this method, in the nested structure?
Thanks guys.
You simply use Array's reduce method here as you are now working with an Array and not an RDD (assuming you really meant the outer wrapper to be an RDD)
val data = sc.parallelize(List((1,List(("foo", 1), ("foo", 1)))))
data.map(l=>(l._1, l._2.foldLeft(List[(String, Int)]())((accum, curr)=>{
val accumAsMap = accum.toMap
accumAsMap.get(curr._1) match {
case Some(value : Int) => (accumAsMap + (curr._1 -> (value + curr._2))).toList
case None => curr :: accum
}
}))).collect
Ultimately, it seems that you do not understand what an RDD is, so you might want to read some of the docs on them.

Iterate Over a tuple

I need to implement a generic method that takes a tuple and returns a Map
Example :
val tuple=((1,2),(("A","B"),("C",3)),4)
I have been trying to break this tuple into a list :
val list=tuple.productIterator.toList
Scala>list: List[Any] = List((1,2), ((A,B),(C,3)), 4)
But this way returns List[Any] .
I am trying now to find out how to iterate over the following tuple ,for example :
((1,2),(("A","B"),("C",3)),4)
in order to loop over each element 1,2,"A",B",...etc. How could I do this kind of iteration over the tuple
What about? :
def flatProduct(t: Product): Iterator[Any] = t.productIterator.flatMap {
case p: Product => flatProduct(p)
case x => Iterator(x)
}
val tuple = ((1,2),(("A","B"),("C",3)),4)
flatProduct(tuple).mkString(",") // 1,2,A,B,C,3,4
Ok, the Any-problem remains. At least that´s due to the return type of productIterator.
Instead of tuples, use Shapeless data structures like HList. You can have generic processing, and also don't lose type information.
The only problem is that documentation isn't very comprehensive.
tuple.productIterator map {
case (a,b) => println(a,b)
case (a) => println(a)
}
This works for me. tranform is a tuple consists of dataframes
def apply_function(a: DataFrame) = a.write.format("parquet").save("..." + a + ".parquet")
transform.productIterator.map(_.asInstanceOf[DataFrame]).foreach(a => apply_function(a))