I am a newbie to Scala.
I have a Tuple[Int, String]
((1, "alpha"), (2, "beta"), (3, "gamma"), (4, "zeta"), (5, "omega"))
For the above list, I want to print all strings where the corresponding length is 4.
printing length of string of Tuples in Scala
val tuples = List((1, "alpha"), (2, "beta"), (3, "gamma"), (4, "zeta"), (5, "omega"))
println(tuples.map(x => (x._2, x._2.length)))
//List((alpha,5), (beta,4), (gamma,5), (zeta,4), (omega,5))
I want to print all strings where the corresponding length is 4
You can filter first and then print as
val tuples = List((1, "alpha"), (2, "beta"), (3, "gamma"), (4, "zeta"), (5, "omega"))
tuples.filter(_._2.length == 4).foreach(x => println(x._2))
it should print
beta
zeta
You can convert your Tuple to List and then map and filter as you need:
tuple.productIterator.toList
.map{case (a,b) => b.toString}
.filter(_.length==4)
Example:
For the given input:
val tuple = ((1, "alpha"), (2, "beta"), (3, "gamma"), (4, "zeta"), (5, "omega"))
tuple: ((Int, String), (Int, String), (Int, String), (Int, String), (Int, String)) = ((1,alpha),(2,beta),(3,gamma),(4,zeta),(5,omega))
Output:
List[String] = List(beta, zeta)
Let's suppose you have a list of Tuple, and you need all the values with string length equal to 4.
You can do a filter on the list:
val filteredList = list.filter(_._2.length == 4)
And then iterate over each element to print them:
filteredList.foreach(tuple => println(tuple._2))
Here is way to achieve this
scala> val x = ((1, "alpha"), (2, "beta"), (3, "gamma"), (4, "zeta"), (5, "omega"))
x: ((Int, String), (Int, String), (Int, String), (Int, String), (Int, String)) = ((1,alpha),(2,beta),(3,gamma),(4,zeta),(5,omega))
scala> val y = x.productIterator.toList.collect{
case ele : (Int, String) if ele._2.length == 4 => ele._2
}
y: List[String] = List(beta, zeta)
Related
I have two RDDs. One RDD is of type RDD[(String, String, String)] and the second RDD is of type RDD[(String, String, String, String, String)]. Whenever I try to perform operations like union, intersection, etc, I get the error :-
error: type mismatch;
found: org.apache.spark.rdd.RDD[(String, String, String, String,String, String)]
required: org.apache.spark.rdd.RDD[(String, String, String)]
uid.union(uid1).first()
How can I perform the set operations in this case? If set operations are not possible at all, what can I do to get the same result as set operations without having the type mismatch problem?
EDIT:
Here's a sample of the first lines from both the RDDs :
(" p69465323_serv80i"," 7 "," fb_406423006398063"," guest_861067032060185_android"," fb_100000829486587"," fb_100007900293502")
(fb_100007609418328,-795000,r316079113_serv60i)
Several operations require two RDDs to have the same type.
Let's take union for example: union basically concatenates two RDDs. As you can imagine it would be unsound to concatenate the following:
RDD1
(1, 2)
(3, 4)
RDD2
(5, 6, "string1")
(7, 8, "string2")
As you see, RDD2 has one extra column. One thing that you can do, is work on RDD1 to that its schema matches that of RDD2, for example by adding a default value:
RDD1
(1, 2)
(3, 4)
RDD1 (AMENDED)
(1, 2, "default")
(3, 4, "default")
RDD2
(5, 6, "string1")
(7, 8, "string2")
UNION
(1, 2, "default")
(3, 4, "default")
(5, 6, "string1")
(7, 8, "string2")
You can achieve this with the following code:
val sc: SparkContext = ??? // your SparkContext
val rdd1: RDD[(Int, Int)] =
sc.parallelize(Seq((1, 2), (3, 4)))
val rdd2: RDD[(Int, Int, String)] =
sc.parallelize(Seq((5, 6, "string1"), (7, 8, "string2")))
val amended: RDD[(Int, Int, String)] =
rdd1.map(pair => (pair._1, pair._2, "default"))
val union: RDD[(Int, Int, String)] =
amended.union(rdd2)
If you know print the contents
union.foreach(println)
you will get what we ended up having in the above example.
Of course, the exact semantics of how you want the two RDDs to match depend on your problem.
I have 2 lists:
val list_1 = List((1, 11), (2, 12), (3, 13), (4, 14))
val list_2 = List((1, 111), (2, 122), (3, 133), (4, 144), (1, 123), (2, 234))
I want to replace key in the second list as value of first list, resulting in a new list that looks like:
List ((11, 111), (12, 122), (13, 133), (14, 144), (11, 123), (12, 234))
This is my attempt:
object UniqueTest {
def main(args: Array[String]){
val l_1 = List((1, 11), (2, 12), (3, 13), (4, 14))
val l_2 = List((1, 111), (2,122), (3, 133), (4, 144), (1, 123), (2, 234))
val l_3 = l_2.map(x => (f(x._1, l_1), x._2))
print(l_3)
}
def f(i: Int, list: List[(Int, Int)]): Int = {
for(pair <- list){
if(i == pair._1){
return pair._2
}
}
return 0
}
}
This results in:
((11, 111), (12, 122), (13, 133), (14, 144), (11, 123), (12, 234))
Is the program above a good way to do this? Are there built-in functions in Scala to handle this need, or another way to do this manipulation?
The only real over-complication you make is this line:
val l_3 = l_2.map(x => (f(x._1, l_1), x._2))
Your f function uses an imperative style to loop over a list to find a key. Any time you find yourself doing this, it's a good indication what you want is a map. By doing the for loop each time you're exploding the computational complexity: a map will allow you to fetch the corresponding value for a given key in O(1). With a map you first convert your list, which is a key-value pair, to a datastructure explicit about supporting the key-value pair relationship.
Thus, the first thing you should do is build your map. Scala provides a really easy way to do this with toMap:
val map_1 = list_1.toMap
Then it is just a matter of 'mapping':
val result = list_2.map { case (key, value) => map_1.getOrElse(key, 0), value) }
This takes each case in your list_2, matches the first value (key) to a key in your map_1, retrieves that value (or the default 0) and puts as the first value in a key-value tuple.
You can do:
val map = l_1.toMap // transform l_1 to a Map[Int, Int]
// for each (a, b) in l_2, retrieve the new value v of a and return (v, b)
val res = l_2.map { case (a, b) => (map.getOrElse(a, 0), b) }
The most idiomatic way is zipping them together and then transforming according to your needs:
(list_1 zip list_2) map { case ((k1, v1), (k2, v2)) => (v1, v2) }
I have the following data:
val RDDApp = sc.parallelize(List("A", "B", "C"))
val RDDUser = sc.parallelize(List(1, 2, 3))
val RDDInstalled = sc.parallelize(List((1, "A"), (1, "B"), (2, "B"), (2, "C"), (3, "A"))).groupByKey
val RDDCart = RDDUser.cartesian(RDDApp)
I want to map this data so that I have an RDD of tuples with (userId, Boolean if the letter is given for user). I thought I found a solution with this:
val results = RDDCart.map (entry =>
(entry._1, RDDInstalled.lookup(entry._1).contains(entry._2))
)
If I call results.first, I get org.apache.spark.SparkException: SPARK-5063. I see the problem with the Action within the Mapping function but do not know how I can work around it so that I get the same result.
Just join and mapValues:
RDDCart.join(RDDInstalled).mapValues{case (x, xs) => xs.toSeq.contains(x)}
I have an Array[(List(String)), Array[(Int, Int)]] like this
((123, 456, 789), (1, 24))
((89, 284), (2, 6))
((125, 173, 88, 222), (3, 4))
I would like to distribute each element of the first list to the second list, like this
(123, (1, 24))
(456, (1, 24))
(789, (1, 24))
(89, (2, 6))
(284, (2, 6))
(125, (3, 4))
(173, (3, 4))
(88, (3, 4))
(22, (3, 4))
Can anyone help me with this? Thank you very much.
For input data defined as follows:
val data = Array((List("123", "456", "789"), (1, 24)), (List("89", "284"), (2, 6)), (List("125", "173", "88", "222"), (3, 4)))
you can use:
data.flatMap { case (l, ii) => l.map((_, ii)) }
which yields:
Array[(String, (Int, Int))] = Array(("123", (1, 24)), ("456", (1, 24)), ("789", (1, 24)), ("89", (2, 6)), ("284", (2, 6)), ("125", (3, 4)), ("173", (3, 4)), ("88", (3, 4)), ("222", (3, 4)))
which I believe matches what you are looking for.
Based on your example, it seemed to me that you were using a single type.
scala> val xs: List[(List[Int], (Int, Int))] =
| List( ( List(123, 456, 789), (1, 24) ),
| ( List(89, 284), (2,6)),
| ( List(125, 173, 88, 222), (3, 4)) )
xs: List[(List[Int], (Int, Int))] = List((List(123, 456, 789), (1,24)),
(List(89, 284),(2,6)),
(List(125, 173, 88, 222),(3,4)))
Then I wrote this function:
scala> def f[A](xs: List[(List[A], (A, A))]): List[(A, (A, A))] =
| for {
| x <- xs
| head <- x._1
| } yield (head, x._2)
f: [A](xs: List[(List[A], (A, A))])List[(A, (A, A))]
Apply f to xs.
scala> f(xs)
res9: List[(Int, (Int, Int))] = List((123,(1,24)), (456,(1,24)),
(789,(1,24)), (89,(2,6)), (284,(2,6)), (125,(3,4)),
(173,(3,4)), (88,(3,4)), (222,(3,4)))
I have 2 arrays Array[(Int, Int)], and Array[(Int, List[String])],
for examples:
(1, 2) and (1, (123, 456, 789))
(2, 8) and (2, (678, 1000))
(3, 4) and (3, (587, 923, 168, 392))
I would like to merge these two arrays into one Array [(Int, List[String], Int)] like this:
(1, (123, 456, 789), 2)
(2, (678, 1000), 8)
(3, (587, 923, 168, 392), 4)
and would like scala still realize the second element is a List[String],
I tried many ways they can combine these 2 maps or arrays, but cannot realize the second element is a List[String], after merging, it treated the second element as Any or Some and cannot traverse it.
I found the solution:
array1.zip(array2).map {
case ((p1,count), (p2,categoryList)) => (p1,categoryList,count)
}