How to calculate length of string in a tuple in scala - scala

Given a list of tuples, where the 1st element of the tuple is an integer and the second element is a string,
scala> val tuple2 : List[(Int,String)] = List((1,"apple"),(2,"ball"),(3,"cat"),(4,"doll"),(5,"eggs"))
tuple2: List[(Int, String)] = List((1,apple), (2,ball), (3,cat), (4,doll), (5,eggs))
I want to print the numbers where the corresponding string length is 4.
Can this be done in one line ?

you need .collect which is filter+map
given your input,
scala> val input : List[(Int,String)] = List((1,"apple"),(2,"ball"),(3,"cat"),(4,"doll"),(5,"eggs"))
input: List[(Int, String)] = List((1,apple), (2,ball), (3,cat), (4,doll), (5,eggs))
filter those of length 4,
scala> input.collect { case(number, string) if string.length == 4 => number}
res2: List[Int] = List(2, 4, 5)
alternative solution using filter + map,
scala> input.filter { case(number, string) => string.length == 4 }
.map { case (number, string) => number}
res4: List[Int] = List(2, 4, 5)

you filter and print as below
tuple2.filter(_._2.length == 4).foreach(x => println(x._1))
You should have output as
2
4
5

I like #prayagupd answer using collect. But foldLeft is the one of my favourite function in Scala! you can use foldLeft:
scala> val input : List[(Int,String)] = List((1,"apple"),(2,"ball"),(3,"cat"),(4,"doll"),(5,"eggs"))
input: List[(Int, String)] = List((1,apple), (2,ball), (3,cat), (4,doll), (5,eggs))
scala> input.foldLeft(List.empty[Int]){case (acc, (n,str)) => if(str.length ==4) acc :+ n else acc}
res3: List[Int] = List(2, 4, 5)

Using a for comprehension as follows,
for ((i,s) <- tuple2 if s.size == 4) yield i
which for the example above delivers
List(2, 4, 5)
Note we pattern match and extract the elements in each tuple and filter by string size. To print a list consider for instance aList.foreach(println).

This will do:
tuple2.filter(_._2.size==4).map(_._1)
In Scala REPL:
scala> val tuple2 : List[(Int,String)] = List((1,"apple"),(2,"ball"),(3,"cat"),(4,"doll"),(5,"eggs"))
tuple2: List[(Int, String)] = List((1,apple), (2,ball), (3,cat), (4,doll), (5,eggs))
scala> tuple2.filter(_._2.size==4).map(_._1)
res261: List[Int] = List(2, 4, 5)
scala>

Related

Ommiting parenthesis in adding elements to List

I'm trying to add element to a List[String] while omitting annoying parenthesis. I tried this:
object Main extends App {
val l = List("fds")
val xs1: List[String] = l.+:("123") // ok
val xs2: List[String] = l +: "123" // compile-error
}
DEMO
Why is omitting parenthesis causing compile-error? These assignments look the same to me. What is the difference?
It's happening because of right associative methods.
scala> val l = List("abc")
l: List[String] = List(abc)
scala> "efg" +: l
res3: List[String] = List(efg, abc)
Read more here What good are right-associative methods in Scala?
Error case
scala> val l = List(1, 2, 3)
l: List[Int] = List(1, 2, 3)
scala> 4 +: l
res1: List[Int] = List(4, 1, 2, 3)
scala> l +: 1
<console>:13: error: value +: is not a member of Int
l +: 1
^
Because +: is right associative. Method +: is getting invoked on Int instead of list
In order to make it work we can explicitly invoke method on list without the special operator syntax
scala> val l = List(1, 2, 3)
l: List[Int] = List(1, 2, 3)
scala> l.+:(1)
res4: List[Int] = List(1, 1, 2, 3)
Above case works because its normal method invocation.

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

Selecting multiple arbitrary columns from Scala array using map()

I'm new to Scala (and Spark). I'm trying to read in a csv file and extract multiple arbitrary columns from the data. The following function does this, but with hard-coded column indices:
def readCSV(filename: String, sc: SparkContext): RDD[String] = {
val input = sc.textFile(filename).map(line => line.split(","))
val out = input.map(csv => csv(2)+","+csv(4)+","+csv(15))
return out
}
Is there a way to use map with an arbitrary number of column indices passed to the function in an array?
If you have a sequence of indices, you could map over it and return the values :
scala> val m = List(List(1,2,3), List(4,5,6))
m: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6))
scala> val indices = List(0,2)
indices: List[Int] = List(0, 2)
// For each inner sequence, get the relevant values
// indices.map(inner) is the same as indices.map(i => inner(i))
scala> m.map(inner => indices.map(inner))
res1: List[List[Int]] = List(List(1, 3), List(4, 6))
// If you want to join all of them use .mkString
scala> m.map(inner => indices.map(inner).mkString(","))
res2: List[String] = List(1,3, 4,6) // that's actually a List containing 2 String

Scala idiom to find first Some of Option from iterator

I have an iterator of Options, and would like to find the first member that is:
Some
and meets a predicate
What's the best idiomatic way to do this?
Also: If an exception is thrown along the way, I'd like to ignore it and move on to the next member
optionIterator find { case Some(x) if predicate(x) => true case _ => false }
As for ignoring exceptions… Is it the predicate that could throw? 'Cause that's not really wise. Nonetheless…
optionIterator find {
case Some(x) => Try(predicate(x)) getOrElse false
case _ => false
}
Adding a coat of best and idiomatic to the paint job:
scala> val vs = (0 to 10) map { case 3 => None case i => Some(i) }
vs: scala.collection.immutable.IndexedSeq[Option[Int]] = Vector(Some(0), Some(1), Some(2), None, Some(4), Some(5), Some(6), Some(7), Some(8), Some(9), Some(10))
scala> def p(i: Int) = if (i % 2 == 0) i > 5 else ???
p: (i: Int)Boolean
scala> import util._
import util._
scala> val it = vs.iterator
it: Iterator[Option[Int]] = non-empty iterator
scala> it collectFirst { case Some(i) if Try(p(i)) getOrElse false => i }
res2: Option[Int] = Some(6)
Getting the first even number over five that doesn't blow up the test.
Assuming that you can wrap your predicate so that any error returns false:
iterator.flatMap(x => x).find(yourSafePredicate)
flatMap takes a collection of collections (which an iterable of Option is as Option and Either are considered collections with a max size of one) and transforms it into a single collection:
scala> for { x <- 1 to 3; y <- 1 to x } yield x :: y :: Nil
res30: IndexedSeq[List[Int]] = Vector(List(1, 1), List(2, 1), List(2, 2), List(3, 1), List(3, 2), List(3, 3))
scala> res30.flatMap(x => x)
res31: IndexedSeq[Int] = Vector(1, 1, 2, 1, 2, 2, 3, 1, 3, 2, 3, 3)
find returns the first entry in your iterable that matches a predicate as an Option or None if there is no match:
scala> (1 to 10).find(_ > 3)
res0: Option[Int] = Some(4)
scala> (1 to 10).find(_ == 11)
res1: Option[Int] = None
Some sample data
scala> val l = Seq(Some(1),None,Some(-7),Some(8))
l: Seq[Option[Int]] = List(Some(1), None, Some(-7), Some(8))
Using flatMap on a Seq of Options will produce a Seq of defined values, all the None's will be discarded
scala> l.flatMap(a => a)
res0: Seq[Int] = List(1, -7, 8)
Then use find on the sequence - you will get the first value, that satisfies the predicate. Pay attention, that found value is wrapped as Option, cause find should be able to return valid value (None) value in case of "not found" situation.
scala> l.flatMap(a => a).find(_ < 0)
res1: Option[Int] = Some(-7)
As far as I know it is "OK" way for the Scala.
Might be more idiomatic way is to use collect / collectFirst on the Seq ...
scala> l.collectFirst { case a#Some(x) if x < 0 => a }
res2: Option[Some[Int]] = Some(Some(-7))
Pay attention that here we have Some(Some(-7)) because the collectFind should have chance to produce "not found" value, so here 1st Some - from collectFirst, the 2nd Some - from the source elements of Seq of Option's.
You can flatten the Some(Some(-7)) if you need the values in your hand.
scala> l.collectFirst({ case a#Some(x) if x < 0 => a }).flatten
res3: Option[Int] = Some(-7)
If nothing found - you will have the None
scala> l.collectFirst({ case a#Some(x) if x < -10 => a }).flatten
res9: Option[Int] = None

How can I find the index of the maximum value in a List in Scala?

For a Scala List[Int] I can call the method max to find the maximum element value.
How can I find the index of the maximum element?
This is what I am doing now:
val max = list.max
val index = list.indexOf(max)
One way to do this is to zip the list with its indices, find the resulting pair with the largest first element, and return the second element of that pair:
scala> List(0, 43, 1, 34, 10).zipWithIndex.maxBy(_._1)._2
res0: Int = 1
This isn't the most efficient way to solve the problem, but it's idiomatic and clear.
Since Seq is a function in Scala, the following code works:
list.indices.maxBy(list)
even easier to read would be:
val g = List(0, 43, 1, 34, 10)
val g_index=g.indexOf(g.max)
def maxIndex[ T <% Ordered[T] ] (list : List[T]) : Option[Int] = list match {
case Nil => None
case head::tail => Some(
tail.foldLeft((0, head, 1)){
case ((indexOfMaximum, maximum, index), elem) =>
if(elem > maximum) (index, elem, index + 1)
else (indexOfMaximum, maximum, index + 1)
}._1
)
} //> maxIndex: [T](list: List[T])(implicit evidence$2: T => Ordered[T])Option[Int]
maxIndex(Nil) //> res0: Option[Int] = None
maxIndex(List(1,2,3,4,3)) //> res1: Option[Int] = Some(3)
maxIndex(List("a","x","c","d","e")) //> res2: Option[Int] = Some(1)
maxIndex(Nil).getOrElse(-1) //> res3: Int = -1
maxIndex(List(1,2,3,4,3)).getOrElse(-1) //> res4: Int = 3
maxIndex(List(1,2,2,1)).getOrElse(-1) //> res5: Int = 1
In case there are multiple maximums, it returns the first one's index.
Pros:You can use this with multiple types, it goes through the list only once, you can supply a default index instead of getting exception for empty lists.
Cons:Maybe you prefer exceptions :) Not a one-liner.
I think most of the solutions presented here go thru the list twice (or average 1.5 times) -- Once for max and the other for the max position. Perhaps a lot of focus is on what looks pretty?
In order to go thru a non empty list just once, the following can be tried:
list.foldLeft((0, Int.MinValue, -1)) {
case ((i, max, maxloc), v) =>
if (v > max) (i + 1, v, i)
else (i + 1, max, maxloc)}._3
Pimp my library! :)
class AwesomeList(list: List[Int]) {
def getMaxIndex: Int = {
val max = list.max
list.indexOf(max)
}
}
implicit def makeAwesomeList(xs: List[Int]) = new AwesomeList(xs)
//> makeAwesomeList: (xs: List[Int])scalaconsole.scratchie1.AwesomeList
//Now we can do this:
List(4,2,7,1,5,6) getMaxIndex //> res0: Int = 2
//And also this:
val myList = List(4,2,7,1,5,6) //> myList : List[Int] = List(4, 2, 7, 1, 5, 6)
myList getMaxIndex //> res1: Int = 2
//Regular list methods also work
myList filter (_%2==0) //> res2: List[Int] = List(4, 2, 6)
More details about this pattern here: http://www.artima.com/weblogs/viewpost.jsp?thread=179766