Split string with default value in scala - scala

I have a list of strings as shown below, which lists fruits and the cost associated with each. In case of no value, it is assumed to be 5:
val stringList: List[String] = List("apples 20", "oranges", "pears 10")
Now I want to split the string to get tuples of the fruit and the cost. What is the scala way of doing this?
stringList.map(query => query.split(" "))
is not what I want.
I found this which is similar. What is the correct Scala way of doing this?

You could use a regular expression and pattern matching:
val Pat = """(.+)\s(\d+)""".r // word followed by whitespace followed by number
def extract(in: String): (String, Int) = in match {
case Pat(name, price) => (name, price.toInt)
case _ => (in, 5)
}
val stringList: List[String] = List("apples 20", "oranges", "pears 10")
stringList.map(extract) // List((apples,20), (oranges,5), (pears,10))
You have two capturing groups in the pattern. These will be extracted as strings, so you have to convert explicitly using .toInt.

You almost have it:
stringList.map(query => query.split(" "))
is what you want, just add another map to it to change lists to tuples:
.map { list => list.head -> list.lift(1).getOrElse("5").toInt }
or this instead, if you prefer:
.collect {
case Seq(a, b) => a -> b.toInt
case Seq(a) => a -> 5
}
(.collect will silently ignore the occurrences, where there are less than one or more than two elements in the list. You can replace it with .map if you would prefer it to through an error in such cases).

Related

Scala Map from Test File

Looking to create a scala map from a test file. A sample of the text file (a few lines of it) can be seen below:
Alabama (9),Democratic:849624,Republican:1441170,Libertarian:25176,Others:7312
Alaska (3),Democratic:153778,Republican:189951,Libertarian:8897,Others:6904
Arizona (11),Democratic:1672143,Republican:1661686,Libertarian:51465,Green:1557,Others:475
I have been given the map buffer as follows:
var mapBuffer: Map[String, List[(String, Int)]] = Map()
Note the party values are separated by a colon.
I am trying to read the file contents and store the data in a map structure where each line of the file is used to construct a map entry with the date as the key, and a list of tuples as the value. The type of the structure should be Map[String, List[(String,Int)]].
Essentially just trying to create a map of each line from the file but I can't quite get it right. I tried the below but with not luck - I think that 'val lines' should be an array rather than an iterator.
val stream : InputStream = getClass.getResourceAsStream("")
val lines: Iterator[String] = scala.io.Source.fromInputStream(stream).getLines
var map: Map[String, List[(String, Int)]] = lines
.map(_.split(","))
.map(line => (line(0).toString, line(1).toList))
.toMap
This appears to do the job. (Scala 2.13.x)
val stateVotes =
util.Using(io.Source.fromFile("votes.txt")){
val PartyVotes = "([^:]+):(\\d+)".r
_.getLines()
.map(_.split(",").toList)
.toList
.groupMapReduce(_.head)(_.tail.collect{
case PartyVotes(p,v) => (p,v.toInt)})(_ ++ _)
} //file is auto-closed
//stateVotes: Try[Map[String,List[(String, Int)]]] = Success(
// Map(Alabama (9) -> List((Democratic,849624), (Republican,1441170), (Libertarian,25176), (Others,7312))
// , Arizona (11) -> List((Democratic,1672143), (Republican,1661686), (Libertarian,51465), (Green,1557), (Others,475))
// , Alaska (3) -> List((Democratic,153778), (Republican,189951), (Libertarian,8897), (Others,6904))))
In this case the number following the state name is preserved. That can be changed.
No, iterator is fine (better than list actually),
you just need to split the values too to create those tuples.
lines
.map(_.split(","))
.map { case l =>
l.head -> l.tail.toList.map(_.split(":"))
.collect { case Seq(a,b) => a -> b.toInt }
}
.toMap
An alternative that looks a little bit more aesthetic to my eye is converting to map early, and then using mapValues (I personally much
prefer short lambdas). The downside is mapValues is lazy, so you end up
having to do .toMap twice to force it in the end:
lines
.map(_.split(","))
.map { case l => l.head -> l.tail.toList }
.toMap
.mapValues(_.split(":"))
.mapValues(_.collect { case Seq(a,b) => a -> b.toInt })
.toMap

Decompose Scala sequence into member values

I'm looking for an elegant way of accessing two items in a Seq at the same time. I've checked earlier in my code that the Seq will have exactly two items. Now I would like to be able to give them names, so they have meaning.
records
.sliding(2) // makes sure we get `Seq` with two items
.map(recs => {
// Something like this...
val (former, latter) = recs
})
Is there an elegant and/or idiomatic way to achieve this in Scala?
I'm not sure if it is any more elegant, but you can also unpick the sequence like this:
val former +: latter +: _ = recs
You can access the elements by their index:
map { recs => {
val (former, latter) = recs(0), recs(1)
}}
You can use pattern matching to decompose the structure of your list:
val records = List("first", "second")
records match {
case first +: second +: Nil => println(s"1: $first, 2: $second")
case _ => // Won't happen (you can omit this)
}
will output
1: first, 2: second
The result of sliding is a List. Using a pattern match, you can give name to these elements like this:
map{ case List(former, latter) =>
...
}
Note that since it's a pattern match, you need to use {} instead of ().
For a records of known types (for example, Int):
records.sliding (2).map (_ match {
case List (former:Int, latter:Int) => former + latter })
Note, that this will unify element (0, 1), then (1, 2), (2, 3) ... and so on. To combine pairwise, use sliding (2, 2):
val pairs = records.sliding (2, 2).map (_ match {
case List (former: Int, latter: Int) => former + latter
case List (one: Int) => one
}).toList
Note, that you now need an extra case for just one element, if the records size is odd.

foreach loop in scala

In scala foreach loop if I have list
val a = List("a","b","c","d")
I can print them without a pattern matching like this
a.foreach(c => println(c))
But, if I have a tuple like this
val v = Vector((1,9), (2,8), (3,7), (4,6), (5,5))
why should I have to use
v.foreach{ case(i,j) => println(i, j) }
a pattern matching case
{ brackets
Please explain what happens when the two foreach loops are executed.
You don't have to, you choose to. The problem is that the current Scala compiler doesn't deconstruct tuples, you can do:
v.foreach(tup => println(tup._1, tup._2))
But, if you want to be able to refer to each element on it's own with a fresh variable name, you have to resort to a partial function with pattern matching which can deconstruct the tuple.
This is what the compiler does when you use case like that:
def main(args: Array[String]): Unit = {
val v: List[(Int, Int)] = scala.collection.immutable.List.apply[(Int, Int)](scala.Tuple2.apply[Int, Int](1, 2), scala.Tuple2.apply[Int, Int](2, 3));
v.foreach[Unit](((x0$1: (Int, Int)) => x0$1 match {
case (_1: Int, _2: Int)(Int, Int)((i # _), (j # _)) => scala.Predef.println(scala.Tuple2.apply[Int, Int](i, j))
}))
}
You see that it pattern matches on unnamed x0$1 and puts _1 and _2 inside i and j, respectively.
According to http://alvinalexander.com/scala/iterating-scala-lists-foreach-for-comprehension:
val names = Vector("Bob", "Fred", "Joe", "Julia", "Kim")
for (name <- names)
println(name)
To answer #2: You can only use case in braces. A more complete answer about braces is located here.
Vector is working a bit differently, you using function literals using case...
In Scala, we using brackets{} which accept case...
{
case pattern1 => "xxx"
case pattern2 => "yyy"
}
So, in this case, we using it with foreach loop...
Print all values using the below pattern then:
val nums = Vector((1,9), (2,8), (3,7), (4,6), (5,5))
nums.foreach {
case(key, value) => println(s"key: $key, value: $value")
}
Also you can check other loops like for loop as well if you think this is not something which you are comfortable with...

How to unwrap optional tuple of options to tuple of options in Scala?

I have a list of Person and want to retrieve a person by its id
val person = personL.find(_.id.equals(tempId))
After that, I want to get as a tuple the first and last element of a list which is an attribute of Person.
val marks: Option[(Option[String], Option[String])] = person.map { p =>
val marks = p.school.marks
(marks.headOption.map(_.midtermMark), marks.lastOption.map(_.finalMark))
}
This work's fine but now I want to transform the Option[(Option[String], Option[String])] to a simple (Option[String], Option[String]). Is it somehow possible to do this on-the-fly by using the previous map?
I suppose:
person.map{...}.getOrElse((None, None))
(None, None) is a default value here in case if your option of tuple is empty
You are, probably, looking for fold:
personL
.collectFirst {
case Person(`tempId`, _, .., school) => school.marks
}.fold[Option[String], Option[String]](None -> None) { marks =>
marks.headOption.map(_.midtermMark) -> marks.lastOption.map(_.finalMark)
}

acces tuple inside a tuple for anonymous map job in Spark

This post is essentially about how to build joint and marginal histograms from a (String, String) RDD. I posted the code that I eventually used below as the answer.
I have an RDD that contains a set of tuples of type (String,String) and since they aren't unique I want to get a look at how many times each String, String combination occurs so I use countByValue like so
val PairCount = Pairs.countByValue().toSeq
which gives me a tuple as output like this ((String,String),Long) where long is the number of times that the (String, String) tuple appeared
These Strings can be repeated in different combinations and I essentially want to run word count on this PairCount variable so I tried something like this to start:
PairCount.map(x => (x._1._1, x._2))
But the output the this spits out is String1->1, String2->1, String3->1, etc.
How do I output a key value pair from a map job in this case where the key is going to be one of the String values from the inner tuple, and the value is going to be the Long value from the outter tuple?
Update:
#vitalii gets me almost there. the answer gets me to a Seq[(String,Long)], but what I really need is to turn that into a map so that I can run reduceByKey it afterwards. when I run
PairCount.flatMap{case((x,y),n) => Seq[x->n]}.toMap
for each unique x I get x->1
for example the above line of code generates mom->1 dad->1 even if the tuples out of the flatMap included (mom,30) (dad,59) (mom,2) (dad,14) in which case I would expect toMap to provide mom->30, dad->59 mom->2 dad->14. However, I'm new to scala so I might be misinterpreting the functionality.
how can I get the Tuple2 sequence converted to a map so that I can reduce on the map keys?
If I correctly understand question, you need flatMap:
val pairCountRDD = pairs.countByValue() // RDD[((String, String), Int)]
val res : RDD[(String, Int)] = pairCountRDD.flatMap { case ((s1, s2), n) =>
Seq(s1 -> n, s2 -> n)
}
Update: I didn't quiet understand what your final goal is, but here's a few more examples that may help you, btw code above is incorrect, I have missed the fact that countByValue returns map, and not RDD:
val pairs = sc.parallelize(
List(
"mom"-> "dad", "dad" -> "granny", "foo" -> "bar", "foo" -> "baz", "foo" -> "foo"
)
)
// don't use countByValue, if pairs is large you will run out of memmory
val pairCountRDD = pairs.map(x => (x, 1)).reduceByKey(_ + _)
val wordCount = pairs.flatMap { case (a,b) => Seq(a -> 1, b ->1)}.reduceByKey(_ + _)
wordCount.take(10)
// count in how many pairs each word occur, keys and values:
val wordPairCount = pairs.flatMap { case (a,b) =>
if (a == b) {
Seq(a->1)
} else {
Seq(a -> 1, b ->1)
}
}.reduceByKey(_ + _)
wordPairCount.take(10)
to get the histograms for the (String,String) RDD I used this code.
val Hist_X = histogram.map(x => (x._1-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_Y = histogram.map(x => (x._2-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_XY = histogram.map(x => (x-> 1.0)).reduceByKey(_+_)
where histogram was the (String,String) RDD