Split String in a Tuple - scala

In a function def a(l: List[(Int, String)]): List[(Int, String)] = ??? I want to split a String into their words in lower case. Commas etc. should be ignored, so I guess I need replaceAll("[^A-Za-z]+", " ").toLowerCase() somewhere? The Int value should stay the same as in the sentence.
Example how it should work:
val example = List((11, "That is great!"), (12, "Wow, impossible!"))
print(a(example))
Result
List((11, "that"),(11, "is"),(11, "great"),(12, "wow"),(12, "impossible"))

You can use flatMap for that:
val example = List((11, "That is great!"), (12, "Wow, impossible!"))
example.flatMap { case (int, str) =>
str
.replaceAll("[^A-Za-z]+", " ")
.toLowerCase()
.split(' ')
.map((int, _))
}
Yields:
res0: List[(Int, String)] = List((11,that), (11,is), (11,great), (12,wow), (12,impossible))

This is strictly equivalent to Yuval's answer, but probably more approachable when starting with Scala
for {
(int, str) <- example
word <- str.replaceAll("[^A-Za-z]+", " ").toLowerCase().split(' ')
} yield (int, word)

Related

groupby scala list of string

I am facing a problem to calculate the sum of elements in Scala having the same title (my key in this case).
Currently my input can be described as:
val listInput1 =
List(
"itemA,CATA,2,4 ",
"itemA,CATA,3,1 ",
"itemB,CATB,4,5",
"itemB,CATB,4,6"
)
val listInput2 =
List(
"itemA,CATA,2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
The required output for lists in input should be
val listoutput1 =
List(
"itemA,CATA,5,5 ",
"itemB,CATB,8,11"
)
val listoutput2 =
List(
"itemA , CATA, 2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
I wrote the following function:
def sumByTitle(listInput: List[String]): List[String] =
listInput.map(_.split(",")).groupBy(_(0)).map {
case (title, features) =>
"%s,%s,%d,%d".format(
title,
features.head.apply(1),
features.map(_(2).toInt).sum,
features.map(_(3).toInt).sum)}.toList
It doesn't give me the expected result as it changes the order of lines.
How can I fix that?
The ListMap is designed to preserve the order of items inserted to the Map.
import collection.immutable.ListMap
def sumByTitle(listInput: List[String]): List[String] = {
val itemPttrn = raw"(.*)(\d+),(\d+)\s*".r
listInput.foldLeft(ListMap.empty[String, (Int,Int)].withDefaultValue((0,0))) {
case (lm, str) =>
val itemPttrn(k, a, b) = str //unsafe
val (x, y) = lm(k)
lm.updated(k, (a.toInt + x, b.toInt + y))
}.toList.map { case (k, (a, b)) => s"$k$a,$b" }
}
This is a bit unsafe in that it will throw if the input string doesn't match the regex pattern.
sumByTitle(listInput1)
//res0: List[String] = List(itemA,CATA,5,5, itemB,CATB,8,11)
sumByTitle(listInput2)
//res1: List[String] = List(itemA,CATA,2,4, itemB,CATB,4,5, itemC,CATC,1,2)
You'll note that the trailing space, if there is one, is not preserved.
If you are just interested in sorting you can simply return the sorted list:
val listInput1 =
List(
"itemA , CATA, 2,4 ",
"itemA , CATA, 3,1 ",
"itemB,CATB,4,5",
"itemB,CATB,4,6"
)
val listInput2 =
List(
"itemA , CATA, 2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
def sumByTitle(listInput: List[String]): List[String] =
listInput.map(_.split(",")).groupBy(_(0)).map {
case (title, features) =>
"%s,%s,%d,%d".format(
title,
features.head.apply(1),
features.map(_(2).trim.toInt).sum,
features.map(_(3).trim.toInt).sum)}.toList.sorted
println("LIST 1")
sumByTitle(listInput1).foreach(println)
println("LIST 2")
sumByTitle(listInput2).foreach(println)
You can find the code on Scastie for you to play around with.
As a side note, you may be interested in separating the serialization and deserialization from your business logic.
Here you can find another Scastie notebook with a relatively naive approach for a first step towards separating concerns.
def foldByTitle(listInput: List[String]): List[Item] =
listInput.map(Item.parseItem).foldLeft(List.empty[Item])(sumByTitle)
val sumByTitle: (List[Item], Item) => List[Item] = (acc, curr) =>
acc.find(_.name == curr.name).fold(curr :: acc) { i =>
acc.filterNot(_.name == curr.name) :+ i.copy(num1 = i.num1 + curr.num1, num2 = i.num2 + curr.num2)
}
case class Item(name: String, category: String, num1: Int, num2: Int)
object Item {
def parseItem(serializedItem: String): Item = {
val itemTokens = serializedItem.split(",").map(_.trim)
Item(itemTokens.head, itemTokens(1), itemTokens(2).toInt, itemTokens(3).toInt)
}
}
This way the initial order of the elements to kept.

How to let mkString skip null in scala?

scala> Seq("abc", null).mkString(" ")
res0: String = abc null
but I want to get "abc" only
Is there a scala way to skip nulls?
scala> val seq = Seq("abc", null, "def")
seq: Seq[String] = List(abc, null, def)
scala> seq.flatMap(Option[String]).mkString(" ")
res0: String = abc def
There's always Seq("abc", null).filter(_ != null).mkString(" ")
Combination of Rex's answer and Eric's first comment:
Seq("abc", null).map(Option(_)).collect{case Some(x) => x}.mkString(" ")
The first map wraps the values resulting in Seq[Option[String]]. collect then essentially does a filter and map, discarding the None values and leaving only the unwrapped Some values.

Scala map and/or groupby functions

I am new to Scala and I am trying to figure out some scala syntax.
So I have a list of strings.
wordList: List[String] = List("this", "is", "a", "test")
I have a function that returns a list of pairs that contains consonants and vowels counts per word:
def countFunction(words: List[String]): List[(String, Int)]
So, for example:
countFunction(List("test")) => List(('Consonants', 3), ('Vowels', 1))
I now want to take a list of words and group them by count signatures:
def mapFunction(words: List[String]): Map[List[(String, Int)], List[String]]
//using wordList from above
mapFunction(wordList) => List(('Consonants', 3), ('Vowels', 1)) -> Seq("this", "test")
List(('Consonants', 1), ('Vowels', 1)) -> Seq("is")
List(('Consonants', 0), ('Vowels', 1)) -> Seq("a")
I'm thinking I need to use GroupBy to do this:
def mapFunction(words: List[String]): Map[List[(String, Int)], List[String]] = {
words.groupBy(F: (A) => K)
}
I've read the scala api for Map.GroupBy and see that F represents discriminator function and K is the type of keys you want returned. So I tried this:
words.groupBy(countFunction => List[(String, Int)]
However, scala doesn't like this syntax. I tried looking up some examples for groupBy and nothing seems to help me with my use case. Any ideas?
Based on your description, your count function should take a word instead of a list of words. I would have defined it like this:
def countFunction(words: String): List[(String, Int)]
If you do that you should be able to call words.groupBy(countFunction), which is the same as:
words.groupBy(word => countFunction(word))
If you cannot change the signature of countFunction, then you should be able to call group by like this:
words.groupBy(word => countFunction(List(word)))
You shouldn't put the return type of the function in the call. The compiler can figure this out itself. You should just call it like this:
words.groupBy(countFunction)
If that doesn't work, please post your countFunction implementation.
Update:
I tested it in the REPL and this works (note that my countFunction has a slightly different signature from yours):
scala> def isVowel(c: Char) = "aeiou".contains(c)
isVowel: (c: Char)Boolean
scala> def isConsonant(c: Char) = ! isVowel(c)
isConsonant: (c: Char)Boolean
scala> def countFunction(s: String) = (('Consonants, s count isConsonant), ('Vowels, s count isVowel))
countFunction: (s: String)((Symbol, Int), (Symbol, Int))
scala> List("this", "is", "a", "test").groupBy(countFunction)
res1: scala.collection.immutable.Map[((Symbol, Int), (Symbol, Int)),List[java.lang.String]] = Map((('Consonants,0),('Vowels,1)) -> List(a), (('Consonants,1),('Vowels,1)) -> List(is), (('Consonants,3),('Vowels,1)) -> List(this, test))
You can include the type of the function passed to groupBy, but like I said you don't need it. If you want to pass it in you do it like this:
words.groupBy(countFunction: String => ((Symbol, Int), (Symbol, Int)))

Scala reverse string

I'm a newbie to scala, I'm just writing a simple function to reverse a given string:
def reverse(s: String) : String
for(i <- s.length - 1 to 0) yield s(i)
the yield gives back a scala.collection.immutable.IndexedSeq[Char], and can not convert it to a String. (or is it something else?)
how do i write this function ?
Note that there is already defined function:
scala> val x = "scala is awesome"
x: java.lang.String = scala is awesome
scala> x.reverse
res1: String = emosewa si alacs
But if you want to do that by yourself:
def reverse(s: String) : String =
(for(i <- s.length - 1 to 0 by -1) yield s(i)).mkString
or (sometimes it is better to use until, but probably not in that case)
def reverse(s: String) : String =
(for(i <- s.length until 0 by -1) yield s(i-1)).mkString
Also, note that if you use reversed counting (from bigger one to less one value) you should specify negative step or you will get an empty set:
scala> for(i <- x.length until 0) yield i
res2: scala.collection.immutable.IndexedSeq[Int] = Vector()
scala> for(i <- x.length until 0 by -1) yield i
res3: scala.collection.immutable.IndexedSeq[Int] = Vector(16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
Here's a short version
def reverse(s: String) = ("" /: s)((a, x) => x + a)
edit
: or even shorter, we have the fantastically cryptic
def reverse(s: String) = ("" /: s)(_.+:(_))
but I wouldn't really recommend this...
You could also write this using a recursive approach (throwing this one in just for fun)
def reverse(s: String): String = {
if (s.isEmpty) ""
else reverse(s.tail) + s.head
}
As indicated by om-nom-nom, pay attention to the by -1 (otherwise you are not really iterating and your result will be empty). The other trick you can use is collection.breakOut.
It can also be provided to the for comprehension like this:
def reverse(s: String): String =
(for(i <- s.length - 1 to 0 by -1) yield s(i))(collection.breakOut)
reverse("foo")
// String = oof
The benefit of using breakOut is that it will avoid creating a intermediate structure as in the mkString solution.
note: breakOut is leveraging CanBuildFrom and builders which are part of the foundation of the redesigned collection library introduced in scala 2.8.0
All the above answers are correct and here's my take:
scala> val reverseString = (str: String) => str.foldLeft("")((accumulator, nextChar) => nextChar + accumulator)
reverseString: String => java.lang.String = <function1>
scala> reverseString.apply("qwerty")
res0: java.lang.String = ytrewq
def rev(s: String): String = {
val str = s.toList
def f(s: List[Char], acc: List[Char]): List[Char] = s match {
case Nil => acc
case x :: xs => f(xs, x :: acc)
}
f(str, Nil).mkString
}
Here is my version of reversing a string.
scala> val sentence = "apple"
sentence: String = apple
scala> sentence.map(x => x.toString).reduce((x, y) => (y + x))
res9: String = elppa

Scala for-comprehension syntax

In the following code, inside the for comprehension, I can refer to the string and index using a tuple dereference:
val strings = List("a", "b", "c")
for (stringWithIndex <- strings.zipWithIndex) {
// Do something with stringWithIndex._1 (string) and stringWithIndex._2 (index)
}
Is there any way in the Scala syntax to have the stringWithIndex split into the parts (string and index) within the for comprehension header, so that readers of the code do not have to wonder at the values of stringWithIndex._1 and stringWithIndex._2?
I tried the following, but it would not compile:
for (case (string, index) <- strings.zipWithIndex) {
// Do something with string and index
}
You almost got it:
scala> val strings = List("a", "b", "c")
strings: List[java.lang.String] = List(a, b, c)
scala> for ( (string, index) <- strings.zipWithIndex)
| { println("str: "+string + " idx: "+index) }
str: a idx: 0
str: b idx: 1
str: c idx: 2
See, no need for case keyword.
strings.zipWithIndex.foreach{case(x,y) => println(x,y)}
res:
(a,0)
(b,1)
(c,2)