Scala - Use predicate function to summarize list of strings - scala

I need to write a function to analyze some text files.
For that, there should be a function that splits the file via a predicate into sublists. It should only get the values after the first time the predicate evaluates to True and afterwards start a new sublist after the predicate was True again.
For Example:
List('ignore','these','words','x','this','is','first','x','this','is','second')
with predicate
x=>x.equals('x')
should produce
List(List('this','is','first'),List('this','is','second'))
I've already done the reading of the file into a List[String] and tried to use foldLeft with a case statement to iterate over the List.
words.foldLeft(List[List[String]]()) {
case (Nil, s) => List(List(s))
case (result, "x") => result :+ List()
case (result, s) => result.dropRight(1) :+ (result.last :+ s)
}
There are 2 problems with this though and I can't figure them out:
This does not ignore the words before the first time the predicate
evaluates to True
I can't use an arbitrary predicate function
If anyone could tell me what I have to do to fix my problems it would be highly appreciated.

I modified your example a little bit:
def foldWithPredicate[A](predicate: A => Boolean)(l: List[A]) =
l.foldLeft[List[List[A]]](Nil){
case (acc, e) if predicate(e) => acc :+ Nil //if predicate passed add new list at the end
case (Nil, _) => Nil //empty list means we need to ignore elements
case (xs :+ x, e) => xs :+ (x :+ e) //append an element to the last list
}
val l = List("ignore","these","words","x","this","is","first","x","this","is","second")
val predicate: String => Boolean = _.equals("x")
foldWithPredicate(predicate)(l) // List(List(this, is, first), List(this, is, second))
There's one problem performance related to your approach: appending is very slow on immutable lists.
It might be faster to prepend elements on the list, but then, of course, all lists will have elements in reversed order (but they could be reversed at the end).
def foldWithPredicate2[A](predicate: A => Boolean)(l: List[A]) =
l.foldLeft[List[List[A]]](Nil){
case (acc, e) if predicate(e) => Nil :: acc
case (Nil, _) => Nil
case (x :: xs, e) => (e :: x) :: xs
}.map(_.reverse).reverse

An alternative approach is to use span to split the items into the next sublist and the rest in a single call. The following code assumes Scala 2.13 for List.unfold:
def splitIntoBlocks[T](items: List[T])(startsNewBlock: T => Boolean): List[List[T]] = {
def splitBlock(items: List[T]): (List[T], List[T]) = items.span(!startsNewBlock(_))
List.unfold(splitBlock(items)._2) {
case blockIndicator :: rest => Some(splitBlock(rest))
case _ => None
}
}
And the usage:
scala> splitIntoBlocks(List(
"ignore", "these", "words",
"x", "this", "is", "first",
"x", "this", "is", "second")
)(_ == "x")
res0: List[List[String]] = List(List(this, is, first), List(this, is, second))

Related

Replacing ._1 and .head with pattern matching in Scala

def checkPeq[A,B](list1: List[(A, List[B])])( P: (A,B) => Boolean): List[Boolean] = {
def helper[A,B](list2: List[(A, List[B])], list3: List[B], acc1: Boolean, acc2: List[Boolean])(leq:(A,B) => Boolean): List[Boolean] = {
list2 match {
case h1::t1 => {
list3 match {
case Nil if t1!=Nil => helper(t1, t1.head._2, true, acc1::acc2)(leq)
case Nil => (acc1::acc2).reverse
case h2::t2 if(leq(h1._1, h2)) => helper(list2, t2, acc1, acc2)(leq)
case h2::t2 => helper(list2, t2, false, acc2)(leq)
}
}
}
}
helper(list1, list1.head._2, true, List())(P)
}
val list1 = List((1,List(1,2,3)), (2, List(2,3)), (3, List(3,2)), (4, List(4,5,6,3)))
println(checkPeq(list1)(_<=_))
I have a tail recursive function which returns List[Boolean], in this case List(true, true, false, false). It's working, but the problem is I need to do it without ._ or .head and preferably without indexes(bcz I can easily replace .head in this function with (0)). I need to do it with pattern matching and I don't have an idea how to start. I also got a tip from my teacher replacing it should be pretty fast. I'd appreciate any tips on how to deal with the problem.
One solution is to simply pattern match both the outer A list and the inner B list at the same time, i.e. as part of a single pattern.
def checkPeq[A,B](in: List[(A,List[B])])(pred: (A,B) => Boolean): List[Boolean] = {
#annotation.tailrec
def loop(aLst :List[(A,List[B])], acc :List[Boolean]) :List[Boolean] =
aLst match {
case Nil => acc.reverse //A list done
case (_,Nil) :: aTl => loop(aTl, true::acc) //B list done
case (a,b::bTl) :: aTl => //test a and b
if (pred(a,b)) loop((a,bTl) :: aTl, acc)
else loop(aTl, false::acc)
}
loop(in, List.empty[Boolean])
}
Here are missing pieces that should help you solve the rest of the problem:
Pattern matching a list
val l = List(2,3)
l match {
case Nil => "the list is empty"
case head :: Nil => "the least has one element"
case head :: tail => "thie list has a head element and a tail of at least one element"
}
Pattern matching a tuple
val t = (75, "picard")
t match {
case (age, name) => s"$name is $age years old"
}
Pattern matching a list of tuples
val lt = List((75, "picard"))
lt match {
case Nil => "the list is empty"
case (name, age) :: Nil => "the list has one tuple"
case (name, age) :: tail => "the list has head tuple and a tail of at least another tuple"
}
Pattern matching a tuple of list of tuples
val lt = List((75, "picard"))
val ct = List((150, "Data"))
(lt, ct) match {
case (Nil, Nil) => "tuple of two empty lists"
case ((name, age) :: Nil, Nil) => "tuple of a list with one tuple and another empty list"
case (Nil, (name, age) :: Nil) => "tuple of an empty list and another list with one tuple"
case ((name, age) :: tail, Nil) => "tuple of list with head tuple and a tail of at least another tuple, and another empty list"
case _ => "and so on"
}
Note how patterns can be composed.

Merging list of tuples in scala based on key

I have a list of tuples look like this:
Seq("ptxt"->"how","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
On the keys, merge ptxt with all the list that will come after it.
e.g.
create a new seq look like this :
Seq("how you doing", "whats up", "this is cool")
You could fold your Seq with foldLeft:
val s = Seq("ptxt"->"how ","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
val r: Seq[String] = s.foldLeft(List[String]()) {
case (xs, ("ptxt", s)) => s :: xs
case (x :: xs, ("list", s)) => (x + s) :: xs
}.reverse
If you don't care about an order you can omit reverse.
Function foldLeft takes two arguments first is the initial value and the second one is a function taking two arguments: the previous result and element of the sequence. Result of this method is then fed the next function call as the first argument.
For example for numbers foldLeft, would just create a sum of all elements starting from left.
List(5, 4, 8, 6, 2).foldLeft(0) { (result, i) =>
result + i
} // 25
For our case, we start with an empty list. Then we provide function, which handles two cases using pattern matching.
Case when the key is "ptxt". In this case, we just prepend the value to list.
case (xs, ("ptxt", s)) => s :: xs
Case when the key is "list". Here we take the first string from the list (using pattern matching) and then concatenate value to it, after that we put it back with the rest of the list.
case (x :: xs, ("list", s)) => (x + s) :: xs
At the end since we were prepending element, we need to revert our list. Why we were prepending, not appending? Because append on the immutable list is O(n) and prepend is O(1), so it's more efficient.
Here another solution:
val data = Seq("ptxt"->"how","list"->"you doing","ptxt"->"whats", "list" -> "up","ptxt"-> "this ", "list"->"is cool")
First group Keys and Values:
val grouped = s.groupBy(_._1)
.map{case (k, l) => k -> l.map{case (_, v) => v.trim}}
// > Map(list -> List(you doing, up, is cool), ptxt -> List(how, whats, this))
Then zip and concatenate the two values:
grouped("ptxt").zip(grouped("list"))
.map{case (a, b) => s"$a $b"}
// > List(how you doing, whats up, this is cool)
Disclaimer: This only works if the there is always key, value, key, value,.. in the list - I had to adjust the input data.
If you change Seq for List, you can solve that with a simple tail-recursive function.
(The code uses Scala 2.13, but can be rewritten to use older Scala versions if needed)
def mergeByKey[K](list: List[(K, String)]): List[String] = {
#annotation.tailrec
def loop(remaining: List[(K, String)], acc: Map[K, StringBuilder]): List[String] =
remaining match {
case Nil =>
acc.valuesIterator.map(_.result()).toList
case (key, value) :: tail =>
loop(
remaining = tail,
acc.updatedWith(key) {
case None => Some(new StringBuilder(value))
case Some(oldValue) => Some(oldValue.append(value))
}
)
}
loop(remaining = list, acc = Map.empty)
}
val data = List("ptxt"->"how","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
mergeByKey(data)
// res: List[String] = List("howwhats upthis ", "you doingis cool")
Or a one liner using groupMap.
(inspired on pme's answer)
data.groupMap(_._1)(_._2).view.mapValues(_.mkString).valuesIterator.toList
Adding another answer since I don't have enough reputation points for adding a comment. just an improvment on Krzysztof Atłasik's answer. to compensate for the case where the Seq starts with a "list" you might want to add another case as:
case (xs,("list", s)) if xs.isEmpty=>xs
So the final code could be something like:
val s = Seq("list"->"how ","list"->"you doing","ptxt"->"whats up","ptxt"-> "this ","list"->"is ","list"->"cool")
val r: Seq[String] = s.foldLeft(List[String]()) {
case (xs,("list", s)) if xs.isEmpty=>xs
case (xs, ("ptxt", s)) => s :: xs
case (x :: xs, ("list", s)) => (x + s) :: xs
}.reverse

Compress a Given Text of String in Scala

I have been trying to compress a String. Given a String like this:
AAABBCAADEEFF, I would need to compress it like 3A2B1C2A1D2E2F
I was able to come up with a tail recursive implementation:
#scala.annotation.tailrec
def compress(str: List[Char], current: Seq[Char], acc: Map[Int, String]): String = str match {
case Nil =>
if (current.nonEmpty)
s"${acc.values.mkString("")}${current.length}${current.head}"
else
s"${acc.values.mkString("")}"
case List(x) if current.contains(x) =>
val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length + 1}${current.head}")
compress(List.empty[Char], Seq.empty[Char], newMap)
case x :: xs if current.isEmpty =>
compress(xs, Seq(x), acc)
case x :: xs if !current.contains(x) =>
if (acc.nonEmpty) {
val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length}${current.head}")
compress(xs, Seq(x), newMap)
} else {
compress(xs, Seq(x), acc ++ Map(1 -> s"${current.length}${current.head}"))
}
case x :: xs =>
compress(xs, current :+ x, acc)
}
// Produces 2F3A2B1C2A instead of 3A2B1C2A1D2E2F
compress("AAABBCAADEEFF".toList, Seq.empty[Char], Map.empty[Int, String])
It fails however for the given case! Not sure what edge scenario I'm missing! Any help?
So what I'm actually doing is, going over the sequence of characters, collecting identical ones into a new Sequence and as long as the new character in the original String input (the first param in the compress method) is found in the current (the second parameter in the compress method), I keep collecting it.
As soon as it is not the case, I empty the current sequence, count and push the collected elements into the Map! It fails for some edge cases that I'm not able to make out!
I came up with this solution:
def compress(word: List[Char]): List[(Char, Int)] =
word.map((_, 1)).foldRight(Nil: List[(Char, Int)])((e, acc) =>
acc match {
case Nil => List(e)
case ((c, i)::rest) => if (c == e._1) (c, i + 1)::rest else e::acc
})
Basically, it's a map followed by a right fold.
Took inspiration from the #nicodp code
def encode(word: String): String =
word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
acc match {
case Nil => (e, 1) :: Nil
case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
case xs => (e, 1) :: xs
}
}.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
First our intermediate result will be List[(Char, Int)]. List of tuples of chars each char will be accompanied by its count.
Now lets start going through the list one char at once using the Great! foldLeft
We will accumulate the result in the acc variable and e represents the current element.
acc is of type List[(Char, Int)] and e is of type Char
Now when we start, we are at first char of the list. Right now the acc is empty list. So, we attach first tuple to the front of the list acc
with count one.
when acc is Nil do (e, 1) :: Nil or (e, 1) :: acc note: acc is Nil
Now front of the list is the node we are interested in.
Lets go to the second element. Now acc has one element which is the first element with count one.
Now, we compare the current element with the front element of the list
if it matches, increment the count and put the (element, incrementedCount) in the front of the list in place of old tuple.
if current element does not match the last element, that means we have
new element. So, we attach new element with count 1 to the front of the list and so on.
then to convert the List[(Char, Int)] to required string representation.
Note: We are using front element of the list which is accessible in O(1) (constant time complexity) has buffer and increasing the count in case same element is found.
Scala REPL
scala> :paste
// Entering paste mode (ctrl-D to finish)
def encode(word: String): String =
word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
acc match {
case Nil => (e, 1) :: Nil
case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
case xs => (e, 1) :: xs
}
}.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
// Exiting paste mode, now interpreting.
encode: (word: String)String
scala> encode("AAABBCAADEEFF")
res0: String = 3A2B1C2A1D2E2F
Bit more concise with back ticks e instead of guard in pattern matching
def encode(word: String): String =
word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
acc match {
case Nil => (e, 1) :: Nil
case ((`e`, lastCharCount) :: xs) => (e, lastCharCount + 1) :: xs
case xs => (e, 1) :: xs
}
}.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
Here's another more simplified approach based upon this answer:
class StringCompressinator {
def compress(raw: String): String = {
val split: Array[String] = raw.split("(?<=(.))(?!\\1)", 0) // creates array of the repeated chars as strings
val converted = split.map(group => {
val char = group.charAt(0) // take first char of group string
s"${group.length}${char}" // use the length as counter and prefix the return string "AAA" becomes "3A"
})
converted.mkString("") // converted is again array, join turn it into a string
}
}
import org.scalatest.FunSuite
class StringCompressinatorTest extends FunSuite {
test("testCompress") {
val compress = (new StringCompressinator).compress(_)
val input = "AAABBCAADEEFF"
assert(compress(input) == "3A2B1C2A1D2E2F")
}
}
Similar idea with slight difference :
Case class for pattern matching the head so we don't need to use if and it also helps on printing end result by overriding toString
Using capital letter for variable name when pattern matching (either that or back ticks, I don't know which I like less :P)
case class Count(c : Char, cnt : Int){
override def toString = s"$cnt$c"
}
def compressor( counts : List[Count], C : Char ) = counts match {
case Count(C, cnt) :: tail => Count(C, cnt + 1) :: tail
case _ => Count(C, 1) :: counts
}
"AAABBCAADEEFF".foldLeft(List[Count]())(compressor).reverse.mkString
//"3A2B1C2A1D2E2F"

Scala - state while looping through a list

Newbie question.
I am looping through a list and need keep state in between the items.
For instance
val l = List("a", "1", "2", "3", "b", "4")
var state: String = ""
l.foreach(o => {
if (toInt(o).isEmpty) state = o else println(state + o.toString)
})
what's the alternative for the usage of var here?
You should keep in mind that it's sometimes (read: when it makes the code more readable and maintainable by others) okay to use mutability when performing some operation that's easily expressed with mutable state as long as that mutable state is confined to as little of your program as possible. Using (e.g.) foldLeft to maintain an accumulator here without using a var doesn't gain you much.
That said, here's one way to go about doing this:
val listOfThings: Seq[Either[Char, Int]] = Seq(Left('a'), Right(11), Right(212), Left('b'), Right(89))
val result = listOfThings.foldLeft(Seq[(Char, Seq[Int])]()) {
case (accumulator, Left(nextChar)) => accumulator :+ (nextChar, Seq.empty)
case (accumulator, Right(nextInt)) =>
val (currentChar, currentSequence) = accumulator.last
accumulator.dropRight(1) :+ (currentChar, currentSequence :+ nextInt)
}
result foreach {
case (char, numbers) => println(numbers.map(num => s"$char-$num").mkString(" "))
}
Use foldLeft:
l.foldLeft(""){ (state, o) =>
if(toInt(o).isEmpty) o
else {
println(state + o.toString)
state
}
}
Pass an arg:
scala> def collapse(header: String, vs: List[String]): Unit = vs match {
| case Nil =>
| case h :: t if h.forall(Character.isDigit) => println(s"$header$h") ; collapse(header, t)
| case h :: t => collapse(h, t)
| }
collapse: (header: String, vs: List[String])Unit
scala> collapse("", vs)
a1
a2
a3
b4
As simple as:
val list: List[Int] = List.range(1, 10) // Create list
def updateState(i : Int) : Int = i + 1 // Generate new state, just add one to each position. That will be the state
list.foldRight[List[(Int,Int)]](List())((a, b) => (a, updateState(a)) :: b)
Note that the result is a list of Tuple2: (Element, State), and each state depends on the element of the list.
Hope this helps
There are two major options to pass a state in functional programming when processing collections (I assume you want to get your result as a variable):
Recursion (classic)
val xs = List("a", "11", "212", "b", "89")
#annotation.tailrec
def fold(seq: ListBuffer[(String, ListBuffer[String])],
xs: Seq[String]): ListBuffer[(String, ListBuffer[String])] = {
(seq, xs) match {
case (_, Nil) =>
seq
case (_, c :: tail) if toInt(c).isEmpty =>
fold(seq :+ ((c, ListBuffer[String]())), tail)
case (init :+ ((c, seq)), i :: tail) =>
fold(init :+ ((c, seq :+ i)), tail)
}
}
val result =
fold(ListBuffer[(String, ListBuffer[String])](), xs)
// Get rid of mutable ListBuffer
.toSeq
.map {
case (c, seq) =>
(c, seq.toSeq)
}
//> List((a,List(11, 212)), (b,List(89)))
foldLeft et al.
val xs = List("a", "11", "212", "b", "89")
val result =
xs.foldLeft(
ListBuffer[(String, ListBuffer[String])]()
) {
case (seq, c) if toInt(c).isEmpty =>
seq :+ ((c, ListBuffer[String]()))
case (init :+ ((c, seq)), i) =>
init :+ ((c, seq :+ i))
}
// Get rid of mutable ListBuffer
.toSeq
.map {
case (c, seq) =>
(c, seq.toSeq)
}
//> List((a,List(11, 212)), (b,List(89)))
Which one is better? Unless you want to abort your processing in the middle of your collection (like e.g. in find) foldLeft is considered a better way and it has slightly less boilerplate, but otherwise they are very similar.
I'm using ListBuffer here to avoid reversing lists.

n-way `span` on sequences

Given a sequence of elements and a predicate p, I would like to produce a sequence of sequences such that, in each subsequence, either all elements satisfy p or the sequence has length 1. Additionally, calling .flatten on the result should give me back my original sequence (so no re-ordering of elements).
For instance, given:
val l = List(2, 4, -6, 3, 1, 8, 7, 10, 0)
val p = (i : Int) => i % 2 == 0
I would like magic(l,p) to produce:
List(List(2, 4, -6), List(3), List(1), List(8), List(7), List(10, 0))
I know of .span, but that method stops the first time it encounters a value that doesn't satisfy p and just returns a pair.
Below is a candidate implementation. It does what I want, but, well, makes we want to cry. I would love for someone to come up with something slightly more idiomatic.
def magic[T](elems : Seq[T], p : T=>Boolean) : Seq[Seq[T]] = {
val loop = elems.foldLeft[(Boolean,Seq[Seq[T]])]((false,Seq.empty)) { (pr,e) =>
val (lastOK,s) = pr
if(lastOK && p(e)) {
(true, s.init :+ (s.last :+ e))
} else {
(p(e), s :+ Seq(e))
}
}
loop._2
}
(Note that I do not particularly care about preserving the actual type of the Seq.)
I would not use foldLeft. It's just a simple recursion of span with a special rule if the head doesn't match the predicate:
def magic[T](elems: Seq[T], p: T => Boolean): Seq[Seq[T]] =
elems match {
case Seq() => Seq()
case Seq(head, tail # _*) if !p(head) => Seq(head) +: magic(tail, p)
case xs =>
val (prefix, rest) = xs span p
prefix +: magic(rest, p)
}
You could also do it tail-recursive, but you need to remember to reverse the output if you're prepending (as is sensible):
def magic[T](elems: Seq[T], p: T => Boolean): Seq[Seq[T]] = {
def iter(elems: Seq[T], out: Seq[Seq[T]]) : Seq[Seq[T]] =
elems match {
case Seq() => out.reverse
case Seq(head, tail # _*) if !p(head) => iter(tail, Seq(head) +: out)
case xs =>
val (prefix, rest) = xs span p
iter(rest, prefix +: out)
}
iter(elems, Seq())
}
For this task you can use takeWhile and drop combined with a little pattern matching an recursion:
def magic[T](elems : Seq[T], p : T=>Boolean) : Seq[Seq[T]] = {
def magic(elems: Seq[T], result: Seq[Seq[T]]): Seq[Seq[T]] = elems.takeWhile(p) match {
// if elems is Nil, we have a result
case Nil if elems.isEmpty => result
// if it's not, but we don't get any values from takeWhile, we take a single elem
case Nil => magic(elems.tail, result :+ Seq(elems.head))
// takeWhile gave us something, so we add it to the result
// and drop as many elements from elems, as takeWhile gave us
case xs => magic(elems.drop(xs.size), result :+ xs)
}
magic(elems, Seq())
}
Another solution using a fold:
def magicFilter[T](seq: Seq[T], p: T => Boolean): Seq[Seq[T]] = {
val (filtered, current) = (seq foldLeft (Seq[Seq[T]](), Seq[T]())) {
case ((filtered, current), element) if p(element) => (filtered, current :+ element)
case ((filtered, current), element) if !current.isEmpty => (filtered :+ current :+ Seq(element), Seq())
case ((filtered, current), element) => (filtered :+ Seq(element), Seq())
}
if (!current.isEmpty) filtered :+ current else filtered
}