Simple functionnal way for grouping successive elements? [duplicate] - scala

I'm trying to 'group' a string into segments, I guess this example would explain it more succintly
scala> val str: String = "aaaabbcddeeeeeeffg"
... (do something)
res0: List("aaaa","bb","c","dd","eeeee","ff","g")
I can thnk of a few ways to do this in an imperative style (with vars and stepping through the string to find groups) but I was wondering if any better functional solution could
be attained? I've been looking through the Scala API but there doesn't seem to be something that fits my needs.
Any help would be appreciated

You can split the string recursively with span:
def s(x : String) : List[String] = if(x.size == 0) Nil else {
val (l,r) = x.span(_ == x(0))
l :: s(r)
}
Tail recursive:
#annotation.tailrec def s(x : String, y : List[String] = Nil) : List[String] = {
if(x.size == 0) y.reverse
else {
val (l,r) = x.span(_ == x(0))
s(r, l :: y)
}
}

Seems that all other answers are very concentrated on collection operations. But pure string + regex solution is much simpler:
str split """(?<=(\w))(?!\1)""" toList
In this regex I use positive lookbehind and negative lookahead for the captured char

def group(s: String): List[String] = s match {
case "" => Nil
case s => s.takeWhile(_==s.head) :: group(s.dropWhile(_==s.head))
}
Edit: Tail recursive version:
def group(s: String, result: List[String] = Nil): List[String] = s match {
case "" => result reverse
case s => group(s.dropWhile(_==s.head), s.takeWhile(_==s.head) :: result)
}
can be used just like the other because the second parameter has a default value and thus doesnt have to be supplied.

Make it one-liner:
scala> val str = "aaaabbcddddeeeeefff"
str: java.lang.String = aaaabbcddddeeeeefff
scala> str.groupBy(identity).map(_._2)
res: scala.collection.immutable.Iterable[String] = List(eeeee, fff, aaaa, bb, c, dddd)
UPDATE:
As #Paul mentioned about the order here is updated version:
scala> str.groupBy(identity).toList.sortBy(_._1).map(_._2)
res: List[String] = List(aaaa, bb, c, dddd, eeeee, fff)

You could use some helper functions like this:
val str = "aaaabbcddddeeeeefff"
def zame(chars:List[Char]) = chars.partition(_==chars.head)
def q(chars:List[Char]):List[List[Char]] = chars match {
case Nil => Nil
case rest =>
val (thesame,others) = zame(rest)
thesame :: q(others)
}
q(str.toList) map (_.mkString)
This should do the trick, right? No doubt it can be cleaned up into one-liners even further

A functional* solution using fold:
def group(s : String) : Seq[String] = {
s.tail.foldLeft(Seq(s.head.toString)) { case (carry, elem) =>
if ( carry.last(0) == elem ) {
carry.init :+ (carry.last + elem)
}
else {
carry :+ elem.toString
}
}
}
There is a lot of cost hidden in all those sequence operations performed on strings (via implicit conversion). I guess the real complexity heavily depends on the kind of Seq strings are converted to.
(*) Afaik all/most operations in the collection library depend in iterators, an imho inherently unfunctional concept. But the code looks functional, at least.

Starting Scala 2.13, List is now provided with the unfold builder which can be combined with String::span:
List.unfold("aaaabbaaacdeeffg") {
case "" => None
case rest => Some(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
or alternatively, coupled with Scala 2.13's Option#unless builder:
List.unfold("aaaabbaaacdeeffg") {
rest => Option.unless(rest.isEmpty)(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
Details:
Unfold (def unfold[A, S](init: S)(f: (S) => Option[(A, S)]): List[A]) is based on an internal state (init) which is initialized in our case with "aaaabbaaacdeeffg".
For each iteration, we span (def span(p: (Char) => Boolean): (String, String)) this internal state in order to find the prefix containing the same symbol and produce a (String, String) tuple which contains the prefix and the rest of the string. span is very fortunate in this context as it produces exactly what unfold expects: a tuple containing the next element of the list and the new internal state.
The unfolding stops when the internal state is "" in which case we produce None as expected by unfold to exit.

Edit: Have to read more carefully. Below is no functional code.
Sometimes, a little mutable state helps:
def group(s : String) = {
var tmp = ""
val b = Seq.newBuilder[String]
s.foreach { c =>
if ( tmp != "" && tmp.head != c ) {
b += tmp
tmp = ""
}
tmp += c
}
b += tmp
b.result
}
Runtime O(n) (if segments have at most constant length) and tmp.+= probably creates the most overhead. Use a string builder instead for strict runtime in O(n).
group("aaaabbcddeeeeeeffg")
> Seq[String] = List(aaaa, bb, c, dd, eeeeee, ff, g)

If you want to use scala API you can use the built in function for that:
str.groupBy(c => c).values
Or if you mind it being sorted and in a list:
str.groupBy(c => c).values.toList.sorted

Related

Conditional concatenation of iterator elements - A Scala idiomatic solution

I have an Iterator of Strings and would like to concatenate each element preceding one that matches a predicate, e.g. for an Iterator of
Iterator("a", "b", "c break", "d break", "e")
and a predicate of
!line.endsWith("break")
I would like to print out
(Group: 0): a-b-c break
(Group: 1): d break
(Group: 2): e
(without needing to hold in memory more than a single group at a time)
I know I can achieve this with an iterator like below, but there has to be a more "Scala" way of writing this, right?
import scala.collection.mutable.ListBuffer
object IteratingAndAccumulating extends App {
class AccumulatingIterator(lines: Iterator[String])extends Iterator[ListBuffer[String]] {
override def hasNext: Boolean = lines.hasNext
override def next(): ListBuffer[String] = getNextLine(lines, new ListBuffer[String])
def getNextLine(lines: Iterator[String], accumulator: ListBuffer[String]): ListBuffer[String] = {
val line = lines.next
accumulator += line
if (line.endsWith("break") || !lines.hasNext) accumulator
else getNextLine(lines, accumulator)
}
}
new AccumulatingIterator(Iterator("a", "b", "c break", "d break", "e"))
.map(_.mkString("-")).zipWithIndex.foreach{
case (conc, i) =>
println(s"(Group: $i): $conc")
}
}
many thanks,
Fil
Here is a simple solution if you don't mind loading the entire contents into memory at once:
val lines: List[List[String]] = it.foldLeft(List(List.empty[String])) {
case (head::tail, x) if predicate(x) => Nil :: (x::head) :: tail
case (head::tail, x) => (x::head ) :: tail
}.dropWhile(_.isEmpty).map(_.reverse).reverse
If you would rather iterate through the strings and groups one-by-one, it gets a little bit more involved:
// first "instrument" the iterator, by "demarcating" group boundaries with None:
val instrumented: Iterator[Option[String]] = it.flatMap {
case x if predicate(x) => Seq(Some(x), None)
case x => Seq(Some(x))
}
// And now, wrap it around into another iterator, constructing groups:
val lines: Iterator[Iterator[String]] = Iterator.continually {
instrumented.takeWhile(_.nonEmpty).flatten
}.takeWhile(_.nonEmpty)

Combining multiple Lists of arbitrary length

I am looking for an approach to join multiple Lists in the following manner:
ListA a b c
ListB 1 2 3 4
ListC + # * § %
..
..
..
Resulting List: a 1 + b 2 # c 3 * 4 § %
In Words: The elements in sequential order, starting at first list combined into the resulting list. An arbitrary amount of input lists could be there varying in length.
I used multiple approaches with variants of zip, sliding iterators but none worked and especially took care of varying list lengths. There has to be an elegant way in scala ;)
val lists = List(ListA, ListB, ListC)
lists.flatMap(_.zipWithIndex).sortBy(_._2).map(_._1)
It's pretty self-explanatory. It just zips each value with its position on its respective list, sorts by index, then pulls the values back out.
Here's how I would do it:
class ListTests extends FunSuite {
test("The three lists from his example") {
val l1 = List("a", "b", "c")
val l2 = List(1, 2, 3, 4)
val l3 = List("+", "#", "*", "§", "%")
// All lists together
val l = List(l1, l2, l3)
// Max length of a list (to pad the shorter ones)
val maxLen = l.map(_.size).max
// Wrap the elements in Option and pad with None
val padded = l.map { list => list.map(Some(_)) ++ Stream.continually(None).take(maxLen - list.size) }
// Transpose
val trans = padded.transpose
// Flatten the lists then flatten the options
val result = trans.flatten.flatten
// Viola
assert(List("a", 1, "+", "b", 2, "#", "c", 3, "*", 4, "§", "%") === result)
}
}
Here's an imperative solution if efficiency is paramount:
def combine[T](xss: List[List[T]]): List[T] = {
val b = List.newBuilder[T]
var its = xss.map(_.iterator)
while (!its.isEmpty) {
its = its.filter(_.hasNext)
its.foreach(b += _.next)
}
b.result
}
You can use padTo, transpose, and flatten to good effect here:
lists.map(_.map(Some(_)).padTo(lists.map(_.length).max, None)).transpose.flatten.flatten
Here's a small recursive solution.
def flatList(lists: List[List[Any]]) = {
def loop(output: List[Any], xss: List[List[Any]]): List[Any] = (xss collect { case x :: xs => x }) match {
case Nil => output
case heads => loop(output ::: heads, xss.collect({ case x :: xs => xs }))
}
loop(List[Any](), lists)
}
And here is a simple streams approach which can cope with an arbitrary sequence of sequences, each of potentially infinite length.
def flatSeqs[A](ssa: Seq[Seq[A]]): Stream[A] = {
def seqs(xss: Seq[Seq[A]]): Stream[Seq[A]] = xss collect { case xs if !xs.isEmpty => xs } match {
case Nil => Stream.empty
case heads => heads #:: seqs(xss collect { case xs if !xs.isEmpty => xs.tail })
}
seqs(ssa).flatten
}
Here's something short but not exceedingly efficient:
def heads[A](xss: List[List[A]]) = xss.map(_.splitAt(1)).unzip
def interleave[A](xss: List[List[A]]) = Iterator.
iterate(heads(xss)){ case (_, tails) => heads(tails) }.
map(_._1.flatten).
takeWhile(! _.isEmpty).
flatten.toList
Here's a recursive solution that's O(n). The accepted solution (using sort) is O(nlog(n)). Some testing I've done suggests the second solution using transpose is also O(nlog(n)) due to the implementation of transpose. The use of reverse below looks suspicious (since it's an O(n) operation itself) but convince yourself that it either can't be called too often or on too-large lists.
def intercalate[T](lists: List[List[T]]) : List[T] = {
def intercalateHelper(newLists: List[List[T]], oldLists: List[List[T]], merged: List[T]): List[T] = {
(newLists, oldLists) match {
case (Nil, Nil) => merged
case (Nil, zss) => intercalateHelper(zss.reverse, Nil, merged)
case (Nil::xss, zss) => intercalateHelper(xss, zss, merged)
case ( (y::ys)::xss, zss) => intercalateHelper(xss, ys::zss, y::merged)
}
}
intercalateHelper(lists, List.empty, List.empty).reverse
}

Scala: Print separators when using output stream

When we need an array of strings to be concatenated, we can use mkString method:
val concatenatedString = listOfString.mkString
However, when we have a very long list of string, getting concatenated string may not be a good choice. In this case, It would be more appropriated to print out to an output stream directly, Writing it to output stream is simple:
listOfString.foreach(outstream.write _)
However, I don't know a neat way to append separators. One thing I tried is looping with an index:
var i = 0
for(str <- listOfString) {
if(i != 0) outstream.write ", "
outstream.write str
i += 1
}
This works, but it is too wordy. Although I can make a function encapsules the code above, I want to know whether Scala API already has a function do the same thing or not.
Thank you.
Here is a function that do what you want in a bit more elegant way:
def commaSeparated(list: List[String]): Unit = list match {
case List() =>
case List(a) => print(a)
case h::t => print(h + ", ")
commaSeparated(t)
}
The recursion avoids mutable variables.
To make it even more functional style, you can pass in the function that you want to use on each item, that is:
def commaSeparated(list: List[String], func: String=>Unit): Unit = list match {
case List() =>
case List(a) => func(a)
case h::t => func(h + ", ")
commaSeparated(t, func)
}
And then call it by:
commaSeparated(mylist, oustream.write _)
I believe what you want is the overloaded definitions of mkString.
Definitions of mkString:
scala> val strList = List("hello", "world", "this", "is", "bob")
strList: List[String] = List(hello, world, this, is, bob)
def mkString: String
scala> strList.mkString
res0: String = helloworldthisisbob
def mkString(sep: String): String
scala> strList.mkString(", ")
res1: String = hello, world, this, is, bob
def mkString(start: String, sep: String, end: String): String
scala> strList.mkString("START", ", ", "END")
res2: String = STARThello, world, this, is, bobEND
EDIT
How about this?
scala> strList.view.map(_ + ", ").foreach(print) // or .iterator.map
hello, world, this, is, bob,
Not good for parallelized code, but otherwise:
val it = listOfString.iterator
it.foreach{x => print(x); if (it.hasNext) print(' ')}
Here's another approach which avoids the var
listOfString.zipWithIndex.foreach{ case (s, i) =>
if (i != 0) outstream write ","
outstream write s }
Self Answer:
I wrote a function encapsulates the code in the original question:
implicit def withSeparator[S >: String](seq: Seq[S]) = new {
def withSeparator(write: S => Any, sep: String = ",") = {
var i = 0
for (str <- seq) {
if (i != 0) write(sep)
write(str)
i += 1
}
seq
}
}
You can use it like this:
listOfString.withSeparator(print _)
The separator can also be assigned:
listOfString.withSeparator(print _, ",\n")
Thank you for everyone answered me. What I wanted to use is a concise and not too slow representation. The implicit function withSeparator looks like the thing I wanted. So I accept my own answer for this question. Thank you again.

How to check if a character is contained in string?

I want to check if the string contains the character. I am writing a hangman code.
For example, here is the word to guess: "scala", but it looks like "_ _ _ _ _" tho the user. Let's assume that user inputs letter 'a', then it must look like "_ _ a _ a".
def checkGuess(){
if (result.contains(user_input)) {
val comp = result.toCharArray
for (i <- comp){
if (user_input != comp(i))
comp(i) = '_'
comp(i)
}
val str = comp.toString
}
}
Is this right?
Thank you in advance.
I don't think this is homework, so I'll probably regret answering if it is...
case class HangmanGame(goal: String, guesses: Set[Char] = Set.empty[Char]) {
override def toString = goal map {c => if (guesses contains c) c else '_'} mkString " "
val isComplete = goal forall { guesses.contains }
def withGuess(c: Char) = copy(guesses = guesses + c)
}
Then
val h = HangmanGame("scala")
h: HangmanGame = _ _ _ _ _
scala> val h1 = h.withGuess('a')
h1: HangmanGame = _ _ a _ a
scala> val h2 = h1.withGuess('l')
h2: HangmanGame = _ _ a l a
scala> val h3 = h2.withGuess('s')
h3: HangmanGame = s _ a l a
scala> val h4 = h3.withGuess('c')
h4: HangmanGame = s c a l a
scala> h4.isComplete
res5: Boolean = true
UPDATE
Okay, so it does look like homework. I guess the genie's out of the bottle now, but unless you get up to speed on Scala very quickly you're going to have a really hard time explaining how it works.
How about:
scala> def checkGuess(str: String, c: Char) = str.replaceAll("[^"+c+"]","_")
checkGuess: (str: String,c: Char)java.lang.String
scala> checkGuess("scala",'a')
res1: java.lang.String = __a_a
scala> def checkGuess2(str: String, C: Char) = str map { case C => C; case _ => '_'}
checkGuess2: (str: String,C: Char)String
scala> checkGuess2("scala",'a')
res2: String = __a_a
Here are some comments about how you wrote this. When using this syntax, def checkGuess() { ... }, the function will not return any value, it will return Unit instead.
This means that you're using it for its side effect only (such as setting some var outside the code block or printing some values). The issue is that you are not setting any value or printing anything inside the function (no printing, no assignment).
What you don't show in your code snippet is where you store the string to guess, the user input and the feedback to print. You can pass the first two as arguments and the last one as a returned value. This make the input and output self contained in the function and does not presume where you render the feedback.
def feedback(target:String, guesses:String): String = {
// target is the string to guess like "scala"
// guesses are the letters that have been provided so far, like "ac"
// last expression should be the feedback to print for instance "_ca_a"
}
Then you can think about the function as transforming each letter in target with _ or with itself depending on whether it is contained in guesses. For this the target map { c => expr } would work pretty well if you figure out how to make expr return c if c is in guesses and '_' otherwise.
Staying as close as possible to the main question ( How to check if a character is contained in string? ) what I did was changing the approach, i.e.:
Inside a for loop, I wanted to do something like some_chr == 't'
and I did the following some_chr.toString == "t" and it worked just fine.

What is Scala way of finding whether all the elements of an Array has same length?

I am new to Scala and but very old to Java and had some understanding working with FP languages like "Haskell".
Here I am wondering how to implement this using Scala. There is a list of elements in an array all of them are strings and I just want to know if there is a way I can do this in Scala in a FP way. Here is my current version which works...
def checkLength(vals: Array[String]): Boolean = {
var len = -1
for(x <- conts){
if(len < 0)
len = x.length()
else{
if (x.length() != len)
return false
else
len = x.length()
}
}
return true;
}
And I am pretty sure there is a better way of doing this in Scala/FP...
list.forall( str => str.size == list(0).size )
Edit: Here's a definition that's as general as possilbe and also allows to check whether a property other than length is the same for all elements:
def allElementsTheSame[T,U](f: T => U)(list: Seq[T]) = {
val first: Option[U] = list.headOption.map( f(_) )
list.forall( f(_) == first.get ) //safe to use get here!
}
type HasSize = { val size: Int }
val checkLength = allElementsTheSame((x: HasSize) => x.size)_
checkLength(Array( "123", "456") )
checkLength(List( List(1,2), List(3,4) ))
Since everyone seems to be so creative, I'll be creative too. :-)
def checkLength(vals: Array[String]): Boolean = vals.map(_.length).removeDuplicates.size <= 1
Mind you, removeDuplicates will likely be named distinct on Scala 2.8.
Tip: Use forall to determine whether all elements in the collection do satisfy a certain predicate (e.g. equality of length).
If you know that your lists are always non-empty, a straight forall works well. If you don't, it's easy to add that in:
list match {
case x :: rest => rest forall (_.size == x.size)
case _ => true
}
Now lists of length zero return true instead of throwing exceptions.
list.groupBy{_.length}.size == 1
You convert the list into a map of groups of equal length strings. If all the strings have the same length, then the map will hold only one such group.
The nice thing with this solution is that you don't need to know anything about the length of the strings, and don't need to comapre them to, say, the first string. It works well on an empty string, in which case it returns false (if that's what you want..)
Here's another approach:
def check(list:List[String]) = list.foldLeft(true)(_ && list.head.length == _.length)
Just my €0.02
def allElementsEval[T, U](f: T => U)(xs: Iterable[T]) =
if (xs.isEmpty) true
else {
val first = f(xs.head)
xs forall { f(_) == first }
}
This works with any Iterable, evaluates f the minimum number of times possible, and while the block can't be curried, the type inferencer can infer the block parameter type.
"allElementsEval" should "return true for an empty Iterable" in {
allElementsEval(List[String]()){ x => x.size } should be (true)
}
it should "eval the function at each item" in {
allElementsEval(List("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(List("aa", "bb", "ccc")) { x => x.size } should be (false)
}
it should "work on Vector and Array as well" in {
allElementsEval(Vector("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(Vector("aa", "bb", "ccc")) { x => x.size } should be (false)
allElementsEval(Array("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(Array("aa", "bb", "ccc")) { x => x.size } should be (false)
}
It's just a shame that head :: tail pattern matching fails so insidiously for Iterables.