Can I count occurrence in 1 loop in this Scala fragment? - scala

I have a simple piece of Scala code. I loop sequentially through a List of Strings, and I want to count the occurrence of each String which I collect as tuples (String, Int) in the list r. The part in the main function should remain (so no groupBy or something). My question is about the update function:
right now I do a find first, and then add a new tuple to r if it doesn't exist. If it does exist, I loop through r and update the counter for the matching String.
Can the update function be modified so that it is more efficient? Can r be updated in a single iteration (adding if it doesn't exist, updating the counter if it does exist)?
Thanks
var r = List[(String, Int)]() // (string, count)
def update(s: String, l: List[(String, Int)]) : List[(String, Int)] = {
if (r.find(a => a._1 == s) == None) {
(s, 1) :: r // add a new item if it does not exist
} else {
for (b <- l) yield {
if (b._1 == s) {
(b._1, b._2 + 1) // update counter if exists
} else {
b // just yield if no match
}
}
}
}
def main(args : Array[String]) : Unit = {
val l = "A" :: "B" :: "A" :: "C" :: "A" :: "B" :: Nil
for (s <- l) r = update(s, r)
r foreach println
}

I suggest you go for the functional style and use the power of Scala's collections:
ss.groupBy(identity).mapValues(_.size)

If you don't want to use groupBy or work with lazy collections (Streams), this could be the way to go:
ss.foldLeft(Map[String, Int]())((m, s) => m + (s -> (m.getOrElse(s, 0) + 1)))

Something like this also works:
val l = List("A","B","A","C","A","B")
l.foldLeft(Map[String,Int]()) {
case (a: Map[String, Int], s: String) => {
a + (s -> (1 + a.getOrElse(s, 0)))
}
}
res3: scala.collection.immutable.Map[String,Int] = Map((A,3), (B,2), (C,1))

Your present solution is horribly slow, mainly because of the choice of r as a List. But your iteration throughout the whole list in case of update can be improved on, at least. I'd write it like this
def update(s: String, l: List[(String, Int)]) : List[(String, Int)] = {
l span (_._1 != s) match {
case (before, (`s`, count) :: after) => before ::: (s, count + 1) :: after
case _ => (s, 1) :: l
}
}
Using span avoids having to search the list twice. Also, we just append after, without having to iterate through it again.

Related

Compress a Given Text of String in Scala

I have been trying to compress a String. Given a String like this:
AAABBCAADEEFF, I would need to compress it like 3A2B1C2A1D2E2F
I was able to come up with a tail recursive implementation:
#scala.annotation.tailrec
def compress(str: List[Char], current: Seq[Char], acc: Map[Int, String]): String = str match {
case Nil =>
if (current.nonEmpty)
s"${acc.values.mkString("")}${current.length}${current.head}"
else
s"${acc.values.mkString("")}"
case List(x) if current.contains(x) =>
val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length + 1}${current.head}")
compress(List.empty[Char], Seq.empty[Char], newMap)
case x :: xs if current.isEmpty =>
compress(xs, Seq(x), acc)
case x :: xs if !current.contains(x) =>
if (acc.nonEmpty) {
val newMap = acc ++ Map(acc.keys.toList.last + 1 -> s"${current.length}${current.head}")
compress(xs, Seq(x), newMap)
} else {
compress(xs, Seq(x), acc ++ Map(1 -> s"${current.length}${current.head}"))
}
case x :: xs =>
compress(xs, current :+ x, acc)
}
// Produces 2F3A2B1C2A instead of 3A2B1C2A1D2E2F
compress("AAABBCAADEEFF".toList, Seq.empty[Char], Map.empty[Int, String])
It fails however for the given case! Not sure what edge scenario I'm missing! Any help?
So what I'm actually doing is, going over the sequence of characters, collecting identical ones into a new Sequence and as long as the new character in the original String input (the first param in the compress method) is found in the current (the second parameter in the compress method), I keep collecting it.
As soon as it is not the case, I empty the current sequence, count and push the collected elements into the Map! It fails for some edge cases that I'm not able to make out!
I came up with this solution:
def compress(word: List[Char]): List[(Char, Int)] =
word.map((_, 1)).foldRight(Nil: List[(Char, Int)])((e, acc) =>
acc match {
case Nil => List(e)
case ((c, i)::rest) => if (c == e._1) (c, i + 1)::rest else e::acc
})
Basically, it's a map followed by a right fold.
Took inspiration from the #nicodp code
def encode(word: String): String =
word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
acc match {
case Nil => (e, 1) :: Nil
case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
case xs => (e, 1) :: xs
}
}.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
First our intermediate result will be List[(Char, Int)]. List of tuples of chars each char will be accompanied by its count.
Now lets start going through the list one char at once using the Great! foldLeft
We will accumulate the result in the acc variable and e represents the current element.
acc is of type List[(Char, Int)] and e is of type Char
Now when we start, we are at first char of the list. Right now the acc is empty list. So, we attach first tuple to the front of the list acc
with count one.
when acc is Nil do (e, 1) :: Nil or (e, 1) :: acc note: acc is Nil
Now front of the list is the node we are interested in.
Lets go to the second element. Now acc has one element which is the first element with count one.
Now, we compare the current element with the front element of the list
if it matches, increment the count and put the (element, incrementedCount) in the front of the list in place of old tuple.
if current element does not match the last element, that means we have
new element. So, we attach new element with count 1 to the front of the list and so on.
then to convert the List[(Char, Int)] to required string representation.
Note: We are using front element of the list which is accessible in O(1) (constant time complexity) has buffer and increasing the count in case same element is found.
Scala REPL
scala> :paste
// Entering paste mode (ctrl-D to finish)
def encode(word: String): String =
word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
acc match {
case Nil => (e, 1) :: Nil
case ((lastChar, lastCharCount) :: xs) if lastChar == e => (lastChar, lastCharCount + 1) :: xs
case xs => (e, 1) :: xs
}
}.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
// Exiting paste mode, now interpreting.
encode: (word: String)String
scala> encode("AAABBCAADEEFF")
res0: String = 3A2B1C2A1D2E2F
Bit more concise with back ticks e instead of guard in pattern matching
def encode(word: String): String =
word.foldLeft(List.empty[(Char, Int)]) { (acc, e) =>
acc match {
case Nil => (e, 1) :: Nil
case ((`e`, lastCharCount) :: xs) => (e, lastCharCount + 1) :: xs
case xs => (e, 1) :: xs
}
}.reverse.map { case (a, num) => s"$num$a" }.foldLeft("")(_ ++ _)
Here's another more simplified approach based upon this answer:
class StringCompressinator {
def compress(raw: String): String = {
val split: Array[String] = raw.split("(?<=(.))(?!\\1)", 0) // creates array of the repeated chars as strings
val converted = split.map(group => {
val char = group.charAt(0) // take first char of group string
s"${group.length}${char}" // use the length as counter and prefix the return string "AAA" becomes "3A"
})
converted.mkString("") // converted is again array, join turn it into a string
}
}
import org.scalatest.FunSuite
class StringCompressinatorTest extends FunSuite {
test("testCompress") {
val compress = (new StringCompressinator).compress(_)
val input = "AAABBCAADEEFF"
assert(compress(input) == "3A2B1C2A1D2E2F")
}
}
Similar idea with slight difference :
Case class for pattern matching the head so we don't need to use if and it also helps on printing end result by overriding toString
Using capital letter for variable name when pattern matching (either that or back ticks, I don't know which I like less :P)
case class Count(c : Char, cnt : Int){
override def toString = s"$cnt$c"
}
def compressor( counts : List[Count], C : Char ) = counts match {
case Count(C, cnt) :: tail => Count(C, cnt + 1) :: tail
case _ => Count(C, 1) :: counts
}
"AAABBCAADEEFF".foldLeft(List[Count]())(compressor).reverse.mkString
//"3A2B1C2A1D2E2F"

Group List elements with a distance less than x

I'm trying to figure out a way to group all the objects in a list depending on an x distance between the elements.
For instance, if distance is 1 then
List(2,3,1,6,10,7,11,12,14)
would give
List(List(1,2,3), List(6,7), List(10,11,12), List(14))
I can only come up with tricky approaches and loops but I guess there must be a cleaner solution.
You may try to sort your list and then use a foldLeft on it. Basically something like that
def sort = {
val l = List(2,3,1,6,10,7,11,12,14)
val dist = 1
l.sorted.foldLeft(List(List.empty[Int]))((list, n) => {
val last = list.head
last match {
case h::q if Math.abs(last.head-n) > dist=> List(n) :: list
case _ => (n :: last ) :: list.tail
}
}
)
}
The result seems to be okay but reversed. Call "reverse" if needed, when needed, on the lists. the code becomes
val l = List(2,3,1,6,10,7,11,12,14)
val dist = 1
val res = l.sorted.foldLeft(List(List.empty[Int]))((list, n) => {
val last = list.head
last match {
case h::q if Math.abs(last.head-n) > dist=> List(n) :: (last.reverse :: list.tail)
case _ => (n :: last ) :: list.tail
}
}
).reverse
The cleanest answer would rely upon a method that probably should be called groupedWhile which would split exactly where a condition was true. If you had this method, then it would just be
def byDist(xs: List[Int], d: Int) = groupedWhile(xs.sorted)((l,r) => r - l <= d)
But we don't have groupedWhile.
So let's make one:
def groupedWhile[A](xs: List[A])(p: (A,A) => Boolean): List[List[A]] = {
val yss = List.newBuilder[List[A]]
val ys = List.newBuilder[A]
(xs.take(1) ::: xs, xs).zipped.foreach{ (l,r) =>
if (!p(l,r)) {
yss += ys.result
ys.clear
}
ys += r
}
ys.result match {
case Nil =>
case zs => yss += zs
}
yss.result.dropWhile(_.isEmpty)
}
Now that you have the generic capability, you can get the specific one easily.

Cartesian product stream scala

I had a simple task to find combination which occurs most often when we drop 4 cubic dices an remove one with least points.
So, the question is: are there any Scala core classes to generate streams of cartesian products in Scala? When not - how to implement it in the most simple and effective way?
Here is the code and comparison with naive implementation in Scala:
object D extends App {
def dropLowest(a: List[Int]) = {
a diff List(a.min)
}
def cartesian(to: Int, times: Int): Stream[List[Int]] = {
def stream(x: List[Int]): Stream[List[Int]] = {
if (hasNext(x)) x #:: stream(next(x)) else Stream(x)
}
def hasNext(x: List[Int]) = x.exists(n => n < to)
def next(x: List[Int]) = {
def add(current: List[Int]): List[Int] = {
if (current.head == to) 1 :: add(current.tail) else current.head + 1 :: current.tail // here is a possible bug when we get maximal value, don't reuse this method
}
add(x.reverse).reverse
}
stream(Range(0, times).map(t => 1).toList)
}
def getResult(list: Stream[List[Int]]) = {
list.map(t => dropLowest(t).sum).groupBy(t => t).map(t => (t._1, t._2.size)).toMap
}
val list1 = cartesian(6, 4)
val list = for (i <- Range(1, 7); j <- Range(1,7); k <- Range(1, 7); l <- Range(1, 7)) yield List(i, j, k, l)
println(getResult(list1))
println(getResult(list.toStream) equals getResult(list1))
}
Thanks in advance
I think you can simplify your code by using flatMap :
val stream = (1 to 6).toStream
def cartesian(times: Int): Stream[Seq[Int]] = {
if (times == 0) {
Stream(Seq())
} else {
stream.flatMap { i => cartesian(times - 1).map(i +: _) }
}
}
Maybe a little bit more efficient (memory-wise) would be using Iterators instead:
val pool = (1 to 6)
def cartesian(times: Int): Iterator[Seq[Int]] = {
if (times == 0) {
Iterator(Seq())
} else {
pool.iterator.flatMap { i => cartesian(times - 1).map(i +: _) }
}
}
or even more concise by replacing the recursive calls by a fold :
def cartesian[A](list: Seq[Seq[A]]): Iterator[Seq[A]] =
list.foldLeft(Iterator(Seq[A]())) {
case (acc, l) => acc.flatMap(i => l.map(_ +: i))
}
and then:
cartesian(Seq.fill(4)(1 to 6)).map(dropLowest).toSeq.groupBy(i => i.sorted).mapValues(_.size).toSeq.sortBy(_._2).foreach(println)
(Note that you cannot use groupBy on Iterators, so Streams or even Lists are the way to go whatever to be; above code still valid since toSeq on an Iterator actually returns a lazy Stream).
If you are considering stats on the sums of dice instead of combinations, you can update the dropLowest fonction :
def dropLowest(l: Seq[Int]) = l.sum - l.min

How to properly use fold left in a scala function

I've a problem with this part of code in scala
object Test12 {
def times(chars: List[Char]): List[(Char, Int)] = {
val sortedChars = chars.sorted
sortedChars.foldLeft (List[(Char, Int)]()) ((l, e) =>
if(l.head._1 == e){
(e, l.head._2 + 1) :: l.tail
} else {
(e, 1) :: l
} )
}
val s = List('a', 'b')
val c = times s
}
The last line give an error :
Missing arguments for method times; follow this method with `_' if you
want to treat it as a partially applied function
But I don't see why, because I've given 2 arguments to the last function - foldLeft.
Thanks in advance for help!
The idea of code is to count how much time each character is present in a given list
The syntax of times is fine, but you need to use parenthesis when calling it, i.e. :
val c = times(s)
But it won't work because you use l.head without checking if l is Nil, and an empty list does not have a head. You can e.g. check with match for that:
def times(chars: List[Char]): List[(Char, Int)] = {
val sortedChars = chars.sorted
sortedChars.foldLeft (List[(Char, Int)]()) ((a,b) => (a,b) match {
case (Nil, e) => (e, 1) :: Nil
case ((e, count) :: l, f) =>
if (e == f) (e, count + 1) :: l
else (f, 1) :: (e, count) :: l
})
}
Although an easier way is to use the higher level collection functions:
def times(chars: List[Char]) = chars.groupBy(c=>c).map(x=>(x._1,x._2.length)).toList
val c = times s
You can't call method without brackets like this. Try times(s) or this times s.

Lazy Cartesian product of several Seqs in Scala

I implemented a simple method to generate Cartesian product on several Seqs like this:
object RichSeq {
implicit def toRichSeq[T](s: Seq[T]) = new RichSeq[T](s)
}
class RichSeq[T](s: Seq[T]) {
import RichSeq._
def cartesian(ss: Seq[Seq[T]]): Seq[Seq[T]] = {
ss.toList match {
case Nil => Seq(s)
case s2 :: Nil => {
for (e <- s) yield s2.map(e2 => Seq(e, e2))
}.flatten
case s2 :: tail => {
for (e <- s) yield s2.cartesian(tail).map(seq => e +: seq)
}.flatten
}
}
}
Obviously, this one is really slow, as it calculates the whole product at once. Did anyone implement a lazy solution for this problem in Scala?
UPD
OK, So I implemented a reeeeally stupid, but working version of an iterator over a Cartesian product. Posting here for future enthusiasts:
object RichSeq {
implicit def toRichSeq[T](s: Seq[T]) = new RichSeq(s)
}
class RichSeq[T](s: Seq[T]) {
def lazyCartesian(ss: Seq[Seq[T]]): Iterator[Seq[T]] = new Iterator[Seq[T]] {
private[this] val seqs = s +: ss
private[this] var indexes = Array.fill(seqs.length)(0)
private[this] val counts = Vector(seqs.map(_.length - 1): _*)
private[this] var current = 0
def next(): Seq[T] = {
val buffer = ArrayBuffer.empty[T]
if (current != 0) {
throw new NoSuchElementException("no more elements to traverse")
}
val newIndexes = ArrayBuffer.empty[Int]
var inside = 0
for ((index, i) <- indexes.zipWithIndex) {
buffer.append(seqs(i)(index))
newIndexes.append(index)
if ((0 to i).forall(ind => newIndexes(ind) == counts(ind))) {
inside = inside + 1
}
}
current = inside
if (current < seqs.length) {
for (i <- (0 to current).reverse) {
if ((0 to i).forall(ind => newIndexes(ind) == counts(ind))) {
newIndexes(i) = 0
} else if (newIndexes(i) < counts(i)) {
newIndexes(i) = newIndexes(i) + 1
}
}
current = 0
indexes = newIndexes.toArray
}
buffer.result()
}
def hasNext: Boolean = current != seqs.length
}
}
Here's my solution to the given problem. Note that the laziness is simply caused by using .view on the "root collection" of the used for comprehension.
scala> def combine[A](xs: Traversable[Traversable[A]]): Seq[Seq[A]] =
| xs.foldLeft(Seq(Seq.empty[A])){
| (x, y) => for (a <- x.view; b <- y) yield a :+ b }
combine: [A](xs: Traversable[Traversable[A]])Seq[Seq[A]]
scala> combine(Set(Set("a","b","c"), Set("1","2"), Set("S","T"))) foreach (println(_))
List(a, 1, S)
List(a, 1, T)
List(a, 2, S)
List(a, 2, T)
List(b, 1, S)
List(b, 1, T)
List(b, 2, S)
List(b, 2, T)
List(c, 1, S)
List(c, 1, T)
List(c, 2, S)
List(c, 2, T)
To obtain this, I started from the function combine defined in https://stackoverflow.com/a/4515071/53974, passing it the function (a, b) => (a, b). However, that didn't quite work directly, since that code expects a function of type (A, A) => A. So I just adapted the code a bit.
These might be a starting point:
Cartesian product of two lists
Expand a Set[Set[String]] into Cartesian Product in Scala
https://stackoverflow.com/questions/6182126/im-learning-scala-would-it-be-possible-to-get-a-little-code-review-and-mentori
What about:
def cartesian[A](list: List[Seq[A]]): Iterator[Seq[A]] = {
if (list.isEmpty) {
Iterator(Seq())
} else {
list.head.iterator.flatMap { i => cartesian(list.tail).map(i +: _) }
}
}
Simple and lazy ;)
def cartesian[A](list: List[List[A]]): List[List[A]] = {
list match {
case Nil => List(List())
case h :: t => h.flatMap(i => cartesian(t).map(i :: _))
}
}
You can look here: https://stackoverflow.com/a/8318364/312172 how to translate a number into an index of all possible values, without generating every element.
This technique can be used to implement a stream.