Create function to group sequence at breakpoints - scala

I'm trying to create a function that takes two sequences and groups the elements first based on the "breakpoints" included in the second. An example:
val ls = ('a' to 'z').map(_.toString)
// IndexedSeq[String] = Vector(a, b, c, d, e, f, g, h, i, j, k, l, m, ...)
val breakpoints = Seq("c", "f", "j")
grouper(ls, breakpoints)
// Seq(Seq("a", "b"), Seq("c", "d", "e"), Seq("f", "g", "h", "i"), Seq("j", ...))
I've tried to do this with recursive calls to takeWhile and dropWhile (as I might in a language like Haskell), but as my current function doesn't use tail recursion, I receive a java.lang.StackOverflowError. Here's the function:
def grouper(strings: Seq[String], breaks: Seq[String]): Seq[Seq[String]] = strings match {
case Nil => Seq()
case s => s.takeWhile(breaks.contains(_)) +: grouper(s.dropWhile(breaks.contains(_)), breaks)
}
Is there a better way to approach this?

You're on the right track. takeWhile and dropWhile can be replaced by span.
def grouper(xs: Seq[String], breaks: Seq[String]): Seq[Seq[String]] = breaks match{
case Nil => Seq(xs)
case h::t => {
val split = xs.span(x => x != h);
split._1 +: grouper(split._2, t);
}
}
scala> grouper(ls, breakpoints)
res5: Seq[Seq[String]] = List(Vector(a, b), Vector(c, d, e), Vector(f, g, h, i),
Vector(j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z))
From the API on span: Note: c span p is equivalent to (but possibly more efficient than) (c takeWhile p, c dropWhile p), provided the evaluation of the predicate p does not cause any side-effects.

For this kind of problems, I always prefer to write my own tail-recursive function, operating o Lists.
import Ordering.Implicits._
def group[A : Ordering](data: List[A], breakPoints: List[A]): List[List[A]] = {
def takeUntil(list: List[A], breakPoint: A): (List[A], List[A]) = {
#annotation.tailrec
def loop(remaining: List[A], acc: List[A]): (List[A], List[A]) =
remaining match {
case x :: xs if (x < breakPoint) =>
loop(remaining = xs, x :: acc)
case _ =>
(acc.reverse, remaining)
}
loop(remaining = list, acc = List.empty)
}
#annotation.tailrec
def loop(remainingElements: List[A], remainingBreakPoints: List[A], acc: List[List[A]]): List[List[A]] =
remainingBreakPoints match {
case breakPoint :: remainingBreakPoints =>
val (group, remaining) = takeUntil(remainingElements, breakPoint)
loop(
remainingElements = remaining,
remainingBreakPoints,
group :: acc
)
case Nil =>
(remainingElements :: acc).reverse
}
loop(
remainingElements = data.sorted,
remainingBreakPoints = breakPoints.sorted,
acc = List.empty
)
}
You can use it like this:
group(data = ('a' to 'z').toList, breakPoints = List('c', 'f', 'j'))
//res: List[List[Char]] = List(
// List('a', 'b'),
// List('c', 'd', 'e'),
// List('f', 'g', 'h', 'i'),
// List('j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')
// )
This function will generate always a list of length = length(breakPoints) + 1.
If there are no more elements, it will generate empty lists.
(you can edit the code for your concrete requirements)

You can try a bit different approach:
def grouper(strings: Seq[String], breaks: Seq[String]): Seq[Seq[String]] = {
var i = 0
(for(x <- strings) yield {if (breaks.contains(x)) {i=i+1}; (x,i)})
.groupBy(_._2).map(_._2.map(_._1)).toList
}
grouper(ls,breakpoints).foreach(println(_))
Vector(f, g, h, i)
Vector(c, d, e)
Vector(j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z)
Vector(a, b)

Related

scala collections : map a list and carry some state?

I seem to run into this problem all the time. I want to modify some of the elements in a list, but I need to keep some state as I do it, so map doesn't work.
Here is an example :
scala> val l1 = List("a","b","c","d","e","f","b","c","e","b","a")
l1: List[String] = List(a, b, c, d, e, f, b, c, e, b, a)
I want to change the name of any duplicates. so I want to end up with this:
List(a1, b1, c1, d, e1, f, b2, c2, e2, b3, a2)
Getting the dupes is easy :
scala> val d = l1.diff(l1.distinct).distinct
d: List[String] = List(b, c, e, a)
Now I'm stuck. I made it work by converting d to a HashMap w/ a count, and writing a function to iterate over l1 and update it & the hash before recursing. Which works fine, but looks kinda ugly to me.
But I've always thought there should be a way to do w/ the collection classes.
Here is the rest of my solution which I don't like :
val m = d.map( _ -> 1).toMap
def makeIt(ms: Map[String, Int], ol: Iterator[String], res: List[String]=List[String]()) :List[String] = {
if( !ol.hasNext) return res
val no = ol.next()
val (p,nm) = ms.get(no) match {
case Some(v) => (s"$no$v", ms.updated(no,v+1))
case None => (no,ms)
}
makeIt(nm,ol,res :+ p)
}
makeIt(m,l1.iterator)
Which gives me what I want
res2: List[String] = List(a1, b1, c1, d, e1, f, b2, c2, e2, b3, a2)
I feel like I want "mapWithState" where I can just pass something along. Like Fold-ish. Maybe it exists and I just haven't found it yet?
Thanks
-------UPDATE---------
#Aluan Haddad's comment pointed me in this direction. Which destroys order, which is fine for my case. But the "state" is carried by zipWithIndex. I'm looking for a more general case where the state would require some computation at each element. But for this simple case I like it :
l1.groupBy(x=>x).values.flatMap( v =>{
if( v.length <= 1 ) v else {
v.zipWithIndex.map{ case (s,i) => s"$s${i+1}"}
}
})
res7: Iterable[String] = List(e1, e2, f, a1, a2, b1, b2, b3, c1, c2, d)
The tricky part is that the "d" and "f" elements get no modification.
This is what I came up with. It's a bit more concise, code wise, but does involve multiple traversals.
val l1: List[String] = List("a","b","c","d","e","f","b","c","e","b","a")
l1.reverse.tails.foldLeft(List[String]()){
case (res, Nil) => res
case (res, hd::tl) =>
val count = tl.count(_ == hd)
if (count > 0) s"$hd${count+1}" +: res
else if (res.contains(hd+2)) (hd+1) +: res
else hd +: res
}
//res0: List[String] = List(a1, b1, c1, d, e1, f, b2, c2, e2, b3, a2)
By using tails, each element, hd, is able to see the future, tl, and the past, res.
A simple but slow version
l1.zipWithIndex.map{ case (elem, i) =>
if (l1.count(_ == elem) == 1) {
elem
} else {
val n = {l1.take(i+1).count(_ == elem)}
s"$elem$n"
}
}
The next version is longer, less pretty, and not functional, but should be faster in the unlikely case that you are processing a very long list.
def makeUniq(in: Seq[String]): Seq[String] = {
// Count occurrence of each element
val m = mutable.Map.empty[String, Int]
for (elem <- in) {
m.update(elem, m.getOrElseUpdate(elem, 0) + 1)
}
// Remove elements with a single occurrence
val dupes = m.filter(_._2 > 1)
// Apply numbering to duplicate elements
in.reverse.map(e => {
val idx = dupes.get(e) match {
case Some(i) =>
dupes.update(e, i - 1)
i.toString
case _ =>
""
}
s"$e$idx"
}).reverse
}
The code is easier if you wanted to apply a count to every element rather than just the non-unique ones.
def makeUniq(in: Seq[String]): Seq[String] = {
val m = mutable.Map.empty[String, Int]
in.map{ e =>
val i = m.getOrElseUpdate(e, 0) + 1
m.update(e, i)
s"$e$i"
}
}

Scala Recursion Over Multiple Lists

I have this function that takes two lists and returns the sum of the two lists.
Example:
def sumOfSums(a: List[Int], b: List[Int]): Int = {
var sum = 0
for(elem <- a) sum += elem
for(elem <- b) sum += elem
sum
}
Simple enough, however now I'm trying to do it recursively and the second list parameter is throwing me off.
What I have so far:
def sumOfSumsRec(a: List[Int], b: List[Int], acc: Int): Int = a match {
case Nil => acc
case h :: t => sumOfSumsRec(t, acc + h)
}
There's 2 problems here:
I'm only matching on the 'a' List
I'm getting an error when I try to do acc + h, im not sure why.
Question: How can I recursively iterate over two lists to get their sum?
Pattern match both lists:
import scala.annotation.tailrec
def recSum(a: List[Int], b: List[Int]): Int = {
#tailrec
def recSumInternal(a: List[Int], b: List[Int], acc: Int): Int = {
(a, b) match {
case (x :: xs, y :: ys) => recSumInternal(xs, ys, x + y + acc)
case (x :: xs, Nil) => recSumInternal(xs, Nil, x + acc)
case (Nil, y :: ys) => recSumInternal(Nil, ys, y + acc)
case _ => acc
}
}
recSumInternal(a, b, 0)
}
Test:
recSum(List(1,2), List(3,4,5))
Yields:
15
Side note:
For any future readers of this post, I assumed this question was prinarly asked for educational purposes mostly, hence showing how recursion can work on multiple lists, but this is by no mean an idiomatic way to take. For any other purposes, by all means:
scala> val a = List(1,2)
a: List[Int] = List(1, 2)
scala> val b = List(3,4,5)
b: List[Int] = List(3, 4, 5)
scala> a.sum + b.sum
res0: Int = 15
Or consider using mechanisms such as foldLeft, foldMap, etc.

Can I use a fold function to implement pack functionality?

I'm working on the following problem:
Pack consecutive duplicates of list elements into sublists. If a list
contains repeated elements they should be placed in separate sublists.
Example:
scala> pack(List('a, 'a, 'a, 'a, 'b, 'c, 'c, 'a, 'a, 'd, 'e, 'e, 'e,
'e)) res0: List[List[Symbol]] = List(List('a, 'a, 'a, 'a), List('b),
List('c, 'c), List('a, 'a), List('d), List('e, 'e, 'e, 'e))
I am wondering whether it can be implemented using foldRight. So far I can only make a recursive solution like below work:
def pack(list: List[Char]) = {
def checkNext(a: List[List[Char]], prev: Char, l: List[Char]): List[List[Char]] = l match {
case Nil => a
case h::tail if h == prev => checkNext((h::a.head)::a.tail,h,tail)
case h::tail => checkNext(List(h)::a,h,tail)
}
checkNext(List(List[Char](list.last)), list.last, list.init.reverse)
}
Absolutely! I find it to be very natural to use folds to accumulate a complex result from iterating a sequence. Essentially, it works the same as what you're doing now, except the matching on the list is provided to you by fold, and you just provide the processing for the cases. I'm not sure if you wanted an actual answer, so I'll try to give you a couple hints.
Think of the type of your final result. Now think of what value of that type would be the result of applying your process to an empty sequence. That's your first argument to foldRight/foldLeft.
Now you have to define what to do to extend your accumulator for each item you process. It seems to me you have two cases: either you've encountered a new letter that you haven't seen before or you're adding another instance to an existing list. You can use some fancy matching to detect which case you're in.
Here's how I'd do it:
def pack(list: List[Char]) = list.foldLeft(List.empty[List[Char]]) { case (acc, next) =>
acc.headOption.flatMap(_.headOption) match {
case Some(x) if x == next => (acc.head :+ next) +: acc.tail
case _ => List(next) +: acc
}
}.reverse
I used flatMap to join the two checks for whether there's a list at all yet and whether the a list for the current character exists. I find foldLeft to be more intuitive and it also has the added benefit of being tail recursive on List.
The result:
scala> pack(List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd',
'e', 'e', 'e', 'e'))
res1: List[List[Char]] = List(List(a, a, a, a),
List(b), List(c, c), List(a, a), List(d), List(e, e, e, e))
Here is my version of fold:
def pack[A](xs: List[A]): List[List[A]] =
xs.foldRight(List[List[A]]()){
case (x, (ys#(y::_)) :: rs) if x == y => (x::ys) :: rs
case (x, ys) => List(x) :: ys
}
However, I prefer the recursive one:
def pack2[A](xs: List[A]): List[List[A]] = xs match {
case Nil => Nil
case x::_ => val (hs, ts) = xs.span(x==); hs::pack2(ts)
}
The recursive one is clearer and shorter than the fold version, in addition it is faster!
scala> def time(n: Int)(call : => Unit): Long = {
| var cnt = 0
| val start = System.currentTimeMillis
| while(cnt < n) {
| cnt += 1
| call
| }
| System.currentTimeMillis - start
| }
time: (n: Int)(call: => Unit)Long
scala> val xs = ("A"*100 + "B"*1000 + "C"*10 + "DEFGH"*1000).toList
xs: List[Char] = List(A, A, A...)
scala> time(10000){ pack(xs) }
res3: Long = 19961
scala> time(10000){ pack2(xs) }
res4: Long = 4382
And named #acjay's version as pack3:
scala> def pack3(list: List[Char]) = list.foldLeft(List.empty[List[Char]]) { case (acc, next) =>
| acc.headOption.flatMap(_.headOption) match {
| case Some(x) if x == next => (acc.head :+ next) +: acc.tail
| case _ => List(next) +: acc
| }
| }.reverse
pack3: (list: List[Char])List[List[Char]]
scala> time(10000){ pack3(xs) }
res5: Long = 420946
scala> pack3(xs) == pack2(xs)
res6: Boolean = true
scala> pack3(xs) == pack(xs)
res7: Boolean = true
Implementation by Martin Odersky
def pack[T](xs: List[T]): List[List[T]] = xs match{
case Nil => Nil
case x :: xs1 =>
val (first, rest) = xs span (y => y == x)
first :: pack(rest)
}

Scala - can a lambda parameter match a tuple?

So say i have some list like
val l = List((1, "blue"), (5, "red"), (2, "green"))
And then i want to filter one of them out, i can do something like
val m = l.filter(item => {
val (n, s) = item // "unpack" the tuple here
n != 2
}
Is there any way i can "unpack" the tuple as the parameter to the lambda directly, instead of having this intermediate item variable?
Something like the following would be ideal, but eclipse tells me wrong number of parameters; expected=1
val m = l.filter( (n, s) => n != 2 )
Any help would be appreciated - using 2.9.0.1
This is about the closest you can get:
val m = l.filter { case (n, s) => n != 2 }
It's basically pattern matching syntax inside an anonymous PartialFunction. There are also the tupled methods in Function object and traits, but they are just a wrapper around this pattern matching expression.
Hmm although Kipton has a good answer. You can actually make this shorter by doing.
val l = List((1, "blue"), (5, "red"), (2, "green"))
val m = l.filter(_._1 != 2)
There are a bunch of options:
for (x <- l; (n,s) = x if (n != 2)) yield x
l.collect{ case x # (n,s) if (n != 2) => x }
l.filter{ case (n,s) => n != 2 }
l.unzip.zipped.map((n,s) => n != 2).zip // Complains that zip is deprecated
val m = l.filter( (n, s) => n != 2 )
... is a type mismatch because that lambda defines a
Function2[String,Int,Boolean] with two parameters instead of
Function1[(String,Int),Boolean] with one Tuple2[String,Int] as its parameter.
You can convert between them like this:
val m = l.filter( ((n, s) => n != 2).tupled )
I've pondered the same, and came to your question today.
I'm not very fond of the partial function approaches (anything having case) since they imply that there could be more entry points for the logic flow. At least to me, they tend to blur the intention of the code. On the other hand, I really do want to go straight to the tuple fields, like you.
Here's a solution I drafted today. It seems to work, but I haven't tried it in production, yet.
object unTuple {
def apply[A, B, X](f: (A, B) => X): (Tuple2[A, B] => X) = {
(t: Tuple2[A, B]) => f(t._1, t._2)
}
def apply[A, B, C, X](f: (A, B, C) => X): (Tuple3[A, B, C] => X) = {
(t: Tuple3[A, B, C]) => f(t._1, t._2, t._3)
}
//...
}
val list = List( ("a",1), ("b",2) )
val list2 = List( ("a",1,true), ("b",2,false) )
list foreach unTuple( (k: String, v: Int) =>
println(k, v)
)
list2 foreach unTuple( (k: String, v: Int, b: Boolean) =>
println(k, v, b)
)
Output:
(a,1)
(b,2)
(a,1,true)
(b,2,false)
Maybe this turns out to be useful. The unTuple object should naturally be put aside in some tool namespace.
Addendum:
Applied to your case:
val m = l.filter( unTuple( (n:Int,color:String) =>
n != 2
))

Expand a Set[Set[String]] into Cartesian Product in Scala

I have the following set of sets. I don't know ahead of time how long it will be.
val sets = Set(Set("a","b","c"), Set("1","2"), Set("S","T"))
I would like to expand it into a cartesian product:
Set("a&1&S", "a&1&T", "a&2&S", ..., "c&2&T")
How would you do that?
I think I figured out how to do that.
def combine(acc:Set[String], set:Set[String]) = for (a <- acc; s <- set) yield {
a + "&" + s
}
val expanded = sets.reduceLeft(combine)
expanded: scala.collection.immutable.Set[java.lang.String] = Set(b&2&T, a&1&S,
a&1&T, b&1&S, b&1&T, c&1&T, a&2&T, c&1&S, c&2&T, a&2&S, c&2&S, b&2&S)
Nice question. Here's one way:
scala> val seqs = Seq(Seq("a","b","c"), Seq("1","2"), Seq("S","T"))
seqs: Seq[Seq[java.lang.String]] = List(List(a, b, c), List(1, 2), List(S, T))
scala> val seqs2 = seqs.map(_.map(Seq(_)))
seqs2: Seq[Seq[Seq[java.lang.String]]] = List(List(List(a), List(b), List(c)), List(List(1), List(2)), List(List(S), List(T)))
scala> val combined = seqs2.reduceLeft((xs, ys) => for {x <- xs; y <- ys} yield x ++ y)
combined: Seq[Seq[java.lang.String]] = List(List(a, 1, S), List(a, 1, T), List(a, 2, S), List(a, 2, T), List(b, 1, S), List(b, 1, T), List(b, 2, S), List(b, 2, T), List(c, 1, S), List(c, 1, T), List(c, 2, S), List(c, 2, T))
scala> combined.map(_.mkString("&"))
res11: Seq[String] = List(a&1&S, a&1&T, a&2&S, a&2&T, b&1&S, b&1&T, b&2&S, b&2&T, c&1&S, c&1&T, c&2&S, c&2&T)
Came after the batle ;) but another one:
sets.reduceLeft((s0,s1)=>s0.flatMap(a=>s1.map(a+"&"+_)))
Expanding on dsg's answer, you can write it more clearly (I think) this way, if you don't mind the curried function:
def combine[A](f: A => A => A)(xs:Iterable[Iterable[A]]) =
xs reduceLeft { (x, y) => x.view flatMap { y map f(_) } }
Another alternative (slightly longer, but much more readable):
def combine[A](f: (A, A) => A)(xs:Iterable[Iterable[A]]) =
xs reduceLeft { (x, y) => for (a <- x.view; b <- y) yield f(a, b) }
Usage:
combine[String](a => b => a + "&" + b)(sets) // curried version
combine[String](_ + "&" + _)(sets) // uncurried version
Expanding on #Patrick's answer.
Now it's more general and lazier:
def combine[A](f:(A, A) => A)(xs:Iterable[Iterable[A]]) =
xs.reduceLeft { (x, y) => x.view.flatMap {a => y.map(f(a, _)) } }
Having it be lazy allows you to save space, since you don't store the exponentially many items in the expanded set; instead, you generate them on the fly. But, if you actually want the full set, you can still get it like so:
val expanded = combine{(x:String, y:String) => x + "&" + y}(sets).toSet