Related
How to split an iterator into a prefix with duplicates and the rest ? For instance,
def splitDupes(it: Iterator[Int]): (Iterator[Int], Iterator[Int]) = ???
val (xs, ys) = splitDupes(List(1, 1, 1, 2, 3, 4, 5).iterator)
xs.toList // List(1, 1, 1)
ys.toList // List(2, 3, 4, 5)
val (xs, ys) = splitDupes(List(1, 2, 3, 4, 5).iterator)
xs.toList // List(1)
ys.toList // List(2, 3, 4, 5)
val (xs, ys) = splitDupes(List(1, 1, 1, 1, 1).iterator)
xs.toList // List(1, 1, 1, 1, 1)
ys.toList // List()
val (xs, ys) = splitDupes(List[Int]().iterator)
xs.toList // List()
ys.toList // List()
Can I use it to read a text file by chunks ?
You can use the span method to split an Iterable into a prefix that satisfies a predicate and a suffix that doesn't. For Iterators span does the correct thing, and lazily stores elements in the prefix Iterator, in case the suffix was iterated before the prefix has run out.
def splitDupes[T](it: Iterator[T]): (Iterator[T], Iterator[T]) = {
if (it.isEmpty) (Iterator.empty, Iterator.empty)
else {
val head = it.next()
val (dupes, rest) = it.span(_ == head)
(Iterator(head) ++ dupes, rest)
}
}
Example:
scala> val (dupes, rest) = splitDupes(Iterator(1,1,1,2,3,2,1))
dupes: Iterator[Int] = <iterator>
rest: Iterator[Int] = <iterator>
scala> (dupes.toList, rest.toList)
res1: (List[Int], List[Int]) = (List(1, 1, 1),List(2, 3, 2, 1))
What about something like this?
(Note: I decided to return a plain List as the first part since that would already been consumed)
def splitDupes[A](it: Iterator[A]): (List[A], Iterator[A]) = {
it.nextOption() match {
case Some(head) =>
#annotation.tailrec
def loop(count: Int): (List[A], Iterator[A]) =
it.nextOption() match {
case Some(x) if (x == head) =>
loop(count + 1)
case Some(x) =>
List.fill(count)(head) -> Iterator(Iterator.single(x), it).flatten
case None =>
List.fill(count)(head) -> Iterator.empty
}
loop(count = 1)
case None =>
List.empty -> Iterator.empty
}
}
I want to split a list of elements into a list of lists such that neighboring elements in the inner list satisfy a given condition.
A simple condition would be neighboring elements are equal. Then if the input is List(1,1,1,2,2,3,3,3,3) output is List(List(1,1,1),List(2,2),List(3,3,3)).
Another condition could be current element should be greater than prev element. Then if the input is List(1,2,3,1,4,6,5,7,8), the output is List(List(1,2,3), List(1,4,6), List(5,7,8)). It would also be wonderful if the method can act on Iterator. The typedef of the method is
def method[A](lst:List[A], cond:(A,A)=>Boolean):List[List[A]]
def method[A](lst:Iterator[A], cond:(A,A)=>Boolean):Iterator[Iterator[A]]
You can use sliding together with span in a recursive function for the desired effect. This quick and dirty version is less efficient, but terser than some of the alternative:
def method[A](lst: TraversableOnce[A], cond: (A, A) => Boolean): List[List[A]] = {
val iterable = lst.toIterable
iterable.headOption.toList.flatMap { head =>
val (next, rest) = iterable.sliding(2).filter(_.size == 2).span(x => cond(x.head, x.last))
(head :: next.toList.map(_.last)) :: method(rest.map(_.last), cond)
}
}
If you want to lazily execute the code, you can return an Iterator[List[A]] instead of List[List[A]]:
def method[A](lst: TraversableOnce[A], cond: (A, A) => Boolean): Iterator[List[A]] = {
val iterable = lst.toIterable
iterable.headOption.toIterator.flatMap { head =>
val (next, rest) = iterable.sliding(2).filter(_.size == 2).span(x => cond(x.head, x.last))
Iterator(head :: next.toList.map(_.last)) ++ method(rest.map(_.last), cond)
}
}
And you can verify that this is lazy:
val x = (Iterator.range(0, 10) ++ Iterator.range(3, 5) ++ Iterator.range(1, 3)).map(x => { println(x); x })
val iter = method(x, (x: Int, y: Int) => x < y) //Only prints 0-9, and then 3!
iter.take(2).toList //Prints more
iter.toList //Prints the rest
You can make it even lazier by returning an Iterator[Iterator[A]]:
def method[A](lst: TraversableOnce[A], cond: (A, A) => Boolean): Iterator[Iterator[A]] = {
val iterable = lst.toIterable
iterable.headOption.toIterator.flatMap { head =>
val (next, rest) = iterable.sliding(2).filter(_.size == 2).span(x => cond(x.head, x.last))
Iterator(Iterator(head) ++ next.toIterator.map(_.last)) ++ method(rest.map(_.last), cond)
}
}
As a relatively unrelated side note, when you have generic parameters of this form, you're better off using 2 parameter lists:
def method[A](lst: TraversableOnce[A])(cond: (A, A) => Boolean)
When you have 2 parameter lists like this, the type inference can be a little bit smarter:
//No need to specify parameter types on the anonymous function now!
method(List(1, 3, 2, 3, 4, 1, 8, 1))((x, y) => x < y).toList
//You can now even use underscore anonymous function notation!
method(List(1, 4, 2, 3, 4, 1, 8))(_ < _)
Here is something close (I believe) to what you are asking for. The only issue with this is that it always produces a List of Lists for the result as opposed to being based on the input type:
val iter = Iterator(1,1,2,2,2,3,3,3)
val list = List(4,5,5,5,5,6,6)
def same(a:Int,b:Int) = a == b
def gt(a:Int, b:Int) = b > a
println(groupByPred(iter, same))
println(groupByPred(list, gt))
def groupByPred[L <: TraversableOnce[T], T](trav:L, cond:(T,T) => Boolean):List[List[T]] = {
val (ret, inner) =
trav.foldLeft((List.empty[List[T]], List.empty[T])){
case ((acc, list), el) if list.isEmpty || cond(list.head, el) => (acc,el :: list)
case ((acc, list), el) => (list.reverse :: acc,el :: List.empty)
}
(inner.reverse :: ret).reverse
}
If you run that code, the output should be the following:
List(List(1, 1), List(2, 2, 2), List(3, 3, 3))
List(List(4, 5), List(5), List(5), List(5, 6), List(6))
Try this.
Puts the head of the list as the first element of the first element of the List of Lists. Then adds things to that first List if the condition holds. If it doesn't, starts a new List with the current entry as the first element.
Both the inner list and the outer are constructed in the wrong order. So reverse each element of the outer List (with map) and then reverse the outer list.
val xs = List(1, 1, 1, 2, 2, 3, 3, 3, 3)
val ys = List(1, 2, 3, 1, 4, 6, 5, 7, 8)
def method[A](lst: List[A], cond: (A, A) => Boolean): List[List[A]] = {
lst.tail.foldLeft(List(List(lst.head))) { (acc, e) =>
if (cond(acc.head.head, e))
(e :: acc.head) :: acc.tail
else List(e) :: acc
}.map(_.reverse).reverse
}
method(xs, { (a: Int, b: Int) => a == b })
//> res0: List[List[Int]] = List(List(1, 1, 1), List(2, 2), List(3, 3, 3, 3))
method(ys, { (a: Int, b: Int) => a < b })
//> res1: List[List[Int]] = List(List(1, 2, 3), List(1, 4, 6), List(5, 7, 8))
Iterator overload
def method[A](iter:Iterator[A], cond: (A, A) => Boolean): List[List[A]] = {
val h = iter.next
iter.foldLeft(List(List(h))) { (acc, e) =>
if (cond(acc.head.head, e))
(e :: acc.head) :: acc.tail
else List(e) :: acc
}.map(_.reverse).reverse
}
method(xs.toIterator, { (a: Int, b: Int) => a == b })
//> res0: List[List[Int]] = List(List(1, 1, 1), List(2, 2), List(3, 3, 3, 3))
method(ys.toIterator, { (a: Int, b: Int) => a < b })
//> res1: List[List[Int]] = List(List(1, 2, 3), List(1, 4, 6), List(5, 7, 8))
More generic version (hat-tip to #cmbaxter for some ideas here) that works with Lists, Iterators and anything that can be traversed once:
def method[A, T <: TraversableOnce[A]](trav: T, cond: (A, A) => Boolean)
: List[List[A]] = {
trav.foldLeft(List(List.empty[A])) { (acc, e) =>
if (acc.head.isEmpty || !cond(acc.head.head, e)) List(e) :: acc
else (e :: acc.head) :: acc.tail
}.map(_.reverse).reverse
}
This is a simple exercise I am solving in Scala: given a list l return a new list, which contains every n-th element of l. If n > l.size return an empty list.
def skip(l: List[Int], n: Int) =
Range(1, l.size/n + 1).map(i => l.take(i * n).last).toList
My solution (see above) seem to work but I am looking for smth. simpler. How would you simplify it?
Somewhat simpler:
scala> val l = (1 to 10).toList
l: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
// n == 3
scala> l.drop(2).grouped(3).map(_.head).toList
res0: List[Int] = List(3, 6, 9)
// n > l.length
scala> l.drop(11).grouped(12).map(_.head).toList
res1: List[Int] = List()
(the toList just to force the iteratot to be evaluated)
Works with infinite lists:
Stream.from(1).drop(2).grouped(3).map(_.head).take(4).toList
res2: List[Int] = List(3, 6, 9, 12)
scala> def skip[A](l:List[A], n:Int) =
l.zipWithIndex.collect {case (e,i) if ((i+1) % n) == 0 => e} // (i+1) because zipWithIndex is 0-based
skip: [A](l: List[A], n: Int)List[A]
scala> val l = (1 to 10).toList
l: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> skip(l,3)
res2: List[Int] = List(3, 6, 9)
scala> skip(l,11)
res3: List[Int] = List()
A bit more readable and the loop size is O(l.length/n):
def skip(l: List[Int], n: Int) = {
require(n > 0)
for (step <- Range(start = n - 1, end = l.length, step = n))
yield l(step)
}
Fold left approach O(n)
def skip(xs: List[Int], n: Int) = {
xs.foldLeft((List[Int](), n)){ case ((acc, counter), x) =>
if(counter==1)
(x+:acc,n)
else
(acc, counter-1)
}
._1
.reverse
}
scala > skip(List(1,2,3,4,5,6,7,8,9,10), 3)
Tailrec less readable approach O(n)
import scala.annotation.tailrec
def skipTR(xs: List[Int], n: Int) = {
#tailrec
def go(ys: List[Int], acc: List[Int], counter: Int): List[Int] = ys match {
case k::ks=>
if(counter==1)
go(ks, k+:acc , n)
else
go(ks, acc, counter-1)
case Nil => acc
}
go(xs, List(), n).reverse
}
skipTR(List(1,2,3,4,5,6,7,8,9,10), 3)
You could omit toList if you don't mind an iterator:
scala> def skip[A](l:List[A], n:Int) =
l.grouped(n).filter(_.length==n).map(_.last).toList
skip: [A](l: List[A], n: Int)List[A]
scala> skip (l,3)
res6: List[Int] = List(3, 6, 9)
Two approaches based in filter on indexes, as follows,
implicit class RichList[A](val list: List[A]) extends AnyVal {
def nthA(n: Int) = n match {
case 0 => List()
case _ => (1 to a.size).filter( _ % n == 0).map { i => list(i-1)}
}
def nthB(n: Int) = n match {
case 0 => List()
case _ => list.zip(Stream.from(1)).filter(_._2 % n == 0).unzip._1
}
}
and so for a given list
val a = ('a' to 'z').toList
we have that
a.nthA(5)
res: List(e, j, o, t, y)
a.nthA(123)
res: List()
a.nthA(0)
res: List()
Update
Using List.tabulate as follows,
implicit class RichList[A](val list: List[A]) extends AnyVal {
def nthC(n: Int) = n match {
case 0 => List()
case n => List.tabulate(list.size) {i =>
if ((i+1) % n == 0) Some(list(i))
else None }.flatten
}
}
In Python I'm able to group consecutive elements with the same key by using itertools.groupby:
>>> items = [(1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4)]
>>> import itertools
>>> list(key for key,it in itertools.groupby(items, lambda tup: tup[0]))
[1, 2, 3, 1]
Scala has groupBy as well, but it produces different result - a map pointing from key to all the values found in the iterable with the specified key (not the consecutive runs with the same key):
scala> val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items: List[(Int, Int)] = List((1,2), (1,5), (1,3), (2,9), (3,7), (1,5), (1,4))
scala> items.groupBy {case (key, value) => key}
res0: scala.collection.immutable.Map[Int,List[(Int, Int)]] = Map(2 -> List((2,9)), 1 -> List((1,2), (1,5), (1,3), (1,5), (1,4)), 3 -> List((3,7)))
What is the most eloquent way of achieving the same as with Python itertools.groupby?
If you just want to throw out sequential duplicates, you can do something like this:
def unchain[A](items: Seq[A]) = if (items.isEmpty) items else {
items.head +: (items zip items.drop(1)).collect{ case (l,r) if r != l => r }
}
That is, just compare the list to a version of itself shifted by one place, and only keep the items which are different. It's easy to add a (same: (a1: A, a2: A) => Boolean) parameter to the method and use !same(l,r) if you want custom behavior for what counts as the same (e.g. do it just by key).
If you want to keep the duplicates, you can use Scala's groupBy to get a very compact (but inefficient) solution:
def groupSequential(items: Seq[A])(same: (a1: A, a2: A) => Boolean) = {
val ns = (items zip items.drop(1)).
scanLeft(0){ (n,cc) => if (same(cc._1, cc._2)) n+1 else n }
(ns zip items).groupBy(_._1).toSeq.sortBy(_._1).map(_._2)
}
Using List.span, like this
def keyMultiSpan(l: List[(Int,Int)]): List[List[(Int,Int)]] = l match {
case Nil => List()
case h :: t =>
val ms = l.span(_._1 == h._1)
ms._1 :: keyMultiSpan(ms._2)
}
Hence let
val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
and so
keyMultiSpan(items).map { _.head._1 }
res: List(1, 2, 3, 1)
Update
A more readable syntax, as suggested by #Paul, an implicit class for possibly neater usage, and type parameterisation for generality,
implicit class RichSpan[A,B](val l: List[(A,B)]) extends AnyVal {
def keyMultiSpan(): List[List[(A,B)]] = l match {
case Nil => List()
case h :: t =>
val (f, r) = l.span(_._1 == h._1)
f :: r.keyMultiSpan()
}
}
Thus, use it as follows,
items.keyMultiSpan.map { _.head._1 }
res: List(1, 2, 3, 1)
Here is a succinct but inefficient solution:
def pythonGroupBy[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
items.foldLeft(List[List[T]]()) {
case (Nil, x) => List(List(x))
case (g :: gs, x) if f(g.head) == f(x) => (x :: g) :: gs
case (gs, x) => List(x) :: gs
}.map(_.reverse).reverse
}
And here is a better one, that only invokes f on each element once:
def pythonGroupBy2[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
if (items.isEmpty)
List(List())
else {
val state = (List(List(items.head)), f(items.head))
items.tail.foldLeft(state) { (state, x) =>
val groupByX = f(x)
state match {
case (g :: gs, groupBy) if groupBy == groupByX => ((x :: g) :: gs, groupBy)
case (gs, _) => (List(x) :: gs, groupByX)
}
}._1.map(_.reverse).reverse
}
}
Both solutions fold over items, building up a list of groups as they go. pythonGroupBy2 also keeps track of the value of f for the current group. At the end, we have to reverse each group and the list of groups in order to get the correct order.
Try:
val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
val res = compress(items.map(_._1))
/** Eliminate consecutive duplicates of list elements **/
def compress[T](l : List[T]) : List[T] = l match {
case head :: next :: tail if (head == next) => compress(next :: tail)
case head :: tail => head :: compress(tail)
case Nil => List()
}
/** Tail recursive version **/
def compress[T](input: List[T]): List[T] = {
def comp(remaining: List[T], l: List[T], last: Any): List[T] = {
remaining match {
case Nil => l
case head :: tail if head == last => comp(tail, l, head)
case head :: tail => comp(tail, head :: l, head)
}
}
comp(input, Nil, Nil).reverse
}
Where compress is the solution of one of the 99 Problems in Scala.
hmm couldn't find something out of the box but this will do it
def groupz[T](list:List[T]):List[T] = {
list match {
case Nil => Nil
case x::Nil => List(x)
case x::xs if (x == xs.head) => groupz(xs)
case x::xs => x::groupz(xs)
}}
//now let's add this functionality to List class
implicit def addPythonicGroupToList[T](list:List[T]) = new {def pythonGroup = groupz(list)}
and now you can do:
val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items.map(_._1).pythonGroup
res1: List[Int] = List(1, 2, 3, 1)
Here is a simple solution that I used for a problem I stumbled on at work. In this case I didn't care too much about space, so did not worry about efficient iterators. Used an ArrayBuffer to accumulate the results.
(Don't use this with enormous amounts of data.)
Sequential GroupBy
import scala.collection.mutable.ArrayBuffer
object Main {
/** Returns consecutive keys and groups from the iterable. */
def sequentialGroupBy[A, K](items: Seq[A], f: A => K): ArrayBuffer[(K, ArrayBuffer[A])] = {
val result = ArrayBuffer[(K, ArrayBuffer[A])]()
if (items.nonEmpty) {
// Iterate, keeping track of when the key changes value.
var bufKey: K = f(items.head)
var buf: ArrayBuffer[A] = ArrayBuffer()
for (elem <- items) {
val key = f(elem)
if (key == bufKey) {
buf += elem
} else {
val group: (K, ArrayBuffer[A]) = (bufKey, buf)
result += group
bufKey = key
buf = ArrayBuffer(elem)
}
}
// Append last group.
val group: (K, ArrayBuffer[A]) = (bufKey, buf)
result += group
}
result
}
def main(args: Array[String]): Unit = {
println("\nExample 1:")
sequentialGroupBy[Int, Int](
Seq(1, 4, 5, 7, 9, 8, 16),
x => x % 2
).foreach(println)
println("\nExample 2:")
sequentialGroupBy[String, Boolean](
Seq("pi", "nu", "rho", "alpha", "xi"),
x => x.length > 2
).foreach(println)
}
}
Running the above code results in the following:
Example 1:
(1,ArrayBuffer(1))
(0,ArrayBuffer(4))
(1,ArrayBuffer(5, 7, 9))
(0,ArrayBuffer(8, 16))
Example 2:
(false,ArrayBuffer(pi, nu))
(true,ArrayBuffer(rho, alpha))
(false,ArrayBuffer(xi))
Given e.g.:
List(5, 2, 3, 3, 3, 5, 5, 3, 3, 2, 2, 2)
I'd like to get to:
List(List(5), List(2), List(3, 3, 3), List(5, 5), List(3, 3), List(2, 2, 2))
I would assume there is a simple List function that does this, but am unable to find it.
This is the trick that I normally use:
def split[T](list: List[T]) : List[List[T]] = list match {
case Nil => Nil
case h::t => val segment = list takeWhile {h ==}
segment :: split(list drop segment.length)
}
Actually... It's not, I usually abstract over the collection type and optimize with tail recursion as well, but wanted to keep the answer simple.
val xs = List(5, 2, 3, 3, 3, 5, 5, 3, 3, 2, 2, 2)
Here's another way.
(List(xs.take(1)) /: xs.tail)((l,r) =>
if (l.head.head==r) (r :: l.head) :: l.tail else List(r) :: l
).reverseMap(_.reverse)
Damn Rex Kerr, for writing the answer I'd go for. Since there are minor stylistic differences, here's my take:
list.tail.foldLeft(List(list take 1)) {
case (acc # (lst # hd :: _) :: tl, el) =>
if (el == hd) (el :: lst) :: tl
else (el :: Nil) :: acc
}
Since the elements are identical, I didn't bother reversing the sublists.
list.foldRight(List[List[Int]]()){
(e, l) => l match {
case (`e` :: xs) :: fs => (e :: e :: xs) :: fs
case _ => List(e) :: l
}
}
Or
list.zip(false :: list.sliding(2).collect{case List(a,b) => a == b}.toList)
.foldLeft(List[List[Int]]())((l,e) => if(e._2) (e._1 :: l.head) :: l.tail
else List(e._1) :: l ).reverse
[Edit]
//find the hidden way
//the beauty must be somewhere
//when we talk scala
def split(l: List[Int]): List[List[Int]] =
l.headOption.map{x => val (h,t)=l.span{x==}; h::split(t)}.getOrElse(Nil)
I have these implementations lying around from working on collections methods. In the end I checked in simpler implementations of inits and tails and left out cluster. Every new method no matter how simple ends up collecting a big tax which is hard to see from the outside. But here's the implementation I didn't use.
import generic._
import scala.reflect.ClassManifest
import mutable.ListBuffer
import annotation.tailrec
import annotation.unchecked.{ uncheckedVariance => uV }
def inits: List[Repr] = repSequence(x => (x, x.init), Nil)
def tails: List[Repr] = repSequence(x => (x, x.tail), Nil)
def cluster[A1 >: A : Equiv]: List[Repr] =
repSequence(x => x.span(y => implicitly[Equiv[A1]].equiv(y, x.head)))
private def repSequence(
f: Traversable[A #uV] => (Traversable[A #uV], Traversable[A #uV]),
extras: Traversable[A #uV]*): List[Repr] = {
def mkRepr(xs: Traversable[A #uV]): Repr = newBuilder ++= xs result
val bb = new ListBuffer[Repr]
#tailrec def loop(xs: Repr): List[Repr] = {
val seq = toCollection(xs)
if (seq.isEmpty)
return (bb ++= (extras map mkRepr)).result
val (hd, tl) = f(seq)
bb += mkRepr(hd)
loop(mkRepr(tl))
}
loop(self.repr)
}
[Edit: I forget other people won't know the internals. This code is written from inside of TraversableLike, so it wouldn't run out of the box.]
Here's a slightly cleaner one:
def groupConsequtive[A](list: List[A]): List[List[A]] = list match {
case head :: tail =>
val (t1, t2) = tail.span(_ == head)
(head :: t1) :: groupConsequtive(t2)
case _ => Nil
}
tail-recursive version
#tailrec
def groupConsequtive[A](list: List[A], acc: List[List[A]] = Nil): List[List[A]] = list match {
case head :: tail =>
val (t1, t2) = tail.span(_ == head)
groupConsequtive(t2, acc :+ (head :: t1))
case _ => acc
}
Here's a tail-recursive solution inspired by #Kevin Wright and #Landei:
#tailrec
def sliceEqual[A](s: Seq[A], acc: Seq[Seq[A]] = Seq()): Seq[Seq[A]] = {
s match {
case fst :: rest =>
val (l, r) = s.span(fst==)
sliceEqual(r, acc :+ l)
case Nil => acc
}
}
this could be simpler:
val input = List(5, 2, 3, 3, 3, 5, 5, 3, 3, 2, 2, 2)
input groupBy identity values