Scala future with unknown loop count - scala

The case is try to get children comment list for a comment, and it is easy to get all the children comments by a 'parentId' field in database. However, take the possible long list into consideration, it's better to do pagination for db query. We can get specified rows like this:
def getChildCommentsByParentId(parentId: Int, offset: Int, rows: Int): Future[Seq[CommentsRow]] = {
val query = Comments.filter(row => row.iparentid === parentId && row.istatus === statusNormal ).drop(offset).take(rows)
db.run(query.result)
}
For getting all the rows by doing several pagination call with a fixed page size, it's easy to express this in an iterative way in other programing language, like this:
$replyList = [];
do {
$ret = getChildCommentsByParentId(parentId, page * pagesize, pagesize);
if (!empty($ret)) {
$replyList = array_merge($replyList, $ret);
}
if (count($ret) < $pageSize) {
break;
}
$page ++ ;
} while(true);
It's reasonable to use for comprehension to handle multiple futures in scala, but i don't know how to handle this case. The key point is, current iteration depends on previous one(the 'page' parameter), and when the count of rows is less than pagesize, returns a full result with all the previous data rows.
Thanks!

I try to solve it in a recursive way:
def getChildComments(nextPage: Int, commentId: Int) : Future[CommentResult] = {
val pageSize = 5
val currentPageFuture = commentModel.getChildCommentsByParentId(commentId, nextPage*pageSize, pageSize).map {
data => CommentResult(nextPage, data)
}
val nextPageFuture = currentPageFuture.flatMap({ commentRes =>
val nextPage = commentRes.nextPage+1
if(commentRes.data.size >= pageSize) {
getChildComments(nextPage, commentId)
}
else {
Future.successful(CommentResult(nextPage, Seq()))
}
})
for {
currentPageComments <- currentPageFuture
nextPageComments <- nextPageFuture
} yield {
CommentResult(currentPageComments.nextPage, currentPageComments.data ++ nextPageComments.data)
}
}
case class CommentResult(nextPage: Int, data: Seq[TbcommentsRow])
But still feel it's ugly, doubt it has more elegant way?

Related

Understanding performance of a tailrec annotated recursive method in scala

Consider the following method - which has been verified to conform to the proper tail recursion :
#tailrec
def getBoundaries(grps: Seq[(BigDecimal, Int)], groupSize: Int, curSum: Int = 0, curOffs: Seq[BigDecimal] = Seq.empty[BigDecimal]): Seq[BigDecimal] = {
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (newSum%50==0) { println(s"id=$id newsum=$newSum") }
if (newSum >= groupSize) {
getBoundaries(grps.tail, groupSize, 0, curOffs :+ id) // r1
} else {
getBoundaries(grps.tail, groupSize, newSum, curOffs) // r2
}
}
}
This is running very slowly - about 75 loops per second. When I hit the stacktrace (a nice feature of Intellij) almost every time the line that is currently being invoked is the second tail-recursive call r2. That fact makes me suspicious of the purported "scala unwraps the recursive calls into a while loop". If the unwrapping were occurring then why are we seeing so much time in the invocations themselves?
Beyond having a properly structured tail recursive method are there other considerations to get a recursive routine have performance approaching a direct iteration?
The performance will depend on the underlying type of the Seq.
If it is List then the problem is appending (:+) to the List because this gets very slow with long lists because it has to scan the whole list to find the end.
One solution is to prepend to the list (+:) each time and then reverse at the end. This can give very significant performance improvements, because adding to the start of a list is very quick.
Other Seq types will have different performance characteristics, but you can convert to a List before the recursive call so that you know how it is going to perform.
Here is sample code
def getBoundaries(grps: Seq[(BigDecimal, Int)], groupSize: Int): Seq[BigDecimal] = {
#tailrec
def loop(grps: List[(BigDecimal, Int)], curSum: Int, curOffs: List[BigDecimal]): List[BigDecimal] =
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (newSum >= groupSize) {
loop(grps.tail, 0, id +: curOffs) // r1
} else {
loop(grps.tail, newSum, curOffs) // r2
}
}
loop(grps.toList, 0, Nil).reverse
}
This version gives 10x performance improvement over the original code using the test data provided by the questioner in his own answer to the question.
The issue is not in the recursion but instead in the array manipulation . With the following testcase it runs at about 200K recursions per second
type Fgroups = Seq[(BigDecimal, Int)]
test("testGetBoundaries") {
val N = 200000
val grps: Fgroups = (N to 1 by -1).flatMap { x => Array.tabulate(x % 20){ x2 => (BigDecimal(x2 * 1e9), 1) }}
val sgrps = grps.sortWith { case (a, b) =>
a._1.longValue.compare(b._1.longValue) < 0
}
val bb = getBoundaries(sgrps, 100 )
println(bb.take(math.min(50,bb.length)).mkString(","))
assert(bb.length==1900)
}
My production data sample has a similar number of entries (Array with 233K rows ) but runs at 3 orders of magnitude more slowly. I am looking into the tail operation and other culprits now.
Update The following reference from Alvin Alexander indicates that the tail operation should be v fast for immutable collections - but deadly slow for long mutable ones - including Array's !
https://alvinalexander.com/scala/understanding-performance-scala-collections-classes-methods-cookbook
Wow! I had no idea about the performance implications of using mutable collections in scala!
Update By adding code to convert the Array to an (immutable) Seq I see the 3 orders of magnitude performance improvement on the production data sample:
val grps = if (grpsIn.isInstanceOf[mutable.WrappedArray[_]] || grpsIn.isInstanceOf[Array[_]]) {
Seq(grpsIn: _*)
} else grpsIn
The (now fast ~200K/sec) final code is:
type Fgroups = Seq[(BigDecimal, Int)]
val cntr = new java.util.concurrent.atomic.AtomicInteger
#tailrec
def getBoundaries(grpsIn: Fgroups, groupSize: Int, curSum: Int = 0, curOffs: Seq[BigDecimal] = Seq.empty[BigDecimal]): Seq[BigDecimal] = {
val grps = if (grpsIn.isInstanceOf[mutable.WrappedArray[_]] || grpsIn.isInstanceOf[Array[_]]) {
Seq(grpsIn: _*)
} else grpsIn
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (cntr.getAndIncrement % 500==0) { println(s"[${cntr.get}] id=$id newsum=$newSum") }
if (newSum >= groupSize) {
getBoundaries(grps.tail, groupSize, 0, curOffs :+ id)
} else {
getBoundaries(grps.tail, groupSize, newSum, curOffs)
}
}
}

How to aggregateByKey with custom class for frequency distribution?

I am trying to create a frequency distribution.
My data is in the following pattern (ColumnIndex, (Value, countOfValue)) of type (Int, (Any, Long)). For instance, (1, (A, 10)) means for column index 1, there are 10 A's.
My goal is to get the top 100 values for all my index's or Keys.
Right away I can make it less compute intensive for my workload by doing an initial filter:
val freqNumDist = numRDD.filter(x => x._2._2 > 1)
Now I found an interesting example of a class, here which seems to fit my use case:
class TopNList (val maxSize:Int) extends Serializable {
val topNCountsForColumnArray = new mutable.ArrayBuffer[(Any, Long)]
var lowestColumnCountIndex:Int = -1
var lowestValue = Long.MaxValue
def add(newValue:Any, newCount:Long): Unit = {
if (topNCountsForColumnArray.length < maxSize -1) {
topNCountsForColumnArray += ((newValue, newCount))
} else if (topNCountsForColumnArray.length == maxSize) {
updateLowestValue
} else {
if (newCount > lowestValue) {
topNCountsForColumnArray.insert(lowestColumnCountIndex, (newValue, newCount))
updateLowestValue
}
}
}
def updateLowestValue: Unit = {
var index = 0
topNCountsForColumnArray.foreach{ r =>
if (r._2 < lowestValue) {
lowestValue = r._2
lowestColumnCountIndex = index
}
index+=1
}
}
}
So Now What I was thinking was putting together an aggregateByKey to use this class in order to get my top 100 values! The problem is that I am unsure of how to use this class in aggregateByKey in order to accomplish this goal.
val initFreq:TopNList = new TopNList(100)
def freqSeq(u: (TopNList), v:(Double, Long)) = (
u.add(v._1, v._2)
)
def freqComb(u1: TopNList, u2: TopNList) = (
u2.topNCountsForColumnArray.foreach(r => u1.add(r._1, r._2))
)
val freqNumDist = numRDD.filter(x => x._2._2 > 1).aggregateByKey(initFreq)(freqSeq, freqComb)
The obvious problem is that nothing is returned by the functions I am using. So I am wondering how to modify this class or do I need to think about this in a whole new light and just cherry pick some of the functions out of this class and add them to the functions I am using for the aggregateByKey?
I'm either thinking about classes wrong or the entire aggregateByKey or both!
Your projections implementations (freqSeq, freqComb) return Unit while you expect them to return TopNList
If intentially keep the style of your solution, the relevant impl should be
def freqSeq(u: TopNList, v:(Any, Long)) : TopNList = {
u.add(v._1, v._2) // operation gives void result (Unit)
u // this one of TopNList type
}
def freqComb(u1: TopNList, u2: TopNList) : TopNList = {
u2.topNCountsForColumnArray.foreach (r => u1.add (r._1, r._2) )
u1
}
Just take a look on aggregateByKey signature of PairRDDFunctions, what does it expect for
def aggregateByKey[U](zeroValue : U)(seqOp : scala.Function2[U, V, U], combOp : scala.Function2[U, U, U])(implicit evidence$3 : scala.reflect.ClassTag[U]) : org.apache.spark.rdd.RDD[scala.Tuple2[K, U]] = { /* compiled code */ }

Infinite loop seems to confuse Scala's type system

Here is an artificial toy example that demonstrates my problem:
def sscce(): Int = {
val rand = new Random()
var count = 0
while (true) { // type mismatch; found: Unit, required: Int
count += 1
if (rand.nextInt() == 42) return count
}
}
How can I help the compiler understand that this method will always return an Int?
I know the above toy example could easily be refactored to get rid of the infinite loop altogether, but I really want to have the infinite loop in my actual code. Trust me on this ;)
Always return an Int:
def sscce(): Int = {
val rand = new Random()
var count = 0
while (true) {
count += 1
if (rand.nextInt() == 42) return count
}
count // <-- this
}
You can also do:
def foo: Int = {
...
while(true) {
... return ...
}
throw new IllegalStateException // unreachable
}
this will typecheck because the type of the throw is Nothing, which is a subtype of Int.
See this question. While loops don't return a value. i.e. they return Unit which is the last statement in your function. So, the definition says it returns an Int but it actually returns Unit thus the type error. #ionut's answer fixes the type error by returning count as the last statement or here is a recursive approach.
def sscce(): Int = {
val rand = new Random()
def ssccer(count: Int): Int = {
if(rand.nextInt == 42) return count
else ssccer(count + 1)
}
ssccer(0)
}
Per the SLS, a while loop is executed similarly to:
def whileLoop(cond: => Boolean)(body: => Unit): Unit =
if (cond) { body ; whileLoop(cond)(body) } else {}
ie., it returns Unit. So the compiler sees the while as the last statement in sscce(), and therefore assumes that you're trying to return Unit. I don't think it's smart enough to realize that return count will eventually always return an Int.
The simple solution is to follow the suggestion of #Brian or #IonutGStan, and force it to return count, whether it truly needs it or not.
From a code quality standpoint, it would be good to ditch the while(true) loop and replace it with something more readable. As a nice side effect, it also solves your problem:
def sscce(): Int = {
val rand = new Random()
var count = 1
while (rand.nextInt() != 42) {
count += 1
}
count
}

How can I make this method more Scalalicious

I have a function that calculates the left and right node values for some collection of treeNodes given a simple node.id, node.parentId association. It's very simple and works well enough...but, well, I am wondering if there is a more idiomatic approach. Specifically is there a way to track the left/right values without using some externally tracked value but still keep the tasty recursion.
/*
* A tree node
*/
case class TreeNode(val id:String, val parentId: String){
var left: Int = 0
var right: Int = 0
}
/*
* a method to compute the left/right node values
*/
def walktree(node: TreeNode) = {
/*
* increment state for the inner function
*/
var c = 0
/*
* A method to set the increment state
*/
def increment = { c+=1; c } // poo
/*
* the tasty inner method
* treeNodes is a List[TreeNode]
*/
def walk(node: TreeNode): Unit = {
node.left = increment
/*
* recurse on all direct descendants
*/
treeNodes filter( _.parentId == node.id) foreach (walk(_))
node.right = increment
}
walk(node)
}
walktree(someRootNode)
Edit -
The list of nodes is taken from a database. Pulling the nodes into a proper tree would take too much time. I am pulling a flat list into memory and all I have is an association via node id's as pertains to parents and children.
Adding left/right node values allows me to get a snapshop of all children (and childrens children) with a single SQL query.
The calculation needs to run very quickly in order to maintain data integrity should parent-child associations change (which they do very frequently).
In addition to using the awesome Scala collections I've also boosted speed by using parallel processing for some pre/post filtering on the tree nodes. I wanted to find a more idiomatic way of tracking the left/right node values. After looking at the answer from #dhg it got even better. Using groupBy instead of a filter turns the algorithm (mostly?) linear instead of quadtratic!
val treeNodeMap = treeNodes.groupBy(_.parentId).withDefaultValue(Nil)
def walktree(node: TreeNode) = {
def walk(node: TreeNode, counter: Int): Int = {
node.left = counter
node.right =
treeNodeMap(node.id)
.foldLeft(counter+1) {
(result, curnode) => walk(curnode, result) + 1
}
node.right
}
walk(node,1)
}
Your code appears to be calculating an in-order traversal numbering.
I think what you want to make your code better is a fold that carries the current value downward and passes the updated value upward. Note that it might also be worth it to do a treeNodes.groupBy(_.parentId) before walktree to prevent you from calling treeNodes.filter(...) every time you call walk.
val treeNodes = List(TreeNode("1","0"),TreeNode("2","1"),TreeNode("3","1"))
val treeNodeMap = treeNodes.groupBy(_.parentId).withDefaultValue(Nil)
def walktree2(node: TreeNode) = {
def walk(node: TreeNode, c: Int): Int = {
node.left = c
val newC =
treeNodeMap(node.id) // get the children without filtering
.foldLeft(c+1)((c, child) => walk(child, c) + 1)
node.right = newC
newC
}
walk(node, 1)
}
And it produces the same result:
scala> walktree2(TreeNode("0","-1"))
scala> treeNodes.map(n => "(%s,%s)".format(n.left,n.right))
res32: List[String] = List((2,7), (3,4), (5,6))
That said, I would completely rewrite your code as follows:
case class TreeNode( // class is now immutable; `walktree` returns a new tree
id: String,
value: Int, // value to be set during `walktree`
left: Option[TreeNode], // recursively-defined structure
right: Option[TreeNode]) // makes traversal much simpler
def walktree(node: TreeNode) = {
def walk(nodeOption: Option[TreeNode], c: Int): (Option[TreeNode], Int) = {
nodeOption match {
case None => (None, c) // if this child doesn't exist, do nothing
case Some(node) => // if this child exists, recursively walk
val (newLeft, cLeft) = walk(node.left, c) // walk the left side
val newC = cLeft + 1 // update the value
val (newRight, cRight) = walk(node.right, newC) // walk the right side
(Some(TreeNode(node.id, newC, newLeft, newRight)), cRight)
}
}
walk(Some(node), 0)._1
}
Then you can use it like this:
walktree(
TreeNode("1", -1,
Some(TreeNode("2", -1,
Some(TreeNode("3", -1, None, None)),
Some(TreeNode("4", -1, None, None)))),
Some(TreeNode("5", -1, None, None))))
To produce:
Some(TreeNode(1,4,
Some(TreeNode(2,2,
Some(TreeNode(3,1,None,None)),
Some(TreeNode(4,3,None,None)))),
Some(TreeNode(5,5,None,None))))
If I get your algorithm correctly:
def walktree(node: TreeNode, c: Int): Int = {
node.left = c
val c2 = treeNodes.filter(_.parentId == node.id).foldLeft(c + 1) {
(cur, n) => walktree(n, cur)
}
node.right = c2 + 1
c2 + 2
}
walktree(new TreeNode("", ""), 0)
Off-by-one errors are likely to occur.
Few random thoughts (better suited for http://codereview.stackexchange.com):
try posting that compiles... We have to guess that is a sequence of TreeNode:
val is implicit for case classes:
case class TreeNode(val id: String, val parentId: String) {
Avoid explicit = and Unit for Unit functions:
def walktree(node: TreeNode) = {
def walk(node: TreeNode): Unit = {
Methods with side-effects should have ():
def increment = {c += 1; c}
This is terribly slow, consider storing list of children in the actual node:
treeNodes filter (_.parentId == node.id) foreach (walk(_))
More concice syntax would be treeNodes foreach walk:
treeNodes foreach (walk(_))

Scala: Group an Iterable into an Iterable of Iterables by a predicate

I have very large Iterators that I want to split into pieces. I have a predicate that looks at an item and returns true if it is the start of a new piece. I need the pieces to be Iterators, because even the pieces will not fit into memory. There are so many pieces that I would be wary of a recursive solution blowing out your stack. The situation is similar to this question, but I need Iterators instead of Lists, and the "sentinels" (items for which the predicate is true) occur (and should be included) at the beginning of a piece. The resulting iterators will only be used in order, though some may not be used at all, and they should only use O(1) memory. I imagine this means they should all share the same underlying iterator. Performance is important.
If I were to take a stab at a function signature, it would be this:
def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] = ...
I would have loved to use takeWhile, but it loses the last element. I investigated span, but it buffers results. My current best idea involves BufferedIterator, but maybe there is a better way.
You'll know you've got it right because something like this doesn't crash your JVM:
groupby((1 to Int.MaxValue).iterator)(_ % (Int.MaxValue / 2) == 0).foreach(group => println(group.sum))
groupby((1 to Int.MaxValue).iterator)(_ % 10 == 0).foreach(group => println(group.sum))
You have an inherent problem. Iterable implies that you can get multiple iterators. Iterator implies that you can only pass through once. That means that your Iterable[Iterable[T]] should be able to produce Iterator[Iterable[T]]s. But when this returns an element--an Iterable[T]--and that asks for multiple iterators, the underlying single iterator can't comply without either caching the results of the list (too big) or calling the original iterable and going through absolutely everything again (very inefficient).
So, while you could do this, I think you should conceive of your problem in a different way.
If you could start with a Seq instead, you could grab subsets as ranges.
If you already know how you want to use your iterable, you could write a method
def process[T](source: Iterable[T])(starts: T => Boolean)(handlers: T => Unit *)
which increments through the set of handlers each time starts fires off a "true". If there's any way you can do your processing in one sweep, something like this is the way to go. (Your handlers will have to save state via mutable data structures or variables, however.)
If you can permit iteration on the outer list to break the inner list, you could have an Iterable[Iterator[T]] with the additional constraint that once you iterate to a later sub-iterator, all previous sub-iterators are invalid.
Here's a solution of the last type (from Iterator[T] to Iterator[Iterator[T]]; one can wrap this to make the outer layers Iterable instead).
class GroupedBy[T](source: Iterator[T])(starts: T => Boolean)
extends Iterator[Iterator[T]] {
private val underlying = source
private var saved: T = _
private var cached = false
private var starting = false
private def cacheNext() {
saved = underlying.next
starting = starts(saved)
cached = true
}
private def oops() { throw new java.util.NoSuchElementException("empty iterator") }
// Comment the next line if you do NOT want the first element to always start a group
if (underlying.hasNext) { cacheNext(); starting = true }
def hasNext = {
while (!(cached && starting) && underlying.hasNext) cacheNext()
cached && starting
}
def next = {
if (!(cached && starting) && !hasNext) oops()
starting = false
new Iterator[T] {
var presumablyMore = true
def hasNext = {
if (!cached && !starting && underlying.hasNext && presumablyMore) cacheNext()
presumablyMore = cached && !starting
presumablyMore
}
def next = {
if (presumablyMore && (cached || hasNext)) {
cached = false
saved
}
else oops()
}
}
}
}
Here's my solution using BufferedIterator. It doesn't let you skip iterators correctly, but it's fairly simple and functional. The first element(s) go into a group even if !startsGroup(first).
def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] =
new Iterator[Iterator[T]] {
val base = iter.buffered
override def hasNext = base.hasNext
override def next() = Iterator(base.next()) ++ new Iterator[T] {
override def hasNext = base.hasNext && !startsGroup(base.head)
override def next() = if (hasNext) base.next() else Iterator.empty.next()
}
}
Update: Keeping a little state lets you skip iterators and prevent people from messing with previous ones:
def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] =
new Iterator[Iterator[T]] {
val base = iter.buffered
var prev: Iterator[T] = Iterator.empty
override def hasNext = base.hasNext
override def next() = {
while (prev.hasNext) prev.next() // Exhaust previous iterator; take* and drop* do NOT always work!! (Jira SI-5002?)
prev = Iterator(base.next()) ++ new Iterator[T] {
var hasMore = true
override def hasNext = { hasMore = hasMore && base.hasNext && !startsGroup(base.head) ; hasMore }
override def next() = if (hasNext) base.next() else Iterator.empty.next()
}
prev
}
}
If you are looking at memory constraints then the following will work. You can only use it if your underlying iterable object supports views. This implementation will iterate over the Iterable and then generate IterableViews which can then be iterated over. This implementation does not care if the very first element tests as a start group since it will be regardless.
def groupby[T](iter: Iterable[T])(startsGroup: T => Boolean): Iterable[Iterable[T]] = new Iterable[Iterable[T]] {
def iterator = new Iterator[Iterable[T]] {
val i = iter.iterator
var index = 0
var nextView: IterableView[T, Iterable[T]] = getNextView()
private def getNextView() = {
val start = index
var hitStartGroup = false
while ( i.hasNext && ! hitStartGroup ) {
val next = i.next()
index += 1
hitStartGroup = ( index > 1 && startsGroup( next ) )
}
if ( hitStartGroup ) {
if ( start == 0 ) iter.view( start, index - 1 )
else iter.view( start - 1, index - 1 )
} else { // hit end
if ( start == index ) null
else if ( start == 0 ) iter.view( start, index )
else iter.view( start - 1, index )
}
}
def hasNext = nextView != null
def next() = {
if ( nextView != null ) {
val next = nextView
nextView = getNextView()
next
} else null
}
}
}
You can maintain low memory foot-print by using Streams. Use result.toIterator, if you an iterator again.
With streams, there's no mutable state, only a single conditional and it's nearly as concise as Jay Hacker's solution.
def batchBy[A,B](iter: Iterator[A])(f: A => B): Stream[(B, Iterator[A])] = {
val base = iter.buffered
val empty = Stream.empty[(B, Iterator[A])]
def getBatch(key: B) = {
Iterator(base.next()) ++ new Iterator[A] {
def hasNext: Boolean = base.hasNext && (f(base.head) == key)
def next(): A = base.next()
}
}
def next(skipList: Option[Iterator[A]] = None): Stream[(B, Iterator[A])] = {
skipList.foreach{_.foreach{_=>}}
if (base.isEmpty) empty
else {
val key = f(base.head)
val batch = getBatch(key)
Stream.cons((key, batch), next(Some(batch)))
}
}
next()
}
I ran the tests:
scala> batchBy((1 to Int.MaxValue).iterator)(_ % (Int.MaxValue / 2) == 0)
.foreach{case(_,group) => println(group.sum)}
-1610612735
1073741823
-536870909
2147483646
2147483647
The second test prints too much to paste to Stack Overflow.
import scala.collection.mutable.ArrayBuffer
object GroupingIterator {
/**
* Create a new GroupingIterator with a grouping predicate.
*
* #param it The original iterator
* #param p Predicate controlling the grouping
* #tparam A Type of elements iterated
* #return A new GroupingIterator
*/
def apply[A](it: Iterator[A])(p: (A, IndexedSeq[A]) => Boolean): GroupingIterator[A] =
new GroupingIterator(it)(p)
}
/**
* Group elements in sequences of contiguous elements that satisfy a predicate. The predicate
* tests each single potential next element of the group with the help of the elements grouped so far.
* If it returns true, the potential next element is added to the group, otherwise
* a new group is started with the potential next element as first element
*
* #param self The original iterator
* #param p Predicate controlling the grouping
* #tparam A Type of elements iterated
*/
class GroupingIterator[+A](self: Iterator[A])(p: (A, IndexedSeq[A]) => Boolean) extends Iterator[IndexedSeq[A]] {
private[this] val source = self.buffered
private[this] val buffer: ArrayBuffer[A] = ArrayBuffer()
def hasNext: Boolean = source.hasNext
def next(): IndexedSeq[A] = {
if (hasNext)
nextGroup()
else
Iterator.empty.next()
}
private[this] def nextGroup(): IndexedSeq[A] = {
assert(source.hasNext)
buffer.clear()
buffer += source.next
while (source.hasNext && p(source.head, buffer)) {
buffer += source.next
}
buffer.toIndexedSeq
}
}