How to group a variable-length, repeating sequence in Scala

How to group a variable-length, repeating sequence in Scala - scala

I have a collection of ints that repeat themselves in a pattern:
val repeatingSequence = List(1,2,3,1,2,3,4,1,2,1,2,3,4,5)
I'd like to section that List up when the pattern repeats itself; in this case, when the sequence goes back to 1:
val groupedBySequence = List(List(1,2,3), List(1,2,3,4), List(1,2), List(1,2,3,4,5))
Notice that I'm grouping when the sequence jumps back to 1, but that the sequence can be of arbitrary length. My colleague and I have solved it by adding an additional method called 'groupWhen'
class IteratorW[A](itr: Iterator[A]) {
def groupWhen(fn: A => Boolean): Iterator[Seq[A]] = {
val bitr = itr.buffered
new Iterator[Seq[A]] {
override def hasNext = bitr.hasNext
override def next = {
val xs = collection.mutable.ListBuffer(bitr.next)
while (bitr.hasNext && !fn(bitr.head)) xs += bitr.next
xs.toSeq
}
}
}
}
implicit def ToIteratorW[A](itr: Iterator[A]): IteratorW[A] = new IteratorW(itr)
> repeatingSequence.iterator.groupWhen(_ == 1).toSeq
List(List(1,2,3), List(1,2,3,4), List(1,2), List(1,2,3,4,5))
However, we both feel like there's a more elegant solution lurking in the collection library.

Given an iterator itr, this will do the trick:
val head = iter.next()
val out = (
Iterator continually {iter takeWhile (_ != head)}
takeWhile {!_.isEmpty}
map {head :: _.toList}
).toList

As well all know, fold can do everything... ;)
val rs = List(1,2,3,1,2,3,4,1,2,1,2,3,4,5)
val res = (rs++List(1)).foldLeft((List[List[Int]](),List[Int]()))((acc,e) => acc match {
case (res,subl) => {
if (e == 1) ((subl.reverse)::res,1::Nil) else (res, e::subl)
}
})
println(res._1.reverse.tail)
Please regard this as an entry for the obfuscated Scala contest rather than as a real answer.

Here's a not-exactly-elegant solution I bashed out using span:
def groupWhen[A](fn: A => Boolean)(xs: List[A]): List[List[A]] = {
xs.span(!fn(_)) match {
case (Nil, Nil) => Nil
case (Nil, z::zs) => groupWhen(fn)(zs) match {
case ys::yss => (z::ys) :: yss
case Nil => List(List(z))
}
case (ys, zs) => ys :: groupWhen(fn)(zs)
}
}
scala> groupWhen[Int](_==1)(List(1,2,3,1,2,3,4,1,2,3,4,5))
res39: List[List[Int]] = List(List(1, 2, 3), List(1, 2, 3, 4), List(1, 2, 3, 4, 5))
scala> groupWhen[Int](_==1)(List(5,4,3,2,1,2,3,1,2,3,4,1,2,3,4,5))
res40: List[List[Int]] = List(List(5, 4, 3, 2), List(1, 2, 3), List(1, 2, 3, 4), List(1, 2, 3, 4, 5))

import scala.collection.mutable.ListBuffer
import scala.collection.breakOut
val repeatingSequence = List(1,2,3,1,2,3,4,1,2,1,2,3,4,5)
val groupedBySequence: List[List[Int]] = repeatingSequence.foldLeft(ListBuffer[ListBuffer[Int]]()) {
case (acc, 1) => acc += ListBuffer(1)
case (acc, n) => acc.last += n; acc
}.map(_.toList)(breakOut)

Related

Why is Scala returning a ';' expected but ',' found error?

I am currently trying to write an inversion count algorithm in Scala, utilizing a working merge-sort algorithm.
The merge-sort functions as expected but when I try to add a count as one of the parameters I get back the error:
Error:(14, 29) ';' expected but ',' found.
case (_, Nil) => left, count
^
Here is the program:
object MergeSort {
def mergeSort(inputList: List[Int]): List[Int] = {
if (inputList.length <= 1) inputList
else {
val (left, right) = inputList.splitAt(inputList.size / 2)
merge(mergeSort(left), mergeSort(right), 0)
}
}
def merge(left: List[Int], right: List[Int], count: Int): List[Int] =
(left, right) match {
case (_, Nil) => left, count
case (Nil, _) => right, count
case (leftHead :: leftTail, rightHead :: rightTail) =>
if (leftHead < rightHead){
val (leftList, leftCount) = leftHead :: merge(leftTail, right, count)
return (leftList, leftCount)
}
else {
val (rightList, rightCount) = rightHead :: merge(left, rightTail, count)
val mergeInversions = leftCount + left.length
return (rightList, mergeInversions)
}
}
val inputList: List[Int] = List(10, 3, 5, 1, 7, 6)
val sorted_arr = mergeSort(inputList)
}

#sepp2k pointed out correctly in the comment that if you want to create a tuple, then you need to wrap it around parentheses.
Here's the working solution:
object MergeSort {
def mergeSort(inputList: List[Int]): List[Int] = {
if (inputList.length <= 1) inputList
else {
val (left, right) = inputList.splitAt(inputList.size / 2)
merge(mergeSort(left), mergeSort(right), 0)._1
}
}
def merge(left: List[Int], right: List[Int], count: Int): (List[Int], Int) =
(left, right) match {
case (_, Nil) => (left, count)
case (Nil, _) => (right, count)
case (leftHead :: leftTail, rightHead :: rightTail) =>
if (leftHead < rightHead) {
val left = merge(leftTail, right, count)
val (leftList, leftCount) = (leftHead :: left._1, left._2)
(leftList, leftCount)
} else {
val right = merge(left, rightTail, count)
val (rightList, rightCount) = (rightHead :: right._1, right._2)
val mergeInversions = rightCount + left.length
(rightList, mergeInversions)
}
}
val inputList: List[Int] = List(10, 3, 5, 1, 7, 6, 0)
val sorted_arr = mergeSort(inputList)
}

What is the efficient way to remove subsets from a List[List[String]]?

I have a ListBuffer of List[String], val tList = ListBuffer[TCount] where TCount is case class TCount(l: List[String], c: Long). I want to find those list l from tList which are not the subset of any other element of tlist and their c value is less than their superset c value. The following program works but I have to use two for loop that makes the code inefficient. Is there any better approach I can use to make the code efficient?
val _arr = tList.toArray
for (i <- 0 to (_arr.length - 1)) {
val il = _arr(i).l.toSet
val ic = _arr(i).c
for (j <- 0 to (_arr.length - 1)) {
val jl = _arr(j).toSet
val jc = _arr(j).c
if (i != j && il.subsetOf(jl) && ic >= jc) {
tList.-=(_arr(i))
}
}
}

Inspired by the set-trie comment:
import scala.collection.SortedMap
class SetTrie[A](val flag: Boolean, val children: SortedMap[A, SetTrie[A]])(implicit val ord: Ordering[A]) {
def insert(xs: List[A]): SetTrie[A] = xs match {
case Nil => new SetTrie(true, children)
case a :: rest => {
val current = children.getOrElse(a, new SetTrie[A](false, SortedMap.empty))
val inserted = current.insert(rest)
new SetTrie(flag, children + (a -> inserted))
}
}
def containsSuperset(xs: List[A], strict: Boolean): Boolean = xs match {
case Nil => !children.isEmpty || (!strict && flag)
case a :: rest => {
children.get(a).map(_.containsSuperset(rest, strict)).getOrElse(false) ||
children.takeWhile(x => ord.lt(x._1, a)).exists(_._2.containsSuperset(xs, false))
}
}
}
def removeSubsets[A : Ordering](xss: List[List[A]]): List[List[A]] = {
val sorted = xss.map(_.sorted)
val setTrie = sorted.foldLeft(new SetTrie[A](false, SortedMap.empty)) { case (st, xs) => st.insert(xs) }
sorted.filterNot(xs => setTrie.containsSuperset(xs, true))
}

Here is a method that relies on a data structure somewhat similar to Set-Trie, but which stores more subsets explicitly. It provides worse compression, but is faster during lookup:
def findMaximal(lists: List[List[String]]): List[List[String]] = {
import collection.mutable.HashMap
class Node(
var isSubset: Boolean = false,
val children: HashMap[String, Node] = HashMap.empty
) {
def insert(xs: List[String], isSubs: Boolean): Unit = if (xs.isEmpty) {
isSubset |= isSubs
} else {
var isSubsSubs = false || isSubs
for (h :: t <- xs.tails) {
children.getOrElseUpdate(h, new Node()).insert(t, isSubsSubs)
isSubsSubs = true
}
}
def isMaximal(xs: List[String]): Boolean = xs match {
case Nil => children.isEmpty && !isSubset
case h :: t => children(h).isMaximal(t)
}
override def toString: String = {
if (children.isEmpty) "#"
else children.flatMap{
case (k,v) => {
if (v.children.isEmpty) List(k)
else (k + ":") :: v.toString.split("\n").map(" " + _).toList
}
}.mkString("\n")
}
}
val listsWithSorted = for (x <- lists) yield (x, x.sorted)
val root = new Node()
for ((x, s) <- listsWithSorted) root.insert(s, false)
// println(root)
for ((x, s) <- listsWithSorted; if root.isMaximal(s)) yield x
}
Note that I'm allowed to do any kind of mutable nonsense inside the body of the method, because the mutable trie data structure never escapes the scope of the method, and can therefore not be inadvertently shared with another thread.
Here is an example with sets of characters (converted to lists of strings):
println(findMaximal(List(
"ab", "abc", "ac", "abd",
"ade", "efd", "adf", "bafd",
"abd", "fda", "dba", "dbe"
).map(_.toList.map(_.toString))))
The output is:
List(
List(a, b, c),
List(a, d, e),
List(e, f, d),
List(b, a, f, d),
List(d, b, e)
)
so indeed, the non-maximal elements ab, ac, abd, adf, fda and dba are eliminated.
And here is what my not-quite-set-trie data structure looks like (child nodes are indented):
e:
f
b:
e
d:
e
f
c
f
d:
e:
f
f
a:
e
b:
d:
f
c
f
d:
e
f
c
f
c
f

Not sure if you can avoid the complexity, but, I guess I'd write like this:
val tList = List(List(1, 2, 3), List(3, 2, 1), List(9, 4, 7), List(3, 5, 6), List(1, 5, 6), List(6, 1, 5))
val tSet = tList.map(_.toSet)
def result = tSet.filterNot { sub => tSet.count(_.subsetOf(sub)) > 1 }

Here's one approach:
Create an indexed Map for identifying the original List elements
Turn Map of List-elements into Map of Sets (with index)
Generate combinations of the Map elements and use a custom filter to capture the elements that are subset of others
Remove those subset elements from the Map of Sets and retrieve remaining elements from the Map of Lists via the index
Sample code:
type TupIntSet = Tuple2[Int, Set[Int]]
def subsetFilter(ls: List[TupIntSet]): List[TupIntSet] =
if ( ls.size != 2 ) List.empty[TupIntSet] else
if ( ls(0)._2 subsetOf ls(1)._2 ) List[TupIntSet]((ls(0)._1, ls(0)._2)) else
if ( ls(1)._2 subsetOf ls(0)._2 ) List[TupIntSet]((ls(1)._1, ls(1)._2)) else
List.empty[TupIntSet]
val tList = List(List(1,2), List(1,2,3), List(3,4,5), List(5,4,3), List(2,3,4), List(6,7))
val listMap = (Stream from 1).zip(tList).toMap
val setMap = listMap.map{ case (i, l) => (i, l.toSet) }
val tSubsets = setMap.toList.combinations(2).toSet.flatMap(subsetFilter)
val resultList = (setMap.toSet -- tSubsets).map(_._1).map(listMap.getOrElse(_, ""))
// resultList: scala.collection.immutable.Set[java.io.Serializable] =
// Set(List(5, 4, 3), List(2, 3, 4), List(6, 7), List(1, 2, 3))

Scala generic "string split" method

If I were splitting a string, I would be able to do
"123,456,789".split(",")
to get
Seq("123","456","789")
Thinking of a string as a sequence of characters, how could this be generalized to other sequences of objects?
val x = Seq(One(),Two(),Three(),Comma(),Five(),Six(),Comma(),Seven(),Eight(),Nine())
x.split(
number=>{
case _:Comma => true
case _ => false
}
)
split in this case doesn't exist, but it reminds me of span, partition, groupby, but only span seems close, but it doesn't handle leading/ending comma's gracefully.

implicit class SplitSeq[T](seq: Seq[T]){
import scala.collection.mutable.ListBuffer
def split(sep: T): Seq[Seq[T]] = {
val buffer = ListBuffer(ListBuffer.empty[T])
seq.foreach {
case `sep` => buffer += ListBuffer.empty
case elem => buffer.last += elem
}; buffer.filter(_.nonEmpty)
}
}
It can be then used like x.split(Comma()).

The following is 'a' solution, not the most elegant -
def split[A](x: Seq[A], edge: A => Boolean): Seq[Seq[A]] = {
val init = (Seq[Seq[A]](), Seq[A]())
val (result, last) = x.foldLeft(init) { (cum, n) =>
val (total, prev) = cum
if (edge(n)) {
(total :+ prev, Seq.empty)
} else {
(total, prev :+ n)
}
}
result :+ last
}
Example result -
scala> split(Seq(1,2,3,0,4,5,0,6,7), (_:Int) == 0)
res53: Seq[Seq[Int]] = List(List(1, 2, 3), List(4, 5), List(6, 7))

This is how I've solved it in the past, but I suspect there is a better / more elegant way.
def break[A](xs:Seq[A], p:A => Boolean): (Seq[A], Seq[A]) = {
if (p(xs.head)) {
xs.span(p)
}
else {
xs.span(a => !p(a))
}
}

Use a Scala collection method to help convert a list of [0,0,0,1,1,1,1,0,0,1,1] into [3,4,2,2]

So I have a list of 0's and 1's, I want to find the count of each element and output this to a list. I can think of a recursive way to do it with functions but is there any helper functions which can help to convert this?
I believe groupBy could be useful but it seems to group all the elements into one partition or another, not into the way I want.
I want to have a list of the count of numbers until each transition from 0 to 1 and 1 to 0. ie, if we have 0,0,0, .. ok we counted 3 zeros so remember 3, then we have 1,1,1,1 so we counted 4 1's, so we remember 4, so far we have a list of [3,4...] and so on

tail-rec version of Yann Moisan's solution:
def pack[A](ls: Seq[A], prev: Seq[Int] = Seq.empty): Seq[Int] = {
if (ls.isEmpty) prev
else {
val (packed, next) = ls span {_ == ls.head }
pack(next, prev :+ packed.size)
}
}

def pack[A](ls: List[A]): List[Int] = {
if (ls.isEmpty) List(0)
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed.size)
else packed.size :: pack(next)
}
}

This might be a little complicated. but I'd go with it.
scala> implicit class ListHelper[A](ls:List[A]) {
def partitionBy(f: (A, A) => Boolean) = if (ls.isEmpty) List.empty[Int]
else (ls zip (ls.head :: ls)).foldLeft(List.empty[Int]){
case (Nil, _) => List(1)
case (x :: xs, (a, b)) => if (a == b) (x + 1) :: xs else 1 :: x :: xs
}.reverse
}
defined class ListHelper
scala> List(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1).partitionBy(_ == _)
res27: List[Int] = List(3, 4, 2, 2)
This is based on the clojure function partition-by

This is groupBy.
val tuple = list.foldRight((0,0)) { (x, accum) =>
if (x == 0) (accum._1 +1, accum._2) else (accum._1, accum._2 +1)
}
List(tuple._1, tuple._2)
on similar lines here is fliptracker (for non-empty lists):
def fliptracker(list: List[Int]) = {
val initial = (list.head, 0, List.empty[Int])
val result = list.foldLeft(initial) {
(acc, x) =>
if (acc._1 == x) (acc._1, acc._2 + 1, acc._3)
else (x, 1, acc._3 ::: List(acc._2))
}
result._3 ::: List (result._2)
}
fliptracker(List(0,0,0,1,1,1,1,0,0,1,1)) // List(3, 4, 2, 2)

I would do
List(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1).foldLeft (List.empty[(Any,Int)]) {
(acc,a) => acc match {
case (`a`, occ) :: tail => (a,occ+1) :: tail
case _ => (a,1) :: acc
}}.reverse.map(_._2)
res1: List[Int] = List(3, 4, 2, 2)

Another alternative answer based upon takeWhile. In this 1==black and 0==white
case class IndexCount(index: Int, count: Int, black: Boolean)
#tailrec
def takeWhileSwitch(black: Boolean, index:Int, list: List[Boolean],
result: List[IndexCount]): List[IndexCount] = {
if (list == Nil) return result.reverse
val takenWhile = list.takeWhile(black == _)
val takenLength = takenWhile.length
val resultToBuild = if (takenLength != 0) {
val indexCount = IndexCount(index, takenLength, black)
indexCount :: result
} else result
takeWhileSwitch(!black, index + takenLength, list.drop(takenLength), resultToBuild)
}
val items = takeWhileSwitch(true, 0, rowsWithBlack, List[IndexCount]())

Nobody expects the imperative solution:
def chunk[A](xs: List[A]) = {
val ys = collection.mutable.Buffer[Int]()
var prev = xs.head
var count = 1
for (x <- xs.tail) {
if (x != prev) {
ys += count
prev = x
count = 1
}
else count += 1
}
ys += count
ys.toList
}

MatchError when match receives an IndexedSeq but not a LinearSeq

Is there a reason that match written against Seq would work differently on IndexedSeq types than the way it does on LinearSeq types? To me it seems like the code below should do the exact same thing regardless of the input types. Of course it doesn't or I wouldn't be asking.
import collection.immutable.LinearSeq
object vectorMatch {
def main(args: Array[String]) {
doIt(Seq(1,2,3,4,7), Seq(1,4,6,9))
doIt(List(1,2,3,4,7), List(1,4,6,9))
doIt(LinearSeq(1,2,3,4,7), LinearSeq(1,4,6,9))
doIt(IndexedSeq(1,2,3,4,7), IndexedSeq(1,4,6,9))
doIt(Vector(1,2,3,4,7), Vector(1,4,6,9))
}
def doIt(a: Seq[Long], b: Seq[Long]) {
try {
println("OK! " + m(a, b))
}
catch {
case ex: Exception => println("m(%s, %s) failed with %s".format(a, b, ex))
}
}
#annotation.tailrec
def m(a: Seq[Long], b: Seq[Long]): Seq[Long] = {
a match {
case Nil => b
case firstA :: moreA => b match {
case Nil => a
case firstB :: moreB if (firstB < firstA) => m(moreA, b)
case firstB :: moreB if (firstB > firstA) => m(a, moreB)
case firstB :: moreB if (firstB == firstA) => m(moreA, moreB)
case _ => throw new Exception("Got here: a: " + a + " b: " + b)
}
}
}
}
Running this on 2.9.1 final, I get the following output:
OK! List(2, 3, 4, 7)
OK! List(2, 3, 4, 7)
OK! List(2, 3, 4, 7)
m(Vector(1, 2, 3, 4, 7), Vector(1, 4, 6, 9)) failed with scala.MatchError: Vector(1, 2, 3, 4, 7) (of class scala.collection.immutable.Vector)
m(Vector(1, 2, 3, 4, 7), Vector(1, 4, 6, 9)) failed with scala.MatchError: Vector(1, 2, 3, 4, 7) (of class scala.collection.immutable.Vector)
It runs fine for List-y things, but fails for Vector-y things. Am I missing something? Is this a compiler bug?
The scalac -print output for m looks like:
#scala.annotation.tailrec def m(a: Seq, b: Seq): Seq = {
<synthetic> val _$this: object vectorMatch = vectorMatch.this;
_m(_$this,a,b){
<synthetic> val temp6: Seq = a;
if (immutable.this.Nil.==(temp6))
{
b
}
else
if (temp6.$isInstanceOf[scala.collection.immutable.::]())
{
<synthetic> val temp8: scala.collection.immutable.:: = temp6.$asInstanceOf[scala.collection.immutable.::]();
<synthetic> val temp9: Long = scala.Long.unbox(temp8.hd$1());
<synthetic> val temp10: List = temp8.tl$1();
val firstA$1: Long = temp9;
val moreA: List = temp10;
{
<synthetic> val temp1: Seq = b;
if (immutable.this.Nil.==(temp1))
{
a
}
else
if (temp1.$isInstanceOf[scala.collection.immutable.::]())
{
<synthetic> val temp3: scala.collection.immutable.:: = temp1.$asInstanceOf[scala.collection.immutable.::]();
<synthetic> val temp4: Long = scala.Long.unbox(temp3.hd$1());
<synthetic> val temp5: List = temp3.tl$1();
val firstB: Long = temp4;
if (vectorMatch.this.gd1$1(firstB, firstA$1))
body%11(firstB){
_m(vectorMatch.this, moreA, b)
}
else
{
val firstB: Long = temp4;
val moreB: List = temp5;
if (vectorMatch.this.gd2$1(firstB, moreB, firstA$1))
body%21(firstB,moreB){
_m(vectorMatch.this, a, moreB)
}
else
{
val firstB: Long = temp4;
val moreB: List = temp5;
if (vectorMatch.this.gd3$1(firstB, moreB, firstA$1))
body%31(firstB,moreB){
_m(vectorMatch.this, moreA, moreB)
}
else
{
body%41(){
throw new java.lang.Exception("Got here: a: ".+(a).+(" b: ").+(b))
}
}
}
}
}
else
{
body%41()
}
}
}
else
throw new MatchError(temp6)
}
};

You can't use :: for anything other than List. The Vector is failing to match because :: is a case class that extends List, so its unapply method does not work for Vector.
val a :: b = List(1,2,3) // fine
val a :: b = Vector(1,2,3) // error
But you can define your own extractor that works for all sequences:
object +: {
def unapply[T](s: Seq[T]) =
s.headOption.map(head => (head, s.tail))
}
So you can do:
val a +: b = List(1,2,3) // fine
val a +: b = Vector(1,2,3) // fine

Followed pattern match works for List, Seq, LinearSeq, IndexedSeq, Vector.
Vector(1,2) match {
case a +: as => s"$a + $as"
case _ => "empty"
}

In Scala 2.10 object +: was introduced at this commit. Since then, for every SeqLike, you can do:
#annotation.tailrec
def m(a: Seq[Long], b: Seq[Long]): Seq[Long] = {
a match {
case Nil => b
case firstA +: moreA => b match {
case Nil => a
case firstB +: moreB if (firstB < firstA) => m(moreA, b)
case firstB +: moreB if (firstB > firstA) => m(a, moreB)
case firstB +: moreB if (firstB == firstA) => m(moreA, moreB)
case _ => throw new Exception("Got here: a: " + a + " b: " + b)
}
}
}
Code run at Scastie.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to group a variable-length, repeating sequence in Scala - scala

Given an iterator itr, this will do the trick: val head = iter.next() val out = ( Iterator continually {iter takeWhile (_ != head)} takeWhile {!_.isEmpty} map {head :: _.toList} ).toList

Related

Why is Scala returning a ';' expected but ',' found error?

What is the efficient way to remove subsets from a List[List[String]]?

Scala generic "string split" method

Use a Scala collection method to help convert a list of [0,0,0,1,1,1,1,0,0,1,1] into [3,4,2,2]

MatchError when match receives an IndexedSeq but not a LinearSeq

Categories

Resources