Grouping fs2 streams into sub-streams based on predicate - scala

I need a combinator that solves the following problem:
test("groupUntil") {
val s = Stream(1, 2, 3, 4, 1, 2, 3, 3, 1, 2, 2).covary[IO]
val grouped: Stream[IO, Stream[IO, Int]] = s.groupUntil(_ == 1)
val result =
for {
group <- grouped
element <- group.fold(0)(_ + _)
} yield element
assertEquals(result.compile.toList.unsafeRunSync(), List(10, 9, 5))
}
The inner streams must also be lazy. (note, groupUntil is the imaginary combinator I'm asking for).
NOTE: I must deal with every element of the internal stream as soon as they arrive at the original stream, i.e. I cannot wait to chunk an entire group.

One way you can achieve laziness here is using Stream as container in fold function:
import cats.effect.IO
import fs2.Stream
val s = Stream(1, 2, 3, 4, 1, 2, 3, 3, 1, 2, 2).covary[IO]
val acc: Stream[IO, Stream[IO, Int]] = Stream.empty
val grouped: Stream[IO, Stream[IO, Int]] = s.fold(acc) {
case (streamOfStreams, nextInt) if nextInt == 1 =>
Stream(Stream(nextInt).covary[IO]).append(streamOfStreams)
case (streamOfStreams, nextInt) =>
streamOfStreams.head.map(_.append(Stream(nextInt).covary[IO])) ++
streamOfStreams.tail
}.flatten
val result: Stream[IO, IO[Int]] = for {
group <- grouped
element = group.compile.foldMonoid
} yield element
assertEquals(result.map(_.unsafeRunSync()).compile.toList.unsafeRunSync().reverse, List(10, 9, 5))
be careful, in result you will get reversed stream, because it's not good idea to work with the last element of the stream, better way is taking head but it requires us to reverse list in the end of our processing.
Another way is use groupAdjacentBy and group elements by some predicate:
val groupedOnceAndOthers: fs2.Stream[IO, (Boolean, Chunk[Int])] =
s.groupAdjacentBy(x => x == 1)
here you will get groups with pairs:
(true,Chunk(1)), (false,Chunk(2, 3, 4)),
(true,Chunk(1)), (false,Chunk(2, 3, 3)),
(true,Chunk(1)), (false,Chunk(2, 2))
to concat groups with 1 and without we can use chunkN (like grouped in scala List) and map result to get rid of boolean pairs and flatMap to flatten Chunks:
val grouped = groupedOnceAndOthers
.chunkN(2, allowFewer = true)
.map(ch => ch.flatMap(_._2).toList)
result grouped is:
List(1, 2, 3, 4) List(1, 2, 3, 3) List(1, 2, 2)
full working sample:
import cats.effect.IO
import fs2.Stream
val s = Stream(1, 2, 3, 4, 1, 2, 3, 3, 1, 2, 2).covary[IO]
val grouped: Stream[IO, Stream[IO, Int]] = s.groupAdjacentBy(x => x == 1)
.chunkN(2, allowFewer = true)
.map(ch => Stream.fromIterator[IO](ch.flatMap(_._2).iterator))
val result: Stream[IO, IO[Int]] = for {
group <- grouped
element = group.compile.foldMonoid
} yield element
assertEquals(result.map(_.unsafeRunSync()).compile.toList.unsafeRunSync(), List(10, 9, 5))

Related

Merge two collections by interleaving values

How can I merge two lists / Seqs so it takes 1 element from list 1, then 1 element from list 2, and so on, instead of just appending list 2 at the end of list 1?
E.g
[1,2] + [3,4] = [1,3,2,4]
and not [1,2,3,4]
Any ideas? Most concat methods I've looked at seem to do to the latter and not the former.
Another way:
List(List(1,2), List(3,4)).transpose.flatten
So maybe your collections aren't always the same size. Using zip in that situation would create data loss.
def interleave[A](a :Seq[A], b :Seq[A]) :Seq[A] =
if (a.isEmpty) b else if (b.isEmpty) a
else a.head +: b.head +: interleave(a.tail, b.tail)
interleave(List(1, 2, 17, 27)
,Vector(3, 4)) //res0: Seq[Int] = List(1, 3, 2, 4, 17, 27)
You can do:
val l1 = List(1, 2)
val l2 = List(3, 4)
l1.zip(l2).flatMap { case (a, b) => List(a, b) }
Try
List(1,2)
.zip(List(3,4))
.flatMap(v => List(v._1, v._2))
which outputs
res0: List[Int] = List(1, 3, 2, 4)
Also consider the following implicit class
implicit class ListIntercalate[T](lhs: List[T]) {
def intercalate(rhs: List[T]): List[T] = lhs match {
case head :: tail => head :: (rhs.intercalate(tail))
case _ => rhs
}
}
List(1,2) intercalate List(3,4)
List(1,2,5,6,6,7,8,0) intercalate List(3,4)
which outputs
res2: List[Int] = List(1, 3, 2, 4)
res3: List[Int] = List(1, 3, 2, 4, 5, 6, 6, 7, 8, 0)

Splitting List into List of List

How can I convert:
List(1,1,1,1,4,2,2,2,2)
into:
List(List(1,1,1,1), List(2,2,2,2))
Thought this would be the easiest way to show what I'm looking for. I am having a hard time trying to find the most functional way to do this with a large list that needs to be separated at a specific element. This element does not show up in the new list of lists. Any help would be appreciated!
If you want to support multiple instances of that separator, you can use foldRight with some list "gymnastics":
// more complex example: separator (4) appears multiple times
val l = List(1,1,1,1,4,2,2,2,2,4,5,6,4)
val separator = 4
val result = l.foldRight(List[List[Int]]()) {
case (`separator`, res) => List(Nil) ++ res
case (v, head :: tail) => List(v :: head) ++ tail
case (v, Nil) => List(List(v))
}
// result: List(List(1, 1, 1, 1), List(2, 2, 2, 2), List(5, 6))
This is the cleanest way to do this
val (l, _ :: r) = list.span( _ != 4)
The span function splits the list at the first value not matching the condition, and the de-structuring on the left-hand side removes the matching value from the second list.
This will fail if there is no matching value.
Given a list and a delimiter, in order to split the list in 2:
val list = List(1, 1, 1, 1, 4, 2, 2, 2, 2)
val delimiter = 4
you could use a combination of List.indexOf, List.take and List.drop:
val splitIdx = list.indexOf(delimiter)
List(list.take(splitIdx), list.drop(splitIdx + 1))
you could use List.span which splits the list into a tuple given a predicate:
list.span(_ != delimiter) match { case (l1, l2) => List(l1, l2.tail) }
in order to produce:
List(List(1, 1, 1, 1), List(2, 2, 2, 2))
scala> val l = List(1,1,1,1,4,2,2,2,2)
l: List[Int] = List(1, 1, 1, 1, 4, 2, 2, 2, 2)
scala> l.splitAt(l.indexOf(4))
res0: (List[Int], List[Int]) = (List(1, 1, 1, 1),List(4, 2, 2, 2, 2))
def convert(list: List[Int], separator: Int): List[List[Int]] = {
#scala.annotation.tailrec
def rec(acc: List[List[Int]], listTemp: List[Int]): List[List[Int]] = {
if (listTemp.isEmpty) acc
else {
val (l, _ :: r) = listTemp.span(_ != separator)
rec(acc ++ List(l), r)
}
}
rec(List(), list)
}

RDD intersections

I have a query about intersection between two RDDs.
My first RDD has a list of elements like this:
A = List(1,2,3,4), List(4,5,6), List(8,3,1),List(1,6,8,9,2)
And the second RDD is like this:
B = (1,2,3,4,5,6,8,9)
(I could store B in memory as a Set but not the first one.)
I would like to do an intersection of each element in A with B
List(1,2,3,4).intersect((1,2,3,4,5,6,8,9))
List(4,5,6).intersect((1,2,3,4,5,6,8,9))
List(8,3,1).intersect((1,2,3,4,5,6,8,9))
List(1,6,8,9,2).intersect((1,2,3,4,5,6,8,9))
How can I do this in Scala?
val result = rdd.map( x => x.intersect(B))
Both B and rdd have to have the same type (in this case List[Int]). Also, note that if B is big but fits in memory, you would want to probably broadcast it as documented here.
scala> val B = List(1,2,3,4,5,6,8,9)
B: List[Int] = List(1, 2, 3, 4, 5, 6, 8, 9)
scala> val rdd = sc.parallelize(Seq(List(1,2,3,4), List(4,5,6), List(8,3,1),List(1,6,8,9,2)))
rdd: org.apache.spark.rdd.RDD[List[Int]] = ParallelCollectionRDD[0] at parallelize at <console>:21
scala> rdd.map( x => x.intersect(B)).collect.mkString("\n")
res3: String =
List(1, 2, 3, 4)
List(4, 5, 6)
List(8, 3, 1)
List(1, 6, 8, 9, 2)

How can I find repeated items in a Scala List?

I have a Scala List that contains some repeated numbers. I want to count the number of times a specific number will repeat itself. For example:
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val repeats = list.takeWhile(_ == List(3,3)).size
And the val repeats would equal 2.
Obviously the above is pseudo-code and takeWhile will not find two repeated 3s since _ represents an integer. I tried mixing both takeWhile and take(2) but with little success. I also referred code from How to find count of repeatable elements in scala list but it appears the author is looking to achieve something different.
Thanks for your help.
This will work in this case:
val repeats = list.sliding(2).count(_.forall(_ == 3))
The sliding(2) method gives you an iterator of lists of elements and successors and then we just count where these two are equal to 3.
Question is if it creates the correct result to List(3, 3, 3)? Do you want that to be 2 or just 1 repeat.
val repeats = list.sliding(2).toList.count(_==List(3,3))
and more generally the following code returns tuples of element and repeats value for all elements:
scala> list.distinct.map(x=>(x,list.sliding(2).toList.count(_.forall(_==x))))
res27: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
which means that the element '3' repeats 2 times consecutively at 2 places and all others 0 times.
and also if we want element repeats 3 times consecutively we just need to modify the code as follows:
list.distinct.map(x=>(x,list.sliding(3).toList.count(_.forall(_==x))))
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 5)
scala> list.distinct.map(x=>(x,list.sliding(3).toList.count(_==List(x,x,x))))
res29: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
Even sliding value can be varied by defining a function as:
def repeatsByTimes(list:List[Int],n:Int) =
list.distinct.map(x=>(x,list.sliding(n).toList.count(_.forall(_==x))))
Now in REPL:
scala> val list = List(1,2,3,3,4,2,8,4,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 4, 2, 8, 4, 3, 3, 5)
scala> repeatsByTimes(list,2)
res33: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeatsByTimes(list,3)
res34: List[(Int, Int)] = List((1,0), (2,0), (3,3), (4,0), (8,0), (5,0))
scala>
We can go still further like given a list of integers and given a maximum number
of consecutive repetitions that any of the element can occur in the list, we may need a list of 3-tuples representing (the element, number of repetitions of this element, at how many places this repetition occurred). this is more exhaustive information than the above. Can be achieved by writing a function like this:
def repeats(list:List[Int],maxRep:Int) =
{ var v:List[(Int,Int,Int)] = List();
for(i<- 1 to maxRep)
v = v ++ list.distinct.map(x=>
(x,i,list.sliding(i).toList.count(_.forall(_==x))))
v.sortBy(_._1) }
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeats(list,3)
res38: List[(Int, Int, Int)] = List((1,1,1), (1,2,0), (1,3,0), (2,1,3),
(2,2,0), (2,3,0), (3,1,9), (3,2,6), (3,3,3), (4,1,3), (4,2,0), (4,3,0),
(5,1,1), (5,2,0), (5,3,0), (8,1,1), (8,2,0), (8,3,0))
scala>
These results can be understood as follows:
1 times the element '1' occurred at 1 places.
2 times the element '1' occurred at 0 places.
............................................
............................................
.............................................
2 times the element '3' occurred at 6 places..
.............................................
3 times the element '3' occurred at 3 places...
............................................and so on.
Thanks to Luigi Plinge I was able to use methods in run-length encoding to group together items in a list that repeat. I used some snippets from this page here: http://aperiodic.net/phil/scala/s-99/
var n = 0
runLengthEncode(totalFrequencies).foreach{ o =>
if(o._1 > 1 && o._2==subjectNumber) n+=1
}
n
The method runLengthEncode is as follows:
private def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
private def runLengthEncode[A](ls: List[A]): List[(Int, A)] =
pack(ls) map { e => (e.length, e.head) }
I'm not entirely satisfied that I needed to use the mutable var n to count the number of occurrences but it did the trick. This will count the number of times a number repeats itself no matter how many times it is repeated.
If you knew your list was not very long you could do it with Strings.
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val matchList = List(3,3)
(matchList.mkString(",")).r.findAllMatchIn(list.mkString(",")).length
From you pseudocode I got this working:
val pairs = list.sliding(2).toList //create pairs of consecutive elements
val result = pairs.groupBy(x => x).map{ case(x,y) => (x,y.size); //group pairs and retain the size, which is the number of occurrences.
result will be a Map[List[Int], Int] so you can the count number like:
result(List(3,3)) // will return 2
I couldn't understand if you also want to check lists of several sizes, then you would need to change the parameter to sliding to the desired size.
def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
def encode[A](ls: List[A]): List[(Int, A)] = pack(ls) map { e => (e.length, e.head) }
val numberOfNs = list.distinct.map{ n =>
(n -> list.count(_ == n))
}.toMap
val runLengthPerN = runLengthEncode(list).map{ t => t._2 -> t._1}.toMap
val nRepeatedMostInSuccession = runLengthPerN.toList.sortWith(_._2 <= _._2).head._1
Where runLength is defined as below from scala's 99 problems problem 9 and scala's 99 problems problem 10.
Since numberOfNs and runLengthPerN are Maps, you can get the population count of any number in the list with numberOfNs(number) and the length of the longest repitition in succession with runLengthPerN(number). To get the runLength, just compute as above with runLength(list).map{ t => t._2 -> t._1 }.

How can I sort List[Int] objects?

What I want to do is sort List objects in Scala, not sort the elements in the list. For example If I have two lists of Ints:
val l1 = List(1, 2, 3, 7)
val l2 = List(1, 2, 3, 4, 10)
I want to be able to put them in order where l1 > l2.
I have created a case class that does what I need it to but the problem is that when I use it none of my other methods work. Do I need to implement all the other methods in the class i.e. flatten, sortWith etc.?
My class code looks like this:
class ItemSet(itemSet: List[Int]) extends Ordered[ItemSet] {
val iSet: List[Int] = itemSet
def compare(that: ItemSet) = {
val thisSize = this.iSet.size
val thatSize = that.iSet.size
val hint = List(thisSize, thatSize).min
var result = 0
var loop = 0
val ths = this.iSet.toArray
val tht = that.iSet.toArray
while (loop < hint && result == 0) {
result = ths(loop).compare(tht(loop))
loop += 1
}
if (loop == hint && result == 0 && thisSize != thatSize) {
thisSize.compare(thatSize)
} else
result
}
}
Now if I create an Array of ItemSets I can sort it:
val is1 = new ItemSet(List(1, 2, 5, 8))
val is2 = new ItemSet(List(1, 2, 5, 6))
val is3 = new ItemSet(List(1, 2, 3, 7, 10))
Array(is1, is2, is3).sorted.foreach(i => println(i.iSet))
scala> List(1, 2, 3, 7, 10)
List(1, 2, 5, 6)
List(1, 2, 5, 8)
The two methods that are giving me problems are:
def itemFrequencies(transDB: Array[ItemSet]): Map[Int, Int] = transDB.flatten.groupBy(x => x).mapValues(_.size)
The error I get is:
Expression of type Map[Nothing, Int] doesn't conform to expected type Map[Int, Int]
And for this one:
def sortListAscFreq(transDB: Array[ItemSet], itemFreq: Map[Int, Int]): Array[List[Int]] = {
for (l <- transDB) yield
l.sortWith(itemFreq(_) < itemFreq(_))
}
I get:
Cannot resolve symbol sortWith.
Is there a way I can just extend List[Int] so that I can sort a collection of lists without loosing the functionality of other methods?
The standard library provides a lexicographic ordering for collections of ordered things. You can put it into scope and you're done:
scala> import scala.math.Ordering.Implicits._
import scala.math.Ordering.Implicits._
scala> val is1 = List(1, 2, 5, 8)
is1: List[Int] = List(1, 2, 5, 8)
scala> val is2 = List(1, 2, 5, 6)
is2: List[Int] = List(1, 2, 5, 6)
scala> val is3 = List(1, 2, 3, 7, 10)
is3: List[Int] = List(1, 2, 3, 7, 10)
scala> Array(is1, is2, is3).sorted foreach println
List(1, 2, 3, 7, 10)
List(1, 2, 5, 6)
List(1, 2, 5, 8)
The Ordering type class is often more convenient than Ordered in Scala—it allows you to specify how some existing type should be ordered without having to change its code or create a proxy class that extends Ordered[Whatever], which as you've seen can get messy very quickly.