I have a sequence Seq[T] and I want to do partial reduce. For example for a Seq[Int] I want to get Seq[Int] consisting of the longest partial sums of monotonic regions. For example:
val s = Seq(1, 2, 4, 3, 2, -1, 0, 6, 8)
groupMonotionic(s) = Seq(1 + 2 + 4, 3 + 2 + (-1), 0 + 6 + 8)
I was looking for some method like conditional fold with the signature fold(z: B)((B, T) => B, (T, T) => Boolean) where the predicate states for where to terminate current sum aggregation, but it seems there is no something like that in the subtrait hierarchy of Seq.
What would be a solution using Scala Collection API and without using mutable variables?
Here is one way amongst many to do this (using Scala 2.13's List#unfold):
// val items = Seq(1, 2, 4, 3, 2, -1, 0, 6, 8)
items match {
case first :: _ :: _ => // If there are more than 2 items
List
.unfold(items.sliding(2).toList) { // We slid items to work on pairs of consecutive items
case Nil => // No more items to unfold
None // None signifies the end of the unfold
case rest # Seq(a, b) :: _ => // We span based on the sign of a-b
Some(rest.span(x => (x.head - x.last).signum == (a-b).signum))
}
.map(_.map(_.last)) // back from slided pairs
match { case head :: rest => (first :: head) :: rest }
case _ => // If there is 0 or 1 item
items.map(List(_))
}
// List(List(1, 2, 4), List(3, 2, -1), List(0, 6, 8))
List.unfold iterates as long as the unfolding function provides Some. It starts with an initial state which is the list of items to unfold. At each iteration, we span the state (remaining elements to unfold) based on the sign of the heading two elements difference. The unfolded elements are heading items sharing the same monotony and the unfolding state becomes the other remaining elements.
List#span splits a list into a tuple whose first part contains elements matching the predicate applied until the predicate stops being valid. The second part of the tuple contains the rest of the elements. Which fits perfectly the expected return type of List.unfold's unfolding function, which is Option[(A, S)] (In this case Option[(List[Int], List[Int])]).
Int.signum returns -1, 0 or 1 depending on the sign of the integer it's applied on.
Note that the first item has to be put back in the result as it hasn't an ancestor determining its signum (match { case head :: rest => (first :: head) :: rest }).
To apply the reducing function (in this case a sum), we can map the final result: .map(_.sum)
Works in Scala 2.13+ with cats
import scala.util.chaining._
import cats.data._
import cats.implicits._
val s = List(1, 2, 4, 3, 2, -1, 0, 6, 8)
def isLocalExtrema(a: List[Int]) =
a.max == a(1) || a.min == a(1)
implicit class ListOps[T](ls: List[T]) {
def multiSpanUntil(f: T => Boolean): List[List[T]] = ls.span(f) match {
case (h, Nil) => List(h)
case (h, t) => (h ::: t.take(1)) :: t.tail.multiSpanUntil(f)
}
}
def groupMonotionic(groups: List[Int]) = groups match {
case Nil => Nil
case x if x.length < 3 => List(groups.sum)
case _ =>
groups
.sliding(3).toList
.map(isLocalExtrema)
.pipe(false :: _ ::: List(false))
.zip(groups)
.multiSpanUntil(!_._1)
.pipe(Nested.apply)
.map(_._2)
.value
.map(_.sum)
}
println(groupMonotionic(s))
//List(7, 4, 14)
Here's one way using foldLeft to traverse the numeric list with a Tuple3 accumulator (listOfLists, prevElem, prevTrend) that stores the previous element and previous trend to conditionally assemble a list of lists in the current iteration:
val list = List(1, 2, 4, 3, 2, -1, 0, 6, 8)
val isUpward = (a: Int, b: Int) => a < b
val initTrend = isUpward(list.head, list.tail.head)
val monotonicLists = list.foldLeft( (List[List[Int]](), list.head, initTrend) ){
case ((lol, prev, prevTrend), curr) =>
val currTrend = isUpward(curr, prev)
if (currTrend == prevTrend)
((curr :: lol.head) :: lol.tail , curr, currTrend)
else
(List(curr) :: lol , curr, currTrend)
}._1.reverse.map(_.reverse)
// monotonicLists: List[List[Int]] = List(List(1, 2, 4), List(3, 2, -1), List(0, 6, 8))
To sum the individual nested lists:
monotonicLists.map(_.sum)
// res1: List[Int] = List(7, 4, 14)
I have an rdd and the structure of the RDD is as follows:
org.apache.spark.rdd.RDD[(String, Array[String])] = MappedRDD[40] at map at <console>:14
Here is x.take(1) looks like:
Array[(String, Array[String])] = Array((8239427349237423,Array(122641|2|2|1|1421990315711|38|6487985623452037|684|, 1229|2|1|1|1411349089424|87|462966136107937|1568|.....))
For each string in the array I want to split by "|" and take the 6th item and return it with the first element of the tuple as follows:
8239427349237423-6487985623452037
8239427349237423-4629661361079371
I started as follows:
def getValues(lines: Array[String]) {
for(line <- lines) {
line.split("|")(6)
}
I also tried following:
val b= x.map(a => (a._1, a._2.flatMap(y => y.split("|")(6))))
But that ended up giving me following:
Array[(String, Array[Char])] = Array((8239427349237423,Array(1, 2, 4, |, 9, |, 4, 1, 7, 6, |, 2, 9, 2, 7, 2, |, 7, |,....)))
If you want to do it for the whole x you can use flatMap:
def getValues(x: Array[(String, Array[String])]) =
x flatMap (line => line._2 map (line._1 + "-" + _.split("\\|")(6)))
Or, maybe a bit more clearly, with for-comprehension:
def getValues(x: Array[(String, Array[String])]) =
for {
(fst, snd) <- x
line <- snd
} yield fst + "-" + line.split("\\|")(6)
You have to call split with "\\|" argument, because it takes a regular expression and | is a special symbol, thus you need to escape it. (Edit: or you can use '|' (a Char), as suggested by #BenReich)
To answer your comment, you can modify getValues to take a single element from x as an argument:
def getValues(item: (String, Array[String])) =
item._2 map (item._1 + "-" + _.split('|')(6))
And then call it with
x flatMap getValues
I have a Scala List that contains some repeated numbers. I want to count the number of times a specific number will repeat itself. For example:
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val repeats = list.takeWhile(_ == List(3,3)).size
And the val repeats would equal 2.
Obviously the above is pseudo-code and takeWhile will not find two repeated 3s since _ represents an integer. I tried mixing both takeWhile and take(2) but with little success. I also referred code from How to find count of repeatable elements in scala list but it appears the author is looking to achieve something different.
Thanks for your help.
This will work in this case:
val repeats = list.sliding(2).count(_.forall(_ == 3))
The sliding(2) method gives you an iterator of lists of elements and successors and then we just count where these two are equal to 3.
Question is if it creates the correct result to List(3, 3, 3)? Do you want that to be 2 or just 1 repeat.
val repeats = list.sliding(2).toList.count(_==List(3,3))
and more generally the following code returns tuples of element and repeats value for all elements:
scala> list.distinct.map(x=>(x,list.sliding(2).toList.count(_.forall(_==x))))
res27: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
which means that the element '3' repeats 2 times consecutively at 2 places and all others 0 times.
and also if we want element repeats 3 times consecutively we just need to modify the code as follows:
list.distinct.map(x=>(x,list.sliding(3).toList.count(_.forall(_==x))))
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 5)
scala> list.distinct.map(x=>(x,list.sliding(3).toList.count(_==List(x,x,x))))
res29: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
Even sliding value can be varied by defining a function as:
def repeatsByTimes(list:List[Int],n:Int) =
list.distinct.map(x=>(x,list.sliding(n).toList.count(_.forall(_==x))))
Now in REPL:
scala> val list = List(1,2,3,3,4,2,8,4,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 4, 2, 8, 4, 3, 3, 5)
scala> repeatsByTimes(list,2)
res33: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeatsByTimes(list,3)
res34: List[(Int, Int)] = List((1,0), (2,0), (3,3), (4,0), (8,0), (5,0))
scala>
We can go still further like given a list of integers and given a maximum number
of consecutive repetitions that any of the element can occur in the list, we may need a list of 3-tuples representing (the element, number of repetitions of this element, at how many places this repetition occurred). this is more exhaustive information than the above. Can be achieved by writing a function like this:
def repeats(list:List[Int],maxRep:Int) =
{ var v:List[(Int,Int,Int)] = List();
for(i<- 1 to maxRep)
v = v ++ list.distinct.map(x=>
(x,i,list.sliding(i).toList.count(_.forall(_==x))))
v.sortBy(_._1) }
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeats(list,3)
res38: List[(Int, Int, Int)] = List((1,1,1), (1,2,0), (1,3,0), (2,1,3),
(2,2,0), (2,3,0), (3,1,9), (3,2,6), (3,3,3), (4,1,3), (4,2,0), (4,3,0),
(5,1,1), (5,2,0), (5,3,0), (8,1,1), (8,2,0), (8,3,0))
scala>
These results can be understood as follows:
1 times the element '1' occurred at 1 places.
2 times the element '1' occurred at 0 places.
............................................
............................................
.............................................
2 times the element '3' occurred at 6 places..
.............................................
3 times the element '3' occurred at 3 places...
............................................and so on.
Thanks to Luigi Plinge I was able to use methods in run-length encoding to group together items in a list that repeat. I used some snippets from this page here: http://aperiodic.net/phil/scala/s-99/
var n = 0
runLengthEncode(totalFrequencies).foreach{ o =>
if(o._1 > 1 && o._2==subjectNumber) n+=1
}
n
The method runLengthEncode is as follows:
private def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
private def runLengthEncode[A](ls: List[A]): List[(Int, A)] =
pack(ls) map { e => (e.length, e.head) }
I'm not entirely satisfied that I needed to use the mutable var n to count the number of occurrences but it did the trick. This will count the number of times a number repeats itself no matter how many times it is repeated.
If you knew your list was not very long you could do it with Strings.
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val matchList = List(3,3)
(matchList.mkString(",")).r.findAllMatchIn(list.mkString(",")).length
From you pseudocode I got this working:
val pairs = list.sliding(2).toList //create pairs of consecutive elements
val result = pairs.groupBy(x => x).map{ case(x,y) => (x,y.size); //group pairs and retain the size, which is the number of occurrences.
result will be a Map[List[Int], Int] so you can the count number like:
result(List(3,3)) // will return 2
I couldn't understand if you also want to check lists of several sizes, then you would need to change the parameter to sliding to the desired size.
def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
def encode[A](ls: List[A]): List[(Int, A)] = pack(ls) map { e => (e.length, e.head) }
val numberOfNs = list.distinct.map{ n =>
(n -> list.count(_ == n))
}.toMap
val runLengthPerN = runLengthEncode(list).map{ t => t._2 -> t._1}.toMap
val nRepeatedMostInSuccession = runLengthPerN.toList.sortWith(_._2 <= _._2).head._1
Where runLength is defined as below from scala's 99 problems problem 9 and scala's 99 problems problem 10.
Since numberOfNs and runLengthPerN are Maps, you can get the population count of any number in the list with numberOfNs(number) and the length of the longest repitition in succession with runLengthPerN(number). To get the runLength, just compute as above with runLength(list).map{ t => t._2 -> t._1 }.