Dynamic sliding window in Scala - scala

Suppose, I have a log file of events (page visits) with a timestamp. I'd like to group events into sessions where I consider that events belong to the same session when they are not further than X minutes from each other.
Currently, I ended up with this algorithm.
val s = List(1000, 501, 500, 10, 3, 2, 1) // timestamps
val n = 10 // time span
import scala.collection.mutable.ListBuffer
(s.head +: s).sliding(2).foldLeft(ListBuffer.empty[ListBuffer[Int]]) {
case (acc, List(a, b)) if acc.isEmpty =>
acc += ListBuffer(a)
acc
case (acc, List(a, b)) =>
if (n >= a - b) {
acc.last += b
acc
} else {
acc += ListBuffer(b)
acc
}
}
The result
ListBuffer(ListBuffer(1000), ListBuffer(501, 500), ListBuffer(10, 3, 2, 1))
Is there any better/functional/efficient way to do it?

Slightly adapting this answer by altering the condition ...
s.foldRight[List[List[Int]]](Nil)((a, b) => b match {
case (bh # bhh :: _) :: bt if (bhh + n >= a) => (a :: bh) :: bt
case _ => (a :: Nil) :: b
})

Related

Optimal way to find neighbors of element of collection in circular manner

I have a Vector and I'd like to find neighbors of given element.
Say if we have Vector(1, 2, 3, 4, 5) and then:
for element 2, result must be Some((1, 3))
for element 5, result must be Some((4, 1))
for element 1, result must be Some((5, 2))
for element 6, result must be None
and so on..
I have not found any solution in standard lib(please point me if there is one), so got the next one:
implicit class VectorOps[T](seq: Vector[T]) {
def findNeighbors(elem: T): Option[(T, T)] = {
val currentIdx = seq.indexOf(elem)
val firstIdx = 0
val lastIdx = seq.size - 1
seq match {
case _ if currentIdx == -1 || seq.size < 2 => None
case _ if seq.size == 2 => seq.find(_ != elem).map(elem => (elem, elem))
case _ if currentIdx == firstIdx => Some((seq(lastIdx), seq(currentIdx + 1)))
case _ if currentIdx == lastIdx => Some((seq(currentIdx - 1), seq(firstIdx)))
case _ => Some((seq(currentIdx - 1), seq(currentIdx + 1)))
}
}
}
The question is: how this can be simplified/optimized using stdlib?
def neighbours[T](v: Seq[T], x: T): Option[(T, T)] =
(v.last +: v :+ v.head)
.sliding(3, 1)
.find(_(1) == x)
.map(x => (x(0), x(2)))
This uses sliding to create a 3 element window in the data and uses find to match the middle value of the 3. Adding the last/first to the input deals with the wrap-around case.
This will fail if the Vector is too short so needs some error checking.
This version is safe for all input
def neighbours[T](v: Seq[T], x: T): Option[(T, T)] =
(v.takeRight(1) ++ v ++ v.take(1))
.sliding(3, 1)
.find(_(1) == x)
.map(x => (x(0), x(2)))
Optimal when number of calls with the same sequence is about or more than seq.toSet.size:
val elementToPair = seq.indicies.map(i => seq(i) ->
(seq((i - 1 + seq.length) % seq.length), seq((i + 1 + seq.length) % seq.length)
).toMap
elementToPair.get(elem)
// other calls
Optimal when number of calls with the same sequence less than seq.toSet.size:
Some(seq.indexOf(elem)).filterNot(_ == -1).map { i =>
(seq((i - 1 + seq.length) % seq.length), seq((i + 1 + seq.length) % seq.length) }

Average for adjacent items in scala

I have a seq
val seq = Seq(1, 9, 5, 4, 3, 5, 5, 5, 8, 2)
I want to get an average for each adjacent (left and right) numbers, meaning in the above example to have the following calculations:
[(1+9)/2, (1+9+5)/3, (9+5+4)/3, (5+4+3)/3, (4+3+5)/3, (3+5+5)/3, (5+5+5)/3, (5+5+8)/3, (5+8+2)/3, (8+2)/2]
The other examples are:
Seq() shouldBe Seq()
Seq(3) shouldBe Seq(3.0d)
Seq(1, 4) shouldBe Seq(2.5d, 2.5d)
Seq(1, 9, 5, 4, 3, 5, 5, 5, 8, 2) shouldBe Seq(5.0, 5.0, 6.0, 4.0, 4.0, 13.0 / 3, 5.0, 6.0, 5.0, 5.0)
I was able to get: numbers.sliding(2, 1).map(nums => nums.sum.toDouble / nums.length).toSeq. But it doesn't consider the previous value.
I tried to do it with foldLeft - it is also cumbersome.
Is there an easy way to do this? What am I missing?
Being honest, this is the kind of problems that I believe are easier to solve using a simple (albeit a bit long) tail-recursive algorithm.
def adjacentAverage(data: List[Int]): List[Double] = {
#annotation.tailrec
def loop(remaining: List[Int], acc: List[Double], previous: Int): List[Double] =
remaining match {
case x :: y :: xs =>
loop(
remaining = y :: xs,
((previous + x + y).toDouble / 3.0d) :: acc,
previous = x
)
case x :: Nil =>
(((previous + x).toDouble / 2.0d) :: acc).reverse
}
data match {
case x :: y :: xs => loop(remaining = y :: xs, acc = ((x + y).toDouble / 2.0d) :: Nil, previous = x)
case x :: Nil => x.toDouble :: Nil
case Nil => Nil
}
}
You can see it running here.
What if you want a sliding window of a different size, like maybe 4 or 7 or ...? The challenge is getting the build-up, (1), (1,2), (1,2,3), (1,2,3,4), ... and the tail-off, ..., (6,7,8,9), (7,8,9), (8,9), (9).
def windowAvg(input: Seq[Int], windowSize: Int): Seq[Double] =
if (input.isEmpty || windowSize < 1) Seq()
else {
val windows = input.sliding(windowSize).toSeq
val buildUp = windows.head.inits.toSeq.tail.reverse.tail
val tailOff = windows.last.tails.toSeq.tail.init
(buildUp ++ windows ++ tailOff).map(x => x.sum.toDouble / x.length)
}
If you really need to trim off the opening and ending single-number entries in the result, then I'll leave that as an exercise for the reader.
My cumbersome solution through foldLeft (no rocket science)
def adjacentAverage(numbers: Seq[Int]): Seq[Double] = numbers.foldLeft(("x", Seq[Double](), 0)) {(acc, num) => acc._1 match {
case "x" => if (numbers.isEmpty) ("x", Seq(), acc._3 + 1) else if (numbers.length == 1) ("x", Seq(num.toDouble), acc._3 + 1) else (num.toString, acc._2 :+ ((num.toDouble + numbers(acc._3 + 1).toDouble) / 2.0), acc._3 + 1)
case _ => (num.toString, try {acc._2 :+ ((acc._1.toDouble + num.toDouble + numbers(acc._3 + 1).toDouble) / 3.0)} catch {case e: IndexOutOfBoundsException => acc._2 :+ ((acc._1.toDouble + num.toDouble) / 2.0) }, acc._3 + 1)
}}._2

"conditionalZip" operator in Akka Streams

Assume that I have two sources:
val first = Source(1 :: 2 :: 4 :: 6 :: Nil)
val second = Source(1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 7 :: Nil)
Is it possible to create a zip that will pair only elements based on a condition? I mean something like:
first.conditionalZip(second, _ == _) // if that method exited
That code would take the element from the first source and drop elements from the second until there is an element that satisfies the condition, and then output a tuple. The result for the above call would be (1, 1), (2, 2), (4, 4), (6, 6).
Consider zipping the two Sources, followed by using statefulMapConcat to transform the zipped elements in accordance with the condition function, as shown below:
import akka.stream.scaladsl._
import akka.NotUsed
def popFirstMatch(ls: List[Int], condF: Int => Boolean): (Option[Int], List[Int]) = {
ls.find(condF) match {
case None =>
(None, ls)
case Some(e) =>
val idx = ls.indexOf(e)
if (idx < 0)
(None, ls)
else {
val (l, r) = ls.splitAt(idx)
(r.headOption, l ++ r.tail)
}
}
}
def conditionalZip( first: Source[Int, NotUsed],
second: Source[Int, NotUsed],
filler: Int,
condFcn: (Int, Int) => Boolean ): Source[(Int, Int), NotUsed] = {
first.zipAll(second, filler, filler).statefulMapConcat{ () =>
var prevList = List.empty[Int]
tuple => tuple match { case (e1, e2) =>
if (e2 != filler) {
if (e1 != filler && condFcn(e1, e2))
(e1, e2) :: Nil
else {
if (e1 != filler)
prevList :+= e1
val (opElem, rest) = popFirstMatch(prevList, condFcn(_, e2))
prevList = rest
opElem match {
case None => Nil
case Some(e) => (e, e2) :: Nil
}
}
}
else
Nil
}
}
}
Test running:
import akka.actor.ActorSystem
implicit val system = ActorSystem("system")
implicit val ec = system.dispatcher
// Example 1:
val first = Source(1 :: 2 :: 4 :: 6 :: Nil)
val second = Source(1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 7 :: Nil)
conditionalZip(first, second, Int.MinValue, _ == _).runForeach(println)
// (1,1)
// (2,2)
// (4,4)
// (6,6)
conditionalZip(first, second, Int.MinValue, _ > _).runForeach(println)
// (4,3)
// (6,4)
conditionalZip(first, second, Int.MinValue, _ < _).runForeach(println)
// (1,2)
// (2,3)
// (4,5)
// (6,7)
// Example 2:
val first = Source(3 :: 9 :: 5 :: 5 :: 6 :: Nil)
val second = Source(1 :: 3 :: 5 :: 2 :: 5 :: 6 :: Nil)
conditionalZip(first, second, Int.MinValue, _ == _).runForeach(println)
// (3,3)
// (5,5)
// (5,5)
// (6,6)
conditionalZip(first, second, Int.MinValue, _ > _).runForeach(println)
// (3,1)
// (9,3)
// (5,2)
// (6,5)
conditionalZip(first, second, Int.MinValue, _ < _).runForeach(println)
// (3,5)
// (5,6)
A few notes:
Method zipAll (available on Akka Stream 2.6+) zips the two Sources while padding the one with fewer elements with provided "filler" values. In this case, these fillers are of no interest hence should be assigned a distinct value from actual elements.
An internal List, prevList, is used within statefulMapConcat to store elements from the 1st Source for comparing in following iterations with elements from the 2nd Source. The List can be replaced with a Set for better lookup performance if elements within the Sources are distinct.
Method popFirstMatch is for extracting the first element in the prevList that matches the provided partial condition function, returning a Tuple of the element of type Option and the remaining List.
NOTE that this is just an illustration of how statefulMapConcat may be a solution for the described problem. Behavior of the sample code may not necessarily match the exact requirement without detailed implementation to either cover all cases or refine the scope of the fairly broad condition function (Int, Int) => Boolean.

How to reduce with a non-associative function via adjacent pairs

How can I reduce a collection via adjacent pairs?
For instance, let + be a non-associative operator:
(1, 2, 3, 4, 5, 6) => ((1+2) + (3+4)) + (5+6)
(a, b, c, d, e, f, g, h) => ((a+b) + (c+d)) + ((e+f) + (g+h))
This question is similar to this, however I don't think parallel collections apply because the semantics require an associative operator for determinism. I'm not concerned so much about parallel execution as I am about the actually associating such that it constructs a balanced expression tree.
Here is a version that does what you want, but treats the "remaining" elements somewhat arbitrarily (if the number of inputs in current iteration is odd, one element is left as-is):
def nonassocPairwiseReduce[A](xs: List[A], op: (A, A) => A): A = {
xs match {
case Nil => throw new IllegalArgumentException
case List(singleElem) => singleElem
case sthElse => {
val grouped = sthElse.grouped(2).toList
val pairwiseOpd = for (g <- grouped) yield {
g match {
case List(a, b) => op(a, b)
case List(x) => x
}
}
nonassocPairwiseReduce(pairwiseOpd, op)
}
}
}
For example, if this is your non-associative operation on Strings:
def putInParentheses(a: String, b: String) = s"(${a} + ${b})"
then your examples
for {
example <- List(
('1' to '6').toList.map(_.toString),
('a' to 'h').toList.map(_.toString)
)
} {
println(nonassocPairwiseReduce(example, putInParentheses))
}
are mapped to
(((1 + 2) + (3 + 4)) + (5 + 6))
(((a + b) + (c + d)) + ((e + f) + (g + h)))
Would be interesting to know why you want to do this.

How to elegantly extract range of list based on specific criteria?

I want to extract range of elements from a list, meeting the following requirements:
First element of range has to be an element previous to element matching specific condition
Last element of range has to be an element next to element matching specific condition
Example: For list (1,1,1,10,2,10,1,1,1) and condition x >= 10 I want to get (1,10,2,10,1)
This is very simple to program imperatively, but I am just wondering if there is some smart Scala-functional way to achieve it. Is it?
Keeping it in the scala standard lib, I would solve this using recursion:
def f(_xs: List[Int])(cond: Int => Boolean): List[Int] = {
def inner(xs: List[Int], res: List[Int]): List[Int] = xs match {
case Nil => Nil
case x :: y :: tail if cond(y) && res.isEmpty => inner(tail, res ++ (x :: y :: Nil))
case x :: y :: tail if cond(x) && res.nonEmpty => res ++ (x :: y :: Nil)
case x :: tail if res.nonEmpty => inner(tail, res :+ x)
case x :: tail => inner(tail, res)
}
inner(_xs, Nil)
}
scala> f(List(1,1,1,10,2,10,1,1,1))(_ >= 10)
res3: List[Int] = List(1, 10, 2, 10, 1)
scala> f(List(2,10,2,10))(_ >= 10)
res4: List[Int] = List()
scala> f(List(2,10,2,10,1))(_ >= 10)
res5: List[Int] = List(2, 10, 2, 10, 1)
Maybe there is something I did not think of in this solution, or I missunderstood something, but I think you will get the basic idea.
Good functional algorithm design practice is all about breaking complex problems into simpler ones.
The principle is called Divide and Conquer.
It's easy to extract two simpler subproblems from the subject problem:
Get a list of all elements after the matching one, preceded with this matching element,
preceded with an element before it.
Get a list of all elements up to the latest matching one, followed by the matching element and
the element after it.
The named problems are simple enough for the appropriate functions to be implemented, so no subdivision is required.
Here's the implementation of the first function:
def afterWithPredecessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: tail if test( a ) => Nil // since there is no predecessor
case a :: b :: tail if test( b ) => a :: b :: tail
case a :: tail => afterWithPredecessor( tail )( test )
}
Since the second problem can be seen as a direct inverse of the first one, it can be easily implemented by reversing the input and output:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= afterWithPredecessor( elements.reverse )( test ).reverse
But here's an optimized version of this:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: b :: tail if test( a ) =>
a :: b :: beforeWithSuccessor( tail )( test )
case a :: tail =>
beforeWithSuccessor( tail )( test ) match {
case Nil => Nil
case r => a :: r
}
}
Finally, composing the above functions together to produce the function solving your problem becomes quite trivial:
def range[ A ]( elements : List[ A ] )( test : A => Boolean ) : List[ A ]
= beforeWithSuccessor( afterWithPredecessor( elements )( test ) )( test )
Tests:
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 10 )
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 1 )
res1: List[Int] = List()
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ == 2 )
res2: List[Int] = List(10, 2, 10)
The second test returns an empty list since the outermost elements satisfying the predicate have no predecessors (or successors).
def range[T](elements: List[T], condition: T => Boolean): List[T] = {
val first = elements.indexWhere(condition)
val last = elements.lastIndexWhere(condition)
elements.slice(first - 1, last + 2)
}
scala> range[Int](List(1,1,1,10,2,10,1,1,1), _ >= 10)
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range[Int](List(2,10,2,10), _ >= 10)
res1: List[Int] = List(2, 10, 2, 10)
scala> range[Int](List(), _ >= 10)
res2: List[Int] = List()
Zip and map to the rescue
val l = List(1, 1, 1, 10, 2, 1, 1, 1)
def test (i: Int) = i >= 10
((l.head :: l) zip (l.tail :+ l.last)) zip l filter {
case ((a, b), c) => (test (a) || test (b) || test (c) )
} map { case ((a, b), c ) => c }
That should work. I only have my smartphone and am miles from anywhere I could test this, so apologise for any typos or minor syntax errors
Edit: works now. I hope it's obvious that my solution shuffles the list to the right and to the left to create two new lists. When these are zipped together and zipped again with the original list, the result is a list of tuples, each containing the original element and a tuple of its neighbours. This is then trivial to filter and map back to a simple list.
Making this into a more general function (and using collect rather than filter -> map)...
def filterWithNeighbours[E](l: List[E])(p: E => Boolean) = l match {
case Nil => Nil
case li if li.size < 3 => if (l exists p) l else Nil
case _ => ((l.head :: l) zip (l.tail :+ l.last)) zip l collect {
case ((a, b), c) if (p (a) || p (b) || p (c) ) => c
}
}
This is less efficient than the recursive solution but makes the test much simpler and more clear. It can be difficult to match the right sequence of patterns in a recursive solution, as the patterns often express the shape of the chosen implementation rather than the original data. With the simple functional solution, each element is clearly and simply being compared to its neighbours.