How to reduce with a non-associative function via adjacent pairs - scala

How can I reduce a collection via adjacent pairs?
For instance, let + be a non-associative operator:
(1, 2, 3, 4, 5, 6) => ((1+2) + (3+4)) + (5+6)
(a, b, c, d, e, f, g, h) => ((a+b) + (c+d)) + ((e+f) + (g+h))
This question is similar to this, however I don't think parallel collections apply because the semantics require an associative operator for determinism. I'm not concerned so much about parallel execution as I am about the actually associating such that it constructs a balanced expression tree.

Here is a version that does what you want, but treats the "remaining" elements somewhat arbitrarily (if the number of inputs in current iteration is odd, one element is left as-is):
def nonassocPairwiseReduce[A](xs: List[A], op: (A, A) => A): A = {
xs match {
case Nil => throw new IllegalArgumentException
case List(singleElem) => singleElem
case sthElse => {
val grouped = sthElse.grouped(2).toList
val pairwiseOpd = for (g <- grouped) yield {
g match {
case List(a, b) => op(a, b)
case List(x) => x
}
}
nonassocPairwiseReduce(pairwiseOpd, op)
}
}
}
For example, if this is your non-associative operation on Strings:
def putInParentheses(a: String, b: String) = s"(${a} + ${b})"
then your examples
for {
example <- List(
('1' to '6').toList.map(_.toString),
('a' to 'h').toList.map(_.toString)
)
} {
println(nonassocPairwiseReduce(example, putInParentheses))
}
are mapped to
(((1 + 2) + (3 + 4)) + (5 + 6))
(((a + b) + (c + d)) + ((e + f) + (g + h)))
Would be interesting to know why you want to do this.

Related

Optimal way to find neighbors of element of collection in circular manner

I have a Vector and I'd like to find neighbors of given element.
Say if we have Vector(1, 2, 3, 4, 5) and then:
for element 2, result must be Some((1, 3))
for element 5, result must be Some((4, 1))
for element 1, result must be Some((5, 2))
for element 6, result must be None
and so on..
I have not found any solution in standard lib(please point me if there is one), so got the next one:
implicit class VectorOps[T](seq: Vector[T]) {
def findNeighbors(elem: T): Option[(T, T)] = {
val currentIdx = seq.indexOf(elem)
val firstIdx = 0
val lastIdx = seq.size - 1
seq match {
case _ if currentIdx == -1 || seq.size < 2 => None
case _ if seq.size == 2 => seq.find(_ != elem).map(elem => (elem, elem))
case _ if currentIdx == firstIdx => Some((seq(lastIdx), seq(currentIdx + 1)))
case _ if currentIdx == lastIdx => Some((seq(currentIdx - 1), seq(firstIdx)))
case _ => Some((seq(currentIdx - 1), seq(currentIdx + 1)))
}
}
}
The question is: how this can be simplified/optimized using stdlib?
def neighbours[T](v: Seq[T], x: T): Option[(T, T)] =
(v.last +: v :+ v.head)
.sliding(3, 1)
.find(_(1) == x)
.map(x => (x(0), x(2)))
This uses sliding to create a 3 element window in the data and uses find to match the middle value of the 3. Adding the last/first to the input deals with the wrap-around case.
This will fail if the Vector is too short so needs some error checking.
This version is safe for all input
def neighbours[T](v: Seq[T], x: T): Option[(T, T)] =
(v.takeRight(1) ++ v ++ v.take(1))
.sliding(3, 1)
.find(_(1) == x)
.map(x => (x(0), x(2)))
Optimal when number of calls with the same sequence is about or more than seq.toSet.size:
val elementToPair = seq.indicies.map(i => seq(i) ->
(seq((i - 1 + seq.length) % seq.length), seq((i + 1 + seq.length) % seq.length)
).toMap
elementToPair.get(elem)
// other calls
Optimal when number of calls with the same sequence less than seq.toSet.size:
Some(seq.indexOf(elem)).filterNot(_ == -1).map { i =>
(seq((i - 1 + seq.length) % seq.length), seq((i + 1 + seq.length) % seq.length) }

Scala: Does reduceLeft and reduceRight have accumulator at different positions?

I am a littlebit confused between the methods reduceleft and reduceRight in Scala. Here's a Snippet that I am testing:
val intList = List(5, 4, 3, 2, 1);
println(intList.reduceRight((curr, acc) => { //Reduce right me y is accumulator
println(s"First Curr = $curr, Acc = $acc")
curr - acc
}));
println(intList.reduceLeft((curr, acc) => { // List(1,2,3,4,5) // Accumulator is the first Element
println(s"Second Curr = $curr, Acc = $acc")
acc - curr}))
And the Output is as shown below:
First Curr = 2, Acc = 1
First Curr = 3, Acc = 1
First Curr = 4, Acc = 2
First Curr = 5, Acc = 2
3
Second Curr = 5, Acc = 4
Second Curr = -1, Acc = 3
Second Curr = 4, Acc = 2
Second Curr = -2, Acc = 1
In both the iterations what I observe is that in case of reduceRight we have (curr,acc) [Meaning curr is passed as first argument and accumulator as second], whereas in case of reduceLeft we have (acc,curr).
Is there any specific reason behind this inconsistency in the arguments?
Let's use a little visualization:
reduceLeft:
a b c d e
acc1 = f(a, b) \/ / / /
acc2 = f(acc1, c) \/ / /
acc3 = f(acc2, d) \/ /
acc4 = f(acc3, e) \/
reduceRight:
a b c d e
\ \ \ \/ acc1 = f(d, e)
\ \ \/ acc2 = f(c, acc1)
\ \/ acc3 = f(b, acc2)
\/ acc4 = f(a, acc3)
The order of arguments help us remember the where the arguments came from and in which order they were evaluated. Because this is quite important to not confuse left-associative and right-associative operations:
any visual help matters and writing acc on the correct side of function makes reasoning easier
lambdas in coll.reduceLeft(_ operation _) and coll.reduceRight(_ operation _) would have very confusing behavior if acc were on left in both cases
Imagine if you defined a right-associative operator e.g. a ** b (a to the power of b). Exponentiation in math is right-associative, so a ** b ** c should behave like a ** (b ** c). Someone decides to calculate such use case using rightReduce, because it implements exactly this behavior. They do
List(a, b, c).reduceRight(_ ** _)
and then they learn that someone decided that acc should be on left side because they wanted it to be "consistent" with reduceLeft. That would be much more inconsistent with every intuitions that you carried from mathematics.

Dynamic sliding window in Scala

Suppose, I have a log file of events (page visits) with a timestamp. I'd like to group events into sessions where I consider that events belong to the same session when they are not further than X minutes from each other.
Currently, I ended up with this algorithm.
val s = List(1000, 501, 500, 10, 3, 2, 1) // timestamps
val n = 10 // time span
import scala.collection.mutable.ListBuffer
(s.head +: s).sliding(2).foldLeft(ListBuffer.empty[ListBuffer[Int]]) {
case (acc, List(a, b)) if acc.isEmpty =>
acc += ListBuffer(a)
acc
case (acc, List(a, b)) =>
if (n >= a - b) {
acc.last += b
acc
} else {
acc += ListBuffer(b)
acc
}
}
The result
ListBuffer(ListBuffer(1000), ListBuffer(501, 500), ListBuffer(10, 3, 2, 1))
Is there any better/functional/efficient way to do it?
Slightly adapting this answer by altering the condition ...
s.foldRight[List[List[Int]]](Nil)((a, b) => b match {
case (bh # bhh :: _) :: bt if (bhh + n >= a) => (a :: bh) :: bt
case _ => (a :: Nil) :: b
})

Parallel Aggregates in Scala using associative operators

I want to perform aggregate of a list of values in Scala. Here are few considerations:
aggregate function [1] is associative as well as commutative: examples are plus and multiply
this function is applied to the list in parallel so as to utilize all the cores of CPU
Here is an implementation:
package com.example.reactive
import scala.concurrent.Future
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
object AggregateParallel {
private def pm[T](l: List[Future[T]])(zero: T)(fn: (T, T) => T): Future[T] = {
val l1 = l.grouped(2)
val l2 = l1.map { sl =>
sl match {
case x :: Nil => x
case x :: y :: Nil =>
for (a <- x; b <- y) yield fn(a, b)
case _ => Future(zero)
}
}.toList
l2 match {
case x :: Nil => x
case x :: xs => pm(l2)(zero)(fn)
case Nil => Future(zero)
}
}
def parallelAggregate[T](l: List[T])(zero: T)(fn: (T, T) => T): T = {
val n = pm(l.map(Future(_)))(zero)(fn)
Await.result(n, 1000 millis)
n.value.get.get
}
def main(args: Array[String]) {
// multiply empty list: zero value is 1
println(parallelAggregate(List[Int]())(1)((x, y) => x * y))
// multiply a list: zero value is 1
println(parallelAggregate(List(1, 2, 3, 4, 5))(1)((x, y) => x * y))
// sum a list: zero value is 0
println(parallelAggregate(List(1, 2, 3, 4, 5))(0)((x, y) => x + y))
// sum a list: zero value is 0
val bigList1 = List(1, 2, 3, 4, 5).map(BigInt(_))
println(parallelAggregate(bigList1)(0)((x, y) => x + y))
// sum a list of BigInt: zero value is 0
val bigList2 = (1 to 100).map(BigInt(_)).toList
println(parallelAggregate(bigList2)(0)((x, y) => x + y))
// multiply a list of BigInt: zero value is 1
val bigList3 = (1 to 100).map(BigInt(_)).toList
println(parallelAggregate(bigList3)(1)((x, y) => x * y))
}
}
OUTPUT:
1
120
15
15
5050
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
How else can I achieve the same objective or improve this code in Scala?
EDIT1:
I have implemented bottom up aggregate. I think I am quite close to the aggregate method in Scala ( below). The difference being that I am only splitting into sub lists of two elements:
Scala implementation:
def aggregate[S](z: S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
executeAndWaitResult(new Aggregate(z, seqop, combop, splitter))
}
With this implementation I assume that the aggregate happens in parallel like so:
List(1,2,3,4,5,6)
-> split parallel -> List(List(1,2), List(3,4), List(5,6) )
-> execute in parallel -> List( 3, 7, 11 )
-> split parallel -> List(List(3,7), List(11) )
-> execute in parallel -> List( 10, 11)
-> Result is 21
Is that correct to assume that Scala aggregate is also doing bottom-up aggregates in parallel?
[1] http://www.mathsisfun.com/associative-commutative-distributive.html
The parallel lists of scala already have an aggregate method that do you just what you're asking for:
http://markusjais.com/scalas-parallel-collections-and-the-aggregate-method/
It works like foldLeft but takes an extra parameter:
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
When called on a parallel collection aggregate splits the collection in N parts, uses foldLeft parrallely on each parts, and uses combop to unit all the results.
But when called on a non parallel collection aggregate just works like foldLeft and ignores the combop.
To have consistent results you need associative and commutative operators because you don't control how the list will be split at all.
A short example:
List(1, 2, 3, 4, 5).par.aggregate(1)(_ * _, _ * _)
res0: Int = 120
Answer to Edit1 (improved according to comment):
I don't think it's the right approach, for a N items list you'll create N Futures. Which creates a big overhead in scheduling. Unless the seqop is really long, I'll avoid creating a Future each time you are calling it.

Expand a Set[Set[String]] into Cartesian Product in Scala

I have the following set of sets. I don't know ahead of time how long it will be.
val sets = Set(Set("a","b","c"), Set("1","2"), Set("S","T"))
I would like to expand it into a cartesian product:
Set("a&1&S", "a&1&T", "a&2&S", ..., "c&2&T")
How would you do that?
I think I figured out how to do that.
def combine(acc:Set[String], set:Set[String]) = for (a <- acc; s <- set) yield {
a + "&" + s
}
val expanded = sets.reduceLeft(combine)
expanded: scala.collection.immutable.Set[java.lang.String] = Set(b&2&T, a&1&S,
a&1&T, b&1&S, b&1&T, c&1&T, a&2&T, c&1&S, c&2&T, a&2&S, c&2&S, b&2&S)
Nice question. Here's one way:
scala> val seqs = Seq(Seq("a","b","c"), Seq("1","2"), Seq("S","T"))
seqs: Seq[Seq[java.lang.String]] = List(List(a, b, c), List(1, 2), List(S, T))
scala> val seqs2 = seqs.map(_.map(Seq(_)))
seqs2: Seq[Seq[Seq[java.lang.String]]] = List(List(List(a), List(b), List(c)), List(List(1), List(2)), List(List(S), List(T)))
scala> val combined = seqs2.reduceLeft((xs, ys) => for {x <- xs; y <- ys} yield x ++ y)
combined: Seq[Seq[java.lang.String]] = List(List(a, 1, S), List(a, 1, T), List(a, 2, S), List(a, 2, T), List(b, 1, S), List(b, 1, T), List(b, 2, S), List(b, 2, T), List(c, 1, S), List(c, 1, T), List(c, 2, S), List(c, 2, T))
scala> combined.map(_.mkString("&"))
res11: Seq[String] = List(a&1&S, a&1&T, a&2&S, a&2&T, b&1&S, b&1&T, b&2&S, b&2&T, c&1&S, c&1&T, c&2&S, c&2&T)
Came after the batle ;) but another one:
sets.reduceLeft((s0,s1)=>s0.flatMap(a=>s1.map(a+"&"+_)))
Expanding on dsg's answer, you can write it more clearly (I think) this way, if you don't mind the curried function:
def combine[A](f: A => A => A)(xs:Iterable[Iterable[A]]) =
xs reduceLeft { (x, y) => x.view flatMap { y map f(_) } }
Another alternative (slightly longer, but much more readable):
def combine[A](f: (A, A) => A)(xs:Iterable[Iterable[A]]) =
xs reduceLeft { (x, y) => for (a <- x.view; b <- y) yield f(a, b) }
Usage:
combine[String](a => b => a + "&" + b)(sets) // curried version
combine[String](_ + "&" + _)(sets) // uncurried version
Expanding on #Patrick's answer.
Now it's more general and lazier:
def combine[A](f:(A, A) => A)(xs:Iterable[Iterable[A]]) =
xs.reduceLeft { (x, y) => x.view.flatMap {a => y.map(f(a, _)) } }
Having it be lazy allows you to save space, since you don't store the exponentially many items in the expanded set; instead, you generate them on the fly. But, if you actually want the full set, you can still get it like so:
val expanded = combine{(x:String, y:String) => x + "&" + y}(sets).toSet