Fastest way to take elementwise sum of two Lists

Fastest way to take elementwise sum of two Lists - scala

I can do elementwise operation like sum using Zipped function. Let I have two Lists L1 and L2 as shown below
val L1 = List(1,2,3,4)
val L2 = List(5,6,7,8)
I can take element wise sum in following way
(L1,L2).zipped.map(_+_)
and result is
List(6, 8, 10, 12)
as expected.
I am using Zipped function in my actual code but it takes too much time. In reality My List Size is more than 1000 and I have more than 1000 Lists and my algorithm is iterative where iterations could be up to one billion.
In code I have to do following stuff
list =( (L1,L2).zipped.map(_+_).map (_ * math.random) , L3).zipped.map(_+_)
size of L1,L2 and L3 is same. Moreover I have to execute my actual code on a cluster.
What is the fastest way to take elementwise sum of Lists in Scala?

One option would be to use a Streaming implementation, taking advantage of the lazyness may increase the performance.
An example using LazyList (introduced in Scala 2.13).
def usingLazyList(l1: LazyList[Double], l2: LazyList[Double], l3: LazyList[Double]): LazyList[Double] =
((l1 zip l2) zip l3).map {
case ((a, b), c) =>
((a + b) * math.random()) + c
}
And an example using fs2.Stream (introduced by the fs2 library).
import fs2.Stream
import cats.effect.IO
def usingFs2Stream(s1: Stream[IO, Double], s2: Stream[IO, Double], s3: Stream[IO, Double]): Stream[IO, Double] =
s1.zipWith(s2) {
case (a, b) =>
(a + b) * math.random()
}.zipWith(s3) {
case (acc, c) =>
acc + c
}
However, if those are still too slow, the best alternative would be to use plain arrays.
Here is an example using ArraySeq (introduced in Scala 2.13 too) which at least will preserve immutability. You may use raw arrays if you prefer but take care.
(if you want, you may also use the collections-parallel module to be even more performant)
import scala.collection.immutable.ArraySeq
import scala.collection.parallel.CollectionConverters._
def usingArraySeq(a1: ArraySeq[Double], a2: ArraySeq[Double], a3: ArraySeq[Double]): ArraySeq[Double] = {
val length = a1.length
val arr = Array.ofDim[Double](length)
(0 until length).par.foreach { i =>
arr(i) = ((a1(i) + a2(i)) * math.random()) + a3(i)
}
ArraySeq.unsafeWrapArray(arr)
}

Related

Scala - access collection members within map or flatMap

Suppose that I use a sequence of various maps and/or flatMaps to generate a sequence of collections. Is it possible to access information about the "current" collection from within any of those methods? For example, without knowing anything specific about the functions used in the previous maps or flatMaps, and without using any intermediate declarations, how can I get the maximum value (or length, or first element, etc.) of the collection upon which the last map acts?
List(1, 2, 3)
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + ??? /* what is the max element of the collection? */)
Edit for clarification:
In the example, I'm not looking for the max (or whatever) of the initial List. I'm looking for the max of the collection after the flatMap has been applied.
By "without using any intermediate declarations" I mean that I do not want to use any temporary collections en route to the final result. So, the example by Steve Waldman below, while giving the desired result, is not what I am seeking. (I include this condition is mostly for aesthetic reasons.)
Edit for clarification, part 2:
The ideal solution would be some magic keyword or syntactic sugar that lets me reference the current collection:
List(1, 2, 3)
.flatMap(x => f(x))
.map(x => x + theCurrentList.max)
I'm prepared to accept the fact, however, that this simply is not possible.

Maybe just define the list as a val, so you can name it? I don't know of any facility built into map(...) or flatMap(...) that would help.
val myList = List(1, 2, 3)
myList
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + myList.max /* what is the max element of the List? */)
Update: By this approach at least, if you have multiple transformations and want to see the transformed version, you'd have to name that. You could get away with
val myList = List(1, 2, 3).flatMap(x => f(x) /* some unknown function */)
myList.map(x => x + myList.max /* what is the max element of the List? */)
Or, if there will be multiple transformations, get in the habit of naming the stages.
val rawList = List(1, 2, 3)
val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified
Update 2: Watch it work in the REPL even with heterogenous types:
scala> def f( x : Int ) : Vector[Double] = Vector(x * math.random, x * math.random )
f: (x: Int)Vector[Double]
scala> val rawList = List(1, 2, 3)
rawList: List[Int] = List(1, 2, 3)
scala> val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
smordified: List[Double] = List(0.40730853571901315, 0.15151641399798665, 1.5305929709857609, 0.35211231420067435, 0.644241939254793, 0.15530230501048903)
scala> val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)
scala> maxified
res3: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)

It is possible, but not pretty, and not likely something you want if you are doing it for "aesthetic reasons."
import scala.math.max
def f(x: Int): Seq[Int] = ???
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),List[Int]())) {
case (x, (xs, Nil)) => ((x :: xs), List.fill(xs.size + 1)(x))
case (x, (xs, xMax :: _)) => ((x :: xs), List.fill(xs.size + 1)(max(x, xMax)))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
// Or alternately, a slightly more efficient version using Streams.
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),Stream[Int]())) {
case (x, (xs, Stream())) =>
((x :: xs), Stream.continually(x))
case (x, (xs, curXMax #:: _)) =>
val newXMax = max(x, curXMax)
((x :: xs), Stream.continually(newXMax))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
Seriously though, I just took this on to see if I could do it. While the code didn't turn out as bad as I expected, I still don't think it's particularly readable. I'd discourage using this over something similar to Steve Waldman's answer. Sometimes, it's simply better to just introduce a val, rather than being dogmatic about it.

You could define a mapWithSelf (resp. flatMapWithSelf) operation along these lines and add it as an implicit enrichment to the collection. For List it might look like:
// Scala 2.13 APIs
object Enrichments {
implicit class WithSelfOps[A](val lst: List[A]) extends AnyVal {
def mapWithSelf[B](f: (A, List[A]) => B): List[B] =
lst.map(f(_, lst))
def flatMapWithSelf[B](f: (A, List[A]) => IterableOnce[B]): List[B] =
lst.flatMap(f(_, lst))
}
}
The enrichment basically fixes the value of the collection before the operation and threads it through. It should be possible to generify this (at least for the strict collections), though it would look a little different in 2.12 vs. 2.13+.
Usage would look like
import Enrichments._
val someF: Int => IterableOnce[Int] = ???
List(1, 2, 3)
.flatMap(someF)
.mapWithSelf { (x, lst) =>
x + lst.max
}
So at the usage site, it's aesthetically pleasant. Note that if you're computing something which traverses the list, you'll be traversing the list every time (leading to a quadratic runtime). You can get around that with some mutability or by just saving the intermediate list after the flatMap.

One somewhat-simple way of referencing prior output within the current map/collect operation is to use a named reference outside the map, then reference it from within the map block:
var prevOutput = ... // starting value of whatever is referenced within the map
myValues.map {
prevOutput = ... // expression that references prior `prevOutput`
prevOutput // return above computed value for the map to collect
}
This draws attention to the fact that we're referencing prior elements while building the new sequence.
This would be more messy, though, if you wanted to reference arbitrarily previous values, not just the previous one.

What's the diff between reduceLeft and reduceRight in Scala?

What's the diff between reduceLeft and reduceRight in Scala?
val list = List(1, 0, 0, 1, 1, 1)
val sum1 = list reduceLeft {_ + _}
val sum2 = list reduceRight {_ + _}
println { sum2 == sum2 }
In my snippet sum1 = sum2 = 4, so the order does not matter here.

When do they produce the same result
As Lionel already pointed out, reduceLeft and reduceRight only produce the same result if the function you are using to combine the elements is associative (this isn't always true, see my note at the bottom). For instance when running reduceLeft and reduceRight on Seq(1,2,3) with the function (a: Int, b: Int) => a - b you get a different result.
scala> Seq(1,2,3)
res0: Seq[Int] = List(1, 2, 3)
scala> res0.reduceLeft(_ - _)
res5: Int = -4
scala> res0.reduceRight(_ - _)
res6: Int = 2
Why this happens can be made clear if we look at how each of the functions is applied over the list.
For reduceRight this is what the calls look like if we were to unwrap them.
(1 - (2 - 3))
(1 - (-1))
2
For reduceLeft the sequence is built up starting from the left,
((1 - 2) - 3)
((-1) - 3)
(-4)
Tail Recursion
Further because reduceLeft is implemented using Tail Recursion, it will not stack overflow when operating on very large collections (possibly even infinite). reduceRight is not tail recursive, so given a collection of large enough size, it will produce a stack overflow.
For instance, on my machine if I run the following I get an Out of Memory error,
scala> (0 to 100000000).reduceRight(_ - _)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Integer.valueOf(Integer.java:832)
at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:65)
at scala.collection.immutable.Range.apply(Range.scala:61)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reversed(TraversableOnce.scala:99)
at scala.collection.AbstractIterator.reversed(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reduceRight(TraversableOnce.scala:197)
at scala.collection.AbstractIterator.reduceRight(Iterator.scala:1194)
at scala.collection.IterableLike$class.reduceRight(IterableLike.scala:85)
at scala.collection.AbstractIterable.reduceRight(Iterable.scala:54)
... 20 elided
But if I compute with reduceLeft I don't get the OOM,
scala> (0 to 100000000).reduceLeft(_ - _)
res16: Int = -987459712
You might get slightly different results on your system, depending your JVM default memory settings.
Prefer left versions
So, because of tail recursion, if you know that reduceLeft and reduceRight will produce the same value, you should prefer the reduceLeft variant. This generally holds true of the other left/right functions, such as foldRight and foldLeft (which are just more general versions of reduceRight and reduceLeft).
When do they really always produce the same result
A small note about reduceLeft and reduceRight and the Associative Property of the function you are using. I said that reduceRight and reduceLeft only produce the same results if the operator is associative. This isn't always true for all collection types. That is somewhat of another topic though, so consult the ScalaDoc, but in short the function you are reducing with needs to be both commutative and associative in order to get the same results for all collection types.

Reduce left doesn't always equal the same result as reduce right. Consider an asymmetric function on your array.
Assuming the same result, performance is one obvious difference
See performance-characteristics
The data structure is build with constant access time to head and tail. Iterating backwards will perform worse for large lists.

The best way to know the difference is read the source code in library/scala/collection/LinearSeqOptimized.scala:
def reduceLeft[B >: A](f: (B, A) => B): B =
......
tail.foldLeft[B](head)(f)
def reduceRight[B >: A](op: (A, B) => B): B =
......
op(head, tail.reduceRight(op))
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
The above is some key part of the code, and you can see reduceLeft is based on foldLeft , while reduceRight is implemented via recursion.
I guess reduceLeft has a better performance.

Scala: Best way to filter & map in one iteration

I'm new to Scala and trying to figure out the best way to filter & map a collection. Here's a toy example to explain my problem.
Approach 1: This is pretty bad since I'm iterating through the list twice and calculating the same value in each iteration.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums filter { x: Int => (x * x) > N } map { x: Int => (x * x).toString }
Approach 2: This is slightly better but I still need to calculate (x * x) twice.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums collect { case x: Int if (x * x) > N => (x * x).toString }
So, is it possible to calculate this without iterating through the collection twice and avoid repeating the same calculations?

Could use a foldRight
nums.foldRight(List.empty[Int]) {
case (i, is) =>
val s = i * i
if (s > N) s :: is else is
}
A foldLeft would also achieve a similar goal, but the resulting list would be in reverse order (due to the associativity of foldLeft.
Alternatively if you'd like to play with Scalaz
import scalaz.std.list._
import scalaz.syntax.foldable._
nums.foldMap { i =>
val s = i * i
if (s > N) List(s) else List()
}

The typical approach is to use an iterator (if possible) or view (if iterator won't work). This doesn't exactly avoid two traversals, but it does avoid creation of a full-sized intermediate collection. You then map first and filter afterwards and then map again if needed:
xs.iterator.map(x => x*x).filter(_ > N).map(_.toString)
The advantage of this approach is that it's really easy to read and, since there are no intermediate collections, it's reasonably efficient.
If you are asking because this is a performance bottleneck, then the answer is usually to write a tail-recursive function or use the old-style while loop method. For instance, in your case
def sumSqBigN(xs: Array[Int], N: Int): Array[String] = {
val ysb = Array.newBuilder[String]
def inner(start: Int): Array[String] = {
if (start >= xs.length) ysb.result
else {
val sq = xs(start) * xs(start)
if (sq > N) ysb += sq.toString
inner(start + 1)
}
}
inner(0)
}
You can also pass a parameter forward in inner instead of using an external builder (especially useful for sums).

I have yet to confirm that this is truly a single pass, but:
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(x) else None
}

You can use collect which applies a partial function to every value of the collection that it's defined at. Your example could be rewritten as follows:
val sqNumsLargerThanN = nums collect {
case (x: Int) if (x * x) > N => (x * x).toString
}

A very simple approach that only does the multiplication operation once. It's also lazy, so it will be executing code only when needed.
nums.view.map(x=>x*x).withFilter(x => x> N).map(_.toString)
Take a look here for differences between filter and withFilter.

Consider this for comprehension,
for (x <- 0 until 10; v = x*x if v > N) yield v.toString
which unfolds to a flatMap over the range and a (lazy) withFilter onto the once only calculated square, and yields a collection with filtered results. To note one iteration and one calculation of square is required (in addition to creating the range).

You can use flatMap.
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(square.toString) else None
}
Or with Scalaz,
import scalaz.Scalaz._
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
(square > N).option(square.toString)
}
The solves the asked question of how to do this with one iteration. This can be useful when streaming data, like with an Iterator.
However...if you are instead wanting the absolute fastest implementation, this is not it. In fact, I suspect you would use a mutable ArrayList and a while loop. But only after profiling would you know for sure. In any case, that's for another question.

Using a for comprehension would work:
val sqNumsLargerThanN = for {x <- nums if x*x > N } yield (x*x).toString
Also, I'm not sure but I think the scala compiler is smart about a filter before a map and will only do 1 pass if possible.

I am also beginner did it as follows
for(y<-(num.map(x=>x*x)) if y>5 ) { println(y)}

What is point of /: function?

Documentation for /: includes
Note: might return different results for different runs, unless the underlying collection type is ordered or the operator
is associative and commutative.
( src)
This just applies if the par version of this function is run, otherwise the result is deterministic (same as foldLeft) ?
Also this function is calling foldLeft under the hood : def /:[B](z: B)(op: (B, A) => B): B = foldLeft(z)(op)
Their function definitions are same (except for function param label, "op" instad of "f") :
def /:[B](z: B)(op: (B, A) ⇒ B): B
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
For these reasons what is point of /: function and when should it be used in favour of foldLeft ?
Is my reasoning incorrect ?

It's just an alternative syntax. Methods ending in : are called on the right hand side.
Instead of
list.foldLeft(0) { op(_, _) }
or
list./:(0) { op(_, _) }
you can
( z /: list ) { op(_, _) }
For example,
scala> val a = List(1,2,3,4)
a: List[Int] = List(1, 2, 3, 4)
scala> ( 0 /: a ) { _ + _ }
res5: Int = 10

Yes, those are aliases originating from dark times when people liked their operators like this:
val x = y |#<#|: z.
The point of the note is to remind that for collections with unspecified iteration order the result of folds might differ. Consider having a Set {1,2,3} that doesn't guarantee the same access order even if left unmodified, and applying an operation that is not e. g. associative (like /). Even if run not after par call, this might result in the following (pseudocode):
{1,2,3} foldLeft / ==> (1 / 2) / 3 ==> 1/6 = 0.1(6)
{3,1,2} foldLeft / ==> (3 / 1) / 2 ==> 3/2 = 1.5
In terms of consistency this is similar to applying non-parallelizable operations to parallel collections, though.

Is the Writer Monad effectively the same as the State Monad?

There's a great tutorial here that seems to suggest to me that the Writer Monad is basically a special case tuple object that does operations on behalf of (A,B). The writer accumulates values on the left (which is A) and that A has a corresponding Monoid with it (hence it can accumulate or mutate state). If A is a collection, then it accumulates.
The State Monad is also a object that deals with an internal tuple. They both can be flatMap'd, map'd, etc. And the operations seem the same to me. How are they different? (please respond back with a scala example, I'm not familiar with Haskel). Thanks!

Your intuition that these two monads are closely related is exactly right. The difference is that Writer is much more limited, in that it doesn't allow you to read the accumulated state (until you cash out at the end). The only thing you can do with the state in a Writer is tack more stuff onto the end.
More concisely, State[S, A] is a kind of wrapper for S => (S, A), while Writer[W, A] is a wrapper for (W, A).
Consider the following usage of Writer:
import scalaz._, Scalaz._
def addW(x: Int, y: Int): Writer[List[String], Int] =
Writer(List(s"$x + $y"), x + y)
val w = for {
a <- addW(1, 2)
b <- addW(3, 4)
c <- addW(a, b)
} yield c
Now we can run the computation:
scala> val (log, res) = w.run
log: List[String] = List(1 + 2, 3 + 4, 3 + 7)
res: Int = 10
We could do exactly the same thing with State:
def addS(x: Int, y: Int) =
State((log: List[String]) => (log |+| List(s"$x + $y"), x + y))
val s = for {
a <- addS(1, 2)
b <- addS(3, 4)
c <- addS(a, b)
} yield c
And then:
scala> val (log, res) = s.run(Nil)
log: List[String] = List(1 + 2, 3 + 4, 3 + 7)
res: Int = 10
But this is a little more verbose, and we could also do lots of other things with State that we couldn't do with Writer.
So the moral of the story is that you should use Writer whenever you can—your solution will be cleaner, more concise, and you'll get the satisfaction of having used the appropriate abstraction.
Very often Writer won't give you all the power you need, though, and in those cases State will be waiting for you.

tl;dr State is Read and Write while Writer is, well, only write.
With State you have access to the previous data stored and you can use this data in your current computation:
def myComputation(x: A) =
State((myState: List[A]) => {
val newValue = calculateNewValueBasedOnState(x,myState)
(log |+| List(newValue), newValue)
})
With Writer you can store data in some object you don't have access to, you can only write to that object.