Scala: apply Map to a list of tuples - scala

very simple question: I want to do something like this:
var arr1: Array[Double] = ...
var arr2: Array[Double] = ...
var arr3: Array[(Double,Double)] = arr1.zip(arr2)
arr3.foreach(x => {if (x._1 > treshold) {x._2 = x._2 * factor}})
I tried a lot differnt syntax versions, but I failed with all of them. How could I solve this? It can not be very difficult ...
Thanks!

Multiple approaches to solve this, consider for instance the use of collect which delivers an immutable collection arr4, as follows,
val arr4 = arr3.collect {
case (x, y) if x > threshold => (x ,y * factor)
case v => v
}
With a for comprehension like this,
for ((x, y) <- arr3)
yield (x, if (x > threshold) y * factor else y)

I think you want to do something like
scala> val arr1 = Array(1.1, 1.2)
arr1: Array[Double] = Array(1.1, 1.2)
scala> val arr2 = Array(1.1, 1.2)
arr2: Array[Double] = Array(1.1, 1.2)
scala> val arr3 = arr1.zip(arr2)
arr3: Array[(Double, Double)] = Array((1.1,1.1), (1.2,1.2))
scala> arr3.filter(_._1> 1.1).map(_._2*2)
res0: Array[Double] = Array(2.4)

I think there are two problems:
You're using foreach, which returns Unit, where you want to use map, which returns an Array[B].
You're trying to update an immutable value, when you want to return a new, updated value. This is the difference between _._2 = _._2 * factor and _._2 * factor.
To filter the values not meeting the threshold:
arr1.zip(arr2).filter(_._1 > threshold).map(_._2 * factor)
To keep all values, but only multiply the ones meeting the threshold:
arr1.zip(arr2).map {
case (x, y) if x > threshold => y * factor
case (_, y) => y
}

You can do it with this,
arr3.map(x => if (x._1 > threshold) (x._1, x._2 * factor) else x)

How about this?
arr3.map { case(x1, x2) => // extract first and second value
if (x1 > treshold) (x1, x2 * factor) // if first value is greater than threshold, 'change' x2
else (x1, x2) // otherwise leave it as it is
}.toMap
Scala is generally functional, which means you do not change values, but create new values, for example you do not write x._2 = …, since tuple is immutable (you can't change it), but create a new tuple.

This will do what you need.
arr3.map(x => if(x._1 > treshold) (x._1, x._2 * factor) else x)
The key here is that you can return tuple from the map lambda expression by putting two variable into (..).
Edit: You want to change every element of an array without creating a new array. Then you need to do the next.
arr3.indices.foreach(x => if(arr3(x)._1 > treshold) (arr3(x)._1, arr3(x)._2 * factor) else x)

Related

How to sum up pair elements individually in Scala

I have the following method to sum up the pair elements in an array of pairs. I am new to scala and feel like there will be a better way than the following piece of code.
def accumulate(results: Array[(Int, Int)]): (Int, Int) = {
var x: Int = 0
var y: Int = 0
for (elem <- results) {
x = x + elem._1
y = y + elem._2
}
(x, y)
}
Yes, you can use foldLeft.
(BTW, I would also use List, instead of Array)
results.foldLeft((0, 0)) {
case ((accX, accY), (x, y)) =>
(accX + x, accY + y)
}
All of the operations in scala.collection.ArrayOps are available on Array[T]. In particular, you can unzip an array of pairs into a pair of arrays
val (xs, ys) = results.unzip
Summing a container is a standard use of fold
val x = xs.fold(0)(_ + _)
val y = ys.fold(0)(_ + _)
And then you can return the pair of values
(x, y)
https://scalafiddle.io/sf/meEKv6T/0 has a complete working example.

efficient computation of haversine distance between elements of collections

I have two collections. Each collection is comprised of a collection containing a latitude, longitude, and epoch.
val arr1= Seq(Seq(34.464, -115.341,1486220267.0), Seq(34.473,
-115.452,1486227821.0), Seq(35.572, -116.945,1486217300.0),
Seq(37.843, -115.874,1486348520.0),Seq(35.874, -115.014,1486349803.0),
Seq(34.345, -116,924, 1486342752.0) )
val arr2= Seq(Seq(35.573, -116.945,1486217300.0 ),Seq(34.853,
-114.983,1486347321.0 ) )
I want to determine how many times the two arrays are within .5 miles and have the same epoch. I have two functions
def haversineDistance_single(pointA: (Double, Double), pointB: (Double, Double)): Double = {
val deltaLat = math.toRadians(pointB._1 - pointA._1)
val deltaLong = math.toRadians(pointB._2 - pointA._2)
val a = math.pow(math.sin(deltaLat / 2), 2) + math.cos(math.toRadians(pointA._1)) * math.cos(math.toRadians(pointB._1)) * math.pow(math.sin(deltaLong / 2), 2)
val greatCircleDistance = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
3958.761 * greatCircleDistance
}
def location_time(col_2:Seq[Seq[Double]], col_1:Seq[Seq[Double]]): Int={
val arr=col_1.map(x=> col_2.filter(y=> (haversineDistance_single((y(0), y(1)), (x(0),x(1)))<=.5) &
(math.abs(y(2)-x(2))<=0)).flatten).filter(x=> x.length>0)
arr.length
}
location_time(arr1,arr2) =1
My actual collections are very large, is there a more efficient way than my location_time function to compute this.
I would consider revising location_time from:
def location_time(col_mobile: Seq[Seq[Double]], col_laptop: Seq[Seq[Double]]): Int = {
val arr = col_laptop.map( x => col_mobile.filter( y =>
(haversineDistance_single((y(0), y(1)), (x(0), x(1))) <= .5) & (math.abs(y(2) - x(2)) <= 0)
).flatten
).filter(x => x.length > 0)
arr.length
}
to:
def location_time(col_mobile: Seq[Seq[Double]], col_laptop: Seq[Seq[Double]]): Int = {
val arr = col_laptop.flatMap( x => col_mobile.filter( y =>
((math.abs(y(2) - x(2)) <= 0 && haversineDistance_single((y(0), y(1)), (x(0), x(1))) <= .5))
)
)
arr.length
}
Changes made:
Revised col_mobile.filter(y => ...) from:
filter(_ => costlyCond1 & lessCostlyCond2)
to:
filter(_ => lessCostlyCond2 && costlyCond1)
Assuming haversineDistance_single is more costly to run than math.abs, replacing & with && (see difference between & versus &&) and testing math.abs first might help the filtering performance.
Simplified map/filter/flatten/filter using flatMap, replacing:
col_laptop.map(x => col_mobile.filter(y => ...).flatten).filter(_.length > 0)
with:
col_laptop.flatMap( x => col_mobile.filter( y => ... ))
In case you have access to, say, an Apache Spark cluster, consider converting your collections (if they're really large) to RDDs to compute using transformations similar to the above.

How to optionally return value from map function

map function on collections requires to return some value for each iteration. But I'm trying to find a way to return value not for each iteration, but only for initial values which matches some predicate .
What I want looks something like this:
(1 to 10).map { x =>
val res: Option[Int] = service.getById(x)
if (res.isDefined) Pair(x, res.get )// no else part
}
I think something like .collect function could do it, but seems with collect function I need to write many code in guards blocks (case x if {...// too much code here})
If you are returning an Option you can flatMap it and get only the values that are present (that is, are not None).
(1 to 10).flatMap { x =>
val res: Option[Int] = service.getById(x)
res.map{y => Pair(x, y) }
}
As you suggest, an alternative way to combine map and filter is to use collect and a partially applied function. Here is a simplified example:
(1 to 10).collect{ case x if x > 5 => x*2 }
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(12, 14, 16, 18, 20)
You can use the collect function (see here) to do exactly what you want. Your example would then look like:
(1 to 10) map (x => (x, service.getById(x))) collect {
case (x, Some(res)) => Pair(x, res)
}
Using a for comprehension, like this,
for ( x <- 1 to 10; res <- service.getById(x) ) yield Pair(x, res.get)
This yields pairs where res does not evaluate to None.
Getting the first element:
(1 to 10).flatMap { x =>
val res: Option[Int] = service.getById(x)
res.map{y => Pair(x, y) }
}.head

Scala: Best way to filter & map in one iteration

I'm new to Scala and trying to figure out the best way to filter & map a collection. Here's a toy example to explain my problem.
Approach 1: This is pretty bad since I'm iterating through the list twice and calculating the same value in each iteration.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums filter { x: Int => (x * x) > N } map { x: Int => (x * x).toString }
Approach 2: This is slightly better but I still need to calculate (x * x) twice.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums collect { case x: Int if (x * x) > N => (x * x).toString }
So, is it possible to calculate this without iterating through the collection twice and avoid repeating the same calculations?
Could use a foldRight
nums.foldRight(List.empty[Int]) {
case (i, is) =>
val s = i * i
if (s > N) s :: is else is
}
A foldLeft would also achieve a similar goal, but the resulting list would be in reverse order (due to the associativity of foldLeft.
Alternatively if you'd like to play with Scalaz
import scalaz.std.list._
import scalaz.syntax.foldable._
nums.foldMap { i =>
val s = i * i
if (s > N) List(s) else List()
}
The typical approach is to use an iterator (if possible) or view (if iterator won't work). This doesn't exactly avoid two traversals, but it does avoid creation of a full-sized intermediate collection. You then map first and filter afterwards and then map again if needed:
xs.iterator.map(x => x*x).filter(_ > N).map(_.toString)
The advantage of this approach is that it's really easy to read and, since there are no intermediate collections, it's reasonably efficient.
If you are asking because this is a performance bottleneck, then the answer is usually to write a tail-recursive function or use the old-style while loop method. For instance, in your case
def sumSqBigN(xs: Array[Int], N: Int): Array[String] = {
val ysb = Array.newBuilder[String]
def inner(start: Int): Array[String] = {
if (start >= xs.length) ysb.result
else {
val sq = xs(start) * xs(start)
if (sq > N) ysb += sq.toString
inner(start + 1)
}
}
inner(0)
}
You can also pass a parameter forward in inner instead of using an external builder (especially useful for sums).
I have yet to confirm that this is truly a single pass, but:
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(x) else None
}
You can use collect which applies a partial function to every value of the collection that it's defined at. Your example could be rewritten as follows:
val sqNumsLargerThanN = nums collect {
case (x: Int) if (x * x) > N => (x * x).toString
}
A very simple approach that only does the multiplication operation once. It's also lazy, so it will be executing code only when needed.
nums.view.map(x=>x*x).withFilter(x => x> N).map(_.toString)
Take a look here for differences between filter and withFilter.
Consider this for comprehension,
for (x <- 0 until 10; v = x*x if v > N) yield v.toString
which unfolds to a flatMap over the range and a (lazy) withFilter onto the once only calculated square, and yields a collection with filtered results. To note one iteration and one calculation of square is required (in addition to creating the range).
You can use flatMap.
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(square.toString) else None
}
Or with Scalaz,
import scalaz.Scalaz._
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
(square > N).option(square.toString)
}
The solves the asked question of how to do this with one iteration. This can be useful when streaming data, like with an Iterator.
However...if you are instead wanting the absolute fastest implementation, this is not it. In fact, I suspect you would use a mutable ArrayList and a while loop. But only after profiling would you know for sure. In any case, that's for another question.
Using a for comprehension would work:
val sqNumsLargerThanN = for {x <- nums if x*x > N } yield (x*x).toString
Also, I'm not sure but I think the scala compiler is smart about a filter before a map and will only do 1 pass if possible.
I am also beginner did it as follows
for(y<-(num.map(x=>x*x)) if y>5 ) { println(y)}

Find min and max elements of array

I want to find the min and max elements of an array using for comprehension. Is it possible to do that with one iteration of array to find both min element and max element?
I am looking for a solution without using scala provided array.min or max.
You can get min and max values of an Array[Int] with reduceLeft function.
scala> val a = Array(20, 12, 6, 15, 2, 9)
a: Array[Int] = Array(20, 12, 6, 15, 2, 9)
scala> a.reduceLeft(_ min _)
res: Int = 2
scala> a.reduceLeft(_ max _)
res: Int = 20
See this link for more information and examples of reduceLeft method: http://alvinalexander.com/scala/scala-reduceleft-examples
Here is a concise and readable solution, that avoids the ugly if statements :
def minMax(a: Array[Int]) : (Int, Int) = {
if (a.isEmpty) throw new java.lang.UnsupportedOperationException("array is empty")
a.foldLeft((a(0), a(0)))
{ case ((min, max), e) => (math.min(min, e), math.max(max, e))}
}
Explanation : foldLeft is a standard method in Scala on many collections. It allows to pass an accumulator to a callback function that will be called for each element of the array.
Take a look at scaladoc for further details
def findMinAndMax(array: Array[Int]) = { // a non-empty array
val initial = (array.head, array.head) // a tuple representing min-max
// foldLeft takes an initial value of type of result, in this case a tuple
// foldLeft also takes a function of 2 parameters.
// the 'left' parameter is an accumulator (foldLeft -> accum is left)
// the other parameter is a value from the collection.
// the function2 should return a value which replaces accumulator (in next iteration)
// when the next value from collection will be picked.
// so on till all values are iterated, in the end accum is returned.
array.foldLeft(initial) { ((min, max), x) =>
if (x < min) (x, max)
else if (x > max) (min, x)
else acc
}
}
Following on from the other answers - a more general solution is possible, that works for other collections as well as Array, and other contents as well as Int:
def minmax[B >: A, A](xs: Iterable[A])(implicit cmp: Ordering[B]): (A, A) = {
if (xs.isEmpty) throw new UnsupportedOperationException("empty.minmax")
val initial = (xs.head, xs.head)
xs.foldLeft(initial) { case ((min, max), x) =>
(if (cmp.lt(x, min)) x else min, if (cmp.gt(x, max)) x else max) }
}
For example:
minmax(List(4, 3, 1, 2, 5)) //> res0: (Int, Int) = (1,5)
minmax(Vector('Z', 'K', 'B', 'A')) //> res1: (Char, Char) = (A,Z)
minmax(Array(3.0, 2.0, 1.0)) //> res2: (Double, Double) = (1.0,3.0)
(It's also possible to write this a bit more concisely using cmp.min() and cmp.max(), but only if you remove the B >: A type bound, which makes the function less general).
Consider this (for non-empty orderable arrays),
val ys = xs.sorted
val (min,max) = (ys.head, ys.last)
val xs: Array[Int] = ???
var min: Int = Int.MaxValue
var max: Int = Int.MinValue
for (x <- xs) {
if (x < min) min = x
if (x > max) max = x
}
I'm super late to the party on this one, but I'm new to Scala and thought I'd contribute anyway. Here is a solution using tail recursion:
#tailrec
def max(list: List[Int], currentMax: Int = Int.MinValue): Int = {
if(list.isEmpty) currentMax
else if ( list.head > currentMax) max(list.tail, list.head)
else max(list.tail,currentMax)
}
Of all of the answers I reviewed to this questions, DNA's solution was the closest to "Scala idiomatic" I could find. However, it can be slightly improved by...:
Performing as few comparisons as needed (important for very large collections)
Provide ideal ordering consistency by only using the Ordering.lt method
Avoiding throwing an Exception
Making the code more readable for those new to and learning Scala
The comments should help clarify the changes.
def minAndMax[B>: A, A](iterable: Iterable[A])(implicit ordering: Ordering[B]): Option[(A, A)] =
if (iterable.nonEmpty)
Some(
iterable.foldLeft((iterable.head, iterable.head)) {
case (minAndMaxTuple, element) =>
val (min, max) =
minAndMaxTuple //decode reference to tuple
if (ordering.lt(element, min))
(element, max) //if replacing min, it isn't possible max will change so no need for the max comparison
else
if (ordering.lt(max, element))
(min, element)
else
minAndMaxTuple //use original reference to avoid instantiating a new tuple
}
)
else
None
And here's the solution expanded to return the lower and upper bounds of a 2d space in a single pass, again using the above optimizations:
def minAndMax2d[B >: A, A](iterable: Iterable[(A, A)])(implicit ordering: Ordering[B]): Option[((A, A), (A, A))] =
if (iterable.nonEmpty)
Some(
iterable.foldLeft(((iterable.head._1, iterable.head._1), (iterable.head._2, iterable.head._2))) {
case ((minAndMaxTupleX, minAndMaxTupleY), (elementX, elementY)) =>
val ((minX, maxX), (minY, maxY)) =
(minAndMaxTupleX, minAndMaxTupleY) //decode reference to tuple
(
if (ordering.lt(elementX, minX))
(elementX, maxX) //if replacing minX, it isn't possible maxX will change so no need for the maxX comparison
else
if (ordering.lt(maxX, elementX))
(minX, elementX)
else
minAndMaxTupleX //use original reference to avoid instantiating a new tuple
, if (ordering.lt(elementY, minY))
(elementY, maxY) //if replacing minY, it isn't possible maxY will change so no need for the maxY comparison
else
if (ordering.lt(maxY, elementY))
(minY, elementY)
else
minAndMaxTupleY //use original reference to avoid instantiating a new tuple
)
}
)
else
None
You could always write your own foldLeft function - that will guarantee one iteration and known performance.
val array = Array(3,4,62,8,9,2,1)
if(array.isEmpty) throw new IllegalArgumentException // Just so we can safely call array.head
val (minimum, maximum) = array.foldLeft((array.head, array.head)) { // We start of with the first element as min and max
case ((min, max), next) =>
if(next > max) (min, next)
else if(next < min) (next, max)
else (min, max)
}
println(minimum, maximum) //1, 62
scala> val v = Vector(1,2)
scala> v.max
res0: Int = 2
scala> v.min
res1: Int = 2
You could use the min and max methods of Vector