Find min and max elements of array - scala

I want to find the min and max elements of an array using for comprehension. Is it possible to do that with one iteration of array to find both min element and max element?
I am looking for a solution without using scala provided array.min or max.

You can get min and max values of an Array[Int] with reduceLeft function.
scala> val a = Array(20, 12, 6, 15, 2, 9)
a: Array[Int] = Array(20, 12, 6, 15, 2, 9)
scala> a.reduceLeft(_ min _)
res: Int = 2
scala> a.reduceLeft(_ max _)
res: Int = 20
See this link for more information and examples of reduceLeft method: http://alvinalexander.com/scala/scala-reduceleft-examples

Here is a concise and readable solution, that avoids the ugly if statements :
def minMax(a: Array[Int]) : (Int, Int) = {
if (a.isEmpty) throw new java.lang.UnsupportedOperationException("array is empty")
a.foldLeft((a(0), a(0)))
{ case ((min, max), e) => (math.min(min, e), math.max(max, e))}
}
Explanation : foldLeft is a standard method in Scala on many collections. It allows to pass an accumulator to a callback function that will be called for each element of the array.
Take a look at scaladoc for further details

def findMinAndMax(array: Array[Int]) = { // a non-empty array
val initial = (array.head, array.head) // a tuple representing min-max
// foldLeft takes an initial value of type of result, in this case a tuple
// foldLeft also takes a function of 2 parameters.
// the 'left' parameter is an accumulator (foldLeft -> accum is left)
// the other parameter is a value from the collection.
// the function2 should return a value which replaces accumulator (in next iteration)
// when the next value from collection will be picked.
// so on till all values are iterated, in the end accum is returned.
array.foldLeft(initial) { ((min, max), x) =>
if (x < min) (x, max)
else if (x > max) (min, x)
else acc
}
}

Following on from the other answers - a more general solution is possible, that works for other collections as well as Array, and other contents as well as Int:
def minmax[B >: A, A](xs: Iterable[A])(implicit cmp: Ordering[B]): (A, A) = {
if (xs.isEmpty) throw new UnsupportedOperationException("empty.minmax")
val initial = (xs.head, xs.head)
xs.foldLeft(initial) { case ((min, max), x) =>
(if (cmp.lt(x, min)) x else min, if (cmp.gt(x, max)) x else max) }
}
For example:
minmax(List(4, 3, 1, 2, 5)) //> res0: (Int, Int) = (1,5)
minmax(Vector('Z', 'K', 'B', 'A')) //> res1: (Char, Char) = (A,Z)
minmax(Array(3.0, 2.0, 1.0)) //> res2: (Double, Double) = (1.0,3.0)
(It's also possible to write this a bit more concisely using cmp.min() and cmp.max(), but only if you remove the B >: A type bound, which makes the function less general).

Consider this (for non-empty orderable arrays),
val ys = xs.sorted
val (min,max) = (ys.head, ys.last)

val xs: Array[Int] = ???
var min: Int = Int.MaxValue
var max: Int = Int.MinValue
for (x <- xs) {
if (x < min) min = x
if (x > max) max = x
}

I'm super late to the party on this one, but I'm new to Scala and thought I'd contribute anyway. Here is a solution using tail recursion:
#tailrec
def max(list: List[Int], currentMax: Int = Int.MinValue): Int = {
if(list.isEmpty) currentMax
else if ( list.head > currentMax) max(list.tail, list.head)
else max(list.tail,currentMax)
}

Of all of the answers I reviewed to this questions, DNA's solution was the closest to "Scala idiomatic" I could find. However, it can be slightly improved by...:
Performing as few comparisons as needed (important for very large collections)
Provide ideal ordering consistency by only using the Ordering.lt method
Avoiding throwing an Exception
Making the code more readable for those new to and learning Scala
The comments should help clarify the changes.
def minAndMax[B>: A, A](iterable: Iterable[A])(implicit ordering: Ordering[B]): Option[(A, A)] =
if (iterable.nonEmpty)
Some(
iterable.foldLeft((iterable.head, iterable.head)) {
case (minAndMaxTuple, element) =>
val (min, max) =
minAndMaxTuple //decode reference to tuple
if (ordering.lt(element, min))
(element, max) //if replacing min, it isn't possible max will change so no need for the max comparison
else
if (ordering.lt(max, element))
(min, element)
else
minAndMaxTuple //use original reference to avoid instantiating a new tuple
}
)
else
None
And here's the solution expanded to return the lower and upper bounds of a 2d space in a single pass, again using the above optimizations:
def minAndMax2d[B >: A, A](iterable: Iterable[(A, A)])(implicit ordering: Ordering[B]): Option[((A, A), (A, A))] =
if (iterable.nonEmpty)
Some(
iterable.foldLeft(((iterable.head._1, iterable.head._1), (iterable.head._2, iterable.head._2))) {
case ((minAndMaxTupleX, minAndMaxTupleY), (elementX, elementY)) =>
val ((minX, maxX), (minY, maxY)) =
(minAndMaxTupleX, minAndMaxTupleY) //decode reference to tuple
(
if (ordering.lt(elementX, minX))
(elementX, maxX) //if replacing minX, it isn't possible maxX will change so no need for the maxX comparison
else
if (ordering.lt(maxX, elementX))
(minX, elementX)
else
minAndMaxTupleX //use original reference to avoid instantiating a new tuple
, if (ordering.lt(elementY, minY))
(elementY, maxY) //if replacing minY, it isn't possible maxY will change so no need for the maxY comparison
else
if (ordering.lt(maxY, elementY))
(minY, elementY)
else
minAndMaxTupleY //use original reference to avoid instantiating a new tuple
)
}
)
else
None

You could always write your own foldLeft function - that will guarantee one iteration and known performance.
val array = Array(3,4,62,8,9,2,1)
if(array.isEmpty) throw new IllegalArgumentException // Just so we can safely call array.head
val (minimum, maximum) = array.foldLeft((array.head, array.head)) { // We start of with the first element as min and max
case ((min, max), next) =>
if(next > max) (min, next)
else if(next < min) (next, max)
else (min, max)
}
println(minimum, maximum) //1, 62

scala> val v = Vector(1,2)
scala> v.max
res0: Int = 2
scala> v.min
res1: Int = 2
You could use the min and max methods of Vector

Related

How to create an Iterable of semi-homogenous Integers that sum to a max in Scala?

In scala; given a maximum sum value and a maximum element value, how can an Iterable of elements be created such that the elements add up to the maximum sum? The Iterable should have the smallest size possible.
As an example, given
val maxSum = 47
val maxElementValue = 10
How can the following Iterable be created:
Iterable(10, 10, 10, 10, 7) //sum to 47
Other examples:
val maxSum = 9
val maxElementValue = 10
Iterable(9)
val maxSum = 11
val maxElementValue = 5
Iterable(5, 5, 1)
Thank you in advance for your consideration and response.
If you are in 2.13 you can use unfold
def elementGenerator(maxSum: Int, maxElementValue: Int): List[Int] =
Iterator.unfold(maxSum) { remainingSum =>
if (remainingSum == 0)
None
else if (remainingSum <= maxElementValue)
Some(remainingSum -> 0)
else
Some(maxElementValue -> (remainingSum - maxElementValue))
}.toList
You may also consider just returning the Iterator or using a LazyList if you do not need to keep all the elements right now but just know how to generate them.
How about
((maxSum % maxElementValue) :: List.fill(maxSum / maxElementValue)(maxElementValue)).filterNot(_ == 0)
You can use .unfold for this:
// Use `Iterator.unfold` if you want an Iterator instead of a List
List.unfold(maxSum) { remaining =>
if (remaining <= 0) None
else {
val nextElement = math.min(maxElementValue, remaining)
Some(nextElement -> (remaining - nextElement))
}
}
returns
List(10, 10, 10, 10, 7)
See it on scastie.
A recursive function can do the trick
#scala.annotation.tailrec
def elementGenerator(maxSum: Int, maxElementValue: Int, currentElements: Seq[Int] = Seq.empty[Int]) : Seq[Int] =
if(maxSum == 0 || maxValue <= 0)
currentElements
else if(maxSum < maxElementValue)
currentElements :+ maxSum
else
elementGenerator(maxSum - maxElementValue, maxElementValue, currentElements :+ maxElementValue)
Presented mostly as a "well, it works but you really don't want to do it this way" alternative :)
Iterator.fill(maxSum)(1).grouped(maxElementValue).map(_.sum)

Getting the mode from an RDD

I would like to get the mode (the most common number) from an rdd using Spark + Scala.
I can get it doing the following but I think it could be a better way to calculate this. The most important thing is if more than one value has the same number of repetition, I need to return both of them.
Let's see my example code:
val l = List(3,4,4,3,3,7,7,7,9)
val rdd = spark.sparkContext.parallelize(l)
val grouped = rdd.map (e => (e, 1)).groupBy(_._1).map(e=> (e._1, e._2.size))
val maxRep = grouped.collect().maxBy(_._2)._2
val mode = grouped.filter(e => e._2 == maxRep).map(e => e._1).collect
And the result is right:
Array[Int] = Array(3, 7)
but is there a better way to do this? I mean considering the performance because the original RDD would be much bigger than this.
This should work and be a little bit more efficient.
(only if you are sure the total number of elements is small)
val counted = rdd.countByValue()
val max = counted.valuesIterator.max
val maxElements = count.collect { case (k, v) if (v == max) => k }
If there could be many elements, consider this alternative which is memory safe.
val counted = rdd.map(x => (x, 1L)).reduceByKey(_ + _).cache()
val max = counted.values.max
val maxElements = counted.map { case (k, v) => (v, k) }.lookup(max)
How about get the max key-value pair from a double groupBy? This works even better for bigger data size.
rdd.groupBy(identity).mapValues(_.size).groupBy(_._2).max
// res1: (Int, Iterable[(Int, Int)]) = (3,CompactBuffer((3,3), (7,3)))
To get the element
rdd.groupBy(identity).mapValues(_.size).groupBy(_._2).max._2.map(_._1)
// res4: Iterable[Int] = List(3, 7)
The first groupBy will get element into (element -> count) with type Map[Int, Long], the second groupBy will group (element -> count) by count, like (count -> Iterable((element, count)), then simply max to get the key-value pair with the maximum key value, which is the count.

Refactoring a small Scala function

I have this function to compute the distance between two n-dimensional points using Pythagoras' theorem.
def computeDistance(neighbour: Point) = math.sqrt(coordinates.zip(neighbour.coordinates).map {
case (c1: Int, c2: Int) => math.pow(c1 - c2, 2)
}.sum)
The Point class (simplified) looks like:
class Point(val coordinates: List[Int])
I'm struggling to refactor the method so it's a little easier to read, can anybody help please?
Here's another way that makes the following three assumptions:
The length of the list is the number of dimensions for the point
Each List is correctly ordered, i.e. List(x, y) or List(x, y, z). We do not know how to handle List(x, z, y)
All lists are of equal length
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
)
The obvious disadvantage here is that we don't have any safety around list length. The quick fix for this is to simply have the function return an Option[Double] like so:
def computeDistance(other: Point): Option[Double] = {
if(other.coordinates.length != coordinates.length) {
return None
}
return Some(sqrt(coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
))
I'd be curious if there is a type safe way to ensure equal list length.
EDIT
It was politely pointed out to me that flatMap(x => List(foo(x))) is equivalent to map(foo) , which I forgot to refactor when I was originally playing w/ this. Slightly cleaner version w/ Map instead of flatMap :
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.map(i => pow(i._2 - i._1, 2))
.fold(0.0)(_ + _)
)
Most of your problem is that you're trying to do math with really long variable names. It's almost always painful. There's a reason why mathematicians use single letters. And assign temporary variables.
Try this:
class Point(val coordinates: List[Int]) { def c = coordinates }
import math._
def d(p: Point) = {
val delta = for ((a,b) <- (c zip p.c)) yield pow(a-b, dims)
sqrt(delta.sum)
}
Consider type aliases and case classes, like this,
type Coord = List[Int]
case class Point(val c: Coord) {
def distTo(p: Point) = {
val z = (c zip p.c).par
val pw = z.aggregate(0.0) ( (a,v) => a + math.pow( v._1-v._2, 2 ), _ + _ )
math.sqrt(pw)
}
}
so that for any two points, for instance,
val p = Point( (1 to 5).toList )
val q = Point( (2 to 6).toList )
we have that
p distTo q
res: Double = 2.23606797749979
Note method distTo uses aggregate on a parallelised collection of tuples, and combines the partial results by the last argument (summation). For high dimensional points this may prove more efficient than the sequential counterpart.
For simplicity of use, consider also implicit classes, as suggested in a comment above,
implicit class RichPoint(val c: Coord) extends AnyVal {
def distTo(d: Coord) = Point(c) distTo Point(d)
}
Hence
List(1,2,3,4,5) distTo List(2,3,4,5,6)
res: Double = 2.23606797749979

How to capture inner matched value in indexWhere vector expression?

Using a Vector[Vector[Int]] reference v, and the expression to find a given number num:
val posX = v.indexWhere(_.indexOf(num) > -1)
Is there any way to capture the value of _.indexOf(num) to use after the expression (i.e. the posY value)? The following signals an error 'Illegal start of simple expression':
val posX = v.indexWhere((val posY = _.indexOf(num)) > -1)
If we do not mind using a variable then we can capture indexOf() value of inner Vector (_ in the below code) in a var and use it later to build the y position:
val posX = v.indexWhere(_.indexOf(num) > -1)
val posY = v(posX).indexOf(num)
There are lots of nice functional ways to do this. The following is probably one of the more concise:
val v = Vector(Vector(1, 2, 3), Vector(4, 5, 6), Vector(7, 8, 9))
val num = 4
val Some((posY, posX)) = v.map(_ indexOf num).zipWithIndex.find(_._1 > -1)
// posY: Int = 0
// posX: Int = 1
Note that there's a lot of extra work going on here, though—we're creating a couple of intermediate collections, parts of which we don't need, etc. If you're calling this thing a lot or on very large collections, you unfortunately may need to take a more imperative approach. In that case I'd suggest bundling up all the unpleasantness:
def locationOf(v: Vector[Vector[Int]])(num: Int): Option[(Int, Int)] = {
var i, j = 0
var found = false
while (i < v.size && !found) {
j = 0
while (j < v(i).size && !found)
if (v(i)(j) == num) found = true else j += 1
if (!found) i += 1
}
if (!found) None else Some(i, j)
}
Not as elegant, but this method is probably going to be a lot faster and more memory efficient. It's small enough that it isn't likely to contain any of the bugs that this kind of programming is so prone to, and it's referentially transparent—all the mutation is local.
From my armchair,
scala> val v = Vector(Vector(1, 2, 3), Vector(4, 5, 6), Vector(7, 8, 9))
scala> v.zipWithIndex collectFirst {
| case (e, i) if (e indexOf num) >= 0 =>
| (i, e indexOf num)
| }
res7: Option[(Int, Int)] = Some((1,0))
I haven't done the armchair math, but that's one intermediate collection compared to Travis's. But see Travis's comment that the result inner index is computed twice here, and the whole point was not to do that.
Here is a solution that will only evaluate up until it finds the required element. I personally find it more readable and you can reuse it across programs. You can obviously make this more general if need be.
val v = Vector(Vector(1, 2, 3), Vector(4, 5, 6))
def findElem(i: Int, vs: Vector[Vector[Int]]): (Int, Int) =
(for {
row <- vs.indices.toStream
col <- vs(row).indices.toStream
if vs(row)(col) == i
} yield (row, col)).head
findElem(5, v) // (1, 1)
You could remove the .toStream methods if you want all positions. Using the .toStream just means that you will only evaluate up until the first occurrence.

Finding character in 2 dimensional scala list

So this might not be the best way to tackle it but my initial thought was a for expression.
Say I have a List like
List(List('a','b','c'),List('d','e','f'),List('h','i','j'))
I would like to find the row and column for a character, say 'e'.
def findChar(letter: Char, list: List[List[Char]]): (Int, Int) =
for {
r <- (0 until list.length)
c <- (0 until list(r).length)
if list(r)(c) == letter
} yield (r, c)
If there is a more elegant way I'm all ears but I would also like to understand what's wrong with this. Specifically the error the compiler gives me here is
type mismatch; found : scala.collection.immutable.IndexedSeq[(Int, Int)] required: (Int, Int)
on the line assigning to r. It seems to be complaining that my iterator doesn't match the return type but I don't quite understand why this is or what to do about it ...
In the signature of findChar you are telling the compiler that it returns (Int, Int). However, the result of your for expression (as inferred by Scala) is IndexedSeq[(Int, Int)] as the error message indicates. The reason is that (r, c) after yield is produced for every "iteration" in the for expression (i.e., you are generating a sequence of results, not just a single result).
EDIT: As for findChar, you could do:
def findChar(letter: Char, list: List[List[Char]]) = {
val r = list.indexWhere(_ contains letter)
val c = list(r).indexOf(letter)
(r, c)
}
It is not the most efficient solution, but relatively short.
EDIT: Or reuse your original idea:
def findAll(letter: Char, list: List[List[Char]]) =
for {
r <- 0 until list.length
c <- 0 until list(r).length
if list(r)(c) == letter
} yield (r, c)
def findChar(c: Char, xs: List[List[Char]]) = findAll(c, xs).head
In both cases, be aware that an exception occurs if the searched letter is not contained in the input list.
EDIT: Or you write a recursive function yourself, like:
def findPos[A](c: A, list: List[List[A]]) = {
def aux(i: Int, xss: List[List[A]]) : Option[(Int, Int)] = xss match {
case Nil => None
case xs :: xss =>
val j = xs indexOf c
if (j < 0) aux(i + 1, xss)
else Some((i, j))
}
aux(0, list)
}
where aux is a (locally defined) auxiliary function that does the actual recursion (and remembers in which sublist we are, the index i). In this implementation a result of None indicates that the searched element was not there, whereas a successful result might return something like Some((1, 1)).
For your other ear, the question duplicates
How to capture inner matched value in indexWhere vector expression?
scala> List(List('a','b','c'),List('d','e','f'),List('h','i','j'))
res0: List[List[Char]] = List(List(a, b, c), List(d, e, f), List(h, i, j))
scala> .map(_ indexOf 'e').zipWithIndex.find(_._1 > -1)
res1: Option[(Int, Int)] = Some((1,1))