Scala: Rewrite code duplication with closures - scala

I have this code:
val arr: Array[Int] = ...
val largestIndex = {
var i = arr.length-2
while (arr(i) > arr(i+1)) i -= 1
i
}
val smallestIndex = {
var k = arr.length-1
while (arr(largestIndex) > arr(k)) k -= 1
k
}
But there is to much code duplication. I tried to rewrite this with closures but I failed. I tried something like this:
def index(sub: Int, f: => Boolean): Int = {
var i = arr.length-sub
while (f) i -= 1
i
}
val largest = index(2, i => arr(i) > arr(i+1))
val smallest = index(1, i => arr(largest) > arr(i))
The problem is that i can't use parameter i of the method index() in the closure. Is there a way to avoid this problem?

val arr = Array(1,2,4,3,3,4,5)
def index(sub: Int, f: Int => Boolean): Int = {
var i = arr.length-sub
while (f(i)) i -= 1
i
}
val largest = index(2, i => arr(i) > arr(i+1))
val smallest = index(1, i => arr(largest) > arr(i))

val arr = Array(1,2,4,3,3,4,5)
arr: Array[Int] = Array(1, 2, 4, 3, 3, 4, 5)
scala> arr.zipWithIndex.max(Ordering.by((x: (Int, Int)) => x._1))._2
res0: Int = 6
scala> arr.zipWithIndex.min(Ordering.by((x: (Int, Int)) => x._1))._2
res1: Int = 0
or
scala> val pairOrdering = Ordering.by((x: (Int, Int)) => x._1)
pairOrdering: scala.math.Ordering[(Int, Int)] = scala.math.Ordering$$anon$4#145ad3d
scala> arr.zipWithIndex.max(pairOrdering)._2
res2: Int = 6
scala> arr.zipWithIndex.min(pairOrdering)._2
res3: Int = 0

Related

Spark in Scala - Map with Function with Extra Arguments

Is there a way in Scala to define an explicit function for an RDD Transformation with additional/extra arguments?
For example, the Python code below uses a lambda expression to apply the transformation map (requiring a function with one argument) with the function my_power (actually having 2 arguments).
def my_power(a, b):
res = a ** b
return res
def my_main(sc, n):
inputRDD = sc.parallelize([1, 2, 3, 4])
powerRDD = inputRDD.map(lambda x: my_power(x, n))
resVAL = powerRDD.collect()
for item in resVAL:
print(item)
However, when attempting an equivalent implementation in Scala, I get a Task not serializable exception.
val myPower: (Int, Int) => Int = (a: Int, b: Int) => {
val res: Int = math.pow(a, b).toInt
res
}
def myMain(sc: SparkContext, n: Int): Unit = {
val inputRDD: RDD[Int] = sc.parallelize(Array(1, 2, 3, 4))
val squareRDD: RDD[Int] = inputRDD.map( (x: Int) => myPower(x, n) )
val resVAL: Array[Int] = squareRDD.collect()
for (item <- resVAL){
println(item)
}
}
In this way it was working for me.
package examples
import org.apache.log4j.Level
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
object RDDTest extends App {
val logger = org.apache.log4j.Logger.getLogger("org")
logger.setLevel(Level.WARN)
val spark = SparkSession.builder()
.appName(this.getClass.getName)
.config("spark.master", "local[*]").getOrCreate()
val myPower: (Int, Int) => Int = (a: Int, b: Int) => {
val res: Int = math.pow(a, b).toInt
res
}
val scontext = spark.sparkContext
myMain(scontext, 10);
def myMain(sc: SparkContext, n: Int): Unit = {
val inputRDD: RDD[Int] = sc.parallelize(Array(1, 2, 3, 4))
val squareRDD: RDD[Int] = inputRDD.map((x: Int) => myPower(x, n))
val resVAL: Array[Int] = squareRDD.collect()
for ( item <- resVAL ) {
println(item)
}
}
}
Result :
1024
59049
1048576
There is another option to broadcast n using sc.broadcast and access in the closure like map is also possible...
Simply adding a local variable as a function alias made it work:
val myPower: (Int, Int) => Int = (a: Int, b: Int) => {
val res: Int = math.pow(a, b).toInt
res
}
def myMain(sc: SparkContext, n: Int): Unit = {
val inputRDD: RDD[Int] = sc.parallelize(Array(1, 2, 3, 4))
val myPowerAlias = myPower
val squareRDD: RDD[Int] = inputRDD.map( (x: Int) => myPowerAlias(x, n) )
val resVAL: Array[Int] = squareRDD.collect()
for (item <- resVAL){
println(item)
}
}

None if no clear winner in Scala Map maxBy

val valueCountsMap: mutable.Map[String, Int] = mutable.Map[String, Int]()
valueCountsMap("a") = 1
valueCountsMap("b") = 1
valueCountsMap("c") = 1
val maxOccurredValueNCount: (String, Int) = valueCountsMap.maxBy(_._2)
// maxOccurredValueNCount: (String, Int) = (b,1)
How can I get None if there's no clear winner when I do maxBy values? I am wondering if there's any native solution already implemented within scala mutable Maps.
No, there's no native solution for what you've described.
Here's how I might go about it.
implicit class UniqMax[K,V:Ordering](m: Map[K,V]) {
def uniqMaxByValue: Option[(K,V)] = {
m.headOption.fold(None:Option[(K,V)]){ hd =>
val ev = implicitly[Ordering[V]]
val (count, max) = m.tail.foldLeft((1,hd)) {case ((c, x), v) =>
if (ev.gt(v._2, x._2)) (1, v)
else if (v._2 == x._2) (c+1, x)
else (c, x)
}
if (count == 1) Some(max) else None
}
}
}
Usage:
Map("a"->11, "b"->12, "c"->11).uniqMaxByValue //res0: Option[(String, Int)] = Some((b,12))
Map(2->"abc", 1->"abx", 0->"ab").uniqMaxByValue //res1: Option[(Int, String)] = Some((1,abx))
Map.empty[Long,Boolean].uniqMaxByValue //res2: Option[(Long, Boolean)] = None
Map('c'->2.2, 'w'->2.2, 'x'->2.1).uniqMaxByValue //res3: Option[(Char, Double)] = None

Partially applied function as object

Let
def f(i:Int)(j:Int) = i + j
and so
f(1) _
Int => Int = <function1>
However,
val f: (Int)(Int) => Int = (a:Int)(b:Int) => a + b // wrong
namely, error: ';' expected but '(' found. How to declare val f ?
Is this what you were looking for?
scala> val f: Int => Int => Int = a => b => a + b
f: Int => (Int => Int) = <function1>
scala> f(1)
res7: Int => Int = <function1>
scala> f(1)(2)
res8: Int = 3

How to find two elements satisfying two predicates in one pass with Scala?

Suppose there is a List[A] and two predicates p1: A => Boolean and p2: A => Boolean.
I need to find two elements in the list: the first element a1 satisfying p1 and the first element a2 satisfying p2 (in my case a1 != a2)
Obviously, I can run find twice but I would like to do it in one pass. How would you do it in one pass in Scala ?
So, here's an attempt. It's fairly straightforward to generalise it to take a list of predicates (and return a list of elements found)
def find2[A](xs: List[A], p1: A => Boolean, p2: A => Boolean): (Option[A], Option[A]) = {
def find2helper(xs: List[A], p1: A => Boolean, p2: A => Boolean, soFar: (Option[A], Option[A])): (Option[A], Option[A]) = {
if (xs == Nil) soFar
else {
val a1 = if (soFar._1.isDefined) soFar._1 else if (p1(xs.head)) Some(xs.head) else None
val a2 = if (soFar._2.isDefined) soFar._2 else if (p2(xs.head)) Some(xs.head) else None
if (a1.isDefined && a2.isDefined) (a1, a2) else find2helper(xs.tail, p1, p2, (a1, a2))
}
}
find2helper(xs, p1, p2, (None, None))
} //> find2: [A](xs: List[A], p1: A => Boolean, p2: A => Boolean)(Option[A], Option[A])
val foo = List(1, 2, 3, 4, 5) //> foo : List[Int] = List(1, 2, 3, 4, 5)
find2[Int](foo, { x: Int => x > 2 }, { x: Int => x % 2 == 0 })
//> res0: (Option[Int], Option[Int]) = (Some(3),Some(2))
find2[Int](foo, { x: Int => x > 2 }, { x: Int => x % 7 == 0 })
//> res1: (Option[Int], Option[Int]) = (Some(3),None)
find2[Int](foo, { x: Int => x > 7 }, { x: Int => x % 2 == 0 })
//> res2: (Option[Int], Option[Int]) = (None,Some(2))
find2[Int](foo, { x: Int => x > 7 }, { x: Int => x % 7 == 0 })
//> res3: (Option[Int], Option[Int]) = (None,None)
Generalised version (which is actually slightly clearer, I think)
def findN[A](xs: List[A], ps: List[A => Boolean]): List[Option[A]] = {
def findNhelper(xs: List[A], ps: List[A => Boolean], soFar: List[Option[A]]): List[Option[A]] = {
if (xs == Nil) soFar
else {
val as = ps.zip(soFar).map {
case (p, e) => if (e.isDefined) e else if (p(xs.head)) Some(xs.head) else None
}
if (as.forall(_.isDefined)) as else findNhelper(xs.tail, ps, as)
}
}
findNhelper(xs, ps, List.fill(ps.length)(None))
} //> findN: [A](xs: List[A], ps: List[A => Boolean])List[Option[A]]
val foo = List(1, 2, 3, 4, 5) //> foo : List[Int] = List(1, 2, 3, 4, 5)
findN[Int](foo, List({ x: Int => x > 2 }, { x: Int => x % 2 == 0 }))
//> res0: List[Option[Int]] = List(Some(3), Some(2))
findN[Int](foo, List({ x: Int => x > 2 }, { x: Int => x % 7 == 0 }))
//> res1: List[Option[Int]] = List(Some(3), None)
findN[Int](foo, List({ x: Int => x > 7 }, { x: Int => x % 2 == 0 }))
//> res2: List[Option[Int]] = List(None, Some(2))
findN[Int](foo, List({ x: Int => x > 7 }, { x: Int => x % 7 == 0 }))
//> res3: List[Option[Int]] = List(None, None)
scala> val l = List(1,2,3)
l: List[Int] = List(1, 2, 3)
scala> val p1 = {x:Int => x % 2 == 0}
p1: Int => Boolean = <function1>
scala> val p2 = {x:Int => x % 3 == 0}
p2: Int => Boolean = <function1>
scala> val pp = {x:Int => p1(x) || p2(x) }
pp: Int => Boolean = <function1>
scala> l.find(pp)
res2: Option[Int] = Some(2)
scala> l.filter(pp)
res3: List[Int] = List(2, 3)
Does this work for you?
def predFilter[A](lst: List[A], p1: A => Boolean, p2: A => Boolean): List[A] =
lst.filter(x => p1(x) || p2(x)) // or p1(x) && p2(x) depending on your need
This will return you a new list that matches either of the predicates.
val a = List(1,2,3,4,5)
val b = predFilter[Int](a, _ % 2 == 0, _ % 3 == 0) // b is now List(2, 3, 4)

Scala, extending the iterator

Im looking to extended the iterator to create a new method takeWhileInclusive, which will operate like takeWhile but include the last element.
My issue is what is best practice to extend the iterator to return a new iterator which I would like to be lazy evaluated. Coming from a C# background I normal use IEnumerable and use the yield keyword, but such an option doesn't appear to exist in Scala.
for example I could have
List(0,1,2,3,4,5,6,7).iterator.map(complex time consuming algorithm).takeWhileInclusive(_ < 6)
so in this case the takeWhileInclusive would only have resolve the predicate on the values until I get the a result greater than 6, and it will include this first result
so far I have:
object ImplicitIterator {
implicit def extendIterator(i : Iterator[Any]) = new IteratorExtension(i)
}
class IteratorExtension[T <: Any](i : Iterator[T]) {
def takeWhileInclusive(predicate:(T) => Boolean) = ?
}
You can use the span method of Iterator to do this pretty cleanly:
class IteratorExtension[A](i : Iterator[A]) {
def takeWhileInclusive(p: A => Boolean) = {
val (a, b) = i.span(p)
a ++ (if (b.hasNext) Some(b.next) else None)
}
}
object ImplicitIterator {
implicit def extendIterator[A](i : Iterator[A]) = new IteratorExtension(i)
}
import ImplicitIterator._
Now (0 until 10).toIterator.takeWhileInclusive(_ < 4).toList gives List(0, 1, 2, 3, 4), for example.
This is one case where I find the mutable solution superior:
class InclusiveIterator[A](ia: Iterator[A]) {
def takeWhileInclusive(p: A => Boolean) = {
var done = false
val p2 = (a: A) => !done && { if (!p(a)) done=true; true }
ia.takeWhile(p2)
}
}
implicit def iterator_can_include[A](ia: Iterator[A]) = new InclusiveIterator(ia)
The following requires scalaz to get fold on a tuple (A, B)
scala> implicit def Iterator_Is_TWI[A](itr: Iterator[A]) = new {
| def takeWhileIncl(p: A => Boolean)
| = itr span p fold (_ ++ _.toStream.headOption)
| }
Iterator_Is_TWI: [A](itr: Iterator[A])java.lang.Object{def takeWhileIncl(p: A => Boolean): Iterator[A]}
Here it is at work:
scala> List(1, 2, 3, 4, 5).iterator takeWhileIncl (_ < 4)
res0: Iterator[Int] = non-empty iterator
scala> res0.toList
res1: List[Int] = List(1, 2, 3, 4)
You can roll your own fold over a pair like this:
scala> implicit def Pair_Is_Foldable[A, B](pair: (A, B)) = new {
| def fold[C](f: (A, B) => C): C = f.tupled(pair)
| }
Pair_Is_Foldable: [A, B](pair: (A, B))java.lang.Object{def fold[C](f: (A, B) => C): C}
class IteratorExtension[T](i : Iterator[T]) {
def takeWhileInclusive(predicate:(T) => Boolean) = new Iterator[T] {
val it = i
var isLastRead = false
def hasNext = it.hasNext && !isLastRead
def next = {
val res = it.next
isLastRead = !predicate(res)
res
}
}
}
And there's an error in your implicit. Here it is fixed:
object ImplicitIterator {
implicit def extendIterator[T](i : Iterator[T]) = new IteratorExtension(i)
}
scala> List(0,1,2,3,4,5,6,7).toStream.filter (_ < 6).take(2)
res8: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> res8.toList
res9: List[Int] = List(0, 1)
After your update:
scala> def timeConsumeDummy (n: Int): Int = {
| println ("Time flies like an arrow ...")
| n }
timeConsumeDummy: (n: Int)Int
scala> List(0,1,2,3,4,5,6,7).toStream.filter (x => timeConsumeDummy (x) < 6)
Time flies like an arrow ...
res14: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> res14.take (4).toList
Time flies like an arrow ...
Time flies like an arrow ...
Time flies like an arrow ...
res15: List[Int] = List(0, 1, 2, 3)
timeConsumeDummy is called 4 times. Am I missing something?