Efficiency of getOrElse vs match - scala

I've been told that the following pattern is more efficient than the simple match paradigm. I'm unsure if this is true, and I can't find any sources to corroborate.
Is using
values.headOption.map { x }.getOrElse(y)
more efficient than
values.headOption match { /* case statements */ }

Non-rigorous, non-scientific, meaningless micro-benchmark:
def measureTime[U](repeats: Long)(block: => U): Double = {
val start = System.currentTimeMillis
var iteration = 0
while (iteration < repeats) {
iteration += 1
block
}
val end = System.currentTimeMillis
(end - start).toDouble / repeats
}
val n: Long = 2000000000L
val emptyTime = measureTime(n) {
/* do nothing */
}
val mapTime = {
val list = List(1, 2)
var sum = 0L
val mapTime = measureTime(n) {
sum += list.headOption.map(x => x * 42).getOrElse(0)
}
assert(sum == 42 * n)
mapTime
}
val matchTime = {
val list = List(1, 2)
var sum = 0L
val t = measureTime(n) {
sum += (list.headOption match {
case Some(x) => x * 42
case None => 0
})
}
assert(sum == 42 * n)
t
}
println("empty: " + emptyTime)
println("Map : " + mapTime)
println("match: " + matchTime)
println("map-empty: " + (mapTime - emptyTime))
println("match-empty: " + (matchTime - emptyTime))
println("(map / match): " + (mapTime / matchTime))
println("((mp - e) / (mt - e)): " + ((mapTime - emptyTime) / (matchTime - emptyTime)))
Output:
empty: 1.2675E-6
Map : 1.10745E-5
match: 1.65855E-5
map-empty: 9.807000000000001E-6
match-empty: 1.5317999999999998E-5
(map / match): 0.6677218051912817
((mp - e) / (mt - e)): 0.6402271837054447
The map + getOrElse version seemed to be 35% faster (or: the garbage collector kicked in at the wrong moment, and messed up measurements for the contrahent).
There is no reason why getOrElse shouldn't be faster, maybe it can spare some cycles here and there because it doesn't need to call Some.unapply and None.unapply and then construct additional Options to signal whether a match succeeded. But: I don't see any significant order-of-magnitude-difference. Use what looks more readable. Your software project will most likely not fail because of a 30% slower O(1) operation, you can use your time and energy better if you optimize those parts of the code that actually matter.

Theoretically the best version is
values.headOption.fold(y)(x)
because it only tests the state of the option once and does not create an intermediate Some() value (though this may well be optimised out by the compiler).
Using fold also adds some type checking which is not there with getOrElse, though this is sometimes a curse rather than a blessing!

Related

Is it faster to create a new Map or clear it and use again?

I need to use many Maps in my project so I wonder which way is more efficient:
val map = mutable.Map[Int, Int] = mutable.Map.empty
for (_ <- 0 until big_number)
{
// do something with map
map.clear()
}
or
for (_ <- 0 until big_number)
{
val map = mutable.Map[Int, Int] = mutable.Map.empty
// do something with map
}
to use in terms of time and memory?
Well, my formal answer would always be depends. As you need to benchmark your own scenario, and see what fits better for your scenario. I'll provide an example how you can try benchmarking your own code. Let's start with writing a measuring method:
def measure(name: String, f: () => Unit): Unit = {
val l = System.currentTimeMillis()
println(name + ": " + (System.currentTimeMillis() - l))
f()
println(name + ": " + (System.currentTimeMillis() - l))
}
Let's assume that in each iteration we need to insert into the map one key-value pair, and then to print it:
Await.result(Future.sequence(Seq(Future {
measure("inner", () => {
for (i <- 0 until 10) {
val map2 = mutable.Map.empty[Int, Int]
map2(i) = i
println(map2)
}
})
},
Future {
measure("outer", () => {
val map1 = mutable.Map.empty[Int, Int]
for (i <- 0 until 10) {
map1(i) = i
println(map1)
map1.clear()
}
})
})), 10.seconds)
The output in this case, is almost always equal between the inner and the outer. Please note that in this case I run the two options in parallel, as if I wouldn't the first one always takes significantly more time, no matter which one of then is first.
Therefore, we can conclude, that in this case they are almost the same.
But, if for example I add an immutable option:
Future {
measure("immutable", () => {
for (i <- 0 until 10) {
val map1 = Map[Int, Int](i -> i)
println(map1)
}
})
}
it always ends up first. This makes sense because immutable collections are much more performant than the mutables.
For better performance tests you probably need to use some third parties, such as scalameter, or others that exists.

How to efficiently combine future results as a future

I have a lot of calculations contributing to a final result, with no restrictions on the order of the contributions. Seems like Futures should be able to speed this up, and they do, but not in the way I thought they would. Here is code comparing performance of a very silly way to divide integers:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
object scale_me_up {
def main(args: Array[String]) {
val M = 500 * 1000
val N = 5
Thread.sleep(3210) // let launcher settle down
for (it <- 0 until 15) {
val method = it % 3
val start = System.currentTimeMillis()
val result = divide(M, N, method)
val elapsed = System.currentTimeMillis() - start
assert(result == M / N)
if (it >= 6) {
val methods = Array("ordinary", "fast parallel", "nice parallel")
val name = methods(method)
println(f"$name%15s: $elapsed ms")
}
}
}
def is_multiple_of(m: Int, n: Int): Boolean = {
val result = !(1 until n).map(_ + (m / n) * n).toSet.contains(m)
assert(result == (m % n == 0)) // yes, a less crazy implementation exists
result
}
def divide(m: Int, n: Int, method: Int): Int = {
method match {
case 0 =>
(1 to m).count(is_multiple_of(_, n))
case 1 =>
(1 to m)
.map { x =>
Future { is_multiple_of(x, n) }
}
.count(Await.result(_, Duration.Inf))
case 2 =>
Await.result(divide_futuristically(m, n), Duration.Inf)
}
}
def divide_futuristically(m: Int, n: Int): Future[Int] = {
val futures = (1 to m).map { x =>
Future { is_multiple_of(x, n) }
}
Future.foldLeft(futures)(0) { (count, flag) =>
{ if (flag) { count + 1 } else { count } }
}
/* much worse performing alternative:
Future.sequence(futures).map(_.count(identity))
*/
}
}
When I run this, the parallel case 1 is somewhat faster than the ordinary case 0 code (hurray), but case 2 takes twice as long. Of course, it depends on the system and whether enough work needs to be done per future (which here grows with the denominator N) to offset the concurrency overhead. [PS] As expected, decreasing N gives case 0 the lead and increasing N enough makes both case 1 and case 2 about twice as fast as case 0 on my two core CPU.
I'm lead to believe that divide_futuristically is a better way to express this kind of calculation: returning a future with the combined result. Blocking is just something we need here to measure performance. But in reality, the more blocking, the faster everyone is finished. What am I doing wrong? Several alternatives to summarize the futures (like sequence) all take the same penalty.
[PPS] this was with Scala 2.12 running on Java 11 on a 2 core CPU. With Java 12 on a 6 core CPU, the difference is much less significant (though the alternative with sequence still drags its feet). With Scala 2.13, the difference is even less and as you increase the amount of work per iteration, divide_futuristically starts outperforming the competition. The future is here at last...
Seems you did everything right. I've tried by myself different approaches even .par but got the same or worse result.
I've dived into Future.foldLeft and tried to analyze what caused the latency:
/** A non-blocking, asynchronous left fold over the specified futures,
* with the start value of the given zero.
* The fold is performed asynchronously in left-to-right order as the futures become completed.
* The result will be the first failure of any of the futures, or any failure in the actual fold,
* or the result of the fold.
*
* Example:
* {{{
* val futureSum = Future.foldLeft(futures)(0)(_ + _)
* }}}
*
* #tparam T the type of the value of the input Futures
* #tparam R the type of the value of the returned `Future`
* #param futures the `scala.collection.immutable.Iterable` of Futures to be folded
* #param zero the start value of the fold
* #param op the fold operation to be applied to the zero and futures
* #return the `Future` holding the result of the fold
*/
def foldLeft[T, R](futures: scala.collection.immutable.Iterable[Future[T]])(zero: R)(op: (R, T) => R)(implicit executor: ExecutionContext): Future[R] =
foldNext(futures.iterator, zero, op)
private[this] def foldNext[T, R](i: Iterator[Future[T]], prevValue: R, op: (R, T) => R)(implicit executor: ExecutionContext): Future[R] =
if (!i.hasNext) successful(prevValue)
else i.next().flatMap { value => foldNext(i, op(prevValue, value), op) }
and this part:
else i.next().flatMap { value => foldNext(i, op(prevValue, value), op) }
.flatMap produces a new Future which is submitted to executor. In other words, every
{ (count, flag) =>
{ if (flag) { count + 1 } else { count } }
}
is executed as new Future.
I suppose this part causes the experimentally proved latency.

Is this correct use of .par in Scala?

Below code computes a distance metric between two Users as specified by case class :
case class User(name: String, features: Vector[Double])
val ul = for (a <- 1 to 100) yield (User("a", Vector(1, 2, 4)))
var count = 0;
def distance(userA: User, userB: User) = {
val subElements = (userA.features zip userB.features) map {
m => (m._1 - m._2) * (m._1 - m._2)
}
val summed = subElements.sum
val sqRoot = Math.sqrt(summed)
count += 1;
println("count is " + count)
((userA.name, userB.name), sqRoot)
}
val distances = ul.par.map(m => ul.map(m2 => {
(distance(m, m2))
})).toList.flatten
val sortedDistances = distances.groupBy(_._1._1).map(m => (m._1, m._2.sortBy(s => s._2)))
println(sortedDistances.get("a").get.size);
This performs a Cartesian product of comparison 100 users : 10000 comparisons. I'm counting each comparison, represented bu var count
Often the count value will be less than 10000, but the amount of items iterated over is always 10000. Is reason for this that as par spawns multiple threads some of these will finish before the println statement is executed. However all will finish within par code block - before distances.groupBy(_._1._1).map(m => (m._1, m._2.sortBy(s => s._2))) is evaluated.
In your example you have a single un-synchronized variable that you're mutating from multiple threads like you said. This means that each thread, at any time, may have a stale copy of count, so when they increment it they will squash any other writes that have occurred, resulting in a count less than it should be.
You can solve this using the synchronized function,
...
val subElements = (userA.features zip userB.features) map {
m => (m._1 - m._2) * (m._1 - m._2)
}
val summed = subElements.sum
val sqRoot = Math.sqrt(summed)
this.synchronized {
count += 1;
}
println("count is " + count)
((userA.name, userB.name), sqRoot)
...
Using 'this.synchronized' will use the containing object as the lock object. For more information on Scala synchronization I suggest reading Twitter's Scala School.

Union-Find (or Disjoint Set) data structure in Scala

I am looking for an existing implementation of a union-find or disjoint set data structure in Scala before I attempt to roll my own as the optimisations look somewhat complicated.
I mean this kind of thing - where the two operations union and find are optimised.
Does anybody know of anything existing? I've obviously tried googling around.
I had written one for myself some time back which I believe performs decently. Unlike other implementations, the find is O(1) and union is O(log(n)). If you have a lot more union operations than find, then this might not be very useful. I hope you find it useful:
package week2
import scala.collection.immutable.HashSet
import scala.collection.immutable.HashMap
/**
* Union Find implementaion.
* Find is O(1)
* Union is O(log(n))
* Implementation is using a HashTable. Each wrap has a set which maintains the elements in that wrap.
* When 2 wraps are union, then both the set's are clubbed. O(log(n)) operation
* A HashMap is also maintained to find the Wrap associated with each node. O(log(n)) operation in mainitaining it.
*
* If the input array is null at any index, it is ignored
*/
class UnionFind[T](all: Array[T]) {
private var dataStruc = new HashMap[T, Wrap]
for (a <- all if (a != null))
dataStruc = dataStruc + (a -> new Wrap(a))
var timeU = 0L
var timeF = 0L
/**
* The number of Unions
*/
private var size = dataStruc.size
/**
* Unions the set containing a and b
*/
def union(a: T, b: T): Wrap = {
val st = System.currentTimeMillis()
val first: Wrap = dataStruc.get(a).get
val second: Wrap = dataStruc.get(b).get
if (first.contains(b) || second.contains(a))
first
else {
// below is to merge smaller with bigger rather than other way around
val firstIsBig = (first.set.size > second.set.size)
val ans = if (firstIsBig) {
first.set = first.set ++ second.set
second.set.foreach(a => {
dataStruc = dataStruc - a
dataStruc = dataStruc + (a -> first)
})
first
} else {
second.set = second.set ++ first.set
first.set.foreach(a => {
dataStruc = dataStruc - a
dataStruc = dataStruc + (a -> second)
})
second
}
timeU = timeU + (System.currentTimeMillis() - st)
size = size - 1
ans
}
}
/**
* true if they are in same set. false if not
*/
def find(a: T, b: T): Boolean = {
val st = System.currentTimeMillis()
val ans = dataStruc.get(a).get.contains(b)
timeF = timeF + (System.currentTimeMillis() - st)
ans
}
def sizeUnion: Int = size
class Wrap(e: T) {
var set = new HashSet[T]
set = set + e
def add(elem: T) {
set = set + elem
}
def contains(elem: T): Boolean = set.contains(elem)
}
}
Here is a simple, short and somewhat efficient mutable implementation of UnionFind:
import scala.collection.mutable
class UnionFind[T]:
private val map = new mutable.HashMap[T, mutable.HashSet[T]]
private var size = 0
def distinct = size
def addFresh(a: T): Unit =
assert(!map.contains(a))
val set = new mutable.HashSet[T]
set += a
map(a) = set
size += 1
def setEqual(a: T, b: T): Unit =
val ma = map(a)
val mb = map(b)
if !ma.contains(b) then
// redirect the elements of the smaller set to the bigger set
if ma.size > mb.size
then
ma ++= mb
mb.foreach { x => map(x) = ma }
else
mb ++= ma
ma.foreach { x => map(x) = mb }
size = size - 1
def isEqual(a: T, b: T): Boolean =
map(a).contains(b)
Remarks:
An immutable implementation of UnionFind can be useful when rollback or backtracking or proofs are necessary
An mutable implementation can avoid garbage collection for speedup
One could also consider a persistent datastructure -- works like an immutable implementation, but is using internally some mutable state for speed

Palindromes using Scala

I came across this problem from CodeChef. The problem states the following:
A positive integer is called a palindrome if its representation in the
decimal system is the same when read from left to right and from right
to left. For a given positive integer K of not more than 1000000
digits, write the value of the smallest palindrome larger than K to
output.
I can define a isPalindrome method as follows:
def isPalindrome(someNumber:String):Boolean = someNumber.reverse.mkString == someNumber
The problem that I am facing is how do I loop from the initial given number and break and return the first palindrome when the integer satisfies the isPalindrome method? Also, is there a better(efficient) way to write the isPalindrome method?
It will be great to get some guidance here
If you have a number like 123xxx you know, that either xxx has to be below 321 - then the next palindrom is 123321.
Or xxx is above, then the 3 can't be kept, and 124421 has to be the next one.
Here is some code without guarantees, not very elegant, but the case of (multiple) Nines in the middle is a bit hairy (19992):
object Palindrome extends App {
def nextPalindrome (inNumber: String): String = {
val len = inNumber.length ()
if (len == 1 && inNumber (0) != '9')
"" + (inNumber.toInt + 1) else {
val head = inNumber.substring (0, len/2)
val tail = inNumber.reverse.substring (0, len/2)
val h = if (head.length > 0) BigInt (head) else BigInt (0)
val t = if (tail.length > 0) BigInt (tail) else BigInt (0)
if (t < h) {
if (len % 2 == 0) head + (head.reverse)
else inNumber.substring (0, len/2 + 1) + (head.reverse)
} else {
if (len % 2 == 1) {
val s2 = inNumber.substring (0, len/2 + 1) // 4=> 4
val h2 = BigInt (s2) + 1 // 5
nextPalindrome (h2 + (List.fill (len/2) ('0').mkString)) // 5 + ""
} else {
val h = BigInt (head) + 1
h.toString + (h.toString.reverse)
}
}
}
}
def check (in: String, expected: String) = {
if (nextPalindrome (in) == expected)
println ("ok: " + in) else
println (" - fail: " + nextPalindrome (in) + " != " + expected + " for: " + in)
}
//
val nums = List (("12345", "12421"), // f
("123456", "124421"),
("54321", "54345"),
("654321", "654456"),
("19992", "20002"),
("29991", "29992"),
("999", "1001"),
("31", "33"),
("13", "22"),
("9", "11"),
("99", "101"),
("131", "141"),
("3", "4")
)
nums.foreach (n => check (n._1, n._2))
println (nextPalindrome ("123456678901234564579898989891254392051039410809512345667890123456457989898989125439205103941080951234566789012345645798989898912543920510394108095"))
}
I guess it will handle the case of a one-million-digit-Int too.
Doing reverse is not the greatest idea. It's better to start at the beginning and end of the string and iterate and compare element by element. You're wasting time copying the entire String and reversing it even in cases where the first and last element don't match. On something with a million digits, that's going to be a huge waste.
This is a few orders of magnitude faster than reverse for bigger numbers:
def isPalindrome2(someNumber:String):Boolean = {
val len = someNumber.length;
for(i <- 0 until len/2) {
if(someNumber(i) != someNumber(len-i-1)) return false;
}
return true;
}
There's probably even a faster method, based on mirroring the first half of the string. I'll see if I can get that now...
update So this should find the next palindrome in almost constant time. No loops. I just sort of scratched it out, so I'm sure it can be cleaned up.
def nextPalindrome(someNumber:String):String = {
val len = someNumber.length;
if(len==1) return "11";
val half = scala.math.floor(len/2).toInt;
var firstHalf = someNumber.substring(0,half);
var secondHalf = if(len % 2 == 1) {
someNumber.substring(half+1,len);
} else {
someNumber.substring(half,len);
}
if(BigInt(secondHalf) > BigInt(firstHalf.reverse)) {
if(len % 2 == 1) {
firstHalf += someNumber.substring(half, half+1);
firstHalf = (BigInt(firstHalf)+1).toString;
firstHalf + firstHalf.substring(0,firstHalf.length-1).reverse
} else {
firstHalf = (BigInt(firstHalf)+1).toString;
firstHalf + firstHalf.reverse;
}
} else {
if(len % 2 == 1) {
firstHalf + someNumber.substring(half,half+1) + firstHalf.reverse;
} else {
firstHalf + firstHalf.reverse;
}
}
}
This is most general and clear solution that I can achieve:
Edit: got rid of BigInt's, now it takes less than a second to calculate million digits number.
def incStr(num: String) = { // helper method to increment number as String
val idx = num.lastIndexWhere('9'!=, num.length-1)
num.take(idx) + (num.charAt(idx)+1).toChar + "0"*(num.length-idx-1)
}
def palindromeAfter(num: String) = {
val lengthIsOdd = num.length % 2
val halfLength = num.length / 2 + lengthIsOdd
val leftHalf = num.take(halfLength) // first half of number (including central digit)
val rightHalf = num.drop(halfLength - lengthIsOdd) // second half of number (also including central digit)
val (newLeftHalf, newLengthIsOdd) = // we need to calculate first half of new palindrome and whether it's length is odd or even
if (rightHalf.compareTo(leftHalf.reverse) < 0) // simplest case - input number is like 123xxx and xxx < 321
(leftHalf, lengthIsOdd)
else if (leftHalf forall ('9'==)) // special case - if number is like '999...', then next palindrome will be like '10...01' and one digit longer
("1" + "0" * (halfLength - lengthIsOdd), 1 - lengthIsOdd)
else // other cases - increment first half of input number before making palindrome
(incStr(leftHalf), lengthIsOdd)
// now we can create palindrome itself
newLeftHalf + newLeftHalf.dropRight(newLengthIsOdd).reverse
}
According to your range-less proposal: the same thing but using Stream:
def isPalindrome(n:Int):Boolean = n.toString.reverse == n.toString
def ints(n: Int): Stream[Int] = Stream.cons(n, ints(n+1))
val result = ints(100).find(isPalindrome)
And with iterator (and different call method, the same thing you can do with Stream, actually):
val result = Iterator.from(100).find(isPalindrome)
But as #user unknown stated, it is direct bruteforce and not practical with large numbers.
To check if List of Any Type is palindrome using slice, without any Loops
def palindrome[T](list: List[T]): Boolean = {
if(list.length==1 || list.length==0 ){
false
}else {
val leftSlice: List[T] = list.slice(0, list.length / 2)
var rightSlice :List[T]=Nil
if (list.length % 2 != 0) {
rightSlice = list.slice(list.length / 2 + 1, list.length).reverse
} else {
rightSlice = list.slice(list.length / 2, list.length).reverse
}
leftSlice ==rightSlice
}
}
Though the simplest solution would be
def palindrome[T](list: List[T]): Boolean = {
list == list.reverse
}
You can simply use the find method on collections to find the first element matching a given predicate:
def isPalindrome(n:Int):Boolean = n.toString.reverse == n.toString
val (start, end) = (100, 1000)
val result: Option[Int] = (start to end).find(isPalindrome)
result foreach println
>Some(101)
Solution to verify if a String is a palindrome
This solution doesn't reverse the String. However I am not sure that it will be faster.
def isPalindrome(s:String):Boolean = {
s.isEmpty ||
((s.last == s.head) && isPalindrome(s.tail.dropRight(1)))
}
Solution to find next palindrome given a String
This solution is not the best for scala (pretty the same as Java solution) but it only deals with Strings and is suitable for large numbers
You just have to mirror the first half of the number you want, check if it is higher than the begin number, otherwise, increase by one the last digit of the first half and mirror it again.
First, a function to increment the string representation of an int by 1:
def incrementString(s:String):String = {
if(s.nonEmpty){
if (s.last == '9')
incrementString(s.dropRight(1))+'0'
else
s.dropRight(1) + (s.last.toInt +1).toChar
}else
"1"
}
Then, a function to compare to string representation of ints: (the function 'compare' doesn't work for that case)
/* is less that 0 if x<y, is more than 0 if x<y, is equal to 0 if x==y */
def compareInts(x:String, y:String):Int = {
if (x.length !=y.length)
(x.length).compare(y.length)
else
x.compare(y)
}
Now the function to compute the next palindrome:
def nextPalindrome(origin_ :String):String = {
/*Comment if you want to have a strictly bigger number, even if you already have a palindrome as input */
val origin = origin_
/* Uncomment if you want to have a strictly bigger number, even if you already have a palindrome as input */
//val origin = incrementString(origin_)
val length = origin.length
if(length % 2 == 0){
val (first, last) = origin.splitAt(length/2);
val reversed = first.reverse
if (compareInts(reversed,last) > -1)
first ++ reversed
else{
val firstIncr = incrementString(first)
firstIncr ++ firstIncr.reverse
}
} else {
val (first,last) = origin.splitAt((length+1)/2)
val reversed = first.dropRight(1).reverse
if (compareInts(reversed,last) != -1)
first ++ reversed
else{
val firstIncr = incrementString(first)
firstIncr ++ firstIncr.dropRight(1).reverse
}
}
}
You could try something like this, I'm using basic recursive:
object Palindromo {
def main(args: Array[String]): Unit = {
var s: String = "arara"
println(verificaPalindromo(s))
}
def verificaPalindromo(s: String): String = {
if (s.length == 0 || s.length == 1)
"true"
else
if (s.charAt(0).toLower == s.charAt(s.length - 1).toLower)
verificaPalindromo(s.substring(1, s.length - 1))
else
"false"
}
}
#tailrec
def palindrome(str: String, start: Int, end: Int): Boolean = {
if (start == end)
true
else if (str(start) != str(end))
false
else
pali(str, start + 1, end - 1)
}
println(palindrome("arora", 0, str.length - 1))