I want to compose numbers in base 3, represented as fixed-length Array[Byte].
Here are some attempts :
val byteBoard = Array.fill(9)(1.toByte)
val cache: Seq[(Int, Int)] = (0 to 8).map(i => (i, math.pow(3d, i.toDouble).toInt))
#Benchmark
def composePow(): Unit = {
val _ = (0 to 8).foldLeft(0) { case (acc, i) => acc + math.pow(3d, i.toDouble).toInt * byteBoard(i) }
}
#Benchmark
def composeCachedPowWithFold(): Unit = {
val _ = cache.foldLeft(0) { case (acc, (i, k)) => acc + k * byteBoard(i).toInt }
}
#Benchmark
def composeCachedPowWithForeach(): Unit = {
var acc = 0
cache.foreach { case (i, k) => acc = acc + k * byteBoard(i)}
}
#Benchmark
def composeUnrolled(): Unit = {
val _ = byteBoard(0) +
3 * byteBoard(1) +
3 * 3 * byteBoard(2) +
3 * 3 * 3 * byteBoard(3) +
3 * 3 * 3 * 3 * byteBoard(4) +
3 * 3 * 3 * 3 * 3 * byteBoard(5) +
3 * 3 * 3 * 3 * 3 * 3 * byteBoard(6) +
3 * 3 * 3 * 3 * 3 * 3 * 3 * byteBoard(7) +
3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * byteBoard(8)
}
Can you confirm the following conclusion :
composePow: boxing + type conversions to use math.pow
composeCachedPowWithFold: boxing because fold is a parameterized method
composeCachedPowWithForeach: no boxing, less idiomatic Scala (local mutation)
composeUnrolled: no boxing
And explain why 4. is way faster than 3. ?
PS: Here are the result of the JMH Benchmark
[info] IndexBenchmark.composeCachedPowWithFold thrpt 10 7180844,823 ± 1015310,847 ops/s
[info] IndexBenchmark.composeCachedPowWithForeach thrpt 10 14234192,613 ± 1449571,042 ops/s
[info] IndexBenchmark.composePow thrpt 10 1515312,179 ± 34892,700 ops/s
[info] IndexBenchmark.composeUnrolled thrpt 10 152297653,110 ± 2237446,053 ops/s
I mostly agree with your analysis of the cases 1,2,4, but the third variant is really funny!
I agree with you about the first two versions: foldLeft is not #specialized, so, yes, there is some boxing-unboxing. But math.pow is evil for integer arithmetic anyway, and all those conversions only incur additional overhead.
Now let's take a closer look at the third variant. It is so slow because you are constructing a closure over mutable state. Look at output of scala -print. This is what your method is rewritten into:
private def composeCachedPowWithForeach(): Unit = {
var acc: runtime.IntRef = scala.runtime.IntRef.create(0);
anon$1.this.cache().foreach({
((x0$3: Tuple2) =>
anon$1.this.
$anonfun$composeCachedPowWithForeach$1(acc, x0$3))
})
};
And here is the function used in foreach:
final <artifact> private[this] def
$anonfun$composeCachedPowWithForeach$1(
acc$1: runtime.IntRef, x0$3: Tuple2
): Unit = {
case <synthetic> val x1: Tuple2 = x0$3;
case4(){
if (x1.ne(null))
{
val i: Int = x1._1$mcI$sp();
val k: Int = x1._2$mcI$sp();
matchEnd3({
acc$1.elem = acc$1.elem.+(k.*(anon$1.this.byteBoard().apply(i)));
scala.runtime.BoxedUnit.UNIT
})
}
else
case5()
};
case5(){
matchEnd3(throw new MatchError(x1))
};
matchEnd3(x: scala.runtime.BoxedUnit){
()
}
};
You see that there is apparently lot of code generated by pattern matching. I'm not sure whether it contributes much to the overhead. What I personally find much more interesting is the runtime.IntRef part. This is an object that keeps a mutable variable that corresponds to var acc in your code. Even though it looks like a simple local variable in the code, it must be referenced from the closure somehow, and is therefore wrapped into an object, and evicted to the heap. I assume that accessing this mutable variable on the heap causes most of the overhead.
In contrast to that, if the byteBoard were passed as an argument, then nothing in the fourth variant would ever leave function's stack frame:
private def composeUnrolled(): Unit = {
val _: Int =
anon$1.this.byteBoard().apply(0).+
(3.*(anon$1.this.byteBoard().apply(1))).+
(9.*(anon$1.this.byteBoard().apply(2))).+
(27.*(anon$1.this.byteBoard().apply(3))).+
(81.*(anon$1.this.byteBoard().apply(4))).+
(243.*(anon$1.this.byteBoard().apply(5))).+
(729.*(anon$1.this.byteBoard().apply(6))).+
(2187.*(anon$1.this.byteBoard().apply(7))).+
(6561.*(anon$1.this.byteBoard().apply(8)));
()
};
There is essentially no control flow to speak of, not really any method invocations (the apply is for accessing array elements, that doesn't count), and overall it's just one very simple arithmetic operation, that might even fit into registers of your processor. That's why it is so fast.
While you are at it, you might want to benchmark these two methods:
def ternaryToInt5(bytes: Array[Byte]): Int = {
var acc = 0
val n = bytes.size
var i = n - 1
while (i >= 0) {
acc *= 3
acc += bytes(i)
i -= 1
}
acc
}
def ternaryToInt6(bytes: Array[Byte]): Int = {
bytes(0) +
3 * (bytes(1) +
3 * (bytes(2) +
3 * (bytes(3) +
3 * (bytes(4) +
3 * (bytes(5) +
3 * (bytes(6) +
3 * (bytes(7) +
3 * (bytes(8)))))))))
}
Also, if you are working with byte arrays frequently, you might find this syntactic sugar useful.
Related
The goal is to code this sum into a recursive function.
Sum
I have tried so far to code it like this.
def under(u: Int): Int = {
var i1 = u/2
var i = i1+1
if ( u/2 == 1 ) then u + 1 - 2 * 1
else (u + 1 - 2 * i) + under(u-1)
}
It seems like i am running into an issue with the recursive part but i am not able to figure out what goes wrong.
In theory, under(5) should produce 10.
Your logic is wrong. It should iterate (whether through loop, recursion or collection is irrelevant) from i=1 to i=n/2. But using n and current i as they are.
(1 to (n/2)).map(i => n + 1 - 2 * i).sum
You are (more or less) running computations from i=1 to i=n (or rather n down to 1) but instead of n you use i/2 and instead of i you use i/2+1. (sum from i=1 to i=n of (n/2 + 1 - 2 * i)).
// actually what you do is more like (1 to n).toList.reverse
// rather than (1 to n)
(1 to n).map(i => i/2 + 1 - 2 * (i/2 + 1)).sum
It's a different formula. It has twice the elements to sum, and a part of each of them is changing instead of being constant while another part has a wrong value.
To implement the same logic with recursion you would have to do something like:
// as one function with default args
// tail recursive version
def under(n: Int, i: Int = 1, sum: Int = 0): Int =
if (i > n/2) sum
else under(n, i+1, sum + (n + 2 - 2 * i))
// not tail recursive
def under(n: Int, i: Int = 1): Int =
if (i > n/2) 0
else (n + 2 - 2 * i) + under(n, i + 1)
// with nested functions without default args
def under(n: Int): Int = {
// tail recursive
def helper(i: Int, sum: Int): Int =
if (i > n/2) sum
else helper(i + 1, sum + (n + 2 - 2 * i))
helper(1, 0)
}
def under(n: Int): Int = {
// not tail recursive
def helper(i: Int): Int =
if (i > n/2) 0
else (n + 2 - 2 * i) + helper(i + 1)
helper(1)
}
As a side note: there is no need to use any iteration / recursion at all. Here is an explicit formula:
def g(n: Int) = n / 2 * (n - n / 2)
that gives the same results as
def h(n: Int) = (1 to n / 2).map(i => n + 1 - 2 * i).sum
Both assume that you want floored n / 2 in the case that n is odd, i.e. both of the functions above behave the same as
def j(n: Int) = (math.ceil(n / 2.0) * math.floor(n / 2.0)).toInt
(at least until rounding errors kick in).
I have a code snippet in java that loops through the file byte by byte and blanks out byte at 3rd position on every 20 bytes. This is done using for each loop.
logic:
for(byte b: raw){
if (pos is 3) b = 32;
if (i > 20) i = 0;
i++
}
Since I am learning scala, I would like to know if there is a better way of looping byte by byte in scala.
I have read into byte array as below in scala:
val result = IOUtils.toByteArray(new FileInputStream (new File(fileDir)))
Thanks.
Here is a diametrically opposite solution to that of Tzach Zohar:
def parallel(ba: Array[Byte], blockSize: Int = 2048): Unit = {
val n = ba.size
val numJobs = (n + blockSize - 1) / blockSize
(0 until numJobs).par.foreach { i =>
val startIdx = i * blockSize
val endIdx = n min ((i + 1) * blockSize)
var j = startIdx + ((3 - startIdx) % 20 + 20) % 20
while (j < endIdx) {
ba(j) = 32
j += 20
}
}
}
You see a lot of mutable variables, scary imperative while-loops, and some strange tricks with modular arithmetic. That's actually not idiomatic Scala at all. But the interesting thing about this solution is that it processes blocks of the byte array in parallel. I've compared the time needed by this solution to your naive solution, using various block sizes:
Naive: 38.196
Parallel( 16): 11.676000
Parallel( 32): 7.260000
Parallel( 64): 4.311000
Parallel( 128): 2.757000
Parallel( 256): 2.473000
Parallel( 512): 2.462000
Parallel(1024): 2.435000
Parallel(2048): 2.444000
Parallel(4096): 2.416000
Parallel(8192): 2.420000
At least in this not very thorough microbenchmark (1000 repetitions on 10MB array), the more-or-less efficiently implemented parallel version outperformed the for-loop in your question by factor 15x.
The question is now: What do you mean by "better"?
My proposal was slightly faster than your naive approach
#TzachZohar's functional solution could generalize better should the
code be moved on a cluster like Apache Spark.
I would usually prefer something closer to #TzachZohar's solution, because it's easier to read.
So, it depends on what you are optimizing for: performance? generality? readability? maintainability? For each notion of "better", you could get a different answer. I've tried to optimize for performance. #TzachZohar optimized for readability and maintainability. That lead to two rather different solutions.
Full code of the microbenchmark, just in case someone is interested:
val array = Array.ofDim[Byte](10000000)
def naive(ba: Array[Byte]): Unit = {
var pos = 0
for (i <- 0 until ba.size) {
if (pos == 3) ba(i) = 32
pos += 1
if (pos == 20) pos = 0
}
}
def parallel(ba: Array[Byte], blockSize: Int): Unit = {
val n = ba.size
val numJobs = (n + blockSize - 1) / blockSize
(0 until numJobs).par.foreach { i =>
val startIdx = i * blockSize
val endIdx = n min ((i + 1) * blockSize)
var j = startIdx + ((3 - startIdx) % 20 + 20) % 20
while (j < endIdx) {
ba(j) = 32
j += 20
}
}
}
def measureTime[U](repeats: Long)(block: => U): Double = {
val start = System.currentTimeMillis
var iteration = 0
while (iteration < repeats) {
iteration += 1
block
}
val end = System.currentTimeMillis
(end - start).toDouble / repeats
}
println("Basic sanity check (did I get the modulo arithmetic right?):")
{
val testArray = Array.ofDim[Byte](50)
naive(testArray)
println(testArray.mkString("[", ",", "]"))
}
{
for (blockSize <- List(3, 7, 13, 16, 17, 32)) {
val testArray = Array.ofDim[Byte](50)
parallel(testArray, blockSize)
println(testArray.mkString("[", ",", "]"))
}
}
val Reps = 1000
val naiveTime = measureTime(Reps)(naive(array))
println("Naive: " + naiveTime)
for (blockSize <- List(16,32,64,128,256,512,1024,2048,4096,8192)) {
val parallelTime = measureTime(Reps)(parallel(array, blockSize))
println("Parallel(%4d): %f".format(blockSize, parallelTime))
}
Here's one way to do this:
val updated = result.grouped(20).flatMap { arr => arr.update(3, 32); arr }
Here's some data:
val data = List(1,1,2,2,3)
I would like to write a function filteredSum which supports the following:
/*1*/ filteredSum(data) // Outputs 0
/*2*/ filteredSum(data, 1) // Outputs 1 + 1 = 2
/*3*/ filteredSum(data, 1, 3) // Outputs 1 + 1 + 3 = 5
/*4*/ filteredSum(data, None) // Outputs 1 + 1 + 2 + 2 + 3 = 9
There are a couple close misses; for instance * notation supports the first three calls:
def filteredSum(data: Seq[Int], filterValues: Int*): Int = {
data.intersect(filterValues).sum
}
And options give you the fourth:
def filteredSum(data: Seq[Int], filterValues: Option[Seq[Int]]) : Int = {
if(filterValues.nonempty) data.intersect(filterValues.get).sum
else data.sum
}
But with this implementation the first three calls look a lot clunkier: filteredSum(data, Some(Seq(1))), for instance.
Any other ideas? (Obviously my actual use case is much more complicated than just adding some numbers together, so I'm not looking for answers that are too closely tied to the intersect or sum functions.)
Make two functions:
def filteredSum(data: Seq[Int], filterValues: Int*): Int =
data.filter(filterValues.toSet).sum
def filteredSum(data: Seq[Int], all: Boolean) : Int =
if(all) data.sum else 0
Is it possible to modify the precedence of any self-defined operators?
For example, I implement elementary arithmetic with totally self-defined operators.
case class Memory(name:String){
var num:Num = null
def <<=(x:Num): Unit = {
println(s"assign ${x.value}")
this.num = x
}
}
case class Num(var value:Int) {
def +|+(x:Num) = {
println("%d + %d = %d".format( value,x.value,x.value + value ))
Num(value + x.value)
}
def *|*(x:Num) = {
println("%d * %d = %d".format( value,x.value,x.value * value ))
Num(value * x.value)
}
}
val m1 = Memory("R")
val a = Num(1)
val b = Num(3)
val c = Num(9)
val d = Num(12)
m1 <<= a *|* b +|+ c +|+ d *|* a
println("final m1 ",m1.num.value)
The results are
1 * 3 = 3
3 + 9 = 12
12 * 1 = 12
12 + 12 = 24
assign 24
(final m1 ,24)
Apparently the precedence is correct. I want *|* be the same precedence as * and +|+ the same as +, <<= is equivalent as = as well. How scala figure it out?
Answering the question about modyfing operator precedence - to change it you basically just have to change the first character of your custom operator - this is how scala figures out precedense for infix operators, by just looking at the first character. So if you eg. add an operator *|+:
It will have same precedence as *|*, like with * and *.
"Bigger" precedence than +|+, just like with * and +.
Unfortunately there is no other way to deal with it right now. No custom annotations/weights and so on to avoid making reading code too fuzzy.
The precedence rules are very well summarized here - Operator precedence in Scala.
About your issue though - you get the right results.
*|*, as well as * are left-associative operators and their first character is *, so they have equal precedense.
Your operation:
a *|* b +|+ c +|+ d *|* a
Translates to
a * b + c + d * a, which is 1 * 3 + 9 + 12 * 1.
Applying standard precedence rules - (a*b) + c + (d*a) result is 24.
NOTE: I am on Scala 2.8—can that be a problem?
Why can't I use the fold function the same way as foldLeft or foldRight?
In the Set scaladoc it says that:
The result of folding may only be a supertype of this parallel collection's type parameter T.
But I see no type parameter T in the function signature:
def fold [A1 >: A] (z: A1)(op: (A1, A1) ⇒ A1): A1
What is the difference between the foldLeft-Right and fold, and how do I use the latter?
EDIT: For example how would I write a fold to add all elements in a list? With foldLeft it would be:
val foo = List(1, 2, 3)
foo.foldLeft(0)(_ + _)
// now try fold:
foo.fold(0)(_ + _)
>:7: error: value fold is not a member of List[Int]
foo.fold(0)(_ + _)
^
Short answer:
foldRight associates to the right. I.e. elements will be accumulated in right-to-left order:
List(a,b,c).foldRight(z)(f) = f(a, f(b, f(c, z)))
foldLeft associates to the left. I.e. an accumulator will be initialized and elements will be added to the accumulator in left-to-right order:
List(a,b,c).foldLeft(z)(f) = f(f(f(z, a), b), c)
fold is associative in that the order in which the elements are added together is not defined. I.e. the arguments to fold form a monoid.
fold, contrary to foldRight and foldLeft, does not offer any guarantee about the order in which the elements of the collection will be processed. You'll probably want to use fold, with its more constrained signature, with parallel collections, where the lack of guaranteed processing order helps the parallel collection implements folding in a parallel way. The reason for changing the signature is similar: with the additional constraints, it's easier to make a parallel fold.
You're right about the old version of Scala being a problem. If you look at the scaladoc page for Scala 2.8.1, you'll see no fold defined there (which is consistent with your error message). Apparently, fold was introduced in Scala 2.9.
For your particular example you would code it the same way you would with foldLeft.
val ns = List(1, 2, 3, 4)
val s0 = ns.foldLeft (0) (_+_) //10
val s1 = ns.fold (0) (_+_) //10
assert(s0 == s1)
Agree with other answers. thought of giving a simple illustrative example:
object MyClass {
def main(args: Array[String]) {
val numbers = List(5, 4, 8, 6, 2)
val a = numbers.fold(0) { (z, i) =>
{
println("fold val1 " + z +" val2 " + i)
z + i
}
}
println(a)
val b = numbers.foldLeft(0) { (z, i) =>
println("foldleft val1 " + z +" val2 " + i)
z + i
}
println(b)
val c = numbers.foldRight(0) { (z, i) =>
println("fold right val1 " + z +" val2 " + i)
z + i
}
println(c)
}
}
Result is self explanatory :
fold val1 0 val2 5
fold val1 5 val2 4
fold val1 9 val2 8
fold val1 17 val2 6
fold val1 23 val2 2
25
foldleft val1 0 val2 5
foldleft val1 5 val2 4
foldleft val1 9 val2 8
foldleft val1 17 val2 6
foldleft val1 23 val2 2
25
fold right val1 2 val2 0
fold right val1 6 val2 2
fold right val1 8 val2 8
fold right val1 4 val2 16
fold right val1 5 val2 20
25
There is two way to solve problems, iterative and recursive. Let's understand by a simple example.let's write a function to sum till the given number.
For example if I give input as 5, I should get 15 as output, as mentioned below.
Input: 5
Output: (1+2+3+4+5) = 15
Iterative Solution.
iterate through 1 to 5 and sum each element.
def sumNumber(num: Int): Long = {
var sum=0
for(i <- 1 to num){
sum+=i
}
sum
}
Recursive Solution
break down the bigger problem into smaller problems and solve them.
def sumNumberRec(num:Int, sum:Int=0): Long = {
if(num == 0){
sum
}else{
val newNum = num - 1
val newSum = sum + num
sumNumberRec(newNum, newSum)
}
}
FoldLeft: is a iterative solution
FoldRight: is a recursive solution
I am not sure if they have memoization to improve the complexity.
And so, if you run the foldRight and FoldLeft on the small list, both will give you a result with similar performance.
However, if you will try to run a FoldRight on Long List it might throw a StackOverFlow error (depends on your memory)
Check the following screenshot, where foldLeft ran without error, however foldRight on same list gave OutofMemmory Error.
fold() does parallel processing so does not guarantee the processing order.
where as foldLeft and foldRight process the items in sequentially for left to right (in case of foldLeft) or right to left (in case of foldRight)
Examples of sum the list -
val numList = List(1, 2, 3, 4, 5)
val r1 = numList.par.fold(0)((acc, value) => {
println("adding accumulator=" + acc + ", value=" + value + " => " + (acc + value))
acc + value
})
println("fold(): " + r1)
println("#######################")
/*
* You can see from the output that,
* fold process the elements of parallel collection in parallel
* So it is parallel not linear operation.
*
* adding accumulator=0, value=4 => 4
* adding accumulator=0, value=3 => 3
* adding accumulator=0, value=1 => 1
* adding accumulator=0, value=5 => 5
* adding accumulator=4, value=5 => 9
* adding accumulator=0, value=2 => 2
* adding accumulator=3, value=9 => 12
* adding accumulator=1, value=2 => 3
* adding accumulator=3, value=12 => 15
* fold(): 15
*/
val r2 = numList.par.foldLeft(0)((acc, value) => {
println("adding accumulator=" + acc + ", value=" + value + " => " + (acc + value))
acc + value
})
println("foldLeft(): " + r2)
println("#######################")
/*
* You can see that foldLeft
* picks elements from left to right.
* It means foldLeft does sequence operation
*
* adding accumulator=0, value=1 => 1
* adding accumulator=1, value=2 => 3
* adding accumulator=3, value=3 => 6
* adding accumulator=6, value=4 => 10
* adding accumulator=10, value=5 => 15
* foldLeft(): 15
* #######################
*/
// --> Note in foldRight second arguments is accumulated one.
val r3 = numList.par.foldRight(0)((value, acc) => {
println("adding value=" + value + ", acc=" + acc + " => " + (value + acc))
acc + value
})
println("foldRight(): " + r3)
println("#######################")
/*
* You can see that foldRight
* picks elements from right to left.
* It means foldRight does sequence operation.
*
* adding value=5, acc=0 => 5
* adding value=4, acc=5 => 9
* adding value=3, acc=9 => 12
* adding value=2, acc=12 => 14
* adding value=1, acc=14 => 15
* foldRight(): 15
* #######################
*/