The following Scala code complete in 1.5 minutes while the equivalent code in GO finish in 2.5 minutes.
Up to fib(40) both take 2 sec. The gap appear in fib(50)
I got the impression the GO, being native, should be faster then Scala.
Scala
def fib(n:Int):Long = {
n match {
case 0 => 0
case 1 => 1
case _ => fib(n-1) + fib(n-2)
}
}
GO
func fib(n int) (ret int) {
if n > 1 {
return fib(n-1) + fib(n-2)
}
return n
}
Scala optimization?
Golang limitation?
As "My other car is a cadr" said the question is "how come Scala is faster than GO in this particular microbenchmark?"
Forget the Fibonacci lets say I do have a function that require recursion.
Is Scala superior in recursion situations?
Its probably an internal compiler implementation or even Scala specific optimization.
Please answer just if you know.
Go in loop run 15000000000 in 12 sec
func fib(n int) (two int) {
one := 0
two = 1
for i := 1; i != n; i++ {
one, two = two, (one + two)
}
return
}
For Go, use iteration not recursion. Recursion can be replaced by iteration with an explicit stack. It avoids the overhead of function calls and call stack management. For example, using iteration and increasing n from 50 to 1000 takes almost no time:
package main
import "fmt"
func fib(n int) (f int64) {
if n < 0 {
n = 0
}
a, b := int64(0), int64(1)
for i := 0; i < n; i++ {
f = a
a, b = b, a+b
}
return
}
func main() {
n := 1000
fmt.Println(n, fib(n))
}
Output:
$ time .fib
1000 8261794739546030242
real 0m0.001s
user 0m0.000s
sys 0m0.000s
Use appropriate algorithms. Avoid exponential time complexity. Don't use recursion for Fibonacci numbers when performance is important.
Reference:
Recursive Algorithms in Computer Science Courses: Fibonacci Numbers
and Binomial Coefficients
We observe that the computational inefficiency of branched recursive
functions was not appropriately covered in almost all textbooks for
computer science courses in the first three years of the curriculum.
Fibonacci numbers and binomial coefficients were frequently used as
examples of branched recursive functions. However, their exponential
time complexity was rarely claimed and never completely proved in the
textbooks. Alternative linear time iterative solutions were rarely
mentioned. We give very simple proofs that these recursive functions
have exponential time complexity.
Recursion is an efficient technique for definitions and algorithms
that make only one recursive call, but can be extremely inefficient if
it makes two or more recursive calls. Thus the recursive approach is
frequently more useful as a conceptual tool rather than as an
efficient computational tool. The proofs presented in this paper were
successfully taught (over a five-year period) to first year students
at the University of Ottawa. It is suggested that recursion as a
problem solving and defining tool be covered in the second part of the
first computer science course. However, recursive programming should
be postponed for the end of the course (or perhaps better at the
beginning of the second computer science course), after iterative
programs are well mastered and stack operation well understood.
The Scala solution will consume stack, since it's not tail recursive (the addition happens after the recursive call), but it shouldn't be creating any garbage at all.
Most likely whichever Hotspot compiler you're using (probably server) is just a better compiler, for this code pattern, than the Go compiler.
If you're really curious, you can download a debug build of the JVM, and have it print out the assembly code.
Related
I am trying to implement RSA prime generation for P and Q based on FIP186-4 specification. The specification describes two different implementations: Section 3.2 Provable Prime Construction vs. Section 3.3 Probable Prime Construction. Initially, I tried implementing the probable prime approach because it is easier to understand and implement, but I discovered it is very slow because of the number of iterations needed to find P and Q primes (worst case it takes 15 minutes). Next, I decided to try the provable prime approach but I found out the algorithm is much more complex and might be slow as well. Below are my two issues:
In Section C.10, Step 12, how to eliminate the sqrt(2) to the expression x = floor(sqrt(2))(2^(Lā1))) + (x mod (2^L ā floor((sqrt(2)(2^(Lā1))))) so that I can represent it as whole numbers using BigNum representation?
In Section C.10, Step 14, is there a fast way to compute y in the interval [1, p2] such that 0 = ( y p0 p1ā1) mod p2? The specification doesn't specify a method to implement this. My initial thought was to perform a linear search staring from integer 1 and up but that can be very slow because p2 can be a very large number.
I tried searching online for help on this issue, but I discovered a lot of examples don't even comply with FIPS186-4. I assume it is because these two methods are too slow.
I want to print integer numbers from 1 to a random integer number (e.g. to a random integer number < 10 like in the following case). I can use e.g. one from the following ways:
1)
val randomNumber=r.nextInt(10)
for (i <- 0 to randomNumber) {println(i)}
2)
for (i <- 0 to r.nextInt(10)) {println(i)}
My question is the following. Is there a difference between 1) and 2) in the sense of computation? It is clear that a random number r.nextInt(10) is computed only once and after that it is assigned to
the variable randomNumber in the first case but what about the second case? Is the part r.nextInt(10) computed only once at the beginnig of the loop or it is computed for each iteration of the loop? If yes then the first variant is better from computation point of view? I know that this proposed example is an easy example but there can be much more complex for loops where the optimization can be very helpful. If the expression r.nextInt(10) in the second case is computed only once what about expressions which are e.g. functions of variable i, something like for (i <- 0 to fce(i)) {println(i)}? I guess that Fce(i) should be evaluated in every iteration of loop.
Thanks for help.
Andrew
No, there are no differences between 1) and 2).
The bound for the for loop will be evaluated once, and then the for loop will repeat this now fixed number of times.
Things would have been very different comparing those two pieces of code :
val randomNumber=r.nextInt(10)
for (i <- 0 to 10) {println(randomNumber)}
and
for (i <- 0 to 10) {println(r.nextInt(10))}
The first for loop will print 10 times the same value, the second will print 10 random values.
But you could still rewrite the second as
def randomNumber=r.nextInt(10)
for (i <- 0 to 10) {println(randomNumber)}
The change from a val to a function will cause a reevaluation every time the loop is executed.
Now, about
for (i <- 0 to function(i)) {doSomething()}
This doesn't compile. The variable i is only available in the scope between the {} of the for loop, and hence not for computing the loop bounds.
I hope this clarifies things a bit !
The two versions do the same thing, and to see why you need to realise that for in Scala works very differently from how it works in other languages like C/C++.
In Scala, for is just a bit of syntax that makes it easier to chain map/foreach, flatMap, and withFilter methods together. It makes functional code appear more like imperative code, but it is still executed in a functional way.
So your second chunk of code
for (i <- 0 to r.nextInt(10)) {println(i)}
is changed by the compiler into something like
(0 to r.nextInt(10)).foreach{ i => println(i) }
Looking at it this way, you can see that the range is computed first (using a single call to nextInt) and then foreach is called on that range. Therefore nextInt is called the same number of times as in your first chunk of code.
Does the Swift compiler use fusion to optimise code?
Let's say we want to write code to calculate the sum of the square roots of all positive numbers in a list. In Haskell you can write
sumOfSquareRoots xs = sum (allSquareRoots (filterPositives xs))
where
allSquareRoots [] = []
allSquareRoots (x:xs) = sqrt x : allSquareRoots xs
filterPositives []
= []
filterPositives (x:xs)
| x > 0 = x : filterPositives xs
| otherwise = filterPositives xs
This code is quite easy to read (the first line is very neat - almost English; the parts after the where are local.) This style also makes use of powerful built-in functions such as sum and we could make the other functions public and have them reused. So, good style.
However, we might be concerned that it is less efficient than having a one-pass function. (It pass through the list first to filterPositives then to get allSquareRoots of this and finally to sum this up.) Due to Haskell's, so-called, lazy evaluation execution strategy, however, the overhead is significantly less than in most other languages. Moreover, a good Haskell compiler can usually derive the one traversal version from the more elegant multiple traversal version using a process called fusion.
My question - does the Swift compiler deploy such optimisation strategies when compiling recursive functions?
I've been trying to find a way to count the number of times sets of Strings occur in a transaction database (implementing the Apriori algorithm in a distributed fashion). The code I have currently is as follows:
val cand_br = sc.broadcast(cand)
transactions.flatMap(trans => freq(trans, cand_br.value))
.reduceByKey(_ + _)
}
def freq(trans: Set[String], cand: Array[Set[String]]) : Array[(Set[String],Int)] = {
var res = ArrayBuffer[(Set[String],Int)]()
for (c <- cand) {
if (c.subsetOf(trans)) {
res += ((c,1))
}
}
return res.toArray
}
transactions starts out as an RDD[Set[String]], and I'm trying to convert it to an RDD[(K, V), with K every element in cand and V the number of occurrences of each element in cand in the transaction list.
When watching performance on the UI, the flatMap stage quickly takes about 3min to finish, whereas the rest takes < 1ms.
transactions.count() ~= 88000 and cand.length ~= 24000 for an idea of the data I'm dealing with. I've tried different ways of persisting the data, but I'm pretty positive that it's an algorithmic problem I am faced with.
Is there a more optimal solution to solve this subproblem?
PS: I'm fairly new to Scala / Spark framework, so there might be some strange constructions in this code
Probably, the right question to ask in this case would be: "what is the time complexity of this algorithm". I think it is very much unrelated to Spark's flatMap operation.
Rough O-complexity analysis
Given 2 collections of Sets of size m and n, this algorithm is counting how many elements of one collection are a subset of elements of the other collection, so it looks like complexity m x n. Looking one level deeper, we also see that 'subsetOf' is linear of the number of elements of the subset. x subSet y == x forAll y, so actually the complexity is m x n x s where s is the cardinality of the subsets being checked.
In other words, this flatMap operation has a lot of work to do.
Going Parallel
Now, going back to Spark, we can also observe that this algo is embarrassingly parallel and we can take advantage of Spark's capabilities to our advantage.
To compare some approaches, I loaded the 'retail' dataset [1] and ran the algo on val cand = transactions.filter(_.size<4).collect. Data size is a close neighbor of the question:
Transactions.count = 88162
cand.size = 15451
Some comparative runs on local mode:
Vainilla: 1.2 minutes
Increase transactions partitions up to # of cores (8): 33 secs
I also tried an alternative implementation, using cartesian instead of flatmap:
transactions
.cartesian(candRDD)
.map{case (tx, cd) => (cd, if (cd.subsetOf(tx)) 1 else 0)}
.reduceByKey(_ + _)
.collect
But that resulted in much longer runs as seen in the top 2 lines of the Spark UI (cartesian and cartesian with a higher number of partitions): 2.5 min
Given I only have 8 logical cores available, going above that does not help.
Sanity checks:
Is there any added 'Spark flatMap time complexity'? Probably some, as it involves serializing closures and unpacking collections, but negligible in comparison with the function being executed.
Let's see if we can do a better job: I implemented the same algo using plain scala:
val resLocal = reduceByKey(transLocal.flatMap(trans => freq(trans, cand)))
Where the reduceByKey operation is a naive implementation taken from [2]
Execution time: 3.67 seconds.
Sparks gives you parallelism out of the box. This impl is totally sequential and therefore takes longer to complete.
Last sanity check: A trivial flatmap operation:
transactions
.flatMap(trans => Seq((trans, 1)))
.reduceByKey( _ + _)
.collect
Execution time: 0.88 secs
Conclusions:
Spark is buying you parallelism and clustering and this algo can take advantage of it. Use more cores and partition the input data accordingly.
There's nothing wrong with flatmap. The time complexity prize goes to the function inside it.
I'm trying to implement a Count-Min Sketch algorithm in Scala, and so I need to generate k pairwise independent hash functions.
This is a lower-level than anything I've ever programmed before, and I don't know much about hash functions except from Algorithms classes, so my question is: how do I generate these k pairwise independent hash functions?
Am I supposed to use a hash function like MD5 or MurmurHash? Do I just generate k hash functions of the form f(x) = ax + b (mod p), where p is a prime and a and b are random integers? (i.e., the universal hashing family everyone learns in algorithms 101)
I'm looking more for simplicity than raw speed (e.g., I'll take something 5x slower if it's simpler to implement).
Scala already has MurmurHash implemented (it's scala.util.MurmurHash). It's very fast and very good at distributing values. A cryptographic hash is overkill--you'll just take tens or hundreds of times longer than you need to. Just pick k different seeds to start with and, since it's nearly cryptographic in quality, you'll get k largely independent hash codes. (In 2.10, you should probably switch to using scala.util.hashing.MurmurHash3; the usage is rather different but you can still do the same thing with mixing.)
If you only need near values to be mapped to randomly far values this will work; if you want to avoid collisions (i.e. if A and B collide using hash 1 they will probably not also collide using hash 2), then you'll need to go at least one more step and hash not the whole object but subcomponents of it so there's an opportunity for the hashes to start out different.
Probably the simplest approach is to take some cryptographic hash function and "seed" it with different sequences of bytes. For most practical purposes, the results should be independent, as this is one of the key properties a cryptographic hash function should have (if you replace any part of a message, the hash should be completely different).
I'd do something like:
// for each 0 <= i < k generate a sequence of random numbers
val randomSeeds: Array[Array[Byte]] = ... ; // initialize by random sequences
def hash(i: Int, value: Array[Byte]): Array[Byte] = {
val dg = java.security.MessageDigest.getInstance("SHA-1");
// "seed" the digest by a random value based on the index
dg.update(randomSeeds(i));
return dg.digest(value);
// if you need integer hash values, just take 4 bytes
// of the result and convert them to an int
}
Edit:
I don't know the precise requirements of the Count-Min Sketch, maybe a simple has function would suffice, but it doesn't seem to be the simplest solution.
I suggested a cryptographic hash function, because there you have quite strong guarantees that the resulting hash functions will be very different, and it's easy to implement, just use the standard libraries.
On the other hand, if you have two hash functions of the form f1(x) = ax + b (mod p) and f2(x) = cx + d (mod p), then you can compute one using another (without knowing x) using a simple linear formula f2(x) = c / a * (f1(x) - b) + d (mod p), which suggests that they aren't very independent. So you could run into unexpected problems here.