I have two arrays (that i have pulled out of a matrix (Array[Array[Int]]) and I need to subtract one from the other.
At the moment I am using this method however, when I profile it, it is the bottleneck.
def subRows(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
I need to do this billions of times so any improvement in speed is a plus.
I have tried using a List instead of an Array to collect the differences and it is MUCH faster but I lose all benefit when I convert it back to an Array.
I did modify the downstream code to take a List to see if that would help but I need to access the contents of the list out of order so again there is loss of any gains there.
It seems like any conversion of one type to another is expensive and I am wondering if there is some way to use a map etc. that might be faster.
Is there a better way?
EDIT
Not sure what I did the first time!?
So the code I used to test it was this:
def subRowsArray(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
def subRowsList(a: Array[Int], b: Array[Int], sizeHint: Int): List[Int] = {
var l: List[Int] = Nil
var i = 0
while (i < sizeHint) {
l = a(i) - b(i) :: l
i += 1
}
l
}
val a = Array.fill(100, 100)(scala.util.Random.nextInt(2))
val loops = 30000 * 10000
def runArray = for (i <- 1 to loops) subRowsArray(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def runList = for (i <- 1 to loops) subRowsList(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def optTimer(f: => Unit) = {
val s = System.currentTimeMillis
f
System.currentTimeMillis - s
}
The results I thought I got the first time I did this were the exact opposite... I must have misread or mixed up the methods.
My apologies for asking a bad question.
That code is the fastest you can manage single-threaded using a standard JVM. If you think List is faster, you're either fooling yourself or not actually telling us what you're doing. Putting an Int into List requires two object creations: one to create the list element, and one to box the integer. Object creations take about 10x longer than an array access. So it's really not a winning proposition to do it any other way.
If you really, really need to go faster, and must stay with a single thread, you should probably switch to C++ or the like and explicitly use SSE instructions. See this question, for example.
If you really, really need to go faster and can use multiple threads, then the easiest is to package up a chunk of work like this (i.e. a sensible number of pairs of vectors that need to be subtracted--probably at least a few million elements per chunk) into a list as long as the number of processors on your machine, and then call list.par.map(yourSubtractionRoutineThatActsOnTheChunkOfWork).
Finally, if you can be destructive,
a(i) -= b(i)
in the inner loop is, of course, faster. Likewise, if you can reuse space (e.g. with System.arraycopy), you're better off than if you have to keep allocating it. But that changes the interface from what you've shown.
You can use Scalameter to try a benchmark the two implementations which requires at least JRE 7 update 4 and Scala 2.10 to be run. I used scala 2.10 RC2.
Compile with scalac -cp scalameter_2.10-0.2.jar RangeBenchmark.scala.
Run with scala -cp scalameter_2.10-0.2.jar:. RangeBenchmark.
Here's the code I used:
import org.scalameter.api._
object RangeBenchmark extends PerformanceTest.Microbenchmark {
val limit = 100
val a = new Array[Int](limit)
val b = new Array[Int](limit)
val array: Array[Int] = new Array(limit)
var list: List[Int] = Nil
val ranges = for {
size <- Gen.single("size")(limit)
} yield 0 until size
measure method "subRowsArray" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
array(i) = a(i) - b(i)
i += 1
}
r => array
}
}
measure method "subRowsList" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
list = a(i) - b(i) :: list
i += 1
}
r => list
}
}
}
Here's the results:
::Benchmark subRowsArray::
Parameters(size -> 100): 8.26E-4
::Benchmark subRowsList::
Parameters(size -> 100): 7.94E-4
You can draw your own conclusions. :)
The stack blew up on larger values of limit. I'll guess it's because it's measuring the performance many times.
Related
I am not asking should i use recursion or iteration, or which is faster between them. I was trying to understand the iteration and recursion time taken, and I come up with an interesting pattern in the time taken of the both, which was what ever was on the top of the file is taking more time than the other.
For example: If I am writing for loop in the beginning it is taking more time than recursion and vice versa. The difference between time taken in both of the process are significently huge aprox 30 to 40 times.
My questions are:-
Is order of a loop and recursion matters?
Is there something related to print?
What could be the possible reason for such behaviour?
following is the code I have in the same file and the language I am using is scala?
def count(x: Int): Unit = {
if (x <= 1000) {
print(s"$x ")
count(x + 1)
}
}
val t3 = System.currentTimeMillis()
count(1)
val t4 = System.currentTimeMillis()
println(s"\ntime taken by the recursion look = ${t4 - t3} mili second")
var c = 1
val t1 = System.currentTimeMillis()
while(c <= 1000)
{
print(s"$c ")
c+=1
}
val t2 = System.currentTimeMillis()
println(s"\ntime taken by the while loop = ${t2 - t1} mili second")
In this situation the time taken for recursion and while loop are 986ms, 20ms respectively.
When I switch the position of loop and recursion which means first loop then recursion, time taken for recursion and while loop are 1.69 sec and 28 ms respectively.
Edit 1:
I can see the same behaviour with bufferWriter if the recursion code is on the top. But not the case when recursion is below the loop. When recursion is below the loop it is taking almost same time with the difference of 2 to 3 ms.
If you wanted to convince yourself that the tailrec-optimization works, without relying on any profiling tools, here is what you could try:
Use way more iterations
Throw away the first few iterations to give the JIT the time to wake up and do the hotspot-optimizations
Throw away all unpredictable side effects like printing to stdout
Throw away all costly operations that are the same in both approaches (formatting numbers etc.)
Measure in multiple rounds
Randomize the number of repetitions in each round
Randomize the order of variants within each round, to avoid any "catastrophic resonance" with the cycles of the garbage collector
Preferably, don't run anything else on the computer
Something along these lines:
def compare(
xs: Array[(String, () => Unit)],
maxRepsPerBlock: Int = 10000,
totalRounds: Int = 100000,
warmupRounds: Int = 1000
): Unit = {
val n = xs.size
val times: Array[Long] = Array.ofDim[Long](n)
val rng = new util.Random
val indices = (0 until n).toList
var totalReps: Long = 0
for (round <- 1 to totalRounds) {
val order = rng.shuffle(indices)
val reps = rng.nextInt(maxRepsPerBlock / 2) + maxRepsPerBlock / 2
for (i <- order) {
var r = 0
while (r < reps) {
r += 1
val start = System.currentTimeMillis
(xs(i)._2)()
val end = System.currentTimeMillis
if (round > warmupRounds) {
times(i) += (end - start)
}
}
}
if (round > warmupRounds) {
totalReps += reps
}
}
for (i <- 0 until n) {
println(f"${xs(i)._1}%20s : ${times(i) / totalReps.toDouble}")
}
}
def gaussSumWhile(n: Int): Long = {
var acc: Long = 0
var i = 0
while (i <= n) {
acc += i
i += 1
}
acc
}
#annotation.tailrec
def gaussSumRec(n: Int, acc: Long = 0, i: Int = 0): Long = {
if (i <= n) gaussSumRec(n, acc + i, i + 1)
else acc
}
compare(Array(
("while", { () => gaussSumWhile(1000) }),
("#tailrec", { () => gaussSumRec(1000) })
))
Here is what it prints:
while : 6.737733046257334E-5
#tailrec : 6.70325653896487E-5
Even the simple hints above are sufficient for creating a benchmark that shows that the while loop and the tail-recursive function take roughly the same time.
Scala does not compile into machine code but into bytecode for the "Java Virtual Machine"(JVM) which then interprets that code on the native processor. The JVM uses multiple mechanisms to optimise code that is run frequently, eventually converting the frequently-called functions ("hotspots") into pure machine code.
This means that testing the first run of a function does not give a good measure of eventual performance. You need to "warm up" the JIT compiler by running the test code many times before attempting to measure the time taken.
Also, as noted in the comments, doing any kind of I/O is going to make timings very unreliable because there is a danger that the I/O will block. Write a test case that does not do any blocking, if possible.
I am using a for comprehension on a stream and I would like to know how many iterations took to get o the final results.
In code:
var count = 0
for {
xs <- xs_generator
x <- xs
count = count + 1 //doesn't work!!
if (x prop)
yield x
}
Is there a way to achieve this?
Edit: If you don't want to return only the first item, but the entire stream of solutions, take a look at the second part.
Edit-2: Shorter version with zipWithIndex appended.
It's not entirely clear what you are attempting to do. To me it seems as if you are trying to find something in a stream of lists, and additionaly save the number of checked elements.
If this is what you want, consider doing something like this:
/** Returns `x` that satisfies predicate `prop`
* as well the the total number of tested `x`s
*/
def findTheX(): (Int, Int) = {
val xs_generator = Stream.from(1).map(a => (1 to a).toList).take(1000)
var count = 0
def prop(x: Int): Boolean = x % 317 == 0
for (xs <- xs_generator; x <- xs) {
count += 1
if (prop(x)) {
return (x, count)
}
}
throw new Exception("No solution exists")
}
println(findTheX())
// prints:
// (317,50403)
Several important points:
Scala's for-comprehension have nothing to do with Python's "yield". Just in case you thought they did: re-read the documentation on for-comprehensions.
There is no built-in syntax for breaking out of for-comprehensions. It's better to wrap it into a function, and then call return. There is also breakable though, but it works with Exceptions.
The function returns the found item and the total count of checked items, therefore the return type is (Int, Int).
The error in the end after the for-comprehension is to ensure that the return type is Nothing <: (Int, Int) instead of Unit, which is not a subtype of (Int, Int).
Think twice when you want to use Stream for such purposes in this way: after generating the first few elements, the Stream holds them in memory. This might lead to "GC-overhead limit exceeded"-errors if the Stream isn't used properly.
Just to emphasize it again: the yield in Scala for-comprehensions is unrelated to Python's yield. Scala has no built-in support for coroutines and generators. You don't need them as often as you might think, but it requires some readjustment.
EDIT
I've re-read your question again. In case that you want an entire stream of solutions together with a counter of how many different xs have been checked, you might use something like that instead:
val xs_generator = Stream.from(1).map(a => (1 to a).toList)
var count = 0
def prop(x: Int): Boolean = x % 317 == 0
val xsWithCounter = for {
xs <- xs_generator;
x <- xs
_ = { count = count + 1 }
if (prop(x))
} yield (x, count)
println(xsWithCounter.take(10).toList)
// prints:
// List(
// (317,50403), (317,50721), (317,51040), (317,51360), (317,51681),
// (317,52003), (317,52326), (317,52650), (317,52975), (317,53301)
// )
Note the _ = { ... } part. There is a limited number of things that can occur in a for-comprehension:
generators (the x <- things)
filters/guards (if-s)
value definitions
Here, we sort-of abuse the value-definition syntax to update the counter. We use the block { counter += 1 } as the right hand side of the assignment. It returns Unit. Since we don't need the result of the block, we use _ as the left hand side of the assignment. In this way, this block is executed once for every x.
EDIT-2
If mutating the counter is not your main goal, you can of course use the zipWithIndex directly:
val xsWithCounter =
xs_generator.flatten.zipWithIndex.filter{x => prop(x._1)}
It gives almost the same result as the previous version, but the indices are shifted by -1 (it's the indices, not the number of tried x-s).
I'm trying to write some code as below -
def kthSmallest(matrix: Array[Array[Int]], k: Int): Int = {
val pq = new PriorityQueue[Int]() //natural ordering
var count = 0
for (
i <- matrix.indices;
j <- matrix(0).indices
) yield {
pq += matrix(i)(j)
count += 1
} //This would yield Any!
pq.dequeue() //kth smallest.
}
My question is, that I only want to loop till the time count is less than k (something like takeWhile(count != k)), but as I'm also inserting elements into the priority queue in the yield, this won't work in the current state.
My other options are to write a nested loop and return once count reaches k. Is it possible to do with yield? I could not find a idiomatic way of doing it yet. Any pointers would be helpful.
It's not idiomatic for Scala to use vars or break loops. You can go for recursion, lazy evaluation or duct tape a break, giving up on some performance (just like return, it's implemented as an Exception, and won't perform well enough). Here are the options broken down:
Use recursion - recursive algorithms are the analog of loops in functional languages
def kthSmallest(matrix: Array[Array[Int]], k: Int): Int = {
val pq = new PriorityQueue[Int]() //natural ordering
#tailrec
def fillQueue(i: Int, j: Int, count: Int): Unit =
if (count >= k || i >= matrix.length) ()
else {
pq += matrix(i)(j)
fillQueue(
if (j >= matrix(i).length - 1) i + 1 else i,
if (j >= matrix(i).length - 1) 0 else j + 1,
count + 1)
}
fillQueue(0, 0, 0)
pq.dequeue() //kth smallest.
}
Use a lazy structure, as chengpohi suggested - this doesn't sound very much like a pure function though. I'd suggest to use an Iterator instead of a Stream in this case though - as iterators don't memoize the steps they've gone through (might spare some memory for large matrices).
For those desperately willing to use break, Scala supports it in an attachable fashion (note the performance caveat mentioned above):
import scala.util.control.Breaks
breakable {
// loop code
break
}
There is a way using the Stream lazy evaluation to do this. Since for yield is equal to flatMap, you can convert for yield to flatMap with Stream:
matrix.indices.toStream.flatMap(i => {
matrix(0).indices.toStream.map(j => {
pq += matrix(i)(j)
count += 1
})
}).takeWhile(_ => count <= k)
Use toStream to convert the collection to Stream, and Since Stream is lazy evaluation, so we can use takeWhile to predicate count to terminate the less loops without init others.
I would like to know the complexity of converting scala collection operations like following ones :
List.fill(n)(1).toArray
Array.fill(n)(1).toList
ArrayBuffer( Array.fill(n)(1):_* )
I suppose that for those exemples we need to loop over all elements so it will be O(n), unfortunately i don't know the subroutines under those conversions so the complexity may be optimized.
Don't hesitate to add complexity for others kind of scala conversions.
I took a quick look at the source code, and they all appear to be O(n) as you thought.
Here's for example the subroutine copyToArray (used by toArray):
override /*TraversableLike*/ def copyToArray[B >: A](xs: Array[B], start: Int, len: Int) {
var i = start
val end = (start + len) min xs.length
val it = iterator
while (i < end && it.hasNext) {
xs(i) = it.next()
i += 1
}
}
source
As you can see it simply iterates over the collection linearly.
I have following simple code
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
And it throws OutOfMemmoryError exception.
I can not understand why, because I think all the parts use constant memmory i.e. lazy evaluation streams and foldLeft...
Those code also don't work
fib(1,1).take(10000000).sum or max, min e.t.c.
How to correctly implement infinite streams and do iterative operations upon it?
Scala version: 2.9.0
Also scala javadoc said, that foldLeft operation is memmory safe for streams
/** Stream specialization of foldLeft which allows GC to collect
* along the way.
*/
#tailrec
override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
if (this.isEmpty) z
else tail.foldLeft(op(z, head))(op)
}
EDIT:
Implementation with iterators still not useful, since it throws ${domainName} exception
def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
How to define correctly infinite stream/iterator in Scala?
EDIT2:
I don't care about int overflow, I just want to understand how to create infinite stream/iterator etc in scala without side effects .
The reason to use Stream instead of Iterator is so that you don't have to calculate all the small terms in the series over again. But this means that you need to store ten million stream nodes. These are pretty large, unfortunately, so that could be enough to overflow the default memory. The only realistic way to overcome this is to start with more memory (e.g. scala -J-Xmx2G). (Also, note that you're going to overflow Long by an enormous margin; the Fibonacci series increases pretty quickly.)
P.S. The iterator implementation I have in mind is completely different; you don't build it out of concatenated singleton Iterators:
def fib(i: Long, j: Long) = Iterator.iterate((i,j)){ case (a,b) => (b,a+b) }.map(_._1)
Now when you fold, past results can be discarded.
The OutOfMemoryError happens indenpendently from the fact that you use Stream. As Rex Kerr mentioned above, Stream -- unlike Iterator -- stores everything in memory. The difference with List is that the elements of Stream are calculated lazily, but once you reach 10000000, there will be 10000000 elements, just like List.
Try with new Array[Int](10000000), you will have the same problem.
To calculate the fibonacci number as above you may want to use different approach. You can take into account the fact that you only need to have two numbers, instead of the whole fibonacci numbers discovered so far.
For example:
scala> def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
fib: (i: Long,j: Long)Iterator[Long]
And to get, for example, the index of the first fibonacci number exceeding 1000000:
scala> fib(1, 1).indexWhere(_ > 1000000)
res12: Int = 30
Edit: I added the following lines to cope with the StackOverflow
If you really want to work with 1 millionth fibonacci number, the iterator definition above will not work either for StackOverflowError. The following is the best I have in mind at the moment:
class FibIterator extends Iterator[BigDecimal] {
var i: BigDecimal = 1
var j: BigDecimal = 1
def next = {val temp = i
i = i + j
j = temp
j }
def hasNext = true
}
scala> new FibIterator().take(1000000).foldLeft(0:BigDecimal)(_ + _)
res49: BigDecimal = 82742358764415552005488531917024390424162251704439978804028473661823057748584031
0652444660067860068576582339667553466723534958196114093963106431270812950808725232290398073106383520
9370070837993419439389400053162345760603732435980206131237515815087375786729469542122086546698588361
1918333940290120089979292470743729680266332315132001038214604422938050077278662240891771323175496710
6543809955073045938575199742538064756142664237279428808177636434609546136862690895665103636058513818
5599492335097606599062280930533577747023889877591518250849190138449610994983754112730003192861138966
1418736269315695488126272680440194742866966916767696600932919528743675517065891097024715258730309025
7920682881137637647091134870921415447854373518256370737719553266719856028732647721347048627996967...
#yura's problem:
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
besides using a Long which can't possibly hold the Fibonacci of 10,000,000, it does work. That is, if the foldLeft is written as:
fib(1,1).take(10000000).foldLeft(0L)(_+_)
Looking at the Streams.scala source, foldLeft() is clearly designed for Garbage Collection, but /: is not def'd.
The other answers alluded to another problem. The Fibonacci of 10 million is a big number and if BigInt is used, instead of just overflowing like with a Long, absolutely enormous numbers are being added to each over and over again.
Since Stream.foldLeft is optimized for GC it does look like the way to solve for really big Fibonacci numbers, rather than using a zip or tail recursion.
// Fibonacci using BigInt
def fib(i:BigInt,j:BigInt):Stream[BigInt] = i #:: fib(j, i+j)
fib(1,0).take(10000000).foldLeft(BigInt("0"))(_+_)
Results of the above code: 10,000,000 is a 8-figure number. How many figures in fib(10000000)? 2,089,877
fib(1,1).take(10000000) is the "this" of the method /:, it is likely that the JVM will consider the reference alive as long as the method runs, even if in this case, it might get rid of it.
So you keep a reference on the head of the stream all along, hence on the whole stream as you build it to 10M elements.
You could just use recursion, which is about as simple:
def fibSum(terms: Int, i: Long = 1, j: Long = 1, total: Long = 2): Long = {
if (terms == 2) total
else fibSum(terms - 1, j, i + j, total + i + j)
}
With this, you can "fold" a billion elements in only a couple of seconds, but as Rex points out, summing the Fibbonaci sequence overflows Long very quickly.
If you really wanted to know the answer to your original problem and don't mind sacrificing some accuracy you could do this:
def fibSum(terms: Int, i: Double = 1, j: Double = 1, tot: Double = 2,
exp: Int = 0): String = {
if (terms == 2) "%.6f".format(tot) + " E+" + exp
else {
val (i1, j1, tot1, exp1) =
if (tot + i + j > 10) (i/10, j/10, tot/10, exp + 1)
else (i, j, tot, exp)
fibSum(terms - 1, j1, i1 + j1, tot1 + i1 + j1, exp1)
}
}
scala> fibSum(10000000)
res54: String = 2.957945 E+2089876