Scala stream to find primes in specific interval - scala

Problem:
I need to create a Scala program which uses Stream class and finds n-th prime number from interval [i, j] (whereas 1 <= i < j).
More information:
I am completely new in Scala but I've looked for various examples on how to find primes in Scala using Stream. None of them helped me to achieve my goal.
I can't seem to understand how to make stream a finite list in interval [i, j] and how to take n-th prime number from that interval.
My code so far:
def primeStream(args: Array[String], s: Stream[Int]): Stream[Int] =
Stream.cons(s.head, primeStream(args,s.tail filter {_ % s.head != 0 }))
if (args(0).toInt < 1) {
println("Entered parameter i does not meet requirements 1<=i<j (1<=" + args(0) + "<" + args(1) + ")")
sys.exit(1)
} else if (args(1).toInt < args(0).toInt) {
println("Entered parameter j does not meet requirements 1<=i<j (1<=" + args(0) + "<" + args(1) + ")")
sys.exit(1)
}
val primes = primeStream(args,Stream.from(args(0).toInt)) // start from i element
primes take args(1).toInt foreach println //Take j elements
Any help would be appreciated!
SOLUTION:
def primeStream(s: Stream[Int]): Stream[Int] =
Stream.cons(s.head, primeStream(s.tail filter {_ % s.head != 0 }))
if (args(0).toInt < 1) {
println("Entered parameter i does not meet requirements 1<=i<j (1<=" + args(0) + "<" + args(1) + ")")
sys.exit(1)
} else if (args(1).toInt < args(0).toInt) {
println("Entered parameter j does not meet requirements 1<=i<j (1<=" + args(0) + "<" + args(1) + ")")
sys.exit(1)
} else if (args(0).toInt == 1) {
println("1 is not a prime by definition!")
sys.exit(1) // if args(0) is 1 then function hangs up - didn't come up with a better solution for this
}
val primes = primeStream(Stream.from(args(0).toInt)) // get primes starting from given parameter
println(primes.takeWhile( _ < args(1).toInt).take(args(2).toInt).last) // get n-th prime and print it out

You just need to have your stream generate values while a certain condition holds:
primes takeWhile(_ < j) take n foreach println
and, of course, you need to get the primeStream function right.
For the algorithmic part, you'd better search on stackoverflow:
Calculating prime numbers in Scala: how does this code work?

Is this a learning exercise, or do you need this in production? For the latter, I would suggest using spire.math.prime.stream from the spire library. It is using a Segmented Stream of Eratosthenes implementation which is probably better than what you will come up with yourself in a short time. It also uses arbitrary precision integers, so it works for numbers larger than 2^64.
scala> import spire.math._
import spire.math._
scala> prime.stream.drop(10).take(10).toArray
res16: Array[spire.math.SafeLong] = Array(31, 37, 41, 43, 47, 53, 59, 61, 67, 71)

Related

scala: Loop through a file to read 20 bytes at a time and blank out bytes at 3rd position

I have a code snippet in java that loops through the file byte by byte and blanks out byte at 3rd position on every 20 bytes. This is done using for each loop.
logic:
for(byte b: raw){
if (pos is 3) b = 32;
if (i > 20) i = 0;
i++
}
Since I am learning scala, I would like to know if there is a better way of looping byte by byte in scala.
I have read into byte array as below in scala:
val result = IOUtils.toByteArray(new FileInputStream (new File(fileDir)))
Thanks.
Here is a diametrically opposite solution to that of Tzach Zohar:
def parallel(ba: Array[Byte], blockSize: Int = 2048): Unit = {
val n = ba.size
val numJobs = (n + blockSize - 1) / blockSize
(0 until numJobs).par.foreach { i =>
val startIdx = i * blockSize
val endIdx = n min ((i + 1) * blockSize)
var j = startIdx + ((3 - startIdx) % 20 + 20) % 20
while (j < endIdx) {
ba(j) = 32
j += 20
}
}
}
You see a lot of mutable variables, scary imperative while-loops, and some strange tricks with modular arithmetic. That's actually not idiomatic Scala at all. But the interesting thing about this solution is that it processes blocks of the byte array in parallel. I've compared the time needed by this solution to your naive solution, using various block sizes:
Naive: 38.196
Parallel( 16): 11.676000
Parallel( 32): 7.260000
Parallel( 64): 4.311000
Parallel( 128): 2.757000
Parallel( 256): 2.473000
Parallel( 512): 2.462000
Parallel(1024): 2.435000
Parallel(2048): 2.444000
Parallel(4096): 2.416000
Parallel(8192): 2.420000
At least in this not very thorough microbenchmark (1000 repetitions on 10MB array), the more-or-less efficiently implemented parallel version outperformed the for-loop in your question by factor 15x.
The question is now: What do you mean by "better"?
My proposal was slightly faster than your naive approach
#TzachZohar's functional solution could generalize better should the
code be moved on a cluster like Apache Spark.
I would usually prefer something closer to #TzachZohar's solution, because it's easier to read.
So, it depends on what you are optimizing for: performance? generality? readability? maintainability? For each notion of "better", you could get a different answer. I've tried to optimize for performance. #TzachZohar optimized for readability and maintainability. That lead to two rather different solutions.
Full code of the microbenchmark, just in case someone is interested:
val array = Array.ofDim[Byte](10000000)
def naive(ba: Array[Byte]): Unit = {
var pos = 0
for (i <- 0 until ba.size) {
if (pos == 3) ba(i) = 32
pos += 1
if (pos == 20) pos = 0
}
}
def parallel(ba: Array[Byte], blockSize: Int): Unit = {
val n = ba.size
val numJobs = (n + blockSize - 1) / blockSize
(0 until numJobs).par.foreach { i =>
val startIdx = i * blockSize
val endIdx = n min ((i + 1) * blockSize)
var j = startIdx + ((3 - startIdx) % 20 + 20) % 20
while (j < endIdx) {
ba(j) = 32
j += 20
}
}
}
def measureTime[U](repeats: Long)(block: => U): Double = {
val start = System.currentTimeMillis
var iteration = 0
while (iteration < repeats) {
iteration += 1
block
}
val end = System.currentTimeMillis
(end - start).toDouble / repeats
}
println("Basic sanity check (did I get the modulo arithmetic right?):")
{
val testArray = Array.ofDim[Byte](50)
naive(testArray)
println(testArray.mkString("[", ",", "]"))
}
{
for (blockSize <- List(3, 7, 13, 16, 17, 32)) {
val testArray = Array.ofDim[Byte](50)
parallel(testArray, blockSize)
println(testArray.mkString("[", ",", "]"))
}
}
val Reps = 1000
val naiveTime = measureTime(Reps)(naive(array))
println("Naive: " + naiveTime)
for (blockSize <- List(16,32,64,128,256,512,1024,2048,4096,8192)) {
val parallelTime = measureTime(Reps)(parallel(array, blockSize))
println("Parallel(%4d): %f".format(blockSize, parallelTime))
}
Here's one way to do this:
val updated = result.grouped(20).flatMap { arr => arr.update(3, 32); arr }

Collatz - maximum number of steps and the corresponding number

I am trying to write a Scala function that takes an upper bound as argument and calculates the steps for the numbers in a range from 1 up to this bound. It had to return the maximum number of steps and the corresponding number that needs that many steps. (as a pair - first element is the number of steps and second is the corresponding index)
I already have created a function called "collatz" which computes the number of steps. I am very new with Scala and I am a bit stuck because of the limitations. Here's how I thought to start the function:
def max(x:Int):Int = {
for (i<-(1 to x).toList) yield collatz(i)
the way I think to solve this problem is to: 1. iterate through the range and apply collatz to all elements while putting them in a new list which stores the number of steps. 2. find the maximum of the new list by using List.max 3. Use List.IndexOf to find the index. However, I'm really stuck since I don't know how to do this without using var (and only using val). Thanks!
Something like this:
def collatzMax(n: Long): (Long, Long) = {
require(n > 0, "Collatz function is not defined for n <= 0")
def collatz(n: Long, steps: Long): Long = n match {
case n if (n <= 1) => steps
case n if (n % 2 == 0) => collatz(n / 2, steps + 1)
case n if (n % 2 == 1) => collatz(3 * n + 1, steps + 1)
}
def loop(n: Long, current: Long, acc: List[(Long, Long)]): List[(Long, Long)] =
if (current > n) acc
else {
loop(n, current + 1, collatz(current, 0) -> current :: acc)
}
loop(n, 1, Nil).sortBy(-_._1).head
}
Example:
collatzMax(12)
result: (Long, Long) = (19,9) // 19 steps for collatz(9)
Using for:
def collatzMax(n: Long) =
(for(i <- 1L to n) yield collatz(i) -> i).sortBy(-_._1).head
Or(continuing your idea):
def maximum(x: Long): (Long, Long) = {
val lst = for (i <- 1L to x) yield collatz(i)
val maxValue = lst.max
(maxValue, lst.indexOf(maxValue) + 1)
}
Try:
(1 to x).map(collatz).maxBy(_._2)._1

Why this function call in Scala is not optimized away?

I'm running this program with Scala 2.10.3:
object Test {
def main(args: Array[String]) {
def factorial(x: BigInt): BigInt =
if (x == 0) 1 else x * factorial(x - 1)
val N = 1000
val t = new Array[Long](N)
var r: BigInt = 0
for (i <- 0 until N) {
val t0 = System.nanoTime()
r = r + factorial(300)
t(i) = System.nanoTime()-t0
}
val ts = t.sortWith((x, y) => x < y)
for (i <- 0 to 10)
print(ts(i) + " ")
println("*** " + ts(N/2) + "\n" + r)
}
}
and call to a pure function factorial with constant argument is evaluated during each loop iteration (conclusion based on timing results). Shouldn't the optimizer reuse function call result after the first call?
I'm using Scala IDE for Eclipse. Are there any optimization flags for the compiler, which may produce more efficient code?
Scala is not a purely functional language, so without an effect system it cannot know that factorial is pure (for example, it doesn't "know" anything about the multiplication of big ints).
You need to add your own memoization approach here. Most simply add a val f300 = factorial(300) outside your loop.
Here is a question about memoization.

More elegant scala code

I am starting to learn scala. Wonder if anyone has a better way to re-write below code in a more functional way. I know there must be one.
val buf = ((addr>>24)&0xff) + "." + ((addr>>16)&0xff) + "." + ((addr>>8)&0xff) + "." + ((addr)&0xff)
This generates the Range(24, 16, 8, 0) with (24 to 0 by -8) and then applies the function addr >> _ & 0xff to each number using map. Lastly, the mapped Range of numbers is "joined" with . to create a string.
The map is more functional than using the + operator but the rest is just syntactic sugar and a library call to mkString.
val addr = 1024
val buf = (24 to 0 by -8).map(addr >> _ & 0xff).mkString(".")
buf: java.lang.String = 0.0.4.0
val buf = List(24,16,8,0).map(addr >> _).map(_ & 0xff).mkString(".")
Here's how I would do it, similar to Brian's answer but with a short list of values and two simple map() methods using Scala's famous '_' operator. Great question!
Some would find the for comprehension a little bit more readable:
(for (pos <- 24 to 0 by -8) yield addr >> pos & 0xff) mkString "."
The advantage is that input - can be ANY number of integers
// trick
implicit class When[F](fun: F) {
def when(cond: F => Boolean)(tail: F => F) = if (cond(fun)) tail(fun) else fun
}
// actual one-liner
12345678.toHexString.when(1 to 8 contains _.length % 8)
(s => "0" * (8 - s.length % 8) + s ).reverse.grouped(2).map
(Integer.parseInt(_, 16)).toList.reverse.mkString(".")
// 0.203.22.228
// a very big IPv7
BigInt("123456789012345678901").toString(16).when(1 to 8 contains _.length % 8)
(s => "0" * (8 - s.length % 8) + s ).reverse.grouped(2).map
(Integer.parseInt(_, 16)).toList.reverse.mkString(".")
// 0.0.0.96.27.228.249.24.242.99.198.83
EDIT
Explanation because of downvotes. implicit class When can be just a library class, it works in 2.10 and allows conditionally execute some of functions in a calls chain. I did not measure performance, and don't care, because an example itself is meant to be an illustration of what is possible, elegant or not.

Scala performance - Sieve

Right now, I am trying to learn Scala . I've started small, writing some simple algorithms . I've encountered some problems when I wanted to implement the Sieve algorithm from finding all all prime numbers lower than a certain threshold .
My implementation is:
import scala.math
object Sieve {
// Returns all prime numbers until maxNum
def getPrimes(maxNum : Int) = {
def sieve(list: List[Int], stop : Int) : List[Int] = {
list match {
case Nil => Nil
case h :: list if h <= stop => h :: sieve(list.filterNot(_ % h == 0), stop)
case _ => list
}
}
val stop : Int = math.sqrt(maxNum).toInt
sieve((2 to maxNum).toList, stop)
}
def main(args: Array[String]) = {
val ap = printf("%d ", (_:Int));
// works
getPrimes(1000).foreach(ap(_))
// works
getPrimes(100000).foreach(ap(_))
// out of memory
getPrimes(1000000).foreach(ap(_))
}
}
Unfortunately it fails when I want to computer all the prime numbers smaller than 1000000 (1 million) . I am receiving OutOfMemory .
Do you have any idea on how to optimize the code, or how can I implement this algorithm in a more elegant fashion .
PS: I've done something very similar in Haskell, and there I didn't encountered any issues .
I would go with an infinite Stream. Using a lazy data structure allows to code pretty much like in Haskell. It reads automatically more "declarative" than the code you wrote.
import Stream._
val primes = 2 #:: sieve(3)
def sieve(n: Int) : Stream[Int] =
if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
else n #:: sieve(n + 2)
def getPrimes(maxNum : Int) = primes.takeWhile(_ < maxNum)
Obviously, this isn't the most performant approach. Read The Genuine Sieve of Eratosthenes for a good explanation (it's Haskell, but not too difficult). For real big ranges you should consider the Sieve of Atkin.
The code in question is not tail recursive, so Scala cannot optimize the recursion away. Also, Haskell is non-strict by default, so you can't hardly compare it to Scala. For instance, whereas Haskell benefits from foldRight, Scala benefits from foldLeft.
There are many Scala implementations of Sieve of Eratosthenes, including some in Stack Overflow. For instance:
(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps))
The following answer is about a 100 times faster than the "one-liner" answer using a Set (and the results don't need sorting to ascending order) and is more of a functional form than the other answer using an array although it uses a mutable BitSet as a sieving array:
object SoE {
def makeSoE_Primes(top: Int): Iterator[Int] = {
val topndx = (top - 3) / 2
val nonprms = new scala.collection.mutable.BitSet(topndx + 1)
def cullp(i: Int) = {
import scala.annotation.tailrec; val p = i + i + 3
#tailrec def cull(c: Int): Unit = if (c <= topndx) { nonprms += c; cull(c + p) }
cull((p * p - 3) >>> 1)
}
(0 to (Math.sqrt(top).toInt - 3) >>> 1).filterNot { nonprms }.foreach { cullp }
Iterator.single(2) ++ (0 to topndx).filterNot { nonprms }.map { i: Int => i + i + 3 }
}
}
It can be tested by the following code:
object Main extends App {
import SoE._
val top_num = 10000000
val strt = System.nanoTime()
val count = makeSoE_Primes(top_num).size
val end = System.nanoTime()
println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
println(f"Found $count primes up to $top_num" + ".")
println("Using one large mutable1 BitSet and functional code.")
}
With the results from the the above as follows:
Successfully completed without errors. [total 294 ms]
Found 664579 primes up to 10000000.
Using one large mutable BitSet and functional code.
There is an overhead of about 40 milliseconds for even small sieve ranges, and there are various non-linear responses with increasing range as the size of the BitSet grows beyond the different CPU caches.
It looks like List isn't very effecient space wise. You can get an out of memory exception by doing something like this
1 to 2000000 toList
I "cheated" and used a mutable array. Didn't feel dirty at all.
def primesSmallerThan(n: Int): List[Int] = {
val nonprimes = Array.tabulate(n + 1)(i => i == 0 || i == 1)
val primes = new collection.mutable.ListBuffer[Int]
for (x <- nonprimes.indices if !nonprimes(x)) {
primes += x
for (y <- x * x until nonprimes.length by x if (x * x) > 0) {
nonprimes(y) = true
}
}
primes.toList
}