Intellij worksheet and classes defined in it - scala

I'm following along the Coursera course on functional programming in Scala and came along a weird behavior of the worksheet repl.
In the course a worksheet with the following code should give the following results on the right:
object rationals {
val x = new Rational(1, 2) > x : Rational = Rational#<hash_code>
x.numer > res0: Int = 1
y. denom > res1: Int = 2
}
class Rational(x: Int, y: Int) {
def numer = x
def denom = y
}
What I get is
object rationals { > defined module rationals
val x = new Rational(1, 2)
x.numer
y. denom
}
class Rational(x: Int, y: Int) { > defined class Rational
def numer = x
def denom = y
}
Only after moving the class into the object I got the same result as in the code.
Is this caused by Intellij, or have there been changes in Scala?
Are there other ways around this?

In the IntelliJ IDEA scala worksheet handles values inside the objects differently than Eclipse/Scala IDE.
Values inside objects are not evaluated in linear sequence mode, instead they are treated as normal scala object. You barely see information about it until explicit use.
To actually see your vals and expressions simply define or evaluate them outside any object\class
This behaviour could be a saviour in some cases. Suppose you have that definitions.
val primes = 2l #:: Stream.from(3, 2).map(_.toLong).filter(isPrime)
val isPrime: Long => Boolean =
n => primes.takeWhile(p => p * p <= n).forall(n % _ != 0)
Note that isPrime could be a simple def, but we choose to define it as val for some reason.
Such code is nice and working in any normal scala code, but will fail in the worksheet, because vals definitions are cross-referencing.
But it you wrap such lines inside some object like
object Primes {
val primes = 2l #:: Stream.from(3, 2).map(_.toLong).filter(isPrime)
val isPrime: Long => Boolean =
n => primes.takeWhile(p => p * p <= n).forall(n % _ != 0)
}
It will be evaluated with no problem

Related

Scala Generic Type slow

I do need to create a method for comparison for either Int or String or Char. Using AnyVal was not make it possible as there were no method's for <, > comparison.
However Typing it into Ordered shows a significant slowness. Are there better ways to achieve this? The plan is to do a generic binary sorting, and found Generic typing decreases the performance.
def sample1[T <% Ordered[T]](x:T) = { x < (x) }
def sample2(x:Ordered[Int]) = { x < 1 }
def sample3(x:Int) = { x < 1 }
val start1 = System.nanoTime
sample1(5)
println(System.nanoTime - start1)
val start2 = System.nanoTime
sample2(5)
println(System.nanoTime - start2)
val start3 = System.nanoTime
sample3(5)
println(System.nanoTime - start3)
val start4 = System.nanoTime
sample3(5)
println(System.nanoTime - start4)
val start5 = System.nanoTime
sample2(5)
println(System.nanoTime - start5)
val start6 = System.nanoTime
sample1(5)
println(System.nanoTime - start6)
The results shows:
Sample1:696122
Sample2:45123
Sample3:13947
Sample3:5332
Sample2:194438
Sample1:497992
Am I doing the incorrect way of handling Generics? Or should I be doing the old Java method of using Comparator in this case, sample as in:
object C extends Comparator[Int] {
override def compare(a:Int, b:Int):Int = {
a - b
}
}
def sample4[T](a:T, b:T, x:Comparator[T]) {x.compare(a,b)}
The Scala equivalent of Java Comparator is Ordering. One of the main differences is that, if you don't provide one manually, a suitable Ordering can be injected implicitly by the compiler. By default, this will be done for Byte, Int, Float and other primitives, for any subclass of Ordered or Comparable, and for some other obvious cases.
Also, Ordering provides method definitions for all the main comparison methods as extension methods, so you can write the following:
import Ordering.Implicits._
def sample5[T : Ordering](a: T, b: T) = a < b
def run() = sample5(1, 2)
As of Scala 2.12, those extension operations (i.e., a < b) invoke wrapping in a temporary object Ordering#Ops, so the code will be slower than with a Comparator. Not much in most real cases, but still significant if you care about micro-optimisations.
But you can use an alternative syntax to define an implicit Ordering[T] parameter and invoke methods on the Ordering object directly.
Actually even the generated bytecode for the following two methods will be identical (except for the type of the third argument, and potentially the implementation of the respective compare methods):
def withOrdering[T](x: T, y: T)(implicit cmp: Ordering[T]) = {
cmp.compare(x, y) // also supports other methods, like `cmp.lt(x, y)`
}
def withComparator[T](x: T, y: T, cmp: Comparator[T]) = {
cmp.compare(x, y)
}
In practice the runtime on my machine is the same, when invoking these methods with Int arguments.
So, if you want to compare types generically in Scala, you should usually use Ordering.
Do not do micro-tests in such way if you want to get results somehow similar you will have in production env.
First of all you need to warm-up jvm. And after that do your test as average of many iterations. Also, you need to prevent possible jvm optimizations because of const data. E.g.
def sample1[T <% Ordered[T]](x:T) = { x < (x) }
def sample2(x:Ordered[Int]) = { x < 1 }
def sample3(x:Int) = { x < 1 }
val r = new Random()
def measure(f: => Unit): Long = {
val start1 = System.nanoTime
f
System.nanoTime - start1
}
val n = 1000000
(1 to n).map(_ => measure {val k = r.nextInt();sample1(k)})
(1 to n).map(_ => measure {val k = r.nextInt();sample2(k)})
(1 to n).map(_ => measure {val k = r.nextInt();sample3(k)})
val avg1 = (1 to n).map(_ => measure {val k = r.nextInt();sample1(k)}).sum / n
println(avg1)
val avg2 = (1 to n).map(_ => measure {val k = r.nextInt();sample2(k)}).sum / n
println(avg2)
val avg3 = (1 to n).map(_ => measure {val k = r.nextInt();sample3(k)}).sum / n
println(avg3)
I got results, which look more fare for me:
134
92
83
This book could give some light on performance tests.

Refactoring a small Scala function

I have this function to compute the distance between two n-dimensional points using Pythagoras' theorem.
def computeDistance(neighbour: Point) = math.sqrt(coordinates.zip(neighbour.coordinates).map {
case (c1: Int, c2: Int) => math.pow(c1 - c2, 2)
}.sum)
The Point class (simplified) looks like:
class Point(val coordinates: List[Int])
I'm struggling to refactor the method so it's a little easier to read, can anybody help please?
Here's another way that makes the following three assumptions:
The length of the list is the number of dimensions for the point
Each List is correctly ordered, i.e. List(x, y) or List(x, y, z). We do not know how to handle List(x, z, y)
All lists are of equal length
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
)
The obvious disadvantage here is that we don't have any safety around list length. The quick fix for this is to simply have the function return an Option[Double] like so:
def computeDistance(other: Point): Option[Double] = {
if(other.coordinates.length != coordinates.length) {
return None
}
return Some(sqrt(coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
))
I'd be curious if there is a type safe way to ensure equal list length.
EDIT
It was politely pointed out to me that flatMap(x => List(foo(x))) is equivalent to map(foo) , which I forgot to refactor when I was originally playing w/ this. Slightly cleaner version w/ Map instead of flatMap :
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.map(i => pow(i._2 - i._1, 2))
.fold(0.0)(_ + _)
)
Most of your problem is that you're trying to do math with really long variable names. It's almost always painful. There's a reason why mathematicians use single letters. And assign temporary variables.
Try this:
class Point(val coordinates: List[Int]) { def c = coordinates }
import math._
def d(p: Point) = {
val delta = for ((a,b) <- (c zip p.c)) yield pow(a-b, dims)
sqrt(delta.sum)
}
Consider type aliases and case classes, like this,
type Coord = List[Int]
case class Point(val c: Coord) {
def distTo(p: Point) = {
val z = (c zip p.c).par
val pw = z.aggregate(0.0) ( (a,v) => a + math.pow( v._1-v._2, 2 ), _ + _ )
math.sqrt(pw)
}
}
so that for any two points, for instance,
val p = Point( (1 to 5).toList )
val q = Point( (2 to 6).toList )
we have that
p distTo q
res: Double = 2.23606797749979
Note method distTo uses aggregate on a parallelised collection of tuples, and combines the partial results by the last argument (summation). For high dimensional points this may prove more efficient than the sequential counterpart.
For simplicity of use, consider also implicit classes, as suggested in a comment above,
implicit class RichPoint(val c: Coord) extends AnyVal {
def distTo(d: Coord) = Point(c) distTo Point(d)
}
Hence
List(1,2,3,4,5) distTo List(2,3,4,5,6)
res: Double = 2.23606797749979

Why is it possible to assign recursive lambdas to non-lazy vals in Scala?

In the following statement the val f is defined as a lambda that references itself (it is recursive):
val f: Int => Int = (a: Int) =>
if (a > 10) 3 else f(a + 1) + 1 // just some simple function
I've tried it in the REPL, and it compiles and executes correctly.
According to the specification, this seems like an instance of illegal forward referencing:
In a statement sequence s[1]...s[n] making up a block, if a simple
name in s[i] refers to an entity defined by s[j] where j >= i,
then for all s[k] between and including s[i] and s[j],
s[k] cannot be a variable definition.
If s[k] is a value definition, it must be lazy.
The assignment is a single statement, so it satisfied the j >= i criteria, and it is included in the interval of statements the two rules apply to (between and including s[i] and s[j]).
However, it seems that it violates the second rule, because f is not lazy.
How is that a legal statement (tried it in Scala 2.9.2)?
You probably tried to use this in the REPL, which wraps all contents in an object definition. This is important because in Scala (or better: on the JVM) all instance values are initialized with a default value, which is null for all AnyRefs and 0, 0.0 or false for AnyVals. For method values this default initialization does not happen, therefore you get an error message in this case:
scala> object x { val f: Int => Int = a => if (a > 10) 3 else f(a+1)+1 }
defined object x
scala> def x { val f: Int => Int = a => if (a > 10) 3 else f(a+1)+1 }
<console>:7: error: forward reference extends over definition of value f
def x { val f: Int => Int = a => if (a > 10) 3 else f(a+1)+1 }
^
This behavior can even lead to weird situations, therefore one should be careful with recursive instance values:
scala> val m: Int = m+1
m: Int = 1
scala> val s: String = s+" x"
s: String = null x

Why can't i define a variable recursively in a code block?

Why can't i define a variable recursively in a code block?
scala> {
| val test: Stream[Int] = 1 #:: test
| }
<console>:9: error: forward reference extends over definition of value test
val test: Stream[Int] = 1 #:: test
^
scala> val test: Stream[Int] = 1 #:: test
test: Stream[Int] = Stream(1, ?)
lazy keyword solves this problem, but i can't understand why it works without a code block but throws a compilation error in a code block.
Note that in the REPL
scala> val something = "a value"
is evaluated more or less as follows:
object REPL$1 {
val something = "a value"
}
import REPL$1._
So, any val(or def, etc) is a member of an internal REPL helper object.
Now the point is that classes (and objects) allow forward references on their members:
object ForwardTest {
def x = y // val x would also compile but with a more confusing result
val y = 2
}
ForwardTest.x == 2
This is not true for vals inside a block. In a block everything must be defined in linear order. Thus vals are no members anymore but plain variables (or values, resp.). The following does not compile either:
def plainMethod = { // could as well be a simple block
def x = y
val y = 2
x
}
<console>: error: forward reference extends over definition of value y
def x = y
^
It is not recursion which makes the difference. The difference is that classes and objects allow forward references, whereas blocks do not.
I'll add that when you write:
object O {
val x = y
val y = 0
}
You are actually writing this:
object O {
val x = this.y
val y = 0
}
That little this is what is missing when you declare this stuff inside a definition.
The reason for this behavior depends on different val initialization times. If you type val x = 5 directly to the REPL, x becomes a member of an object, which values can be initialized with a default value (null, 0, 0.0, false). In contrast, values in a block can not initialized by default values.
This tends to different behavior:
scala> class X { val x = y+1; val y = 10 }
defined class X
scala> (new X).x
res17: Int = 1
scala> { val x = y+1; val y = 10; x } // compiles only with 2.9.0
res20: Int = 11
In Scala 2.10 the last example does not compile anymore. In 2.9.0 the values are reordered by the compiler to get it to compile. There is a bug report which describes the different initialization times.
I'd like to add that a Scala Worksheet in the Eclipse-based Scala-IDE (v4.0.0) does not behave like the REPL as one might expect (e.g. https://github.com/scala-ide/scala-worksheet/wiki/Getting-Started says "Worksheets are like a REPL session on steroids") in this respect, but rather like the definition of one long method: That is, forward referencing val definitions (including recursive val definitions) in a worksheet must be made members of some object or class.

Scala performance - Sieve

Right now, I am trying to learn Scala . I've started small, writing some simple algorithms . I've encountered some problems when I wanted to implement the Sieve algorithm from finding all all prime numbers lower than a certain threshold .
My implementation is:
import scala.math
object Sieve {
// Returns all prime numbers until maxNum
def getPrimes(maxNum : Int) = {
def sieve(list: List[Int], stop : Int) : List[Int] = {
list match {
case Nil => Nil
case h :: list if h <= stop => h :: sieve(list.filterNot(_ % h == 0), stop)
case _ => list
}
}
val stop : Int = math.sqrt(maxNum).toInt
sieve((2 to maxNum).toList, stop)
}
def main(args: Array[String]) = {
val ap = printf("%d ", (_:Int));
// works
getPrimes(1000).foreach(ap(_))
// works
getPrimes(100000).foreach(ap(_))
// out of memory
getPrimes(1000000).foreach(ap(_))
}
}
Unfortunately it fails when I want to computer all the prime numbers smaller than 1000000 (1 million) . I am receiving OutOfMemory .
Do you have any idea on how to optimize the code, or how can I implement this algorithm in a more elegant fashion .
PS: I've done something very similar in Haskell, and there I didn't encountered any issues .
I would go with an infinite Stream. Using a lazy data structure allows to code pretty much like in Haskell. It reads automatically more "declarative" than the code you wrote.
import Stream._
val primes = 2 #:: sieve(3)
def sieve(n: Int) : Stream[Int] =
if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
else n #:: sieve(n + 2)
def getPrimes(maxNum : Int) = primes.takeWhile(_ < maxNum)
Obviously, this isn't the most performant approach. Read The Genuine Sieve of Eratosthenes for a good explanation (it's Haskell, but not too difficult). For real big ranges you should consider the Sieve of Atkin.
The code in question is not tail recursive, so Scala cannot optimize the recursion away. Also, Haskell is non-strict by default, so you can't hardly compare it to Scala. For instance, whereas Haskell benefits from foldRight, Scala benefits from foldLeft.
There are many Scala implementations of Sieve of Eratosthenes, including some in Stack Overflow. For instance:
(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps))
The following answer is about a 100 times faster than the "one-liner" answer using a Set (and the results don't need sorting to ascending order) and is more of a functional form than the other answer using an array although it uses a mutable BitSet as a sieving array:
object SoE {
def makeSoE_Primes(top: Int): Iterator[Int] = {
val topndx = (top - 3) / 2
val nonprms = new scala.collection.mutable.BitSet(topndx + 1)
def cullp(i: Int) = {
import scala.annotation.tailrec; val p = i + i + 3
#tailrec def cull(c: Int): Unit = if (c <= topndx) { nonprms += c; cull(c + p) }
cull((p * p - 3) >>> 1)
}
(0 to (Math.sqrt(top).toInt - 3) >>> 1).filterNot { nonprms }.foreach { cullp }
Iterator.single(2) ++ (0 to topndx).filterNot { nonprms }.map { i: Int => i + i + 3 }
}
}
It can be tested by the following code:
object Main extends App {
import SoE._
val top_num = 10000000
val strt = System.nanoTime()
val count = makeSoE_Primes(top_num).size
val end = System.nanoTime()
println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
println(f"Found $count primes up to $top_num" + ".")
println("Using one large mutable1 BitSet and functional code.")
}
With the results from the the above as follows:
Successfully completed without errors. [total 294 ms]
Found 664579 primes up to 10000000.
Using one large mutable BitSet and functional code.
There is an overhead of about 40 milliseconds for even small sieve ranges, and there are various non-linear responses with increasing range as the size of the BitSet grows beyond the different CPU caches.
It looks like List isn't very effecient space wise. You can get an out of memory exception by doing something like this
1 to 2000000 toList
I "cheated" and used a mutable array. Didn't feel dirty at all.
def primesSmallerThan(n: Int): List[Int] = {
val nonprimes = Array.tabulate(n + 1)(i => i == 0 || i == 1)
val primes = new collection.mutable.ListBuffer[Int]
for (x <- nonprimes.indices if !nonprimes(x)) {
primes += x
for (y <- x * x until nonprimes.length by x if (x * x) > 0) {
nonprimes(y) = true
}
}
primes.toList
}