/* want to read the rational code which compute ((1/2)+(2/3)). I finde this code but I have one question about that*/
object Rationals
{
val x= new Rational(1, 2) // 1/2
x.numer // *
x.denom // **
/* * and ** are my questions. why I have to use them? */
val y = new Rational(2, 3) // 2/3
x.add(y)
/* my result most be equal to 7/6 */
}
class Rational (x : Int, y : Int)
{
def numer= x
def denom= y
def add (that : Rational) =
new Rational (
numer * that.denom + that.numer * denom, /* 1*3 + 2*2 */
denom * that.denom) /* 2*2 */
override def toString = numer + "/" + denom /* 7/6 */
}
The lines:
x.numer // *
x.denom // **
aren't doing anything. They are getting computed but not used; and they don't have side-effects.
Related
The goal is to code this sum into a recursive function.
Sum
I have tried so far to code it like this.
def under(u: Int): Int = {
var i1 = u/2
var i = i1+1
if ( u/2 == 1 ) then u + 1 - 2 * 1
else (u + 1 - 2 * i) + under(u-1)
}
It seems like i am running into an issue with the recursive part but i am not able to figure out what goes wrong.
In theory, under(5) should produce 10.
Your logic is wrong. It should iterate (whether through loop, recursion or collection is irrelevant) from i=1 to i=n/2. But using n and current i as they are.
(1 to (n/2)).map(i => n + 1 - 2 * i).sum
You are (more or less) running computations from i=1 to i=n (or rather n down to 1) but instead of n you use i/2 and instead of i you use i/2+1. (sum from i=1 to i=n of (n/2 + 1 - 2 * i)).
// actually what you do is more like (1 to n).toList.reverse
// rather than (1 to n)
(1 to n).map(i => i/2 + 1 - 2 * (i/2 + 1)).sum
It's a different formula. It has twice the elements to sum, and a part of each of them is changing instead of being constant while another part has a wrong value.
To implement the same logic with recursion you would have to do something like:
// as one function with default args
// tail recursive version
def under(n: Int, i: Int = 1, sum: Int = 0): Int =
if (i > n/2) sum
else under(n, i+1, sum + (n + 2 - 2 * i))
// not tail recursive
def under(n: Int, i: Int = 1): Int =
if (i > n/2) 0
else (n + 2 - 2 * i) + under(n, i + 1)
// with nested functions without default args
def under(n: Int): Int = {
// tail recursive
def helper(i: Int, sum: Int): Int =
if (i > n/2) sum
else helper(i + 1, sum + (n + 2 - 2 * i))
helper(1, 0)
}
def under(n: Int): Int = {
// not tail recursive
def helper(i: Int): Int =
if (i > n/2) 0
else (n + 2 - 2 * i) + helper(i + 1)
helper(1)
}
As a side note: there is no need to use any iteration / recursion at all. Here is an explicit formula:
def g(n: Int) = n / 2 * (n - n / 2)
that gives the same results as
def h(n: Int) = (1 to n / 2).map(i => n + 1 - 2 * i).sum
Both assume that you want floored n / 2 in the case that n is odd, i.e. both of the functions above behave the same as
def j(n: Int) = (math.ceil(n / 2.0) * math.floor(n / 2.0)).toInt
(at least until rounding errors kick in).
I have the following method that computes the probability of a value in a DataSet:
/**
* Compute the probabilities of each value on the given [[DataSet]]
*
* #param x single colum [[DataSet]]
* #return Sequence of probabilites for each value
*/
private[this] def probs(x: DataSet[Double]): Seq[Double] = {
val counts = x.groupBy(_.doubleValue)
.reduceGroup(_.size.toDouble)
.name("X Probs")
.collect
val total = counts.sum
counts.map(_ / total)
}
The problem is that when I submit my flink job, that uses this method, its causing flink to kill the job due to a task TimeOut. I am executing this method for each attribute on a DataSet with only 40.000 instances and 9 attributes.
Is there a way I could do this code more efficient?
After a few tries, I made it work with mapPartition, this method is part of a class InformationTheory, which does some computations to calculate Entropy, mutual information etc. So, for example, SymmetricalUncertainty is computed as this:
/**
* Computes 'symmetrical uncertainty' (SU) - a symmetric mutual information measure.
*
* It is defined as SU(X, y) = 2 * (IG(X|Y) / (H(X) + H(Y)))
*
* #param xy [[DataSet]] with two features
* #return SU value
*/
def symmetricalUncertainty(xy: DataSet[(Double, Double)]): Double = {
val su = xy.mapPartitionWith {
case in ⇒
val x = in map (_._2)
val y = in map (_._1)
val mu = mutualInformation(x, y)
val Hx = entropy(x)
val Hy = entropy(y)
Some(2 * mu / (Hx + Hy))
}
su.collect.head.head
}
With this, I can compute efficiently entropy, mutual information etc. The catch is, it only works with a level of parallelism of 1, the problem resides in mapPartition.
Is there a way I could do something similar to what I am doing here with SymmetricalUncertainty, but with whatever level of parallelism?
I finally did it, don't know if its the best solution, but its working with n levels of parallelism:
def symmetricalUncertainty(xy: DataSet[(Double, Double)]): Double = {
val su = xy.reduceGroup { in ⇒
val invec = in.toVector
val x = invec map (_._2)
val y = invec map (_._1)
val mu = mutualInformation(x, y)
val Hx = entropy(x)
val Hy = entropy(y)
2 * mu / (Hx + Hy)
}
su.collect.head
}
You can check the entire code at InformationTheory.scala, and its tests InformationTheorySpec.scala
I want to compose numbers in base 3, represented as fixed-length Array[Byte].
Here are some attempts :
val byteBoard = Array.fill(9)(1.toByte)
val cache: Seq[(Int, Int)] = (0 to 8).map(i => (i, math.pow(3d, i.toDouble).toInt))
#Benchmark
def composePow(): Unit = {
val _ = (0 to 8).foldLeft(0) { case (acc, i) => acc + math.pow(3d, i.toDouble).toInt * byteBoard(i) }
}
#Benchmark
def composeCachedPowWithFold(): Unit = {
val _ = cache.foldLeft(0) { case (acc, (i, k)) => acc + k * byteBoard(i).toInt }
}
#Benchmark
def composeCachedPowWithForeach(): Unit = {
var acc = 0
cache.foreach { case (i, k) => acc = acc + k * byteBoard(i)}
}
#Benchmark
def composeUnrolled(): Unit = {
val _ = byteBoard(0) +
3 * byteBoard(1) +
3 * 3 * byteBoard(2) +
3 * 3 * 3 * byteBoard(3) +
3 * 3 * 3 * 3 * byteBoard(4) +
3 * 3 * 3 * 3 * 3 * byteBoard(5) +
3 * 3 * 3 * 3 * 3 * 3 * byteBoard(6) +
3 * 3 * 3 * 3 * 3 * 3 * 3 * byteBoard(7) +
3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * byteBoard(8)
}
Can you confirm the following conclusion :
composePow: boxing + type conversions to use math.pow
composeCachedPowWithFold: boxing because fold is a parameterized method
composeCachedPowWithForeach: no boxing, less idiomatic Scala (local mutation)
composeUnrolled: no boxing
And explain why 4. is way faster than 3. ?
PS: Here are the result of the JMH Benchmark
[info] IndexBenchmark.composeCachedPowWithFold thrpt 10 7180844,823 ± 1015310,847 ops/s
[info] IndexBenchmark.composeCachedPowWithForeach thrpt 10 14234192,613 ± 1449571,042 ops/s
[info] IndexBenchmark.composePow thrpt 10 1515312,179 ± 34892,700 ops/s
[info] IndexBenchmark.composeUnrolled thrpt 10 152297653,110 ± 2237446,053 ops/s
I mostly agree with your analysis of the cases 1,2,4, but the third variant is really funny!
I agree with you about the first two versions: foldLeft is not #specialized, so, yes, there is some boxing-unboxing. But math.pow is evil for integer arithmetic anyway, and all those conversions only incur additional overhead.
Now let's take a closer look at the third variant. It is so slow because you are constructing a closure over mutable state. Look at output of scala -print. This is what your method is rewritten into:
private def composeCachedPowWithForeach(): Unit = {
var acc: runtime.IntRef = scala.runtime.IntRef.create(0);
anon$1.this.cache().foreach({
((x0$3: Tuple2) =>
anon$1.this.
$anonfun$composeCachedPowWithForeach$1(acc, x0$3))
})
};
And here is the function used in foreach:
final <artifact> private[this] def
$anonfun$composeCachedPowWithForeach$1(
acc$1: runtime.IntRef, x0$3: Tuple2
): Unit = {
case <synthetic> val x1: Tuple2 = x0$3;
case4(){
if (x1.ne(null))
{
val i: Int = x1._1$mcI$sp();
val k: Int = x1._2$mcI$sp();
matchEnd3({
acc$1.elem = acc$1.elem.+(k.*(anon$1.this.byteBoard().apply(i)));
scala.runtime.BoxedUnit.UNIT
})
}
else
case5()
};
case5(){
matchEnd3(throw new MatchError(x1))
};
matchEnd3(x: scala.runtime.BoxedUnit){
()
}
};
You see that there is apparently lot of code generated by pattern matching. I'm not sure whether it contributes much to the overhead. What I personally find much more interesting is the runtime.IntRef part. This is an object that keeps a mutable variable that corresponds to var acc in your code. Even though it looks like a simple local variable in the code, it must be referenced from the closure somehow, and is therefore wrapped into an object, and evicted to the heap. I assume that accessing this mutable variable on the heap causes most of the overhead.
In contrast to that, if the byteBoard were passed as an argument, then nothing in the fourth variant would ever leave function's stack frame:
private def composeUnrolled(): Unit = {
val _: Int =
anon$1.this.byteBoard().apply(0).+
(3.*(anon$1.this.byteBoard().apply(1))).+
(9.*(anon$1.this.byteBoard().apply(2))).+
(27.*(anon$1.this.byteBoard().apply(3))).+
(81.*(anon$1.this.byteBoard().apply(4))).+
(243.*(anon$1.this.byteBoard().apply(5))).+
(729.*(anon$1.this.byteBoard().apply(6))).+
(2187.*(anon$1.this.byteBoard().apply(7))).+
(6561.*(anon$1.this.byteBoard().apply(8)));
()
};
There is essentially no control flow to speak of, not really any method invocations (the apply is for accessing array elements, that doesn't count), and overall it's just one very simple arithmetic operation, that might even fit into registers of your processor. That's why it is so fast.
While you are at it, you might want to benchmark these two methods:
def ternaryToInt5(bytes: Array[Byte]): Int = {
var acc = 0
val n = bytes.size
var i = n - 1
while (i >= 0) {
acc *= 3
acc += bytes(i)
i -= 1
}
acc
}
def ternaryToInt6(bytes: Array[Byte]): Int = {
bytes(0) +
3 * (bytes(1) +
3 * (bytes(2) +
3 * (bytes(3) +
3 * (bytes(4) +
3 * (bytes(5) +
3 * (bytes(6) +
3 * (bytes(7) +
3 * (bytes(8)))))))))
}
Also, if you are working with byte arrays frequently, you might find this syntactic sugar useful.
I'm currently attempting to learn functional programming (and Scala) together. I'm porting some code from a FORTRAN routine for calculating a particular modified normalisation of the associated Legendre polynomials. My direct imperative translation of the original code is modpLgndr1 (which I have checked against the original algorithm). My initial attempt at writing the code in a functional form is in modpLgdnr2.
import math.pow, abs, sqrt, Pi
def xfact(m: Int): Double = {
if (m <= 1) 1.0
else {
if (m % 2 == 1) m.toDouble / sqrt(m.toDouble) * xfact(m - 1)
else 1.0 / sqrt(m.toDouble) * xfact(m - 1)
}
}
//this is a very un-scala function....
def modpLgndr1(l: Int, m: Int, x: Double): Double = {
assert(0 <= m && m <= l && abs(x) <= 1.0)
val dl = l.toDouble
val dm = m.toDouble
val norm = sqrt(2.0 * dl + 1.0) / sqrt(4.0 * Pi)
var pmm = norm
if (m != 0) pmm = (pow(-1, m)).toDouble * pmm * xfact(2 * m) * pow((1.0-x * x), (dm / 2.0))
if (l == m) pmm
else {
var pmmp1 = x * pmm * sqrt(2.0 * m + 1.0)
if (l == m + 1) pmmp1
else {
var pll = 0.0
var dll = 0.0
for (ll <- m + 2 to l) {
dll = ll.toDouble
pll = (x * (2.0 * dll - 1.0) * pmmp1 - sqrt(pow((dll - 1.0), 2.0) - dm * dm) * pmm) / sqrt(pow(dll, 2.0) - pow(dm, 2.0))
pmm = pmmp1
pmmp1 = pll
}
pll
}
}
}
def modpLgndr2(l: Int, m: Int, x: Double): Double = {
assert(0 <= m && m <= l && abs(x) <= 1.0)
val dl = l.toDouble
val dm = m.toDouble
val norm = sqrt(2.0 * dl + 1.0) / sqrt(4.0 * Pi)
val pmm = if (m == 0) norm else (pow(-1, m)).toDouble * norm * xfact(2 * m) * pow((1.0-x * x), (dm / 2.0))
if (l == m) pmm
else {
val pmmp1 = x * pmm * sqrt(2.0 * m + 1.0)
if (l == m + 1) pmmp1
else {
def mplacc(ll: Int, acc1: Double, acc2: Double): Double = {
val dll = ll.toDouble
val pll = (x * (2.0 * dll - 1.0) * acc2 - sqrt(pow((dll - 1.0), 2.0) - dm * dm) * acc1) / sqrt(pow(dll, 2.0) - pow(dm, 2.0))
if (ll == m + 2) pll
else mplacc(ll - 1, acc2, pll)
}
mplacc(l, pmm, pmmp1)
}
}
}
If I call the two functions I get output like this:
scala> for (i <- 0 to 10) println(modpLgndr1(10,i,0.2))
0.16685408398957746
-0.2769345073769805
-0.1575129272628402
0.2948210515201088
0.12578847877176355
-0.3292975894931367
-0.058267280378036426
0.37448134558730417
-0.08024600262585084
-0.40389602261165075
0.4424459249420354
scala> for (i <- 0 to 10) println(modpLgndr2(10,i,0.2))
0.16685408398957752
-0.2772969351441124
-0.1578618478786792
0.29654926805696474
0.1349402872678466
-0.33707342609134694
-0.06901634276825179
0.38912154672892657
-0.08024600262585084
-0.40389602261165075
0.4424459249420354
Essentially, for m = 0, l-2, l-1, l the code agrees; otherwise there is a significant discrepancy. That seems to tell me that the problem is in calling the mplacc function. To me, mplacc just looks like a recursive form of the for loop in modpLgndr1. Why am I wrong?
I'm trying to understand the add method in below class taken from book 'Programming in Scala - Second Edition'.
Is this correct :
The method 'add' takes defines an operator which of type Rational. I don't know what is occurring within the new Rational :
numer * that.denom + that.numer * denom,
denom * that.denom
How are numer & denom be assigned here ? , Why is each expression separated by a comma ?
Entire class :
class Rational(n: Int , d: Int) {
require(d != 0)
private val g = gcd(n.abs, d.abs)
val numer = n / g
val denom = d/ g
def this(n: Int) = this(n, 1)
def add(that: Rational): Rational =
new Rational(
numer * that.denom + that.numer * denom,
denom * that.denom
)
override def toString = numer +"/" + denom
private def gcd(a: Int, b: Int): Int =
if(b == 0) a else gcd(b, a % b)
}
That two expressions are the arguments to the Rational constructor, thus they will be the private vals n and d respectively. For example
class C(a: Int, b: Int)
val c1 = new C(1, 2) // here 1, 2 is like the two expressions you mention
The add method implements rational number addition:
a c a*d c*b a*d + c*b
--- + --- = ----- + ----- = -----------
b d b*d b*d b*d
where
a = numer
b = denom
c = that.numer
d = that.denom