I am reading Functional Programming in Scala and am having trouble understanding a piece of code. I have checked the errata for the book and the passage in question does not have a misprint. (Actually, it does have a misprint, but the misprint does not affect the code that I have a question about.)
The code in question calculates a pseudo-random, non-negative integer that is less than some upper bound. The function that does this is called nonNegativeLessThan.
trait RNG {
def nextInt: (Int, RNG) // Should generate a random `Int`.
}
case class Simple(seed: Long) extends RNG {
def nextInt: (Int, RNG) = {
val newSeed = (seed * 0x5DEECE66DL + 0xBL) & 0xFFFFFFFFFFFFL // `&` is bitwise AND. We use the current seed to generate a new seed.
val nextRNG = Simple(newSeed) // The next state, which is an `RNG` instance created from the new seed.
val n = (newSeed >>> 16).toInt // `>>>` is right binary shift with zero fill. The value `n` is our new pseudo-random integer.
(n, nextRNG) // The return value is a tuple containing both a pseudo-random integer and the next `RNG` state.
}
}
type Rand[+A] = RNG => (A, RNG)
def nonNegativeInt(rng: RNG): (Int, RNG) = {
val (i, r) = rng.nextInt
(if (i < 0) -(i + 1) else i, r)
}
def nonNegativeLessThan(n: Int): Rand[Int] = { rng =>
val (i, rng2) = nonNegativeInt(rng)
val mod = i % n
if (i + (n-1) - mod >= 0) (mod, rng2)
else nonNegativeLessThan(n)(rng2)
}
I have trouble understanding the following code in nonNegativeLessThan that looks like this: if (i + (n-1) - mod >= 0) (mod, rng2), etc.
The book explains that this entire if-else expression is necessary because a naive implementation that simply takes the mod of the result of nonNegativeInt would be slightly skewed toward lower values since Int.MaxValue is not guaranteed to be a multiple of n. Therefore, this code is meant to check if the generated output of nonNegativeInt would be larger than the largest multiple of n that fits inside a 32 bit value. If the generated number is larger than the largest multiple of n that fits inside a 32 bit value, the function recalculates the pseudo-random number.
To elaborate, the naive implementation would look like this:
def naiveNonNegativeLessThan(n: Int): Rand[Int] = map(nonNegativeInt){_ % n}
where map is defined as follows
def map[A,B](s: Rand[A])(f: A => B): Rand[B] = {
rng =>
val (a, rng2) = s(rng)
(f(a), rng2)
}
To repeat, this naive implementation is not desirable because of a slight skew towards lower values when Int.MaxValue is not a perfect multiple of n.
So, to reiterate the question: what does the following code do, and how does it help us determine whether a number is smaller that the largest multiple of n that fits inside a 32 bit integer? I am talking about this code inside nonNegativeLessThan:
if (i + (n-1) - mod >= 0) (mod, rng2)
else nonNegativeLessThan(n)(rng2)
I have exactly the same confusion about this passage from the Functional Programming in Scala. And I absolutely agree with jwvh's analysis - the statement if (i + (n-1) - mod >= 0) will be always true.
In fact, if one tries the same example in Rust, the compiler warns about this (just an interesting comparison of how much static checking is being done). Of course the pencil and paper approach of jwvh is absolutely the right approach.
We first define some type aliases to make the code match closer to the Scala code (forgive my Rust if its not quite idiomatic).
pub type RNGType = Box<dyn RNG>;
pub type Rand<A> = Box<dyn Fn(RNGType) -> (A, RNGType)>;
pub fn non_negative_less_than_(n: u32) -> Rand<u32> {
let t = move |rng: RNGType| {
let (i, rng2) = non_negative_int(rng);
let rem = i % n;
if i + (n - 1) - rem >= 0 {
(rem, rng2)
} else {
non_negative_less_than(n)(rng2)
}
};
Box::new(t)
}
The compiler warning regarding if nn + (n - 1) - rem >= 0 is:
warning: comparison is useless due to type limits
Recently I had an interview for Scala Developer position. I was asked such question
// matrix 100x100 (content unimportant)
val matrix = Seq.tabulate(100, 100) { case (x, y) => x + y }
// A
for {
row <- matrix
elem <- row
} print(elem)
// B
val func = print _
for {
row <- matrix
elem <- row
} func(elem)
and the question was: Which implementation, A or B, is more efficent?
We all know that for comprehensions can be translated to
// A
matrix.foreach(row => row.foreach(elem => print(elem)))
// B
matrix.foreach(row => row.foreach(func))
B can be written as matrix.foreach(row => row.foreach(print _))
Supposedly correct answer is B, because A will create function print 100 times more.
I have checked Language Specification but still fail to understand the answer. Can somebody explain this to me?
In short:
Example A is faster in theory, in practice you shouldn't be able to measure any difference though.
Long answer:
As you already found out
for {xs <- xxs; x <- xs} f(x)
is translated to
xxs.foreach(xs => xs.foreach(x => f(x)))
This is explained in §6.19 SLS:
A for loop
for ( p <- e; p' <- e' ... ) e''
where ... is a (possibly empty) sequence of generators, definitions, or guards, is translated to
e .foreach { case p => for ( p' <- e' ... ) e'' }
Now when one writes a function literal, one gets a new instance every time the function needs to be called (§6.23 SLS). This means that
xs.foreach(x => f(x))
is equivalent to
xs.foreach(new scala.Function1 { def apply(x: T) = f(x)})
When you introduce a local function type
val g = f _; xxs.foreach(xs => xs.foreach(x => g(x)))
you are not introducing an optimization because you still pass a function literal to foreach. In fact the code is slower because the inner foreach is translated to
xs.foreach(new scala.Function1 { def apply(x: T) = g.apply(x) })
where an additional call to the apply method of g happens. Though, you can optimize when you write
val g = f _; xxs.foreach(xs => xs.foreach(g))
because the inner foreach now is translated to
xs.foreach(g())
which means that the function g itself is passed to foreach.
This would mean that B is faster in theory, because no anonymous function needs to be created each time the body of the for comprehension is executed. However, the optimization mentioned above (that the function is directly passed to foreach) is not applied on for comprehensions, because as the spec says the translation includes the creation of function literals, therefore there are always unnecessary function objects created (here I must say that the compiler could optimize that as well, but it doesn't because optimization of for comprehensions is difficult and does still not happen in 2.11). All in all it means that A is more efficient but B would be more efficient if it is written without a for comprehension (and no function literal is created for the innermost function).
Nevertheless, all of these rules can only be applied in theory, because in practice there is the backend of scalac and the JVM itself which both can do optimizations - not to mention optimizations that are done by the CPU. Furthermore your example contains a syscall that is executed on every iteration - it is probably the most expensive operation here that outweighs everything else.
I'd agree with sschaef and say that A is the more efficient option.
Looking at the generated class files we get the following anonymous functions and their apply methods:
MethodA:
anonfun$2 -- row => row.foreach(new anonfun$2$$anonfun$1)
anonfun$2$$anonfun$1 -- elem => print(elem)
i.e. matrix.foreach(row => row.foreach(elem => print(elem)))
MethodB:
anonfun$3 -- x => print(x)
anonfun$4 -- row => row.foreach(new anonfun$4$$anonfun$2)
anonfun$4$$anonfun$2 -- elem => func(elem)
i.e. matrix.foreach(row => row.foreach(elem => func(elem)))
where func is just another indirection before calling to print. In addition func needs to be looked up, i.e. through a method call on an instance (this.func()) for each row.
So for Method B, 1 extra object is created (func) and there are # of elem additional function calls.
The most efficient option would be
matrix.foreach(row => row.foreach(func))
as this has the least number of objects created and does exactly as you would expect.
Benchmark
Summary
Method A is nearly 30% faster than method B.
Link to code: https://gist.github.com/ziggystar/490f693bc39d1396ef8d
Implementation Details
I added method C (two while loops) and D (fold, sum). I also increased the size of the matrix and used an IndexedSeq instead. Also I replaced the print with something less heavy (sum all entries).
Strangely the while construct is not the fastest. But if one uses Array instead of IndexedSeq it becomes the fastest by a large margin (factor 5, no boxing anymore). Using explicitly boxed integers, methods A, B, C are all equally fast. In particular they are faster by 50% compared to the implicitly boxed versions of A, B.
Results
A
4.907797735
4.369745787
4.375195012000001
4.7421321800000005
4.35150636
B
5.955951859000001
5.925475619
5.939570085000001
5.955592247
5.939672226000001
C
5.991946029
5.960122757000001
5.970733164
6.025532582
6.04999499
D
9.278486201
9.265983922
9.228320372
9.255641645
9.22281905
verify results
999000000
999000000
999000000
999000000
>$ scala -version
Scala code runner version 2.11.0 -- Copyright 2002-2013, LAMP/EPFL
Code excerpt
val matrix = IndexedSeq.tabulate(1000, 1000) { case (x, y) => x + y }
def variantA(): Int = {
var r = 0
for {
row <- matrix
elem <- row
}{
r += elem
}
r
}
def variantB(): Int = {
var r = 0
val f = (x:Int) => r += x
for {
row <- matrix
elem <- row
} f(elem)
r
}
def variantC(): Int = {
var r = 0
var i1 = 0
while(i1 < matrix.size){
var i2 = 0
val row = matrix(i1)
while(i2 < row.size){
r += row(i2)
i2 += 1
}
i1 += 1
}
r
}
def variantD(): Int = matrix.foldLeft(0)(_ + _.sum)
I have recently been playing around on HackerRank in my down time, and am having some trouble solving this problem: https://www.hackerrank.com/challenges/functional-programming-the-sums-of-powers efficiently.
Problem statement: Given two integers X and N, find the number of ways to express X as a sum of powers of N of unique natural numbers.
Example: X = 10, N = 2
There is only one way get 10 using powers of 2 below 10, and that is 1^2 + 3^2
My Approach
I know that there probably exists a nice, elegant recurrence for this problem; but unfortunately I couldn't find one, so I started thinking about other approaches. What I decided on what that I would gather a range of numbers from [1,Z] where Z is the largest number less than X when raised to the power of N. So for the example above, I only consider [1,2,3] because 4^2 > 10 and therefore can't be a part of (positive) numbers that sum to 10. After gathering this range of numbers I raised them all to the power N then found the permutations of all subsets of this list. So for [1,2,3] I found [[1],[4],[9],[1,4],[1,9],[4,9],[1,4,9]], not a trivial series of operations for large initial ranges of numbers (my solution timed out on the final two hackerrank tests). The final step was to count the sublists that summed to X.
Solution
object Solution {
def numberOfWays(X : Int, N : Int) : Int = {
def candidates(num : Int) : List[List[Int]] = {
if( Math.pow(num, N).toInt > X )
List.range(1, num).map(
l => Math.pow(l, N).toInt
).toSet[Int].subsets.map(_.toList).toList
else
candidates(num+1)
}
candidates(1).count(l => l.sum == X)
}
def main(args: Array[String]) {
println(numberOfWays(readInt(),readInt()))
}
}
Has anyone encountered this problem before? If so, are there more elegant solutions?
After you build your list of squares you are left with what I would consider a kind of Partition Problem called the Subset Sum Problem. This is an old NP-Complete problem. So the answer to your first question is "Yes", and the answer to the second is given in the links.
This can be thought of as a dynamic programming problem. I still reason about Dynamic Programming problems imperatively, because that was how I was taught, but this can probably be made functional.
A. Make an array A of length X with type parameter Integer.
B. Iterate over i from 1 to Nth root of X. For all i, set A[i^N - 1] = 1.
C. Iterate over j from 0 until X. In an inner loop, iterate over k from 0 to (X + 1) / 2.
A[j] += A[k] * A[x - k]
D. A[X - 1]
This can be made slightly more efficient by keeping track of which indices are non-trivial, but not that much more efficient.
def numberOfWays(X: Int, N: Int): Int = {
def powerSumHelper(sum: Int, maximum: Int): Int = sum match {
case x if x < 1 => 0
case _ => {
val limit = scala.math.min(maximum, scala.math.floor(scala.math.pow(sum, 1.0 / N)).toInt)
(limit to 1 by -1).map(x => {
val y = scala.math.pow(x, N).toInt
if (y == sum) 1 else powerSumHelper(sum - y, x - 1)
}).sum
}
}
powerSumHelper(X, Integer.MAX_VALUE)
}
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Can you overload + in haskell?
Can you implement a Matrix class and an * operator that will work on two matrices?:
scala> val x = Matrix(3, 1,2,3,4,5,6)
x: Matrix =
[1.0, 2.0, 3.0]
[4.0, 5.0, 6.0]
scala> x*x.transpose
res0: Matrix =
[14.0, 32.0]
[32.0, 77.0]
and just so people don't say that it's hard, here is the Scala implementation (courtesy of Jonathan Merritt):
class Matrix(els: List[List[Double]]) {
/** elements of the matrix, stored as a list of
its rows */
val elements: List[List[Double]] = els
def nRows: Int = elements.length
def nCols: Int = if (elements.isEmpty) 0
else elements.head.length
/** all rows of the matrix must have the same
number of columns */
require(elements.forall(_.length == nCols))
/* Add to each elem of matrix */
private def addRows(a: List[Double],
b: List[Double]):
List[Double] =
List.map2(a,b)(_+_)
private def subRows(a: List[Double],
b: List[Double]):List[Double] =
List.map2(a,b)(_-_)
def +(other: Matrix): Matrix = {
require((other.nRows == nRows) &&
(other.nCols == nCols))
new Matrix(
List.map2(elements, other.elements)
(addRows(_,_))
)
}
def -(other: Matrix): Matrix = {
require((other.nRows == nRows) &&
(other.nCols == nCols))
new Matrix(
List.map2(elements, other.elements)
(subRows(_,_))
)
}
def transpose(): Matrix = new Matrix(List.transpose(elements))
private def dotVectors(a: List[Double],
b: List[Double]): Double = {
val multipliedElements =
List.map2(a,b)(_*_)
(0.0 /: multipliedElements)(_+_)
}
def *(other: Matrix): Matrix = {
require(nCols == other.nRows)
val t = other.transpose()
new Matrix(
for (row <- elements) yield {
for (otherCol <- t.elements)
yield dotVectors(row, otherCol)
}
)
override def toString(): String = {
val rowStrings =
for (row <- elements)
yield row.mkString("[", ", ", "]")
rowStrings.mkString("", "\n", "\n")
}
}
/* Matrix constructor from a bunch of numbers */
object Matrix {
def apply(nCols: Int, els: Double*):Matrix = {
def splitRowsWorker(
inList: List[Double],
working: List[List[Double]]):
List[List[Double]] =
if (inList.isEmpty)
working
else {
val (a, b) = inList.splitAt(nCols)
splitRowsWorker(b, working + a)
}
def splitRows(inList: List[Double]) =
splitRowsWorker(inList, List[List[Double]]())
val rows: List[List[Double]] =
splitRows(els.toList)
new Matrix(rows)
}
}
EDIT I understood that strictly speaking the answer is No: overloading * is not possible without side-effects of defining also a + and others or special tricks. The numeric-prelude package describes it best:
In some cases, the hierarchy is not finely-grained enough: Operations
that are often defined independently are lumped together. For
instance, in a financial application one might want a type "Dollar",
or in a graphics application one might want a type "Vector". It is
reasonable to add two Vectors or Dollars, but not, in general,
reasonable to multiply them. But the programmer is currently forced to
define a method for '(*)' when she defines a method for '(+)'.
It'll be perfectly safe with a smart constructor and stored dimensions. Of course there are no natural implementations for the operations signum and fromIntegral (or maybe a diagonal matrix would be fine for the latter).
module Matrix (Matrix(),matrix,matrixTranspose) where
import Data.List (transpose)
data Matrix a = Matrix {matrixN :: Int,
matrixM :: Int,
matrixElems :: [[a]]}
deriving (Show, Eq)
matrix :: Int -> Int -> [[a]] -> Matrix a
matrix n m vals
| length vals /= m = error "Wrong number of rows"
| any (/=n) $ map length vals = error "Column length mismatch"
| otherwise = Matrix n m vals
matrixTranspose (Matrix m n vals) = matrix n m (transpose vals)
instance Num a => Num (Matrix a) where
(+) (Matrix m n vals) (Matrix m' n' vals')
| m/=m' = error "Row number mismatch"
| n/=n' = error "Column number mismatch"
| otherwise = Matrix m n (zipWith (zipWith (+)) vals vals')
abs (Matrix m n vals) = Matrix m n (map (map abs) vals)
negate (Matrix m n vals) = Matrix m n (map (map negate) vals)
(*) (Matrix m n vals) (Matrix n' p vals')
| n/=n' = error "Matrix dimension mismatch in multiplication"
| otherwise = let tvals' = transpose vals'
dot x y = sum $ zipWith (*) x y
result = map (\col -> map (dot col) tvals') vals
in Matrix m p result
Test it in ghci:
*Matrix> let a = matrix 3 2 [[1,0,2],[-1,3,1]]
*Matrix> let b = matrix 2 3 [[3,1],[2,1],[1,0]]
*Matrix> a*b
Matrix {matrixN = 3, matrixM = 3, matrixElems = [[5,1],[4,2]]}
Since my Num instance is generic, it even works for complex matrices out of the box:
Prelude Data.Complex Matrix> let c = matrix 2 2 [[0:+1,1:+0],[5:+2,4:+3]]
Prelude Data.Complex Matrix> let a = matrix 2 2 [[0:+1,1:+0],[5:+2,4:+3]]
Prelude Data.Complex Matrix> let b = matrix 2 3 [[3:+0,1],[2,1],[1,0]]
Prelude Data.Complex Matrix> a
Matrix {matrixN = 2, matrixM = 2, matrixElems = [[0.0 :+ 1.0,1.0 :+ 0.0],[5.0 :+ 2.0,4.0 :+ 3.0]]}
Prelude Data.Complex Matrix> b
Matrix {matrixN = 2, matrixM = 3, matrixElems = [[3.0 :+ 0.0,1.0 :+ 0.0],[2.0 :+ 0.0,1.0 :+ 0.0],[1.0 :+ 0.0,0.0 :+ 0.0]]}
Prelude Data.Complex Matrix> a*b
Matrix {matrixN = 2, matrixM = 3, matrixElems = [[2.0 :+ 3.0,1.0 :+ 1.0],[23.0 :+ 12.0,9.0 :+ 5.0]]}
EDIT: new material
Oh, you want to just override the (*) function without any Num stuff. That's possible to o but you'll have to remember that the Haskell standard library has reserved (*) for use in the Num class.
module Matrix where
import qualified Prelude as P
import Prelude hiding ((*))
import Data.List (transpose)
class Multiply a where
(*) :: a -> a -> a
data Matrix a = Matrix {matrixN :: Int,
matrixM :: Int,
matrixElems :: [[a]]}
deriving (Show, Eq)
matrix :: Int -> Int -> [[a]] -> Matrix a
matrix n m vals
| length vals /= m = error "Wrong number of rows"
| any (/=n) $ map length vals = error "Column length mismatch"
| otherwise = Matrix n m vals
matrixTranspose (Matrix m n vals) = matrix n m (transpose vals)
instance P.Num a => Multiply (Matrix a) where
(*) (Matrix m n vals) (Matrix n' p vals')
| n/=n' = error "Matrix dimension mismatch in multiplication"
| otherwise = let tvals' = transpose vals'
dot x y = sum $ zipWith (P.*) x y
result = map (\col -> map (dot col) tvals') vals
in Matrix m p result
a = matrix 3 2 [[1,2,3],[4,5,6]]
b = a * matrixTranspose
Testing in ghci:
*Matrix> b
Matrix {matrixN = 3, matrixM = 3, matrixElems = [[14,32],[32,77]]}
There. Now if a third module wants to use both the Matrix version of (*) and the Prelude version of (*) it'll have to of course import one or the other qualified. But that's just business as usual.
I could've done all of this without the Multiply type class but this implementation leaves our new shiny (*) open for extension in other modules.
Alright, there's a lot of confusion about what's happening here floating around, and it's not being helped by the fact that the Haskell term "class" does not line up with the OO term "class" in any meaningful way. So let's try to make a careful answer. This answer starts with Haskell's module system.
In Haskell, when you import a module Foo.Bar, it creates a new set of bindings. For each variable x exported by the module Foo.Bar, you get a new name Foo.Bar.x. In addition, you may:
import qualified or not. If you import qualified, nothing more happens. If you do not, an additional name without the module prefix is defined; in this case, just plain old x is defined.
change the qualification prefix or not. If you import as Alias, then the name Foo.Bar.x is not defined, but the name Alias.x is.
hide certain names. If you hide name foo, then neither the plain name foo nor any qualified name (like Foo.Bar.foo or Alias.foo) is defined.
Furthermore, names may be multiply defined. For example, if Foo.Bar and Baz.Quux both export the variable x, and I import both modules without qualification, then the name x refers to both Foo.Bar.x and Baz.Quux.x. If the name x is never used in the resulting module, this clash is ignored; otherwise, a compiler error asks you to provide more qualification.
Finally, if none of your imports mention the module Prelude, the following implicit import is added:
import Prelude
This imports the Prelude without qualification, with no additional prefix, and without hiding any names. So it defines "bare" names and names prefixed by Prelude., and nothing more.
Here ends the bare basics you need to understand about the module system. Now let's discuss the bare basics you need to understand about typeclasses.
A typeclass includes a class name, a list of type variables bound by that class, and a collection of variables with type signatures that refer to the bound variables. Here's an example:
class Foo a where
foo :: a -> a -> Int
The class name is Foo, the bound type variable is a, and there is only one variable in the collection, namely foo, with type signature a -> a -> Int. This class declares that some types have a binary operation, named foo, which computes an Int. Any type may later (even in another module) be declared to be an instance of this class: this involves defining the binary operation above, where the bound type variable a is substituted with the type you are creating an instance for. As an example, we might implement this for integers by the instance:
instance Foo Int where
foo a b = (a `mod` 76) * (b + 7)
Here ends the bare basics you need to understand about typeclasses. We may now answer your question. The only reason the question is tricky is because it falls smack dab on the intersection between two name management techniques: modules and typeclasses. Below I discuss what this means for your specific question.
The module Prelude defines a typeclass named Num, which includes in its collection of variables a variable named *. Therefore, we have several options for the name *:
If the type signature we desire happens to follow the pattern a -> a -> a, for some type a, then we may implement the Num typeclass. We therefore extend the Num class with a new instance; the name Prelude.* and any aliases for this name are extended to work for the new type. For matrices, this would look like, for example,
instance Num Matrix where
m * n = {- implementation goes here -}
We may define a different name than *.
m |*| n = {- implementation goes here -}
We may define the name *. Whether this name is defined as part of a new type class or not is immaterial. If we do nothing else, there will then be at least two definitions of *, namely, the one in the current module and the one implicitly imported from the Prelude. We have a variety of ways of dealing with this. The simplest is to explicitly import the Prelude, and ask for the name * not to be defined:
import Prelude hiding ((*))
You might alternately choose to leave the implicit import of Prelude, and use a qualified * everywhere you use it. Other solutions are also possible.
The main point I want you to take away from this is: the name * is in no way special. It is just a name defined by the Prelude, and all of the tools we have available for namespace control are available.
You can implement * as matrix multiplication by defining an instance of Num class for Matrix. But the code won't be type-safe: * (and other arithmetic operations) on matrices as you define them is not total, because of size mismatch or in case of '/' non-existence of inverse matrices.
As for 'the hierarchy is not defined precisely' - there is also Monoid type class, exactly for the cases when only one operation is defined.
There are too many things to be 'added', sometimes in rather exotic ways (think of permutation groups). Haskell designers designed to reserve arithmetical operations for different representations of numbers, and use other names for more exotic cases.