Scala equivalent of Haskell's do-notation (yet again) - scala

I know that Haskell's
do
x <- [1, 2, 3]
y <- [7, 8, 9]
let z = (x + y)
return z
can be expressed in Scala as
for {
x <- List(1, 2, 3)
y <- List(7, 8, 9)
z = x + y
} yield z
But, especially with monads, Haskell often has statements inside the do block that don't correspond to either <- or =. For example, here's some code from Pandoc that uses Parsec to parse something from a string.
-- | Parse contents of 'str' using 'parser' and return result.
parseFromString :: GenParser tok st a -> [tok] -> GenParser tok st a
parseFromString parser str = do
oldPos <- getPosition
oldInput <- getInput
setInput str
result <- parser
setInput oldInput
setPosition oldPos
return result
As you can see, it saves the position and input, runs the parser on the string, and then restores the input and position before returning the result.
I can't for the life of me figure out how to translate setInput str, setInput oldInput, and setPosition oldPos into Scala. I think it would work if I just put nonsense variables in so I could use <-, like
for {
oldPos <- getPosition
oldInput <- getInput
whyAmIHere <- setInput str
result <- parser
...
} yield result
but I'm not sure that's the case and, if it is correct, I'm sure that there must be a better way to do this.
Oh, and if you can answer this question, can you answer one more: how long do I have to stare at Monads before they don't feel like black magic? :-)
Thanks!
Todd

Yes, that translation is valid.
do { x <- m; n } is equivalent to m >>= \x -> n, and do { m; n } is equivalent to m >> n. Since m >> n is defined as m >>= \_ -> n (where _ means "don't bind this value to anything"), that is indeed a valid translation; do { m; n } is the same as do { _ <- m; n }, or do { unusedVariable <- m; n }.
A statement without a variable binding in a do block simply disregards the result, usually because there's no meaningful result to speak of. For instance, there's nothing interesting to do with the result of putStrLn "Hello, world!", so you wouldn't bind its result to a variable.
(As for monads being black magic, the best realisation you can have is that they're not really complicated at all; trying to find deeper meaning in them is not generally a productive way of learning how they work. They're simply an interface to compose computations that happen to be particularly common. I recommend reading the Typeclassopedia to get a solid grasp on Haskell's abstract typeclasses, though you'll need to have read a general Haskell introduction to get much out of it.)

Related

Swift-like "guard-let" in Scala

In Swift, one can do something like:
guard let x = z else {return} // z is "String?"
In this simple case, if z (an optional) is empty, the function will exit.
I really like this structure and recently started developing in Scala, and I was wondering if there's an equivalent in Scala.
What I found is this:
val x = if (z.isEmpty) return else z.get // z is "Option[String]"
It works - but I still wonder if there's a more "Scala-ish" way of doing it.
EDIT: the use case is that a configuration argument is allowed to be null in the app, but this specific function has nothing to do in that case. I'm looking for the correct way to avoid calling code that will not work.
Scala is functional so using return to break out early is almost always a bad idea (partly because return doesn't work the way it might appear).
In this case there are a couple of choices.
Using map allows the value inside an Option to be processed if it is there:
val xOpt = z.map { x =>
// process x
}
xOpt is a new Option that contains the results of the processing, or None if z was already None.
The alternative is to use a default value if z is None and then use that.
val x = z.getOrElse(defaultX)
// process x
In this case x is a bare value not an Option.
Admittedly I'm not familiar with Swift, but since you're not returning a value, this is presumably in a function being called for side-effect.
In Scala, the natural equivalent for "perform this side-effect if and only if this Option is non-empty is foreach:
z.foreach { x =>
// code that uses x
}
There is syntactic sugar around this, the above is equivalent to
for (
x <- z
) { // the absence of yield is significant here
// code that uses x
}
If z is empty, the foreach does nothing, so the "code that uses x would be everything in the function after the guard-let.
It might be that there's an implicit null being returned in the early exit case (it's hard to think of another value that would make sense) if this is a function being called for value. In that case, the natural equivalent would be map:
z.map { x =>
// code that computes a value from x
} // passes through None if z is empty
The for sugar would be
for (
x <- z
) yield {
// code that uses x
}
As above, "code that uses x" means everything after the guard-let.
If you happened to have two nullable arguments like
guard let x = z else { return }
guard let w = y else { return }
You could nest these or have multiple <-'s in your for expression:
for {
x <- z
w <- y
} yield {
// code that uses x and w
}
Alternatively (and my personal preference), you can use zip:
z.zip(y).foreach {
case (x, w) =>
// code that uses x and w
}
// or in the called-for-value case
z.zip(y).map {
case (x, w) =>
// code that uses x and w
}.headOption // zip creates an Iterable, so bring it back to Option
For comprehensions are similar:
def f(z: Option[String]) = for {
x <- z // x is String
result = x + " okay"
} yield result
This is useful when you have several short-circuiting operations, but the downside (compared to Swift) is you can't use other control structures such as if without some difficulty.
However, for in Scala can be used for more than just Option, e.g. to process lists:
def products(xs: Vector[Int], ys: Vector[Int]) = for {
x <- xs
y <- ys
} yield x * y // Result is a Vector of each x multiplied by each y
It is important to understand that Option is a monad. Being a monad means that it supports map and flatMap and also many other collection operations like foreach. Being a monad, and specifically supporting map, flatMap, and foreach also means that Option can be used as a generator in a for-comprehension.
The best way of thinking about Option is as a collection that can hold either 0 or 1 values. (In fact, Option does inherit from IterableOnce, and it implements several of the methods of Iterable, so you can not just think of Option as a collection, it simply is a collection.) A lot of the knowledge you have about collections transfers directly to Option, and Option supports the most important of the standard Scala collections operations, including the afore-mentioned monad operations.
So, what happens if you iterate over an empty collection? Well, nothing! What happens if you map over an empty collection? Well, you get back an empty collection!
So, the correct way of working of Options is actually not to try and get the value out of it, but instead to treat the Option as a collection.
So, instead of trying to get x out of the Option and then manipulating x, you can manipulate the value inside of the Option.
E.g. if you want to print it:
z.foreach(println)
// or
for (x <- z) println(x)
Iterating over an empty collection doesn't do anything, so this will either do nothing (if the Option is empty) or will print the value.
If you want to downcase it:
val x = z.map(_.toLowerCase())
// or
val x = for (x <- z) yield x.toLowerCase()
Note that in my case, x is an Option[String] (i.e. either a Some[String] or None) whereas in your case, x is a String, i.e. in my case, we are staying in the land of Options and don't leave.
Only at the very boundary of your system should you try and extract the value out of the Option, and the best way to do that is the method getOrElse, which does exactly what it sounds like: it gets the value out of the option if there is one, otherwise it returns the default value that you supplied.

What does >>= mean in purescript?

I was reading the purescript wiki and found following section which explains do in terms of >>=.
What does >>= mean?
Do notation
The do keyword introduces simple syntactic sugar for monadic
expressions.
Here is an example, using the monad for the Maybe type:
maybeSum :: Maybe Number -> Maybe Number -> Maybe Number
maybeSum a b = do
n <- a
m <- b
let result = n + m
return result
maybeSum takes two
values of type Maybe Number and returns their sum if neither number is
Nothing.
When using do notation, there must be a corresponding
instance of the Monad type class for the return type. Statements can
have the following form:
a <- x which desugars to x >>= \a -> ...
x which desugars to x >>= \_ -> ... or just x if this is the last statement.
A let binding let a = x. Note the lack of the in keyword.
The example maybeSum desugars to ::
maybeSum a b =
a >>= \n ->
b >>= \m ->
let result = n + m
in return result
>>= is a function, nothing more. It resides in the Prelude module and has type (>>=) :: forall m a b. (Bind m) => m a -> (a -> m b) -> m b, being an alias for the bind function of the Bind type class. You can find the definitions of the Prelude module in this link, found in the Pursuit package index.
This is closely related to the Monad type class in Haskell, which is a bit easier to find resources. There's a famous question on SO about this concept, which is a good starting point if you're looking to improve your knowledge on the bind function (if you're starting on functional programming now, you can skip it for a while).

For comprehension and number of function creation

Recently I had an interview for Scala Developer position. I was asked such question
// matrix 100x100 (content unimportant)
val matrix = Seq.tabulate(100, 100) { case (x, y) => x + y }
// A
for {
row <- matrix
elem <- row
} print(elem)
// B
val func = print _
for {
row <- matrix
elem <- row
} func(elem)
and the question was: Which implementation, A or B, is more efficent?
We all know that for comprehensions can be translated to
// A
matrix.foreach(row => row.foreach(elem => print(elem)))
// B
matrix.foreach(row => row.foreach(func))
B can be written as matrix.foreach(row => row.foreach(print _))
Supposedly correct answer is B, because A will create function print 100 times more.
I have checked Language Specification but still fail to understand the answer. Can somebody explain this to me?
In short:
Example A is faster in theory, in practice you shouldn't be able to measure any difference though.
Long answer:
As you already found out
for {xs <- xxs; x <- xs} f(x)
is translated to
xxs.foreach(xs => xs.foreach(x => f(x)))
This is explained in §6.19 SLS:
A for loop
for ( p <- e; p' <- e' ... ) e''
where ... is a (possibly empty) sequence of generators, definitions, or guards, is translated to
e .foreach { case p => for ( p' <- e' ... ) e'' }
Now when one writes a function literal, one gets a new instance every time the function needs to be called (§6.23 SLS). This means that
xs.foreach(x => f(x))
is equivalent to
xs.foreach(new scala.Function1 { def apply(x: T) = f(x)})
When you introduce a local function type
val g = f _; xxs.foreach(xs => xs.foreach(x => g(x)))
you are not introducing an optimization because you still pass a function literal to foreach. In fact the code is slower because the inner foreach is translated to
xs.foreach(new scala.Function1 { def apply(x: T) = g.apply(x) })
where an additional call to the apply method of g happens. Though, you can optimize when you write
val g = f _; xxs.foreach(xs => xs.foreach(g))
because the inner foreach now is translated to
xs.foreach(g())
which means that the function g itself is passed to foreach.
This would mean that B is faster in theory, because no anonymous function needs to be created each time the body of the for comprehension is executed. However, the optimization mentioned above (that the function is directly passed to foreach) is not applied on for comprehensions, because as the spec says the translation includes the creation of function literals, therefore there are always unnecessary function objects created (here I must say that the compiler could optimize that as well, but it doesn't because optimization of for comprehensions is difficult and does still not happen in 2.11). All in all it means that A is more efficient but B would be more efficient if it is written without a for comprehension (and no function literal is created for the innermost function).
Nevertheless, all of these rules can only be applied in theory, because in practice there is the backend of scalac and the JVM itself which both can do optimizations - not to mention optimizations that are done by the CPU. Furthermore your example contains a syscall that is executed on every iteration - it is probably the most expensive operation here that outweighs everything else.
I'd agree with sschaef and say that A is the more efficient option.
Looking at the generated class files we get the following anonymous functions and their apply methods:
MethodA:
anonfun$2 -- row => row.foreach(new anonfun$2$$anonfun$1)
anonfun$2$$anonfun$1 -- elem => print(elem)
i.e. matrix.foreach(row => row.foreach(elem => print(elem)))
MethodB:
anonfun$3 -- x => print(x)
anonfun$4 -- row => row.foreach(new anonfun$4$$anonfun$2)
anonfun$4$$anonfun$2 -- elem => func(elem)
i.e. matrix.foreach(row => row.foreach(elem => func(elem)))
where func is just another indirection before calling to print. In addition func needs to be looked up, i.e. through a method call on an instance (this.func()) for each row.
So for Method B, 1 extra object is created (func) and there are # of elem additional function calls.
The most efficient option would be
matrix.foreach(row => row.foreach(func))
as this has the least number of objects created and does exactly as you would expect.
Benchmark
Summary
Method A is nearly 30% faster than method B.
Link to code: https://gist.github.com/ziggystar/490f693bc39d1396ef8d
Implementation Details
I added method C (two while loops) and D (fold, sum). I also increased the size of the matrix and used an IndexedSeq instead. Also I replaced the print with something less heavy (sum all entries).
Strangely the while construct is not the fastest. But if one uses Array instead of IndexedSeq it becomes the fastest by a large margin (factor 5, no boxing anymore). Using explicitly boxed integers, methods A, B, C are all equally fast. In particular they are faster by 50% compared to the implicitly boxed versions of A, B.
Results
A
4.907797735
4.369745787
4.375195012000001
4.7421321800000005
4.35150636
B
5.955951859000001
5.925475619
5.939570085000001
5.955592247
5.939672226000001
C
5.991946029
5.960122757000001
5.970733164
6.025532582
6.04999499
D
9.278486201
9.265983922
9.228320372
9.255641645
9.22281905
verify results
999000000
999000000
999000000
999000000
>$ scala -version
Scala code runner version 2.11.0 -- Copyright 2002-2013, LAMP/EPFL
Code excerpt
val matrix = IndexedSeq.tabulate(1000, 1000) { case (x, y) => x + y }
def variantA(): Int = {
var r = 0
for {
row <- matrix
elem <- row
}{
r += elem
}
r
}
def variantB(): Int = {
var r = 0
val f = (x:Int) => r += x
for {
row <- matrix
elem <- row
} f(elem)
r
}
def variantC(): Int = {
var r = 0
var i1 = 0
while(i1 < matrix.size){
var i2 = 0
val row = matrix(i1)
while(i2 < row.size){
r += row(i2)
i2 += 1
}
i1 += 1
}
r
}
def variantD(): Int = matrix.foldLeft(0)(_ + _.sum)

How to concisely express function iteration?

Is there a concise, idiomatic way how to express function iteration? That is, given a number n and a function f :: a -> a, I'd like to express \x -> f(...(f(x))...) where f is applied n-times.
Of course, I could make my own, recursive function for that, but I'd be interested if there is a way to express it shortly using existing tools or libraries.
So far, I have these ideas:
\n f x -> foldr (const f) x [1..n]
\n -> appEndo . mconcat . replicate n . Endo
but they all use intermediate lists, and aren't very concise.
The shortest one I found so far uses semigroups:
\n f -> appEndo . times1p (n - 1) . Endo,
but it works only for positive numbers (not for 0).
Primarily I'm focused on solutions in Haskell, but I'd be also interested in Scala solutions or even other functional languages.
Because Haskell is influenced by mathematics so much, the definition from the Wikipedia page you've linked to almost directly translates to the language.
Just check this out:
Now in Haskell:
iterateF 0 _ = id
iterateF n f = f . iterateF (n - 1) f
Pretty neat, huh?
So what is this? It's a typical recursion pattern. And how do Haskellers usually treat that? We treat that with folds! So after refactoring we end up with the following translation:
iterateF :: Int -> (a -> a) -> (a -> a)
iterateF n f = foldr (.) id (replicate n f)
or point-free, if you prefer:
iterateF :: Int -> (a -> a) -> (a -> a)
iterateF n = foldr (.) id . replicate n
As you see, there is no notion of the subject function's arguments both in the Wikipedia definition and in the solutions presented here. It is a function on another function, i.e. the subject function is being treated as a value. This is a higher level approach to a problem than implementation involving arguments of the subject function.
Now, concerning your worries about the intermediate lists. From the source code perspective this solution turns out to be very similar to a Scala solution posted by #jmcejuela, but there's a key difference that GHC optimizer throws away the intermediate list entirely, turning the function into a simple recursive loop over the subject function. I don't think it could be optimized any better.
To comfortably inspect the intermediate compiler results for yourself, I recommend to use ghc-core.
In Scala:
Function chain Seq.fill(n)(f)
See scaladoc for Function. Lazy version: Function chain Stream.fill(n)(f)
Although this is not as concise as jmcejuela's answer (which I prefer), there is another way in scala to express such a function without the Function module. It also works when n = 0.
def iterate[T](f: T=>T, n: Int) = (x: T) => (1 to n).foldLeft(x)((res, n) => f(res))
To overcome the creation of a list, one can use explicit recursion, which in reverse requires more static typing.
def iterate[T](f: T=>T, n: Int): T=>T = (x: T) => (if(n == 0) x else iterate(f, n-1)(f(x)))
There is an equivalent solution using pattern matching like the solution in Haskell:
def iterate[T](f: T=>T, n: Int): T=>T = (x: T) => n match {
case 0 => x
case _ => iterate(f, n-1)(f(x))
}
Finally, I prefer the short way of writing it in Caml, where there is no need to define the types of the variables at all.
let iterate f n x = match n with 0->x | n->iterate f (n-1) x;;
let f5 = iterate f 5 in ...
I like pigworker's/tauli's ideas the best, but since they only gave it as a comments, I'm making a CW answer out of it.
\n f x -> iterate f x !! n
or
\n f -> (!! n) . iterate f
perhaps even:
\n -> ((!! n) .) . iterate

how to approach implementing TCO'ed recursion

I have been looking into recursion and TCO. It seems that TCO can make the code verbose and also impact the performance. e.g. I have implemented the code which takes in 7 digit phone number and gives back all possible permutation of words e.g. 464-7328 can be "GMGPDAS ... IMGREAT ... IOIRFCU" Here is the code.
/*Generate the alphabet table*/
val alphabet = (for (ch <- 'a' to 'z') yield ch.toString).toList
/*Given the number, return the possible alphabet List of String(Instead of Char for convenience)*/
def getChars(num : Int) : List[String] = {
if (num > 1) return List[String](alphabet((num - 2) * 3), alphabet((num - 2) * 3 + 1), alphabet((num - 2) * 3 + 2))
List[String](num.toString)
}
/*Recursion without TCO*/
def getTelWords(input : List[Int]) : List[String] = {
if (input.length == 1) return getChars(input.head)
getChars(input.head).foldLeft(List[String]()) {
(l, ch) => getTelWords(input.tail).foldLeft(List[String]()) { (ll, x) => ch + x :: ll } ++ l
}
}
It is short and I don't have to spend too much time on this. However when I try to do that in tail call recursion to get it TCO'ed. I have to spend a considerable amount of time and The code become very verbose. I won't be posing the whole code to save space. Here is a link to git repo link. It is for sure that quite a lot of you can write better and concise tail recursive code than mine. I still believe that in general TCO is more verbose (e.g. Factorial and Fibonacci tail call recursion has extra parameter, accumulator.) Yet, TCO is needed to prevent the stack overflow. I would like to know how you would approach TCO and recursion. The Scheme implementation of Akermann with TCO in this thread epitomize my problem statement.
Is it possible that you're using the term "tail call optimization", when in fact you really either mean writing a function in iterative recursive style, or continuation passing style, so that all the recursive calls are tail calls?
Implementing TCO is the job of a language implementer; one paper that talks about how it can be done efficiently is the classic Lambda: the Ultimate GOTO paper.
Tail call optimization is something that your language's evaluator will do for you. Your question, on the other hand, sounds like you are asking how to express functions in a particular style so that the program's shape allows your evaluator to perform tail call optimization.
As sclv mentioned in the comments, tail recursion is pointless for this example in Haskell. A simple implementation of your problem can be written succinctly and efficiently using the list monad.
import Data.Char
getChars n | n > 1 = [chr (ord 'a' + 3*(n-2)+i) | i <- [0..2]]
| otherwise = ""
getTelNum = mapM getChars
As said by others, I would not be worried about tail call for this case, as it does not recurse very deeply (length of the input) compared to the size of the output. You should be out of memory (or patience) before you are out of stack
I would implement probably implement with something like
def getTelWords(input: List[Int]): List[String] = input match {
case Nil => List("")
case x :: xs => {
val heads = getChars(x)
val tails = getTelWords(xs)
for(c <- heads; cs <- tails) yield c + cs
}
}
If you insist on a tail recursive one, that might be based on
def helper(reversedPrefixes: List[String], input: List[Int]): List[String]
= input match {
case Nil => reversedPrefixes.map(_.reverse)
case (x :: xs) => helper(
for(c <- getChars(x); rp <- reversedPrefixes) yield c + rp,
xs)
}
(the actual routine should call helper(List(""), input))