Resources on managing, phasing, composing monads (in Scala or Haskell) - scala

I am looking for resources discussing good practice for composing monads. My most pressing problem is that I'm writing a system that is making use of a series of state monads over different state types, and it seems as if the best way to handle this situation is to just create one big product type (perhaps prettied up as a record/class) encompassing all the components I'm interested in even though phase 1 is not interested in component B, and phase 2 is only interested in component A.1.
I'd appreciate pointers to nice discussions of alternatives when writing code in this sort of area. My own code-base is in Scala, but I'm happy to read discussions about the same issues in Haskell.

It's a bit challenging to use stacks of StateT because it becomes confusing as to which layer you're talking to when you write get or put. If you use an explicit stack in transformers style then you have to use a bunch of lifts and if you use the class-based method of mtl you get stuck entirely.
-- using transformers explicit stack style
type Z a = StateT Int (StateT String IO) a
go :: Z ()
go = do int <- get
str <- lift get
replicateM int (liftIO $ putStrLn str)
We might want to avoid that mess with an explicit product type of states. Since we end up with functions from our product state to each individual component it's easy to get into those individual components using gets
data ZState = ZState { int :: Int, str :: String }
type Z a = StateT ZState IO a
go :: Z ()
go = do i <- gets int
s <- gets str
replicateM i (liftIO $ putStrLn s)
But this might be considered ugly still for two reasons: (1) put and modification in general doesn't have anywhere nearly as nice a story and (2) we can't easily look at the type of a function which only impacts, say, the int state and know that it doesn't touch str. We'd much prefer to maintain that type-ensured modularity.
If you're lens-savvy there's a solution called zoom
-- the real type is MUCH more general
zoom :: Lens' mother child -> StateT child m a -> StateT mother m a
which "lifts" a stateful computation on a subpart of some larger state space up to the entire state space. Or, pragmatically, we use it like this:
data ZState = ZState { _int :: Int, _str :: String }
makeLenses ''ZState
type Z = StateT ZState IO a
inc :: MonadState Int m => m ()
inc = modify (+1)
yell :: MonadState String m => m ()
yell = modify (map toUpper)
go :: Z ()
go = do zoom int $ do inc
inc
inc
zoom str yell
i <- use int
s <- use str
replicateM i (liftIO $ putStrLn s)
And now most of the problems should have vanished—we can zoom in to isolate stateful operations which only depend upon a subset of the total state like inc and yell and determine their isolation in their type. We also can still get inner state components using use.
More than this, zoom can be used to zoom in on state buried deep inside various tranformer stacks. The fully general type works just fine in a situation like this
type Z a = EitherT String (ListT (StateT ZState IO)) a
>>> :t zoom int :: EitherT String (ListT (StateT Int IO)) a -> Z a
zoom int :: EitherT String (ListT (StateT Int IO)) a -> Z a
But while this is really nice, the fully general zoom requires some heavy trickery and you can only zoom over some transformer layers. (It's today heavily unclear to me how you add that functionality to your own layer, though presumably it's possible.)

Related

Set of functions that are instances in a common way

I'm pretty new to haskell and I think I'm falling into some OO traps. Here's a sketch of a structure (simplified) that I'm having trouble implementing:
A concept of an Observable that acts on a list of samples (Int) to produce a result (Int)
A concept SimpleObservable that achieves the result using a certain pattern (while there will be Observables that do it other ways), e.g. something like an average
A function instance, e.g. one that's just an average times a constant
My first thought was to use a subclass; something like (the below is kinda contrived but hopefully gets the idea across)
class Observable a where
estimate :: a -> [Int] -> Int
class (Observable a) => SimpleObservable a where
compute :: a -> Int -> Int
simpleEstimate :: a -> [Int] -> Int
simpleEstimate obs list = sum $ map compute list
data AveConst = AveConst Int
instance Observable AveConst where
estimate = simpleEstimate
instance SimpleObservable AveConst where
compute (AveConst c) x = c * x
However, even if something like the above compiles it's ugly. Googling tells me DefaultSignatures might help in that I don't have to do estimate = simpleEstimate for each instance but from discussions around it it seems doing it this way would be an antipattern.
Another option would be to have no subclass, but something like (with the same Observable class):
data AveConst = AveConst Int
instance Observable AveConst where
estimate (AveConst c) list = sum $ map (*c) list
But this way I'm not sure how to reuse the pattern; each Observable has to contain the complete estimate definition and there will be code repetition.
A third way is a type with a function field:
data SimpleObservable = SimpleObservable {
compute :: [Int] -> Int
}
instance Observable SimpleObservable where
estimate obs list =
sum $ map (compute obs) list
aveConst :: Int -> SimpleObservable
aveConst c = SimpleObservable {
compute = (*c)
}
But I'm not sure this is idiomatic either. Any advice?
I propose going even simpler:
type Observable = [Int] -> Int
Then, an averaging observable is:
average :: Observable
average ns = sum ns `div` length ns
If your Observable needs some data inside -- say, a constant to multiply by -- no problem; that's what closures are for. For example:
sumTimesConst :: Int -> Observable
sumTimesConst c = sum . map (c*)
You can abstract over the construction of Observables without trouble; e.g. if you want a SimpleObservable which only looks at elements, and then sums, you can:
type SimpleObservable = Int -> Int
timesConst :: Int -> SimpleObservable
timesConst = (*)
liftSimple :: SimpleObservable -> Observable
liftSimple f = sum . map f
Then liftSimple . timesConst is another perfectly fine way to spell sumTimesConst.
...but honestly, I'd feel dirty doing any of the above things. sum . map (c*) is a perfectly readable expression without introducing a questionable new name for its type.
I do not fully understand the question yet but I'll edit this answer as I learn more.
Something which acts on a list and produces a result can simply be a function. The interface (that is, the type) of this function can be [a] -> b. This says the function accepts a list of elements of some type and returns a result of a possibly different type.
Now, lets invent a small problem as an example. I want to take a list of lists, some function on lists which produces a number, apply this function to every list, and return the average of the numbers.
average :: (Fractional b) => ([a] -> b) -> [[a]] -> b
average f xs = sum (fmap f xs) / genericLength xs
For example, average genericLength will tell me the average length of the sub-lists. I do not need to define any type classes or new types. Simply, I use the function type [a] -> b for those functions which map a list to some result.

Efficient implementation of Catamorphisms in Scala

For a datatype representing the natural numbers:
sealed trait Nat
case object Z extends Nat
case class S(pred: Nat) extends Nat
In Scala, here is an elementary way of implementing the corresponding catamorphism:
def cata[A](z: A)(l: Nat)(f: A => A): A = l match {
case Z => z
case S(xs) => f( cata(z)(xs)(f) )
}
However, since the recursive call to cata isn't in tail position, this can easily trigger a stack overflow.
What are alternative implementation options that will avoid this? I'd rather not go down the route of F-algebras unless the interface ultimately presented by the code can look pretty much as simple as the above.
EDIT: Looks like this might be directly relevant: Is it possible to use continuations to make foldRight tail recursive?
If you were implementing a catamorphism on lists, that would be what in Haskell we call a foldr. We know that foldr does not have a tail-recursive definition, but foldl does. So if you insist on a tail-recusive program, the right thing to do is reverse the list argument (tail-recursively, in linear time), then use a foldl in place of the foldr.
Your example uses the simpler data type of naturals (and a truly "efficient" implementation would use machine integers, but we'll agree to leave that aside). What is the reverse of one of your natural numbers? Just the number itself, because we can think of it as a list with no data in each node, so we can't tell the difference when it is reversed! And what's the equivalent of the foldl? It's the program (forgive the pseudocode)
def cata(z, a, f) = {
var x = a, y = z;
while (x != Z) {
y = f(y);
x = pred(x)
}
return y
}
Or as a Scala tail-recursion,
def cata[A](z: A)(a: Nat)(f: A => A): A = a match {
case Z => z
case S(b) => cata( f(z) )(b)(f)
}
Will that do?
Yes, this is exactly the motivating example in the paper Clowns to the left of me, jokers to the right
(Dissecting Data Structures) (updated, better, but non-free version here http://dl.acm.org/citation.cfm?id=1328474).
The basic idea is that you want to turn your recursive function into a loop, so you need to figure out a data structure that keeps track of the state of the procedure, which is
What you've calculuated so far
What you have left to do.
The type of this state depends on the structure of the type you're doing the fold over, at any point in the fold you are at some node of the tree and you need to remember the tree structure of "the rest of the tree".
The paper shows how you can calculate that state type mechanically. If you do this for Lists, you get that the state you need to keep track of is
The operation run on all the previous values.
The list of elements left to process.
Which is exactly what foldl keeps track of, so it's kind of a coincidence that foldl and foldr can be given the same type.

Is Scala idiomatic coding style just a cool trap for writing inefficient code?

I sense that the Scala community has a little big obsession with writing "concise", "cool", "scala idiomatic", "one-liner" -if possible- code. This is immediately followed by a comparison to Java/imperative/ugly code.
While this (sometimes) leads to easy to understand code, it also leads to inefficient code for 99% of developers. And this is where Java/C++ is not easy to beat.
Consider this simple problem: Given a list of integers, remove the greatest element. Ordering does not need to be preserved.
Here is my version of the solution (It may not be the greatest, but it's what the average non-rockstar developer would do).
def removeMaxCool(xs: List[Int]) = {
val maxIndex = xs.indexOf(xs.max);
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.
Here is my totally uncool, Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.
def removeMaxFast(xs: List[Int]) = {
var res = ArrayBuffer[Int]()
var max = xs.head
var first = true;
for (x <- xs) {
if (first) {
first = false;
} else {
if (x > max) {
res.append(max)
max = x
} else {
res.append(x)
}
}
}
res.toList
}
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge
obstacle to cross for greater Scala adoption. Is there a way out of this trap?
I am looking for practical advice to avoid such "inefficiency traps" while keeping implementation clear ans concise.
Clarification: This question comes from a real-life scenario: I had to write a complex algorithm. First I wrote it in Scala, then I "had to" rewrite it in Java. The Java implementation was twice as long, and not that clear, but at the same time it was twice as fast. Rewriting the Scala code to be efficient would probably take some time and a somewhat deeper understanding of scala internal efficiencies (for vs. map vs. fold, etc)
Let's discuss a fallacy in the question:
So, if 99% of Java developers write more efficient code than 99% of
Scala developers, this is a huge obstacle to cross for greater Scala
adoption. Is there a way out of this trap?
This is presumed, with absolutely no evidence backing it up. If false, the question is moot.
Is there evidence to the contrary? Well, let's consider the question itself -- it doesn't prove anything, but shows things are not that clear.
Totally non-Scala idiomatic, non-functional, non-concise, but it's
very efficient. It traverses the list only once!
Of the four claims in the first sentence, the first three are true, and the fourth, as shown by user unknown, is false! And why it is false? Because, contrary to what the second sentence states, it traverses the list more than once.
The code calls the following methods on it:
res.append(max)
res.append(x)
and
res.toList
Let's consider first append.
append takes a vararg parameter. That means max and x are first encapsulated into a sequence of some type (a WrappedArray, in fact), and then passed as parameter. A better method would have been +=.
Ok, append calls ++=, which delegates to +=. But, first, it calls ensureSize, which is the second mistake (+= calls that too -- ++= just optimizes that for multiple elements). Because an Array is a fixed size collection, which means that, at each resize, the whole Array must be copied!
So let's consider this. When you resize, Java first clears the memory by storing 0 in each element, then Scala copies each element of the previous array over to the new array. Since size doubles each time, this happens log(n) times, with the number of elements being copied increasing each time it happens.
Take for example n = 16. It does this four times, copying 1, 2, 4 and 8 elements respectively. Since Java has to clear each of these arrays, and each element must be read and written, each element copied represents 4 traversals of an element. Adding all we have (n - 1) * 4, or, roughly, 4 traversals of the complete list. If you count read and write as a single pass, as people often erroneously do, then it's still three traversals.
One can improve on this by initializing the ArrayBuffer with an initial size equal to the list that will be read, minus one, since we'll be discarding one element. To get this size, we need to traverse the list once, though.
Now let's consider toList. To put it simply, it traverses the whole list to create a new list.
So, we have 1 traversal for the algorithm, 3 or 4 traversals for resize, and 1 additional traversal for toList. That's 4 or 5 traversals.
The original algorithm is a bit difficult to analyse, because take, drop and ::: traverse a variable number of elements. Adding all together, however, it does the equivalent of 3 traversals. If splitAt was used, it would be reduced to 2 traversals. With 2 more traversals to get the maximum, we get 5 traversals -- the same number as the non-functional, non-concise algorithm!
So, let's consider improvements.
On the imperative algorithm, if one uses ListBuffer and +=, then all methods are constant-time, which reduces it to a single traversal.
On the functional algorithm, it could be rewritten as:
val max = xs.max
val (before, _ :: after) = xs span (max !=)
before ::: after
That reduces it to a worst case of three traversals. Of course, there are other alternatives presented, based on recursion or fold, that solve it in one traversal.
And, most interesting of all, all of these algorithms are O(n), and the only one which almost incurred (accidentally) in worst complexity was the imperative one (because of array copying). On the other hand, the cache characteristics of the imperative one might well make it faster, because the data is contiguous in memory. That, however, is unrelated to either big-Oh or functional vs imperative, and it is just a matter of the data structures that were chosen.
So, if we actually go to the trouble of benchmarking, analyzing the results, considering performance of methods, and looking into ways of optimizing it, then we can find faster ways to do this in an imperative manner than in a functional manner.
But all this effort is very different from saying the average Java programmer code will be faster than the average Scala programmer code -- if the question is an example, that is simply false. And even discounting the question, we have seen no evidence that the fundamental premise of the question is true.
EDIT
First, let me restate my point, because it seems I wasn't clear. My point is that the code the average Java programmer writes may seem to be more efficient, but actually isn't. Or, put another way, traditional Java style doesn't gain you performance -- only hard work does, be it Java or Scala.
Next, I have a benchmark and results too, including almost all solutions suggested. Two interesting points about it:
Depending on list size, the creation of objects can have a bigger impact than multiple traversals of the list. The original functional code by Adrian takes advantage of the fact that lists are persistent data structures by not copying the elements right of the maximum element at all. If a Vector was used instead, both left and right sides would be mostly unchanged, which might lead to even better performance.
Even though user unknown and paradigmatic have similar recursive solutions, paradigmatic's is way faster. The reason for that is that he avoids pattern matching. Pattern matching can be really slow.
The benchmark code is here, and the results are here.
def removeOneMax (xs: List [Int]) : List [Int] = xs match {
case x :: Nil => Nil
case a :: b :: xs => if (a < b) a :: removeOneMax (b :: xs) else b :: removeOneMax (a :: xs)
case Nil => Nil
}
Here is a recursive method, which only iterates once. If you need performance, you have to think about it, if not, not.
You can make it tail-recursive in the standard way: giving an extra parameter carry, which is per default the empty List, and collects the result while iterating. That is, of course, a bit longer, but if you need performance, you have to pay for it:
import annotation.tailrec
#tailrec
def removeOneMax (xs: List [Int], carry: List [Int] = List.empty) : List [Int] = xs match {
case a :: b :: xs => if (a < b) removeOneMax (b :: xs, a :: carry) else removeOneMax (a :: xs, b :: carry)
case x :: Nil => carry
case Nil => Nil
}
I don't know what the chances are, that later compilers will improve slower map-calls to be as fast as while-loops. However: You rarely need high speed solutions, but if you need them often, you will learn them fast.
Do you know how big your collection has to be, to use a whole second for your solution on your machine?
As oneliner, similar to Daniel C. Sobrals solution:
((Nil : List[Int], xs(0)) /: xs.tail) ((p, x)=> if (p._2 > x) (x :: p._1, p._2) else ((p._2 :: p._1), x))._1
but that is hard to read, and I didn't measure the effective performance. The normal pattern is (x /: xs) ((a, b) => /* something */). Here, x and a are pairs of List-so-far and max-so-far, which solves the problem to bring everything into one line of code, but isn't very readable. However, you can earn reputation on CodeGolf this way, and maybe someone likes to make a performance measurement.
And now to our big surprise, some measurements:
An updated timing-method, to get the garbage collection out of the way, and have the hotspot-compiler warm up, a main, and many methods from this thread, together in an Object named
object PerfRemMax {
def timed (name: String, xs: List [Int]) (f: List [Int] => List [Int]) = {
val a = System.currentTimeMillis
val res = f (xs)
val z = System.currentTimeMillis
val delta = z-a
println (name + ": " + (delta / 1000.0))
res
}
def main (args: Array [String]) : Unit = {
val n = args(0).toInt
val funs : List [(String, List[Int] => List[Int])] = List (
"indexOf/take-drop" -> adrian1 _,
"arraybuf" -> adrian2 _, /* out of memory */
"paradigmatic1" -> pm1 _, /**/
"paradigmatic2" -> pm2 _,
// "match" -> uu1 _, /*oom*/
"tailrec match" -> uu2 _,
"foldLeft" -> uu3 _,
"buf-=buf.max" -> soc1 _,
"for/yield" -> soc2 _,
"splitAt" -> daniel1,
"ListBuffer" -> daniel2
)
val r = util.Random
val xs = (for (x <- 1 to n) yield r.nextInt (n)).toList
// With 1 Mio. as param, it starts with 100 000, 200k, 300k, ... 1Mio. cases.
// a) warmup
// b) look, where the process gets linear to size
funs.foreach (f => {
(1 to 10) foreach (i => {
timed (f._1, xs.take (n/10 * i)) (f._2)
compat.Platform.collectGarbage
});
println ()
})
}
I renamed all the methods, and had to modify uu2 a bit, to fit to the common method declaration (List [Int] => List [Int]).
From the long result, i only provide the output for 1M invocations:
scala -Dserver PerfRemMax 2000000
indexOf/take-drop: 0.882
arraybuf: 1.681
paradigmatic1: 0.55
paradigmatic2: 1.13
tailrec match: 0.812
foldLeft: 1.054
buf-=buf.max: 1.185
for/yield: 0.725
splitAt: 1.127
ListBuffer: 0.61
The numbers aren't completly stable, depending on the sample size, and a bit varying from run to run. For example, for 100k to 1M runs, in steps of 100k, the timing for splitAt was as follows:
splitAt: 0.109
splitAt: 0.118
splitAt: 0.129
splitAt: 0.139
splitAt: 0.157
splitAt: 0.166
splitAt: 0.749
splitAt: 0.752
splitAt: 1.444
splitAt: 1.127
The initial solution is already pretty fast. splitAt is a modification from Daniel, often faster, but not always.
The measurement was done on a single core 2Ghz Centrino, running xUbuntu Linux, Scala-2.8 with Sun-Java-1.6 (desktop).
The two lessons for me are:
always measure your performance improvements; it is very hard to estimate it, if you don't do it on a daily basis
it is not only fun, to write functional code - sometimes the result is even faster
Here is a link to my benchmarkcode, if somebody is interested.
First of all, the behavior of the methods you presented is not the same. The first one keeps the element ordering, while the second one doesn't.
Second, among all the possible solution which could be qualified as "idiomatic", some are more efficient than others. Staying very close to your example, you can for instance use tail-recursion to eliminate variables and manual state management:
def removeMax1( xs: List[Int] ) = {
def rec( max: Int, rest: List[Int], result: List[Int]): List[Int] = {
if( rest.isEmpty ) result
else if( rest.head > max ) rec( rest.head, rest.tail, max :: result)
else rec( max, rest.tail, rest.head :: result )
}
rec( xs.head, xs.tail, List() )
}
or fold the list:
def removeMax2( xs: List[Int] ) = {
val result = xs.tail.foldLeft( xs.head -> List[Int]() ) {
(acc,x) =>
val (max,res) = acc
if( x > max ) x -> ( max :: res )
else max -> ( x :: res )
}
result._2
}
If you want to keep the original insertion order, you can (at the expense of having two passes, rather than one) without any effort write something like:
def removeMax3( xs: List[Int] ) = {
val max = xs.max
xs.filterNot( _ == max )
}
which is more clear than your first example.
The biggest inefficiency when you're writing a program is worrying about the wrong things. This is usually the wrong thing to worry about. Why?
Developer time is generally much more expensive than CPU time — in fact, there is usually a dearth of the former and a surplus of the latter.
Most code does not need to be very efficient because it will never be running on million-item datasets multiple times every second.
Most code does need to bug free, and less code is less room for bugs to hide.
The example you gave is not very functional, actually. Here's what you are doing:
// Given a list of Int
def removeMaxCool(xs: List[Int]): List[Int] = {
// Find the index of the biggest Int
val maxIndex = xs.indexOf(xs.max);
// Then take the ints before and after it, and then concatenate then
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
Mind you, it is not bad, but you know when functional code is at its best when it describes what you want, instead of how you want it. As a minor criticism, if you used splitAt instead of take and drop you could improve it slightly.
Another way of doing it is this:
def removeMaxCool(xs: List[Int]): List[Int] = {
// the result is the folding of the tail over the head
// and an empty list
xs.tail.foldLeft(xs.head -> List[Int]()) {
// Where the accumulated list is increased by the
// lesser of the current element and the accumulated
// element, and the accumulated element is the maximum between them
case ((max, ys), x) =>
if (x > max) (x, max :: ys)
else (max, x :: ys)
// and of which we return only the accumulated list
}._2
}
Now, let's discuss the main issue. Is this code slower than the Java one? Most certainly! Is the Java code slower than a C equivalent? You can bet it is, JIT or no JIT. And if you write it directly in assembler, you can make it even faster!
But the cost of that speed is that you get more bugs, you spend more time trying to understand the code to debug it, and you have less visibility of what the overall program is doing as opposed to what a little piece of code is doing -- which might result in performance problems of its own.
So my answer is simple: if you think the speed penalty of programming in Scala is not worth the gains it brings, you should program in assembler. If you think I'm being radical, then I counter that you just chose the familiar as being the "ideal" trade off.
Do I think performance doesn't matter? Not at all! I think one of the main advantages of Scala is leveraging gains often found in dynamically typed languages with the performance of a statically typed language! Performance matters, algorithm complexity matters a lot, and constant costs matters too.
But, whenever there is a choice between performance and readability and maintainability, the latter is preferable. Sure, if performance must be improved, then there isn't a choice: you have to sacrifice something to it. And if there's no lost in readability/maintainability -- such as Scala vs dynamically typed languages -- sure, go for performance.
Lastly, to gain performance out of functional programming you have to know functional algorithms and data structures. Sure, 99% of Java programmers with 5-10 years experience will beat the performance of 99% of Scala programmers with 6 months experience. The same was true for imperative programming vs object oriented programming a couple of decades ago, and history shows it didn't matter.
EDIT
As a side note, your "fast" algorithm suffer from a serious problem: you use ArrayBuffer. That collection does not have constant time append, and has linear time toList. If you use ListBuffer instead, you get constant time append and toList.
For reference, here's how splitAt is defined in TraversableLike in the Scala standard library,
def splitAt(n: Int): (Repr, Repr) = {
val l, r = newBuilder
l.sizeHintBounded(n, this)
if (n >= 0) r.sizeHint(this, -n)
var i = 0
for (x <- this) {
(if (i < n) l else r) += x
i += 1
}
(l.result, r.result)
}
It's not unlike your example code of what a Java programmer might come up with.
I like Scala because, where performance matters, mutability is a reasonable way to go. The collections library is a great example; especially how it hides this mutability behind a functional interface.
Where performance isn't as important, such as some application code, the higher order functions in Scala's library allow great expressivity and programmer efficiency.
Out of curiosity, I picked an arbitrary large file in the Scala compiler (scala.tools.nsc.typechecker.Typers.scala) and counted something like 37 for loops, 11 while loops, 6 concatenations (++), and 1 fold (it happens to be a foldRight).
What about this?
def removeMax(xs: List[Int]) = {
val buf = xs.toBuffer
buf -= (buf.max)
}
A bit more ugly, but faster:
def removeMax(xs: List[Int]) = {
var max = xs.head
for ( x <- xs.tail )
yield {
if (x > max) { val result = max; max = x; result}
else x
}
}
Try this:
(myList.foldLeft((List[Int](), None: Option[Int]))) {
case ((_, None), x) => (List(), Some(x))
case ((Nil, Some(m), x) => (List(Math.min(x, m)), Some(Math.max(x, m))
case ((l, Some(m), x) => (Math.min(x, m) :: l, Some(Math.max(x, m))
})._1
Idiomatic, functional, traverses only once. Maybe somewhat cryptic if you are not used to functional-programming idioms.
Let's try to explain what is happening here. I will try to make it as simple as possible, lacking some rigor.
A fold is an operation on a List[A] (that is, a list that contains elements of type A) that will take an initial state s0: S (that is, an instance of a type S) and a function f: (S, A) => S (that is, a function that takes the current state and an element from the list, and gives the next state, ie, it updates the state according to the next element).
The operation will then iterate over the elements of the list, using each one to update the state according to the given function. In Java, it would be something like:
interface Function<T, R> { R apply(T t); }
class Pair<A, B> { ... }
<State> State fold(List<A> list, State s0, Function<Pair<A, State>, State> f) {
State s = s0;
for (A a: list) {
s = f.apply(new Pair<A, State>(a, s));
}
return s;
}
For example, if you want to add all the elements of a List[Int], the state would be the partial sum, that would have to be initialized to 0, and the new state produced by a function would simply add the current state to the current element being processed:
myList.fold(0)((partialSum, element) => partialSum + element)
Try to write a fold to multiply the elements of a list, then another one to find extreme values (max, min).
Now, the fold presented above is a bit more complex, since the state is composed of the new list being created along with the maximum element found so far. The function that updates the state is more or less straightforward once you grasp these concepts. It simply puts into the new list the minimum between the current maximum and the current element, while the other value goes to the current maximum of the updated state.
What is a bit more complex than to understand this (if you have no FP background) is to come up with this solution. However, this is only to show you that it exists, can be done. It's just a completely different mindset.
EDIT: As you see, the first and second case in the solution I proposed are used to setup the fold. It is equivalent to what you see in other answers when they do xs.tail.fold((xs.head, ...)) {...}. Note that the solutions proposed until now using xs.tail/xs.head don't cover the case in which xs is List(), and will throw an exception. The solution above will return List() instead. Since you didn't specify the behavior of the function on empty lists, both are valid.
Another option would be:
package code.array
object SliceArrays {
def main(args: Array[String]): Unit = {
println(removeMaxCool(Vector(1,2,3,100,12,23,44)))
}
def removeMaxCool(xs: Vector[Int]) = xs.filter(_ < xs.max)
}
Using Vector instead of List, the reason is that Vector is more versatile and has a better general performance and time complexity if compared to List.
Consider the following collections operations:
head, tail, apply, update, prepend, append
Vector takes an amortized constant time for all operations, as per Scala docs:
"The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys"
While List takes constant time only for head, tail and prepend operations.
Using
scalac -print
generates:
package code.array {
object SliceArrays extends Object {
def main(args: Array[String]): Unit = scala.Predef.println(SliceArrays.this.removeMaxCool(scala.`package`.Vector().apply(scala.Predef.wrapIntArray(Array[Int]{1, 2, 3, 100, 12, 23, 44})).$asInstanceOf[scala.collection.immutable.Vector]()));
def removeMaxCool(xs: scala.collection.immutable.Vector): scala.collection.immutable.Vector = xs.filter({
((x$1: Int) => SliceArrays.this.$anonfun$removeMaxCool$1(xs, x$1))
}).$asInstanceOf[scala.collection.immutable.Vector]();
final <artifact> private[this] def $anonfun$removeMaxCool$1(xs$1: scala.collection.immutable.Vector, x$1: Int): Boolean = x$1.<(scala.Int.unbox(xs$1.max(scala.math.Ordering$Int)));
def <init>(): code.array.SliceArrays.type = {
SliceArrays.super.<init>();
()
}
}
}
Another contender. This uses a ListBuffer, like Daniel's second offering, but shares the post-max tail of the original list, avoiding copying it.
def shareTail(xs: List[Int]): List[Int] = {
var res = ListBuffer[Int]()
var maxTail = xs
var first = true;
var x = xs
while ( x != Nil ) {
if (x.head > maxTail.head) {
while (!(maxTail.head == x.head)) {
res += maxTail.head
maxTail = maxTail.tail
}
}
x = x.tail
}
res.prependToList(maxTail.tail)
}

Defining a function a -> String, which works for types without Show?

I'd like to define a function which can "show" values of any type, with special behavior for types which actually do define a Show instance:
magicShowCast :: ?
debugShow :: a -> String
debugShow x = case magicShowCast x of
Just x' -> show x'
Nothing -> "<unprintable>"
This would be used to add more detailed information to error messages when something goes wrong:
-- needs to work on non-Showable types
assertEq :: Eq a => a -> a -> IO ()
assertEq x y = when (x /= y)
(throwIO (AssertionFailed (debugShow x) (debugShow y)))
data CanShow = CanShow1
| CanShow 2
deriving (Eq, Show)
data NoShow = NoShow1
| NoShow2
deriving (Eq)
-- AssertionFailed "CanShow1" "CanShow2"
assertEq CanShow1 CanShow2
-- AssertionFailed "<unprintable>" "<unprintable>"
assertEq NoShow1 NoShow2
Is there any way to do this? I tried using various combinations of GADTs, existential types, and template haskell, but either these aren't enough or I can't figure out how to apply them properly.
The real answer: You can't. Haskell intentionally doesn't define a generic "serialize to string" function, and being able to do so without some type class constraint would violate parametricity all over town. Dreadful, just dreadful.
If you don't see why this poses a problem, consider the following type signature:
something :: (a, a) -> b -> a
How would you implement this function? The generic type means it has to be either const . fst or const . snd, right? Hmm.
something (x,y) z = if debugShow z == debugShow y then y else x
> something ('a', 'b') ()
'a'
> something ('a', 'b') 'b'
'b'
Oooooooops! So much for being able to reason about your program in any sane way. That's it, show's over, go home, it was fun while it lasted.
The terrible, no good, unwise answer: Sure, if you don't mind shamelessly cheating. Did I mention that example above was an actual GHCi session? Ha, ha.
import Control.Exception
import Control.Monad
import GHC.Vacuum
debugShow :: a -> String
debugShow = show . nameGraph . vacuumLazy
assertEq :: Eq a => a -> a -> IO ()
assertEq x y = when (x /= y) . throwIO . AssertionFailed $
unlines ["assertEq failed:", '\t':debugShow x, "=/=", '\t':debugShow y]
data NoShow = NoShow1
| NoShow2
deriving (Eq)
> assertEq NoShow1 NoShow2
*** Exception: assertEq failed:
[("|0",["NoShow1|1"]),("NoShow1|1",[])]
=/=
[("|0",["NoShow2|1"]),("NoShow2|1",[])]
Oh. Ok. That looks like a fantastic idea, doesn't it.
Anyway, this doesn't quite give you what you want, since there's no obvious way to fall back to a proper Show instance when available. On the other hand, this lets you do a lot more than show can do, so perhaps it's a wash.
Seriously, though. Don't do this in actual, working code. Ok, error reporting, debugging, logging... that makes sense. But otherwise, it's probably very ill-advised.
I asked this question a while ago on the haskell-cafe list, and the experts said no. Here are some good responses,
http://www.haskell.org/pipermail/haskell-cafe/2011-May/091744.html
http://www.haskell.org/pipermail/haskell-cafe/2011-May/091746.html
The second one mentions GHC advanced overlap, but my experience was that it doesn't really work.
For your particular problem, I'd introduce a typeclass
class MaybeShow a where mshow :: a -> String
make anything that is showable do the logical thing
instance Show a => MaybeShow a where mshow = show
and then, if you have a fixed number of types which wouldn't be showable, say
instance MaybeShow NotShowableA where mshow _ = "<unprintable>"
of course you could abstract it a little,
class NotShowable a
instance NotShowable a => MaybeShow a where mshow _ = "<unprintable>"
instance NotShowable NotShowableA -- etc.
You shouldn't be able to. The simplest way to implement type classes is by having them compile into an extra parameter
foo :: Show s => a -> s
turns into
foo :: show -> a -> s
The program just passess around the type class instances (like v-tables in C++) as ordinary data. This is why you can trivially use things that look not just like multiple dispatch in OO languages, but can dispatch off the return type.
The problem is that a signature
foo :: a -> String
has no way of getting the implementation of Show that goes for a in cases when it has one.
You might be able to get something like this to work in particular implementations, with the correct language extensions (overlapping instance, etc) on, but I havent tried it
class MyShow a where
myShow :: a -> String
instance (Show a) => MyShow a where
myShow = show
instance MyShow a where
myShow = ...
one trick that might help is enable type families. It can let you write code like
instance (a' ~ a, Show a') => MyShow a
which can sometimes help you get code past the compiler that it doesn't think looks okay.

How to get the value of a Maybe in Haskell

I'm relatively new to Haskell and began to read "Real World Haskell".
I Just stumbled over the type Maybe and have a question about how to receive the actual value from a Just 1 for example.
I have written the following code:
combine a b c = (eliminate a, eliminate b, eliminate c)
where eliminate (Just a) = a
eliminate Nothing = 0
This works fine if I use:
combine (Just 1) Nothing (Just 2)
But if I change, for example, 1 to a String it doesn't work.
I think I know why: because eliminate has to give back one type, which is, in this case, an Int. But how can I change eliminate to deal at least with Strings (or maybe with all kind of types)?
From the standard Prelude,
maybe :: b -> (a -> b) -> Maybe a -> b
maybe n _ Nothing = n
maybe _ f (Just x) = f x
Given a default value, and a function, apply the function to the value in the Maybe or return the default value.
Your eliminate could be written maybe 0 id, e.g. apply the identity function, or return 0.
From the standard Data.Maybe,
fromJust :: Maybe a -> a
fromJust Nothing = error "Maybe.fromJust: Nothing"
fromJust (Just x) = x
This is a partial function (does not return a value for every input, as opposed to a total function, which does), but extracts the value when possible.
[edit from Author, 6 years later] This is a needlessly long answer, and I'm not sure why it was accepted. Use maybe or Data.Maybe.fromMaybe as suggested in the highest upvoted answer. What follows is more of a thought experiment rather than practical advice.
So you're trying to create a function that works for a bunch of different types. This is a good time to make a class. If you've programmed in Java or C++, a class in Haskell is kind of like an interface in those languages.
class Nothingish a where
nada :: a
This class defines a value nada, which is supposed to be the class's equivalent of Nothing. Now the fun part: making instances of this class!
instance Nothingish (Maybe a) where
nada = Nothing
For a value of type Maybe a, the Nothing-like value is, well, Nothing! This will be a weird example in a minute. But before then, let's make lists an instance of this class too.
instance Nothingish [a] where
nada = []
An empty list is kind of like Nothing, right? So for a String (which is a list of Char), it will return the empty string, "".
Numbers are also an easy implementation. You've already indicated that 0 obviously represents "Nothingness" for numbers.
instance (Num a) => Nothingish a where
nada = 0
This one will actually not work unless you put a special line at the top of your file
{-# LANGUAGE FlexibleInstances, UndecidableInstances, OverlappingInstances #-}
Or when you compile it you can set the flags for these language pragmas. Don't worry about them, they're just magic that makes more stuff work.
So now you've got this class and these instances of it...now let's just re-write your function to use them!
eliminate :: (Nothingish a) => Maybe a -> a
eliminate (Just a) = a
eliminate Nothing = nada
Notice I only changed 0 to nada, and the rest is the same. Let's give it a spin!
ghci> eliminate (Just 2)
2
ghci> eliminate (Just "foo")
"foo"
ghci> eliminate (Just (Just 3))
Just 3
ghci> eliminate (Just Nothing)
Nothing
ghci> :t eliminate
eliminate :: (Nothingish t) => Maybe t -> t
ghci> eliminate Nothing
error! blah blah blah...**Ambiguous type variable**
Looks great for values and stuff. Notice the (Just Nothing) turns into Nothing, see? That was a weird example, a Maybe in a Maybe. Anyways...what about eliminate Nothing? Well, the resultant type is ambiguous. It doesn't know what we are expecting. So we have to tell it what type we want.
ghci> eliminate Nothing :: Int
0
Go ahead and try it out for other types; you'll see it gets nada for each one. So now, when you use this function with your combine function, you get this:
ghci> let combine a b c = (eliminate a, eliminate b, eliminate c)
ghci> combine (Just 2) (Just "foo") (Just (Just 3))
(2,"foo",Just 3)
ghci> combine (Just 2) Nothing (Just 4)
error! blah blah Ambiguous Type blah blah
Notice you still have to indicate what type your "Nothing" is, or indicate what return type you expect.
ghci> combine (Just 2) (Nothing :: Maybe Int) (Just 4)
(2,0,4)
ghci> combine (Just 2) Nothing (Just 4) :: (Int, Int, Int)
(2,0,4)
Or, you could restrict the types that your function allows by putting its type signature explicitly in the source. This makes sense if the logical use of the function would be that it is only used with parameters of the same type.
combine :: (Nothingish a) => Maybe a -> Maybe a -> Maybe a -> (a,a,a)
combine a b c = (eliminate a, eliminate b, eliminate c)
Now it only works if all three Maybe things are the same type. That way, it will infer that the Nothing is the same type as the others.
ghci> combine (Just 2) Nothing (Just 4)
(2,0,4)
No ambiguity, yay! But now it is an error to mix and match, like we did before.
ghci> combine (Just 2) (Just "foo") (Just (Just 3))
error! blah blah Couldn't match expected type blah blah
blah blah blah against inferred type blah blah
Well, I think that was a sufficiently long and overblown answer. Enjoy.
I'm new to Haskell too, so I don't know if this exists in the platform yet (I'm sure it does), but how about a "get or else" function to get a value if it exists, else return a default?
getOrElse::Maybe a -> a -> a
getOrElse (Just v) d = v
getOrElse Nothing d = d
This is the answer I was looking for when I came to this question:
https://hackage.haskell.org/package/base-4.9.0.0/docs/Data-Maybe.html#v:fromJust
...and similarly, for Either:
https://hackage.haskell.org/package/either-unwrap-1.1/docs/Data-Either-Unwrap.html
They provide functions I would have written myself which unwrap the value from its context.
The eliminate function's type signature is:
eliminate :: Maybe Int -> Int
That's because it returns 0 on Nothing, forcing the compiler to assume that a :: Int in your eliminate function. Hence, the compiler deduces the type signature of the combine function to be:
combine :: Maybe Int -> Maybe Int -> Maybe Int -> (Int, Int, Int)
and that's precisely why it doesn't work when you pass a String to it.
If you wrote it as:
combine a b c = (eliminate a, eliminate b, eliminate c)
where eliminate (Just a) = a
eliminate Nothing = undefined
then it would have worked with String or with any other type. The reason relies on the fact that undefined :: a, which makes eliminate polymorphic and applicable to types other than Int.
Of course, that's not the aim of your code, i.e., to make the combine function total.
Indeed, even if an application of combine to some Nothing arguments would succeed (that's because Haskell is lazy by default), as soon as you try to evaluate the results you would get a runtime error as undefined can't be evaluated to something useful (to put it in simple terms).