What is meant by parallelism in this case?

What is meant by parallelism in this case? - scala

In this blog post about parallel collections in Scala http://beust.com/weblog/2011/08/15/scalas-parallel-collections/
This is mentioned on a comment by Daniel Spiewak:
Other people have already commented about the mkString example, so I’m
going to leave it alone. It does reflect a much larger point about
collection semantics in general. Basically, this is it: in the absence
of side-effects (in user code), parallel collections have the
exact same semantics as the sequential collections. Put another way:
forAll { (xs: Vector[A], f: A => B) =>
(xs.par map f) == (xs map f)
}
Does this mean that if there are no side effects parallelism is not achieved? If this is true can this point be expanded to explain why this the case?

Does this mean that if there are no side effects parallelism is not
achieved?
No, that's not what it means. When Daniel Spiewak says that
Basically, this is it: in the absence of side-effects (in user code),
parallel collections have the exact same semantics as the sequential
collections.
It means that if your function has no side-effects then using it to map over a simple collection or a parallel collection will yield the same outcome. Which is why:
Put another way: forAll { (xs: Vector[A], f: A => B) => (xs.par map f)
== (xs map f) }
if f is side-effect free.
So, it's actually the opposite: if there are side effects parallelism is not a good idea since the outcome will be inconsistent.

It will always be run in parallel, but the result may differ when side-effects are there.
Let's say A = Int and B = Int with following code:
var tmp = false
def f(i: Int) = if(!tmp) {tmp = true; 0} else i + 1
So here we have a function f with a side-effect. I assume that tmp is false before running the code.
Running Vector(1,2,3).map(f) will always result in Vector(0,3,4)
But Vector(1,2,3).par.map(f) can have different results. It can be Vector(0,3,4) but since its parallel, the second element maybe is mapped first etc. So something like this might happen Vector(2,0,4).
You can be sure that the result will be the same, when in this case f would not have side-effects.

Related

How to implement memoization in Scala without mutability?

I was recently reading Category Theory for Programmers and in one of the challenges, Bartosz proposed to write a function called memoize which takes a function as an argument and returns the same one with the difference that, the first time this new function is called, it stores the result of the argument and then returns this result each time it is called again.
def memoize[A, B](f: A => B): A => B = ???
The problem is, I can't think of any way to implement this function without resorting to mutability. Moreover, the implementations I have seen uses mutable data structures to accomplish the task.
My question is, is there a purely functional way of accomplishing this? Maybe without mutability or by using some functional trick?
Thanks for reading my question and for any future help. Have a nice day!

is there a purely functional way of accomplishing this?
No. Not in the narrowest sense of pure functions and using the given signature.
TLDR: Use mutable collections, it's okay!
Impurity of g
val g = memoize(f)
// state 1
g(a)
// state 2
What would you expect to happen for the call g(a)?
If g(a) memoizes the result, an (internal) state has to change, so the state is different after the call g(a) than before.
As this could be observed from the outside, the call to g has side effects, which makes your program impure.
From the Book you referenced, 2.5 Pure and Dirty Functions:
[...] functions that
always produce the same result given the same input and
have no side effects
are called pure functions.
Is this really a side effect?
Normally, at least in Scala, internal state changes are not considered side effects.
See the definition in the Scala Book
A pure function is a function that depends only on its declared inputs and its internal algorithm to produce its output. It does not read any other values from “the outside world” — the world outside of the function’s scope — and it does not modify any values in the outside world.
The following examples of lazy computations both change their internal states, but are normally still considered purely functional as they always yield the same result and have no side effects apart from internal state:
lazy val x = 1
// state 1: x is not computed
x
// state 2: x is 1
val ll = LazyList.continually(0)
// state 1: ll = LazyList(<not computed>)
ll(0)
// state 2: ll = LazyList(0, <not computed>)
In your case, the equivalent would be something using a private, mutable Map (as the implementations you may have found) like:
def memoize[A, B](f: A => B): A => B = {
val cache = mutable.Map.empty[A, B]
(a: A) => cache.getOrElseUpdate(a, f(a))
}
Note that the cache is not public.
So, for a pure function f and without looking at memory consumption, timings, reflection or other evil stuff, you won't be able to tell from the outside whether f was called twice or g cached the result of f.
In this sense, side effects are only things like printing output, writing to public variables, files etc.
Thus, this implementation is considered pure (at least in Scala).
Avoiding mutable collections
If you really want to avoid var and mutable collections, you need to change the signature of your memoize method.
This is, because if g cannot change internal state, it won't be able to memoize anything new after it was initialized.
An (inefficient but simple) example would be
def memoizeOneValue[A, B](f: A => B)(a: A): (B, A => B) = {
val b = f(a)
val g = (v: A) => if (v == a) b else f(v)
(b, g)
}
val (b1, g) = memoizeOneValue(f, a1)
val (b2, h) = memoizeOneValue(g, a2)
// ...
The result of f(a1) would be cached in g, but nothing else. Then, you could chain this and always get a new function.
If you are interested in a faster version of that, see #esse's answer, which does the same, but more efficient (using an immutable map, so O(log(n)) instead of the linked list of functions above, O(n)).

Let's try(Note: I have change the return type of memoize to store the cached data):
import scala.language.existentials
type M[A, B] = A => T forSome { type T <: (B, A => T) }
def memoize[A, B](f: A => B): M[A, B] = {
import scala.collection.immutable
def withCache(cache: immutable.Map[A, B]): M[A, B] = a => cache.get(a) match {
case Some(b) => (b, withCache(cache))
case None =>
val b = f(a)
(b, withCache(cache + (a -> b)))
}
withCache(immutable.Map.empty)
}
def f(i: Int): Int = { print(s"Invoke f($i)"); i }
val (i0, m0) = memoize(f)(1) // f only invoked at first time
val (i1, m1) = m0(1)
val (i2, m2) = m1(1)

Yes there is pure functional ways to implement polymorphic function memoization. The topic is surprisingly deep and even summons the Yoneda Lemma, which is likely what Bartosz had in mind with this exercise.
The blog post Memoization in Haskell gives a nice introduction by simplifying the problem a bit: instead of looking at arbitrary functions it restricts the problem to functions from the integers.
The following memoize function takes a function of type Int -> a and
returns a memoized version of the same function. The trick is to turn
a function into a value because, in Haskell, functions are not
memoized but values are. memoize converts a function f :: Int -> a
into an infinite list [a] whose nth element contains the value of f n.
Thus each element of the list is evaluated when it is first accessed
and cached automatically by the Haskell runtime thanks to lazy
evaluation.
memoize :: (Int -> a) -> (Int -> a)
memoize f = (map f [0 ..] !!)
Apparently the approach can be generalised to function of arbitrary domains. The trick is to come up with a way to use the type of the domain as an index into a lazy data structure used for "storing" previous values. And this is where the Yoneda Lemma comes in and my own understanding of the topic becomes flimsy.

How to model if-expressions with actor systems?

I'm trying to emulate a simple functional language using an actor based execution model where issues arose modelling if-expression.
Actor systems nowadays are used basically for speeding up all kind of stuff by avoiding OS locks and stalled threads or to make microservices less painful, but initially it was supposed to be an alternative model of computation in general [1][2], a contemporary take on it may be propagation networks. So this should be capable to cover any programming language construct and certainly an if, right?
While I'm aware that this is occasionally met with irritation, I saw one timid attempt to move towards recursive algorithms represented using akka actors (I've refurbished it and added further examples including the one given below). That attempt halted at function calls, but why not go further and also model operators and if conditions with actors? In fact the smalltalk language applies this model and yields a precursor of the actor concept, as has been pointed out in the accepted answers below.
Surprisingly recursive function calls aren't much of an issue, but if1 is, due to it's potentially stateful nature.
Given the clause C: if a then x else y, here's the problem:
My initial idea was, that C is an actor behaving like a function with 3 parameters (a,x,y) that returns either x or y depending on a. Being maximally parallel [2] a,x and y would be evaluated simultaneously and passed as messages to C. Now, this isn't really good, if C is the exit condition of a recursive function f, sending one branch of f in an infinite recursion. Also if x or y have side effects one can't just evaluate both of them. Let's take this recursive sum (which is not the usual factorial, stupid as such and could be made tail recursive, but that's not the point)
f(n) = if n <= 0
0
else
n + f(n -1)
Note, that I'd like to create an if-expression resembling the one of Scala, see the (spec, p. 88), or Haskell, for that matter, rather than an if-statement that relies on side-effects.
f(0) would cause 3 concurrent evaluations
n <= 0 (ok)
0 (ok)
n + f(n -1) (bad, introducing the weird behavior that the call to f(n) actually will return (yielding 0) but the evaluation of its branches continues infinitely)
I can see these options from here:
The whole computation becomes stateful and the evaluation of either x
or y only happens after a has been calculated (mandatory if x or y have side effects).
Some guarding mechanism gets introduced that renders x or y not
applicable for arguments outside a certain range upon the call of f. They might evaluate to some "not applicable" marker instead of a value, which will not be used in C anyway, since it comes from a branch which isn't relevant.
I'm not sure at this point, If I didn't miss out on the question at a fundamental level and there are obvious other approaches that I just don't see. Input appreciated :)
Btw. see this for an exhaustive list of conditional branching in different languages, without giving their semantics, and, not exhaustive, the wiki page on conditionals, with semantics, and this for a discussion how the question at hand is an issue till down to the level of hardware.
1 I'm aware that an if could be seen as a special case of pattern matching, but then the question is, how to model the different cases of a match expression using actors. But maybe that wasn't even intended in the first place, matching is just something that every actor can do without referring to other specialized "match-actors". On the other hand it has been stated that "everything is an actor", rather confusing[2]. Btw. does anybody have a clear notion what the [#message whatever] notation is meant to be in that paper? # is irritatingly undefined. Maybe smalltak gives a hint, there it indicates a symbol.

There is a little bit of a misconception in your question. In functional languages, if is not necessarily a function of three parameters. Rather, it is sometimes two functions of two parameters.
In particular, that is how the Church Encoding of Booleans works in λ-calculus: there are two functions, let's call them True and False. Both functions have two parameters. True simply returns the first argument, False simply returns the second argument.
First, let's define two functions called true and false. We could define them any way we want, they are completely arbitrary, but we will define them in a very special way which has some advantages as we will see later (I will use ECMAScript as a somewhat reasonable approximation of λ-calculus that is probably readable by a bigger portion of visitors to this site than λ-calculus itself):
const tru = (thn, _ ) => thn,
fls = (_ , els) => els;
tru is a function with two parameters which simply ignores its second argument and returns the first. fls is also a function with two parameters which simply ignores its first argument and returns the second.
Why did we encode tru and fls this way? Well, this way, the two functions not only represent the two concepts of true and false, no, at the same time, they also represent the concept of "choice", in other words, they are also an if/then/else expression! We evaluate the if condition and pass it the then block and the else block as arguments. If the condition evaluates to tru, it will return the then block, if it evaluates to fls, it will return the else block. Here's an example:
tru(23, 42);
// => 23
This returns 23, and this:
fls(23, 42);
// => 42
returns 42, just as you would expect.
There is a wrinkle, however:
tru(console.log("then branch"), console.log("else branch"));
// then branch
// else branch
This prints both then branch and else branch! Why?
Well, it returns the return value of the first argument, but it evaluates both arguments, since ECMAScript is strict and always evaluates all arguments to a function before calling the function. IOW: it evaluates the first argument which is console.log("then branch"), which simply returns undefined and has the side-effect of printing then branch to the console, and it evaluates the second argument, which also returns undefined and prints to the console as a side-effect. Then, it returns the first undefined.
In λ-calculus, where this encoding was invented, that's not a problem: λ-calculus is pure, which means it doesn't have any side-effects; therefore you would never notice that the second argument also gets evaluated. Plus, λ-calculus is lazy (or at least, it is often evaluated under normal order), meaning, it doesn't actually evaluate arguments which aren't needed. So, IOW: in λ-calculus the second argument would never be evaluated, and if it were, we wouldn't notice.
ECMAScript, however, is strict, i.e. it always evaluates all arguments. Well, actually, not always: the if/then/else, for example, only evaluates the then branch if the condition is true and only evaluates the else branch if the condition is false. And we want to replicate this behavior with our iff. Thankfully, even though ECMAScript isn't lazy, it has a way to delay the evaluation of a piece of code, the same way almost every other language does: wrap it in a function, and if you never call that function, the code will never get executed.
So, we wrap both blocks in a function, and at the end call the function that is returned:
tru(() => console.log("then branch"), () => console.log("else branch"))();
// then branch
prints then branch and
fls(() => console.log("then branch"), () => console.log("else branch"))();
// else branch
prints else branch.
We could implement the traditional if/then/else this way:
const iff = (cnd, thn, els) => cnd(thn, els);
iff(tru, 23, 42);
// => 23
iff(fls, 23, 42);
// => 42
Again, we need some extra function wrapping when calling the iff function and the extra function call parentheses in the definition of iff, for the same reason as above:
const iff = (cnd, thn, els) => cnd(thn, els)();
iff(tru, () => console.log("then branch"), () => console.log("else branch"));
// then branch
iff(fls, () => console.log("then branch"), () => console.log("else branch"));
// else branch
Now that we have those two definitions, we can implement or. First, we look at the truth table for or: if the first operand is truthy, then the result of the expression is the same as the first operand. Otherwise, the result of the expression is the result of the second operand. In short: if the first operand is true, we return the first operand, otherwise we return the second operand:
const orr = (a, b) => iff(a, () => a, () => b);
Let's check out that it works:
orr(tru,tru);
// => tru(thn, _) {}
orr(tru,fls);
// => tru(thn, _) {}
orr(fls,tru);
// => tru(thn, _) {}
orr(fls,fls);
// => fls(_, els) {}
Great! However, that definition looks a little ugly. Remember, tru and fls already act like a conditional all by themselves, so really there is no need for iff, and thus all of that function wrapping at all:
const orr = (a, b) => a(a, b);
There you have it: or (plus other boolean operators) defined with nothing but function definitions and function calls in just a handful of lines:
const tru = (thn, _ ) => thn,
fls = (_ , els) => els,
orr = (a , b ) => a(a, b),
nnd = (a , b ) => a(b, a),
ntt = a => a(fls, tru),
xor = (a , b ) => a(ntt(b), b),
iff = (cnd, thn, els) => cnd(thn, els)();
Unfortunately, this implementation is rather useless: there are no functions or operators in ECMAScript which return tru or fls, they all return true or false, so we can't use them with our functions. But there's still a lot we can do. For example, this is an implementation of a singly-linked list:
const cons = (hd, tl) => which => which(hd, tl),
car = l => l(tru),
cdr = l => l(fls);
You may have noticed something peculiar: tru and fls play a double role, they act both as the data values true and false, but at the same time, they also act as a conditional expression. They are data and behavior, bundled up into one … uhm … "thing" … or (dare I say) object! Does this idea of identifying data and behavior remind us of anything?
Indeed, tru and fls are objects. And, if you have ever used Smalltalk, Self, Newspeak or other pure object-oriented languages, you will have noticed that they implement booleans in exactly the same way: two objects true and false which have method named if that takes two blocks (functions, lambdas, whatever) as arguments and evaluates one of them.
Here's an example of what it might look like in Scala:
sealed abstract trait Buul {
def apply[T, U <: T, V <: T](thn: ⇒ U)(els: ⇒ V): T
def &&&(other: ⇒ Buul): Buul
def |||(other: ⇒ Buul): Buul
def ntt: Buul
}
case object Tru extends Buul {
override def apply[T, U <: T, V <: T](thn: ⇒ U)(els: ⇒ V): U = thn
override def &&&(other: ⇒ Buul) = other
override def |||(other: ⇒ Buul): this.type = this
override def ntt = Fls
}
case object Fls extends Buul {
override def apply[T, U <: T, V <: T](thn: ⇒ U)(els: ⇒ V): V = els
override def &&&(other: ⇒ Buul): this.type = this
override def |||(other: ⇒ Buul) = other
override def ntt = Tru
}
object BuulExtension {
import scala.language.implicitConversions
implicit def boolean2Buul(b: ⇒ Boolean) = if (b) Tru else Fls
}
import BuulExtension._
(2 < 3) { println("2 is less than 3") } { println("2 is greater than 3") }
// 2 is less than 3
Given the very close relationship between OO and actors (they are pretty much the same thing, actually), which is not historically surprising (Alan Kay based Smalltalk on Carl Hewitt's PLANNER; Carl Hewitt based Actors on Alan Kay's Smalltalk), I wouldn't be surprised if this turned out to be a step in the right direction to solve your problem.

Q : didn't ( I ) miss out on the question at a fundamental level?
Yes, you happened to have missed a cardinal point: even the functional-languages, that may otherwise enjoy the forms of AND- and/or OR-based fine-grain sorts of parallelism, do not grow as wild as not to respect the strictly [SERIAL] nature of the if ( expression1 ) expression2 [ else expression3 ]
You have spend many efforts on argumentation about recursion-case(s) whereas the principal property was left out of your view. State-fullness is the Mother Nature of the computing ( these toys are nothing but finite state automata, no matter how large the state-space might be, it is and always will principally remain finite ).
Even the cited Scala p.88 confirms this: "The conditional expression is evaluated by evaluating first e1. If this evaluates to true, the result of evaluating e2 is returned, otherwise the result of evaluating e3 is returned." - which is a pure-[SERIAL] process-recipe ( one step after another ).
One may remember, that even the evaluation of expression1 may have (and does have ) state-change effects ( not only "side-effects" ), but indeed state-change effects ( PRNG-steps into a new state whenever a random-number was asked to get generated and many similar situations )
Thus the if e1 then e2 else e3 has to obey a pure-[SERIAL] implementation, no matter what benefits may be brought into action from fine-grain {AND|OR}-based-parallelism ( may see many working examples thereof in languages that can use 'em right since late 70-ies early 80-ies )

Efficient implementation of Catamorphisms in Scala

For a datatype representing the natural numbers:
sealed trait Nat
case object Z extends Nat
case class S(pred: Nat) extends Nat
In Scala, here is an elementary way of implementing the corresponding catamorphism:
def cata[A](z: A)(l: Nat)(f: A => A): A = l match {
case Z => z
case S(xs) => f( cata(z)(xs)(f) )
}
However, since the recursive call to cata isn't in tail position, this can easily trigger a stack overflow.
What are alternative implementation options that will avoid this? I'd rather not go down the route of F-algebras unless the interface ultimately presented by the code can look pretty much as simple as the above.
EDIT: Looks like this might be directly relevant: Is it possible to use continuations to make foldRight tail recursive?

If you were implementing a catamorphism on lists, that would be what in Haskell we call a foldr. We know that foldr does not have a tail-recursive definition, but foldl does. So if you insist on a tail-recusive program, the right thing to do is reverse the list argument (tail-recursively, in linear time), then use a foldl in place of the foldr.
Your example uses the simpler data type of naturals (and a truly "efficient" implementation would use machine integers, but we'll agree to leave that aside). What is the reverse of one of your natural numbers? Just the number itself, because we can think of it as a list with no data in each node, so we can't tell the difference when it is reversed! And what's the equivalent of the foldl? It's the program (forgive the pseudocode)
def cata(z, a, f) = {
var x = a, y = z;
while (x != Z) {
y = f(y);
x = pred(x)
}
return y
}
Or as a Scala tail-recursion,
def cata[A](z: A)(a: Nat)(f: A => A): A = a match {
case Z => z
case S(b) => cata( f(z) )(b)(f)
}
Will that do?

Yes, this is exactly the motivating example in the paper Clowns to the left of me, jokers to the right
(Dissecting Data Structures) (updated, better, but non-free version here http://dl.acm.org/citation.cfm?id=1328474).
The basic idea is that you want to turn your recursive function into a loop, so you need to figure out a data structure that keeps track of the state of the procedure, which is
What you've calculuated so far
What you have left to do.
The type of this state depends on the structure of the type you're doing the fold over, at any point in the fold you are at some node of the tree and you need to remember the tree structure of "the rest of the tree".
The paper shows how you can calculate that state type mechanically. If you do this for Lists, you get that the state you need to keep track of is
The operation run on all the previous values.
The list of elements left to process.
Which is exactly what foldl keeps track of, so it's kind of a coincidence that foldl and foldr can be given the same type.

Efficiently iterate over one Set, and then another, in one for loop in Scala

I want to iterate over all the elements of one Set, and then all the elements of another Set, using a single loop. (I don't care about duplicates, because I happen to know the two Sets are disjoint.)
The reason I want to do it in a single loop is because I have some additional code to measure progress, which requires it to be in a single loop.
This doesn't work in general, because it might intermix the two Sets arbitrarily:
for(x <- firstSet ++ secondSet) {
...
}
This works, but builds 3 intermediate Seqs in memory, so it's far too inefficient in terms of both time and space usage:
for(x <- firstSet.toSeq ++ secondSet.toSeq) {
...
}

for(x <- firstSet.toIterator ++ secondSet.toIterator) {
...
}
This doesn't build any intermediate data structures, so I think it's the most efficient way.

If you just want a traversal, and you want maximum performance, this is the best way even though it is ugly:
val s1 = Set(1,2,3)
val s2 = Set(4,5,6)
val block : Int => Unit = x => { println(x) }
s1.foreach(block)
s2.foreach(block)
Since this is pretty ugly, you can just define a class for it:
def traverse[T](a:Traversable[T], b:Traversable[T]) : Traversable[T] =
new Traversable[T] {
def foreach[U](f:T=>U) { a.foreach(f); b.foreach(f) }
}
And then use it like this:
for(x<-traverse(s1, s2)) println(x)
However, unless this is extremely performance-critical, the solution posted by Robin Green is better. The overhead is creation of two iterators and concatenation of them. If you have deeper nested data structures, concatenating iterators can be quite expensive though. For example a tree iterator that is defined by concatenating the iterators of the subtrees will be painfully slow, whereas a tree traversable where you just call foreach on each subtree will be close to optimal.

Is Scala idiomatic coding style just a cool trap for writing inefficient code?

I sense that the Scala community has a little big obsession with writing "concise", "cool", "scala idiomatic", "one-liner" -if possible- code. This is immediately followed by a comparison to Java/imperative/ugly code.
While this (sometimes) leads to easy to understand code, it also leads to inefficient code for 99% of developers. And this is where Java/C++ is not easy to beat.
Consider this simple problem: Given a list of integers, remove the greatest element. Ordering does not need to be preserved.
Here is my version of the solution (It may not be the greatest, but it's what the average non-rockstar developer would do).
def removeMaxCool(xs: List[Int]) = {
val maxIndex = xs.indexOf(xs.max);
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.
Here is my totally uncool, Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.
def removeMaxFast(xs: List[Int]) = {
var res = ArrayBuffer[Int]()
var max = xs.head
var first = true;
for (x <- xs) {
if (first) {
first = false;
} else {
if (x > max) {
res.append(max)
max = x
} else {
res.append(x)
}
}
}
res.toList
}
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge
obstacle to cross for greater Scala adoption. Is there a way out of this trap?
I am looking for practical advice to avoid such "inefficiency traps" while keeping implementation clear ans concise.
Clarification: This question comes from a real-life scenario: I had to write a complex algorithm. First I wrote it in Scala, then I "had to" rewrite it in Java. The Java implementation was twice as long, and not that clear, but at the same time it was twice as fast. Rewriting the Scala code to be efficient would probably take some time and a somewhat deeper understanding of scala internal efficiencies (for vs. map vs. fold, etc)

Let's discuss a fallacy in the question:
So, if 99% of Java developers write more efficient code than 99% of
Scala developers, this is a huge obstacle to cross for greater Scala
adoption. Is there a way out of this trap?
This is presumed, with absolutely no evidence backing it up. If false, the question is moot.
Is there evidence to the contrary? Well, let's consider the question itself -- it doesn't prove anything, but shows things are not that clear.
Totally non-Scala idiomatic, non-functional, non-concise, but it's
very efficient. It traverses the list only once!
Of the four claims in the first sentence, the first three are true, and the fourth, as shown by user unknown, is false! And why it is false? Because, contrary to what the second sentence states, it traverses the list more than once.
The code calls the following methods on it:
res.append(max)
res.append(x)
and
res.toList
Let's consider first append.
append takes a vararg parameter. That means max and x are first encapsulated into a sequence of some type (a WrappedArray, in fact), and then passed as parameter. A better method would have been +=.
Ok, append calls ++=, which delegates to +=. But, first, it calls ensureSize, which is the second mistake (+= calls that too -- ++= just optimizes that for multiple elements). Because an Array is a fixed size collection, which means that, at each resize, the whole Array must be copied!
So let's consider this. When you resize, Java first clears the memory by storing 0 in each element, then Scala copies each element of the previous array over to the new array. Since size doubles each time, this happens log(n) times, with the number of elements being copied increasing each time it happens.
Take for example n = 16. It does this four times, copying 1, 2, 4 and 8 elements respectively. Since Java has to clear each of these arrays, and each element must be read and written, each element copied represents 4 traversals of an element. Adding all we have (n - 1) * 4, or, roughly, 4 traversals of the complete list. If you count read and write as a single pass, as people often erroneously do, then it's still three traversals.
One can improve on this by initializing the ArrayBuffer with an initial size equal to the list that will be read, minus one, since we'll be discarding one element. To get this size, we need to traverse the list once, though.
Now let's consider toList. To put it simply, it traverses the whole list to create a new list.
So, we have 1 traversal for the algorithm, 3 or 4 traversals for resize, and 1 additional traversal for toList. That's 4 or 5 traversals.
The original algorithm is a bit difficult to analyse, because take, drop and ::: traverse a variable number of elements. Adding all together, however, it does the equivalent of 3 traversals. If splitAt was used, it would be reduced to 2 traversals. With 2 more traversals to get the maximum, we get 5 traversals -- the same number as the non-functional, non-concise algorithm!
So, let's consider improvements.
On the imperative algorithm, if one uses ListBuffer and +=, then all methods are constant-time, which reduces it to a single traversal.
On the functional algorithm, it could be rewritten as:
val max = xs.max
val (before, _ :: after) = xs span (max !=)
before ::: after
That reduces it to a worst case of three traversals. Of course, there are other alternatives presented, based on recursion or fold, that solve it in one traversal.
And, most interesting of all, all of these algorithms are O(n), and the only one which almost incurred (accidentally) in worst complexity was the imperative one (because of array copying). On the other hand, the cache characteristics of the imperative one might well make it faster, because the data is contiguous in memory. That, however, is unrelated to either big-Oh or functional vs imperative, and it is just a matter of the data structures that were chosen.
So, if we actually go to the trouble of benchmarking, analyzing the results, considering performance of methods, and looking into ways of optimizing it, then we can find faster ways to do this in an imperative manner than in a functional manner.
But all this effort is very different from saying the average Java programmer code will be faster than the average Scala programmer code -- if the question is an example, that is simply false. And even discounting the question, we have seen no evidence that the fundamental premise of the question is true.
EDIT
First, let me restate my point, because it seems I wasn't clear. My point is that the code the average Java programmer writes may seem to be more efficient, but actually isn't. Or, put another way, traditional Java style doesn't gain you performance -- only hard work does, be it Java or Scala.
Next, I have a benchmark and results too, including almost all solutions suggested. Two interesting points about it:
Depending on list size, the creation of objects can have a bigger impact than multiple traversals of the list. The original functional code by Adrian takes advantage of the fact that lists are persistent data structures by not copying the elements right of the maximum element at all. If a Vector was used instead, both left and right sides would be mostly unchanged, which might lead to even better performance.
Even though user unknown and paradigmatic have similar recursive solutions, paradigmatic's is way faster. The reason for that is that he avoids pattern matching. Pattern matching can be really slow.
The benchmark code is here, and the results are here.

def removeOneMax (xs: List [Int]) : List [Int] = xs match {
case x :: Nil => Nil
case a :: b :: xs => if (a < b) a :: removeOneMax (b :: xs) else b :: removeOneMax (a :: xs)
case Nil => Nil
}
Here is a recursive method, which only iterates once. If you need performance, you have to think about it, if not, not.
You can make it tail-recursive in the standard way: giving an extra parameter carry, which is per default the empty List, and collects the result while iterating. That is, of course, a bit longer, but if you need performance, you have to pay for it:
import annotation.tailrec
#tailrec
def removeOneMax (xs: List [Int], carry: List [Int] = List.empty) : List [Int] = xs match {
case a :: b :: xs => if (a < b) removeOneMax (b :: xs, a :: carry) else removeOneMax (a :: xs, b :: carry)
case x :: Nil => carry
case Nil => Nil
}
I don't know what the chances are, that later compilers will improve slower map-calls to be as fast as while-loops. However: You rarely need high speed solutions, but if you need them often, you will learn them fast.
Do you know how big your collection has to be, to use a whole second for your solution on your machine?
As oneliner, similar to Daniel C. Sobrals solution:
((Nil : List[Int], xs(0)) /: xs.tail) ((p, x)=> if (p._2 > x) (x :: p._1, p._2) else ((p._2 :: p._1), x))._1
but that is hard to read, and I didn't measure the effective performance. The normal pattern is (x /: xs) ((a, b) => /* something */). Here, x and a are pairs of List-so-far and max-so-far, which solves the problem to bring everything into one line of code, but isn't very readable. However, you can earn reputation on CodeGolf this way, and maybe someone likes to make a performance measurement.
And now to our big surprise, some measurements:
An updated timing-method, to get the garbage collection out of the way, and have the hotspot-compiler warm up, a main, and many methods from this thread, together in an Object named
object PerfRemMax {
def timed (name: String, xs: List [Int]) (f: List [Int] => List [Int]) = {
val a = System.currentTimeMillis
val res = f (xs)
val z = System.currentTimeMillis
val delta = z-a
println (name + ": " + (delta / 1000.0))
res
}
def main (args: Array [String]) : Unit = {
val n = args(0).toInt
val funs : List [(String, List[Int] => List[Int])] = List (
"indexOf/take-drop" -> adrian1 _,
"arraybuf" -> adrian2 _, /* out of memory */
"paradigmatic1" -> pm1 _, /**/
"paradigmatic2" -> pm2 _,
// "match" -> uu1 _, /*oom*/
"tailrec match" -> uu2 _,
"foldLeft" -> uu3 _,
"buf-=buf.max" -> soc1 _,
"for/yield" -> soc2 _,
"splitAt" -> daniel1,
"ListBuffer" -> daniel2
)
val r = util.Random
val xs = (for (x <- 1 to n) yield r.nextInt (n)).toList
// With 1 Mio. as param, it starts with 100 000, 200k, 300k, ... 1Mio. cases.
// a) warmup
// b) look, where the process gets linear to size
funs.foreach (f => {
(1 to 10) foreach (i => {
timed (f._1, xs.take (n/10 * i)) (f._2)
compat.Platform.collectGarbage
});
println ()
})
}
I renamed all the methods, and had to modify uu2 a bit, to fit to the common method declaration (List [Int] => List [Int]).
From the long result, i only provide the output for 1M invocations:
scala -Dserver PerfRemMax 2000000
indexOf/take-drop: 0.882
arraybuf: 1.681
paradigmatic1: 0.55
paradigmatic2: 1.13
tailrec match: 0.812
foldLeft: 1.054
buf-=buf.max: 1.185
for/yield: 0.725
splitAt: 1.127
ListBuffer: 0.61
The numbers aren't completly stable, depending on the sample size, and a bit varying from run to run. For example, for 100k to 1M runs, in steps of 100k, the timing for splitAt was as follows:
splitAt: 0.109
splitAt: 0.118
splitAt: 0.129
splitAt: 0.139
splitAt: 0.157
splitAt: 0.166
splitAt: 0.749
splitAt: 0.752
splitAt: 1.444
splitAt: 1.127
The initial solution is already pretty fast. splitAt is a modification from Daniel, often faster, but not always.
The measurement was done on a single core 2Ghz Centrino, running xUbuntu Linux, Scala-2.8 with Sun-Java-1.6 (desktop).
The two lessons for me are:
always measure your performance improvements; it is very hard to estimate it, if you don't do it on a daily basis
it is not only fun, to write functional code - sometimes the result is even faster
Here is a link to my benchmarkcode, if somebody is interested.

First of all, the behavior of the methods you presented is not the same. The first one keeps the element ordering, while the second one doesn't.
Second, among all the possible solution which could be qualified as "idiomatic", some are more efficient than others. Staying very close to your example, you can for instance use tail-recursion to eliminate variables and manual state management:
def removeMax1( xs: List[Int] ) = {
def rec( max: Int, rest: List[Int], result: List[Int]): List[Int] = {
if( rest.isEmpty ) result
else if( rest.head > max ) rec( rest.head, rest.tail, max :: result)
else rec( max, rest.tail, rest.head :: result )
}
rec( xs.head, xs.tail, List() )
}
or fold the list:
def removeMax2( xs: List[Int] ) = {
val result = xs.tail.foldLeft( xs.head -> List[Int]() ) {
(acc,x) =>
val (max,res) = acc
if( x > max ) x -> ( max :: res )
else max -> ( x :: res )
}
result._2
}
If you want to keep the original insertion order, you can (at the expense of having two passes, rather than one) without any effort write something like:
def removeMax3( xs: List[Int] ) = {
val max = xs.max
xs.filterNot( _ == max )
}
which is more clear than your first example.

The biggest inefficiency when you're writing a program is worrying about the wrong things. This is usually the wrong thing to worry about. Why?
Developer time is generally much more expensive than CPU time — in fact, there is usually a dearth of the former and a surplus of the latter.
Most code does not need to be very efficient because it will never be running on million-item datasets multiple times every second.
Most code does need to bug free, and less code is less room for bugs to hide.

The example you gave is not very functional, actually. Here's what you are doing:
// Given a list of Int
def removeMaxCool(xs: List[Int]): List[Int] = {
// Find the index of the biggest Int
val maxIndex = xs.indexOf(xs.max);
// Then take the ints before and after it, and then concatenate then
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
Mind you, it is not bad, but you know when functional code is at its best when it describes what you want, instead of how you want it. As a minor criticism, if you used splitAt instead of take and drop you could improve it slightly.
Another way of doing it is this:
def removeMaxCool(xs: List[Int]): List[Int] = {
// the result is the folding of the tail over the head
// and an empty list
xs.tail.foldLeft(xs.head -> List[Int]()) {
// Where the accumulated list is increased by the
// lesser of the current element and the accumulated
// element, and the accumulated element is the maximum between them
case ((max, ys), x) =>
if (x > max) (x, max :: ys)
else (max, x :: ys)
// and of which we return only the accumulated list
}._2
}
Now, let's discuss the main issue. Is this code slower than the Java one? Most certainly! Is the Java code slower than a C equivalent? You can bet it is, JIT or no JIT. And if you write it directly in assembler, you can make it even faster!
But the cost of that speed is that you get more bugs, you spend more time trying to understand the code to debug it, and you have less visibility of what the overall program is doing as opposed to what a little piece of code is doing -- which might result in performance problems of its own.
So my answer is simple: if you think the speed penalty of programming in Scala is not worth the gains it brings, you should program in assembler. If you think I'm being radical, then I counter that you just chose the familiar as being the "ideal" trade off.
Do I think performance doesn't matter? Not at all! I think one of the main advantages of Scala is leveraging gains often found in dynamically typed languages with the performance of a statically typed language! Performance matters, algorithm complexity matters a lot, and constant costs matters too.
But, whenever there is a choice between performance and readability and maintainability, the latter is preferable. Sure, if performance must be improved, then there isn't a choice: you have to sacrifice something to it. And if there's no lost in readability/maintainability -- such as Scala vs dynamically typed languages -- sure, go for performance.
Lastly, to gain performance out of functional programming you have to know functional algorithms and data structures. Sure, 99% of Java programmers with 5-10 years experience will beat the performance of 99% of Scala programmers with 6 months experience. The same was true for imperative programming vs object oriented programming a couple of decades ago, and history shows it didn't matter.
EDIT
As a side note, your "fast" algorithm suffer from a serious problem: you use ArrayBuffer. That collection does not have constant time append, and has linear time toList. If you use ListBuffer instead, you get constant time append and toList.

For reference, here's how splitAt is defined in TraversableLike in the Scala standard library,
def splitAt(n: Int): (Repr, Repr) = {
val l, r = newBuilder
l.sizeHintBounded(n, this)
if (n >= 0) r.sizeHint(this, -n)
var i = 0
for (x <- this) {
(if (i < n) l else r) += x
i += 1
}
(l.result, r.result)
}
It's not unlike your example code of what a Java programmer might come up with.
I like Scala because, where performance matters, mutability is a reasonable way to go. The collections library is a great example; especially how it hides this mutability behind a functional interface.
Where performance isn't as important, such as some application code, the higher order functions in Scala's library allow great expressivity and programmer efficiency.
Out of curiosity, I picked an arbitrary large file in the Scala compiler (scala.tools.nsc.typechecker.Typers.scala) and counted something like 37 for loops, 11 while loops, 6 concatenations (++), and 1 fold (it happens to be a foldRight).

What about this?
def removeMax(xs: List[Int]) = {
val buf = xs.toBuffer
buf -= (buf.max)
}
A bit more ugly, but faster:
def removeMax(xs: List[Int]) = {
var max = xs.head
for ( x <- xs.tail )
yield {
if (x > max) { val result = max; max = x; result}
else x
}
}

Try this:
(myList.foldLeft((List[Int](), None: Option[Int]))) {
case ((_, None), x) => (List(), Some(x))
case ((Nil, Some(m), x) => (List(Math.min(x, m)), Some(Math.max(x, m))
case ((l, Some(m), x) => (Math.min(x, m) :: l, Some(Math.max(x, m))
})._1
Idiomatic, functional, traverses only once. Maybe somewhat cryptic if you are not used to functional-programming idioms.
Let's try to explain what is happening here. I will try to make it as simple as possible, lacking some rigor.
A fold is an operation on a List[A] (that is, a list that contains elements of type A) that will take an initial state s0: S (that is, an instance of a type S) and a function f: (S, A) => S (that is, a function that takes the current state and an element from the list, and gives the next state, ie, it updates the state according to the next element).
The operation will then iterate over the elements of the list, using each one to update the state according to the given function. In Java, it would be something like:
interface Function<T, R> { R apply(T t); }
class Pair<A, B> { ... }
<State> State fold(List<A> list, State s0, Function<Pair<A, State>, State> f) {
State s = s0;
for (A a: list) {
s = f.apply(new Pair<A, State>(a, s));
}
return s;
}
For example, if you want to add all the elements of a List[Int], the state would be the partial sum, that would have to be initialized to 0, and the new state produced by a function would simply add the current state to the current element being processed:
myList.fold(0)((partialSum, element) => partialSum + element)
Try to write a fold to multiply the elements of a list, then another one to find extreme values (max, min).
Now, the fold presented above is a bit more complex, since the state is composed of the new list being created along with the maximum element found so far. The function that updates the state is more or less straightforward once you grasp these concepts. It simply puts into the new list the minimum between the current maximum and the current element, while the other value goes to the current maximum of the updated state.
What is a bit more complex than to understand this (if you have no FP background) is to come up with this solution. However, this is only to show you that it exists, can be done. It's just a completely different mindset.
EDIT: As you see, the first and second case in the solution I proposed are used to setup the fold. It is equivalent to what you see in other answers when they do xs.tail.fold((xs.head, ...)) {...}. Note that the solutions proposed until now using xs.tail/xs.head don't cover the case in which xs is List(), and will throw an exception. The solution above will return List() instead. Since you didn't specify the behavior of the function on empty lists, both are valid.

Another option would be:
package code.array
object SliceArrays {
def main(args: Array[String]): Unit = {
println(removeMaxCool(Vector(1,2,3,100,12,23,44)))
}
def removeMaxCool(xs: Vector[Int]) = xs.filter(_ < xs.max)
}
Using Vector instead of List, the reason is that Vector is more versatile and has a better general performance and time complexity if compared to List.
Consider the following collections operations:
head, tail, apply, update, prepend, append
Vector takes an amortized constant time for all operations, as per Scala docs:
"The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys"
While List takes constant time only for head, tail and prepend operations.
Using
scalac -print
generates:
package code.array {
object SliceArrays extends Object {
def main(args: Array[String]): Unit = scala.Predef.println(SliceArrays.this.removeMaxCool(scala.`package`.Vector().apply(scala.Predef.wrapIntArray(Array[Int]{1, 2, 3, 100, 12, 23, 44})).$asInstanceOf[scala.collection.immutable.Vector]()));
def removeMaxCool(xs: scala.collection.immutable.Vector): scala.collection.immutable.Vector = xs.filter({
((x$1: Int) => SliceArrays.this.$anonfun$removeMaxCool$1(xs, x$1))
}).$asInstanceOf[scala.collection.immutable.Vector]();
final <artifact> private[this] def $anonfun$removeMaxCool$1(xs$1: scala.collection.immutable.Vector, x$1: Int): Boolean = x$1.<(scala.Int.unbox(xs$1.max(scala.math.Ordering$Int)));
def <init>(): code.array.SliceArrays.type = {
SliceArrays.super.<init>();
()
}
}
}

Another contender. This uses a ListBuffer, like Daniel's second offering, but shares the post-max tail of the original list, avoiding copying it.
def shareTail(xs: List[Int]): List[Int] = {
var res = ListBuffer[Int]()
var maxTail = xs
var first = true;
var x = xs
while ( x != Nil ) {
if (x.head > maxTail.head) {
while (!(maxTail.head == x.head)) {
res += maxTail.head
maxTail = maxTail.tail
}
}
x = x.tail
}
res.prependToList(maxTail.tail)
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse