Idiomatic form of dealing with un-initialized var - scala

I'm coding up my first Scala script to get a feel for the language, and I'm a bit stuck as to the best way to achieve something.
My situation is the following, I have a method which I need to call N times, this method returns an Int on each run (might be different, there's a random component to the execution), and I want to keep the best run (the smallest value returned on these runs).
Now, coming from a Java/Python background, I would simply initialize the variable with null/None, and compare in the if, something like:
best = None
for...
result = executionOfThings()
if(best is None or result < best):
best = result
And that's that (pardon for the semi-python pseudo-code).
Now, on Scala, I'm struggling a bit. I've read about the usage of Option and pattern matching to achieve the same effect, and I guess I could code up something like (this was the best I could come up with):
best match {
case None => best = Some(res)
case Some(x) if x > res => best = Some(res)
case _ =>
}
I believe this works, but I'm not sure if it's the most idiomatic way of writing it. It's clear enough, but a bit verbose for such a simple "use-case".
Anyone that could shine a functional light on me?
Thanks.

For this particular problem, not in general, I would suggest initializing with Int.MaxValue as long as you're guaranteed that N >= 1. Then you just
if (result < best) best = result
You could also, with best as an option,
best = best.filter(_ >= result).orElse( Some(result) )
if the optionality is important (e.g. it is possible that N == 0, and you don't take a distinct path through the code in that case). This is a more general way to deal with optional values that may get replaced: use filter to keep the non-replaced cases, and orElse to fill in the replacement if needed.

Just use the min function:
(for (... executionOfThings()).min
Example:
((1 to 5).map (x => 4 * x * x - (x * x * x))).min

edit: adjusted to #user-unknown's suggestion
I would suggest you to rethink you whole computation to be more functional. You mutate state which should be avoided. I could think of a recursive version of your code:
def calcBest[A](xs: List[A])(f: A => Int): Int = {
def calcBest(xs: List[A], best: Int = Int.MaxValue): Int = xs match {
// will match an empty list
case Nil => best
// x will hold the head of the list and rest the rest ;-)
case x :: rest => calcBest(rest, math.min(f(x), best))
}
calcBest(xs)
}
callable with calcBest(List(7,5,3,8,2))(_*2) // => res0: Int = 4
With this you have no mutable state at all.
Another way would be to use foldLeft on the list:
list.foldLeft(Int.MaxValue) { case (best,x) => math.min(calculation(x),best) }
foldLeft takes a B and a PartialFunction of Tuple2[B,A] => B and returns B
Both ways are equivalent. The first one is probably faster, the second is more readable. Both traverse a list call a function on each value and return the smallest. Which from your snippet is what you want, right?

I thought I would offer another idiomatic solution. You can use Iterator.continually to create an infinite-length iterator that's lazily evaluated, take(N) to limit the iterator to N elements, and use min to find the winner.
Iterator.continually { executionOfThings() }.take(N).min

Related

Swift-like "guard-let" in Scala

In Swift, one can do something like:
guard let x = z else {return} // z is "String?"
In this simple case, if z (an optional) is empty, the function will exit.
I really like this structure and recently started developing in Scala, and I was wondering if there's an equivalent in Scala.
What I found is this:
val x = if (z.isEmpty) return else z.get // z is "Option[String]"
It works - but I still wonder if there's a more "Scala-ish" way of doing it.
EDIT: the use case is that a configuration argument is allowed to be null in the app, but this specific function has nothing to do in that case. I'm looking for the correct way to avoid calling code that will not work.
Scala is functional so using return to break out early is almost always a bad idea (partly because return doesn't work the way it might appear).
In this case there are a couple of choices.
Using map allows the value inside an Option to be processed if it is there:
val xOpt = z.map { x =>
// process x
}
xOpt is a new Option that contains the results of the processing, or None if z was already None.
The alternative is to use a default value if z is None and then use that.
val x = z.getOrElse(defaultX)
// process x
In this case x is a bare value not an Option.
Admittedly I'm not familiar with Swift, but since you're not returning a value, this is presumably in a function being called for side-effect.
In Scala, the natural equivalent for "perform this side-effect if and only if this Option is non-empty is foreach:
z.foreach { x =>
// code that uses x
}
There is syntactic sugar around this, the above is equivalent to
for (
x <- z
) { // the absence of yield is significant here
// code that uses x
}
If z is empty, the foreach does nothing, so the "code that uses x would be everything in the function after the guard-let.
It might be that there's an implicit null being returned in the early exit case (it's hard to think of another value that would make sense) if this is a function being called for value. In that case, the natural equivalent would be map:
z.map { x =>
// code that computes a value from x
} // passes through None if z is empty
The for sugar would be
for (
x <- z
) yield {
// code that uses x
}
As above, "code that uses x" means everything after the guard-let.
If you happened to have two nullable arguments like
guard let x = z else { return }
guard let w = y else { return }
You could nest these or have multiple <-'s in your for expression:
for {
x <- z
w <- y
} yield {
// code that uses x and w
}
Alternatively (and my personal preference), you can use zip:
z.zip(y).foreach {
case (x, w) =>
// code that uses x and w
}
// or in the called-for-value case
z.zip(y).map {
case (x, w) =>
// code that uses x and w
}.headOption // zip creates an Iterable, so bring it back to Option
For comprehensions are similar:
def f(z: Option[String]) = for {
x <- z // x is String
result = x + " okay"
} yield result
This is useful when you have several short-circuiting operations, but the downside (compared to Swift) is you can't use other control structures such as if without some difficulty.
However, for in Scala can be used for more than just Option, e.g. to process lists:
def products(xs: Vector[Int], ys: Vector[Int]) = for {
x <- xs
y <- ys
} yield x * y // Result is a Vector of each x multiplied by each y
It is important to understand that Option is a monad. Being a monad means that it supports map and flatMap and also many other collection operations like foreach. Being a monad, and specifically supporting map, flatMap, and foreach also means that Option can be used as a generator in a for-comprehension.
The best way of thinking about Option is as a collection that can hold either 0 or 1 values. (In fact, Option does inherit from IterableOnce, and it implements several of the methods of Iterable, so you can not just think of Option as a collection, it simply is a collection.) A lot of the knowledge you have about collections transfers directly to Option, and Option supports the most important of the standard Scala collections operations, including the afore-mentioned monad operations.
So, what happens if you iterate over an empty collection? Well, nothing! What happens if you map over an empty collection? Well, you get back an empty collection!
So, the correct way of working of Options is actually not to try and get the value out of it, but instead to treat the Option as a collection.
So, instead of trying to get x out of the Option and then manipulating x, you can manipulate the value inside of the Option.
E.g. if you want to print it:
z.foreach(println)
// or
for (x <- z) println(x)
Iterating over an empty collection doesn't do anything, so this will either do nothing (if the Option is empty) or will print the value.
If you want to downcase it:
val x = z.map(_.toLowerCase())
// or
val x = for (x <- z) yield x.toLowerCase()
Note that in my case, x is an Option[String] (i.e. either a Some[String] or None) whereas in your case, x is a String, i.e. in my case, we are staying in the land of Options and don't leave.
Only at the very boundary of your system should you try and extract the value out of the Option, and the best way to do that is the method getOrElse, which does exactly what it sounds like: it gets the value out of the option if there is one, otherwise it returns the default value that you supplied.

What's the reasoning behind adding the "case" keyword to Scala?

Apart from:
case class A
... case which is quite useful?
Why do we need to use case in match? Wouldn't:
x match {
y if y > 0 => y * 2
_ => -1
}
... be much prettier and concise?
Or why do we need to use case when a function takes a tuple? Say, we have:
val z = List((1, -1), (2, -2), (3, -3)).zipWithIndex
Now, isn't:
z map { case ((a, b), i) => a + b + i }
... way uglier than just:
z map (((a, b), i) => a + b + i)
...?
First, as we know, it is possible to put several statements for the same case scenario without needing some separation notation, just a line jump, like :
x match {
case y if y > 0 => y * 2
println("test")
println("test2") // these 3 statements belong to the same "case"
}
If case was not needed, compiler would have to find a way to know when a line is concerned by the next case scenario.
For example:
x match {
y if y > 0 => y * 2
_ => -1
}
How compiler would know whether _ => -1 belongs to the first case scenario or represents the next case?
Moreover, how compiler would know that the => sign doesn't represent a literal function but the actual code for the current case?
Compiler would certainly need a kind of code like this allowing cases isolation:
(using curly braces, or anything else)
x match {
{y if y > 0 => y * 2}
{_ => -1} // confusing with literal function notation
}
And surely, solution (provided currently by scala) using case keyword is a lot more readable and understandable than putting some way of separation like curly braces in my example.
Adding to #Mik378's answer:
When you write this: (a, b) => something, you are defining an anonymous Function2 - a function that takes two parameters.
When you write this: case (a, b) => something, you are defining an anonymous PartialFunction that takes one parameter and matches it against a pair.
So you need the case keyword to differentiate between these two.
The second issue, anonymous functions that avoid the case, is a matter of debate:
https://groups.google.com/d/msg/scala-debate/Q0CTZNOekWk/z1eg3dTkCXoJ
Also: http://www.scala-lang.org/old/node/1260
For the first issue, the choice is whether you allow a block or an expression on the RHS of the arrow.
In practice, I find that shorter case bodies are usually preferable, so I can certainly imagine your alternative syntax resulting in crisper code.
Consider one-line methods. You write:
def f(x: Int) = 2 * x
then you need to add a statement. I don't know if the IDE is able to auto-add parens.
def f(x: Int) = { val res = 2*x ; res }
That seems no worse than requiring the same syntax for case bodies.
To review, a case clause is case Pattern Guard => body.
Currently, body is a block, or a sequence of statements and a result expression.
If body were an expression, you'd need braces for multiple statements, like a function.
I don't think => results in ambiguities since function literals don't qualify as patterns, unlike literals like 1 or "foo".
One snag might be: { case foo => ??? } is a "pattern matching anonymous function" (SLS 8.5). Obviously, if the case is optional or eliminated, then { foo => ??? } is ambiguous. You'd have to distinguish case clauses for anon funs (where case is required) and case clauses in a match.
One counter-argument for the current syntax is that, in an intuition deriving from C, you always secretly hope that your match will compile to a switch table. In that metaphor, the cases are labels to jump to, and a label is just the address of a sequence of statements.
The alternative syntax might encourage a more inlined approach:
x match {
C => c(x)
D => d(x)
_ => ???
}
#inline def c(x: X) = ???
//etc
In this form, it looks more like a dispatch table, and the match body recalls the Map syntax, Map(a -> 1, b -> 2), that is, a tidy simplification of the association.
One of the key aspects of code readability is the words that grab your attention. For example,
return grabs your attention when you see it because you know that it is such a decisive action (breaking out of the function and possible sending a value back to the caller).
Another example is break--not that I like break, but it gets your attention.
I would agree with #Mik378 that case in Scala is more readable than the alternatives. Besides the compiler confusion he mentions, it gets your attention.
I am all for concise code, but there is a line between concise and illegible. I will gladly make the trade of 4n characters (where n is the number of cases) for the substantial readability that I get in return.

Using foldleft or some other operator to calculate point distances?

Ok so I thought this would be a snap, trying to practice Scala's collection operators and my example is a list of points.
The class can calculate and return the distance to another point (as double).
However, fold left doesn't seem to be the right solution - considering elements e1, e2, e3.. I need a moving window to calculate, I need the last element looked at to carry forward in the function - not just the sum
Sum {
e1.dist(e2)
e2.dist(e3)
etc
}
Reading the API I noticed a function called "sliding", perhaps that's the correct solution in conjunction with another operator. I know how to do this with loops of course, but trying to learn the scala way.
Thanks
import scala.math._
case class Point(x:Int, y:Int) {
def dist(p:Point) = sqrt( (p.x-x)^2+(p.y-y)^2 )
}
object Point {
//Unsure how to define this?
def dist(l:Seq[Point]) =l.foldLeft(0.0)((sum:Double,p:Point)=>)
}
I'm not quite sure what you want to do, but assuming you want the sum of the distances:
l.zip(l.tail).map { case (x,y) => x.dist(y) }.sum
Or with sliding:
l.sliding(2).map {
case List(fst,snd) => fst.dist(snd)
case _ => 0
}.sum
If you want to do it as a fold, you can, but you need the accumulator to keep both the total and the previous element:
l.foldLeft(l.head, 0.0){
case ((prev, sum), p) => (p, sum + p.dist(prev))
}._2
You finish with a tuple consiting of the last element and sum, so use ._2 to get the sum part.
btw, ^ on Int is bitwise logical XOR, not power. Use math.pow.
The smartest way is probably using zipped, which is a kind of iterator so you don't traverse the list more than once as you would using zip:
(l, l.tail).zipped.map( _ dist _ ).sum

Is Scala idiomatic coding style just a cool trap for writing inefficient code?

I sense that the Scala community has a little big obsession with writing "concise", "cool", "scala idiomatic", "one-liner" -if possible- code. This is immediately followed by a comparison to Java/imperative/ugly code.
While this (sometimes) leads to easy to understand code, it also leads to inefficient code for 99% of developers. And this is where Java/C++ is not easy to beat.
Consider this simple problem: Given a list of integers, remove the greatest element. Ordering does not need to be preserved.
Here is my version of the solution (It may not be the greatest, but it's what the average non-rockstar developer would do).
def removeMaxCool(xs: List[Int]) = {
val maxIndex = xs.indexOf(xs.max);
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.
Here is my totally uncool, Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.
def removeMaxFast(xs: List[Int]) = {
var res = ArrayBuffer[Int]()
var max = xs.head
var first = true;
for (x <- xs) {
if (first) {
first = false;
} else {
if (x > max) {
res.append(max)
max = x
} else {
res.append(x)
}
}
}
res.toList
}
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge
obstacle to cross for greater Scala adoption. Is there a way out of this trap?
I am looking for practical advice to avoid such "inefficiency traps" while keeping implementation clear ans concise.
Clarification: This question comes from a real-life scenario: I had to write a complex algorithm. First I wrote it in Scala, then I "had to" rewrite it in Java. The Java implementation was twice as long, and not that clear, but at the same time it was twice as fast. Rewriting the Scala code to be efficient would probably take some time and a somewhat deeper understanding of scala internal efficiencies (for vs. map vs. fold, etc)
Let's discuss a fallacy in the question:
So, if 99% of Java developers write more efficient code than 99% of
Scala developers, this is a huge obstacle to cross for greater Scala
adoption. Is there a way out of this trap?
This is presumed, with absolutely no evidence backing it up. If false, the question is moot.
Is there evidence to the contrary? Well, let's consider the question itself -- it doesn't prove anything, but shows things are not that clear.
Totally non-Scala idiomatic, non-functional, non-concise, but it's
very efficient. It traverses the list only once!
Of the four claims in the first sentence, the first three are true, and the fourth, as shown by user unknown, is false! And why it is false? Because, contrary to what the second sentence states, it traverses the list more than once.
The code calls the following methods on it:
res.append(max)
res.append(x)
and
res.toList
Let's consider first append.
append takes a vararg parameter. That means max and x are first encapsulated into a sequence of some type (a WrappedArray, in fact), and then passed as parameter. A better method would have been +=.
Ok, append calls ++=, which delegates to +=. But, first, it calls ensureSize, which is the second mistake (+= calls that too -- ++= just optimizes that for multiple elements). Because an Array is a fixed size collection, which means that, at each resize, the whole Array must be copied!
So let's consider this. When you resize, Java first clears the memory by storing 0 in each element, then Scala copies each element of the previous array over to the new array. Since size doubles each time, this happens log(n) times, with the number of elements being copied increasing each time it happens.
Take for example n = 16. It does this four times, copying 1, 2, 4 and 8 elements respectively. Since Java has to clear each of these arrays, and each element must be read and written, each element copied represents 4 traversals of an element. Adding all we have (n - 1) * 4, or, roughly, 4 traversals of the complete list. If you count read and write as a single pass, as people often erroneously do, then it's still three traversals.
One can improve on this by initializing the ArrayBuffer with an initial size equal to the list that will be read, minus one, since we'll be discarding one element. To get this size, we need to traverse the list once, though.
Now let's consider toList. To put it simply, it traverses the whole list to create a new list.
So, we have 1 traversal for the algorithm, 3 or 4 traversals for resize, and 1 additional traversal for toList. That's 4 or 5 traversals.
The original algorithm is a bit difficult to analyse, because take, drop and ::: traverse a variable number of elements. Adding all together, however, it does the equivalent of 3 traversals. If splitAt was used, it would be reduced to 2 traversals. With 2 more traversals to get the maximum, we get 5 traversals -- the same number as the non-functional, non-concise algorithm!
So, let's consider improvements.
On the imperative algorithm, if one uses ListBuffer and +=, then all methods are constant-time, which reduces it to a single traversal.
On the functional algorithm, it could be rewritten as:
val max = xs.max
val (before, _ :: after) = xs span (max !=)
before ::: after
That reduces it to a worst case of three traversals. Of course, there are other alternatives presented, based on recursion or fold, that solve it in one traversal.
And, most interesting of all, all of these algorithms are O(n), and the only one which almost incurred (accidentally) in worst complexity was the imperative one (because of array copying). On the other hand, the cache characteristics of the imperative one might well make it faster, because the data is contiguous in memory. That, however, is unrelated to either big-Oh or functional vs imperative, and it is just a matter of the data structures that were chosen.
So, if we actually go to the trouble of benchmarking, analyzing the results, considering performance of methods, and looking into ways of optimizing it, then we can find faster ways to do this in an imperative manner than in a functional manner.
But all this effort is very different from saying the average Java programmer code will be faster than the average Scala programmer code -- if the question is an example, that is simply false. And even discounting the question, we have seen no evidence that the fundamental premise of the question is true.
EDIT
First, let me restate my point, because it seems I wasn't clear. My point is that the code the average Java programmer writes may seem to be more efficient, but actually isn't. Or, put another way, traditional Java style doesn't gain you performance -- only hard work does, be it Java or Scala.
Next, I have a benchmark and results too, including almost all solutions suggested. Two interesting points about it:
Depending on list size, the creation of objects can have a bigger impact than multiple traversals of the list. The original functional code by Adrian takes advantage of the fact that lists are persistent data structures by not copying the elements right of the maximum element at all. If a Vector was used instead, both left and right sides would be mostly unchanged, which might lead to even better performance.
Even though user unknown and paradigmatic have similar recursive solutions, paradigmatic's is way faster. The reason for that is that he avoids pattern matching. Pattern matching can be really slow.
The benchmark code is here, and the results are here.
def removeOneMax (xs: List [Int]) : List [Int] = xs match {
case x :: Nil => Nil
case a :: b :: xs => if (a < b) a :: removeOneMax (b :: xs) else b :: removeOneMax (a :: xs)
case Nil => Nil
}
Here is a recursive method, which only iterates once. If you need performance, you have to think about it, if not, not.
You can make it tail-recursive in the standard way: giving an extra parameter carry, which is per default the empty List, and collects the result while iterating. That is, of course, a bit longer, but if you need performance, you have to pay for it:
import annotation.tailrec
#tailrec
def removeOneMax (xs: List [Int], carry: List [Int] = List.empty) : List [Int] = xs match {
case a :: b :: xs => if (a < b) removeOneMax (b :: xs, a :: carry) else removeOneMax (a :: xs, b :: carry)
case x :: Nil => carry
case Nil => Nil
}
I don't know what the chances are, that later compilers will improve slower map-calls to be as fast as while-loops. However: You rarely need high speed solutions, but if you need them often, you will learn them fast.
Do you know how big your collection has to be, to use a whole second for your solution on your machine?
As oneliner, similar to Daniel C. Sobrals solution:
((Nil : List[Int], xs(0)) /: xs.tail) ((p, x)=> if (p._2 > x) (x :: p._1, p._2) else ((p._2 :: p._1), x))._1
but that is hard to read, and I didn't measure the effective performance. The normal pattern is (x /: xs) ((a, b) => /* something */). Here, x and a are pairs of List-so-far and max-so-far, which solves the problem to bring everything into one line of code, but isn't very readable. However, you can earn reputation on CodeGolf this way, and maybe someone likes to make a performance measurement.
And now to our big surprise, some measurements:
An updated timing-method, to get the garbage collection out of the way, and have the hotspot-compiler warm up, a main, and many methods from this thread, together in an Object named
object PerfRemMax {
def timed (name: String, xs: List [Int]) (f: List [Int] => List [Int]) = {
val a = System.currentTimeMillis
val res = f (xs)
val z = System.currentTimeMillis
val delta = z-a
println (name + ": " + (delta / 1000.0))
res
}
def main (args: Array [String]) : Unit = {
val n = args(0).toInt
val funs : List [(String, List[Int] => List[Int])] = List (
"indexOf/take-drop" -> adrian1 _,
"arraybuf" -> adrian2 _, /* out of memory */
"paradigmatic1" -> pm1 _, /**/
"paradigmatic2" -> pm2 _,
// "match" -> uu1 _, /*oom*/
"tailrec match" -> uu2 _,
"foldLeft" -> uu3 _,
"buf-=buf.max" -> soc1 _,
"for/yield" -> soc2 _,
"splitAt" -> daniel1,
"ListBuffer" -> daniel2
)
val r = util.Random
val xs = (for (x <- 1 to n) yield r.nextInt (n)).toList
// With 1 Mio. as param, it starts with 100 000, 200k, 300k, ... 1Mio. cases.
// a) warmup
// b) look, where the process gets linear to size
funs.foreach (f => {
(1 to 10) foreach (i => {
timed (f._1, xs.take (n/10 * i)) (f._2)
compat.Platform.collectGarbage
});
println ()
})
}
I renamed all the methods, and had to modify uu2 a bit, to fit to the common method declaration (List [Int] => List [Int]).
From the long result, i only provide the output for 1M invocations:
scala -Dserver PerfRemMax 2000000
indexOf/take-drop: 0.882
arraybuf: 1.681
paradigmatic1: 0.55
paradigmatic2: 1.13
tailrec match: 0.812
foldLeft: 1.054
buf-=buf.max: 1.185
for/yield: 0.725
splitAt: 1.127
ListBuffer: 0.61
The numbers aren't completly stable, depending on the sample size, and a bit varying from run to run. For example, for 100k to 1M runs, in steps of 100k, the timing for splitAt was as follows:
splitAt: 0.109
splitAt: 0.118
splitAt: 0.129
splitAt: 0.139
splitAt: 0.157
splitAt: 0.166
splitAt: 0.749
splitAt: 0.752
splitAt: 1.444
splitAt: 1.127
The initial solution is already pretty fast. splitAt is a modification from Daniel, often faster, but not always.
The measurement was done on a single core 2Ghz Centrino, running xUbuntu Linux, Scala-2.8 with Sun-Java-1.6 (desktop).
The two lessons for me are:
always measure your performance improvements; it is very hard to estimate it, if you don't do it on a daily basis
it is not only fun, to write functional code - sometimes the result is even faster
Here is a link to my benchmarkcode, if somebody is interested.
First of all, the behavior of the methods you presented is not the same. The first one keeps the element ordering, while the second one doesn't.
Second, among all the possible solution which could be qualified as "idiomatic", some are more efficient than others. Staying very close to your example, you can for instance use tail-recursion to eliminate variables and manual state management:
def removeMax1( xs: List[Int] ) = {
def rec( max: Int, rest: List[Int], result: List[Int]): List[Int] = {
if( rest.isEmpty ) result
else if( rest.head > max ) rec( rest.head, rest.tail, max :: result)
else rec( max, rest.tail, rest.head :: result )
}
rec( xs.head, xs.tail, List() )
}
or fold the list:
def removeMax2( xs: List[Int] ) = {
val result = xs.tail.foldLeft( xs.head -> List[Int]() ) {
(acc,x) =>
val (max,res) = acc
if( x > max ) x -> ( max :: res )
else max -> ( x :: res )
}
result._2
}
If you want to keep the original insertion order, you can (at the expense of having two passes, rather than one) without any effort write something like:
def removeMax3( xs: List[Int] ) = {
val max = xs.max
xs.filterNot( _ == max )
}
which is more clear than your first example.
The biggest inefficiency when you're writing a program is worrying about the wrong things. This is usually the wrong thing to worry about. Why?
Developer time is generally much more expensive than CPU time — in fact, there is usually a dearth of the former and a surplus of the latter.
Most code does not need to be very efficient because it will never be running on million-item datasets multiple times every second.
Most code does need to bug free, and less code is less room for bugs to hide.
The example you gave is not very functional, actually. Here's what you are doing:
// Given a list of Int
def removeMaxCool(xs: List[Int]): List[Int] = {
// Find the index of the biggest Int
val maxIndex = xs.indexOf(xs.max);
// Then take the ints before and after it, and then concatenate then
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
Mind you, it is not bad, but you know when functional code is at its best when it describes what you want, instead of how you want it. As a minor criticism, if you used splitAt instead of take and drop you could improve it slightly.
Another way of doing it is this:
def removeMaxCool(xs: List[Int]): List[Int] = {
// the result is the folding of the tail over the head
// and an empty list
xs.tail.foldLeft(xs.head -> List[Int]()) {
// Where the accumulated list is increased by the
// lesser of the current element and the accumulated
// element, and the accumulated element is the maximum between them
case ((max, ys), x) =>
if (x > max) (x, max :: ys)
else (max, x :: ys)
// and of which we return only the accumulated list
}._2
}
Now, let's discuss the main issue. Is this code slower than the Java one? Most certainly! Is the Java code slower than a C equivalent? You can bet it is, JIT or no JIT. And if you write it directly in assembler, you can make it even faster!
But the cost of that speed is that you get more bugs, you spend more time trying to understand the code to debug it, and you have less visibility of what the overall program is doing as opposed to what a little piece of code is doing -- which might result in performance problems of its own.
So my answer is simple: if you think the speed penalty of programming in Scala is not worth the gains it brings, you should program in assembler. If you think I'm being radical, then I counter that you just chose the familiar as being the "ideal" trade off.
Do I think performance doesn't matter? Not at all! I think one of the main advantages of Scala is leveraging gains often found in dynamically typed languages with the performance of a statically typed language! Performance matters, algorithm complexity matters a lot, and constant costs matters too.
But, whenever there is a choice between performance and readability and maintainability, the latter is preferable. Sure, if performance must be improved, then there isn't a choice: you have to sacrifice something to it. And if there's no lost in readability/maintainability -- such as Scala vs dynamically typed languages -- sure, go for performance.
Lastly, to gain performance out of functional programming you have to know functional algorithms and data structures. Sure, 99% of Java programmers with 5-10 years experience will beat the performance of 99% of Scala programmers with 6 months experience. The same was true for imperative programming vs object oriented programming a couple of decades ago, and history shows it didn't matter.
EDIT
As a side note, your "fast" algorithm suffer from a serious problem: you use ArrayBuffer. That collection does not have constant time append, and has linear time toList. If you use ListBuffer instead, you get constant time append and toList.
For reference, here's how splitAt is defined in TraversableLike in the Scala standard library,
def splitAt(n: Int): (Repr, Repr) = {
val l, r = newBuilder
l.sizeHintBounded(n, this)
if (n >= 0) r.sizeHint(this, -n)
var i = 0
for (x <- this) {
(if (i < n) l else r) += x
i += 1
}
(l.result, r.result)
}
It's not unlike your example code of what a Java programmer might come up with.
I like Scala because, where performance matters, mutability is a reasonable way to go. The collections library is a great example; especially how it hides this mutability behind a functional interface.
Where performance isn't as important, such as some application code, the higher order functions in Scala's library allow great expressivity and programmer efficiency.
Out of curiosity, I picked an arbitrary large file in the Scala compiler (scala.tools.nsc.typechecker.Typers.scala) and counted something like 37 for loops, 11 while loops, 6 concatenations (++), and 1 fold (it happens to be a foldRight).
What about this?
def removeMax(xs: List[Int]) = {
val buf = xs.toBuffer
buf -= (buf.max)
}
A bit more ugly, but faster:
def removeMax(xs: List[Int]) = {
var max = xs.head
for ( x <- xs.tail )
yield {
if (x > max) { val result = max; max = x; result}
else x
}
}
Try this:
(myList.foldLeft((List[Int](), None: Option[Int]))) {
case ((_, None), x) => (List(), Some(x))
case ((Nil, Some(m), x) => (List(Math.min(x, m)), Some(Math.max(x, m))
case ((l, Some(m), x) => (Math.min(x, m) :: l, Some(Math.max(x, m))
})._1
Idiomatic, functional, traverses only once. Maybe somewhat cryptic if you are not used to functional-programming idioms.
Let's try to explain what is happening here. I will try to make it as simple as possible, lacking some rigor.
A fold is an operation on a List[A] (that is, a list that contains elements of type A) that will take an initial state s0: S (that is, an instance of a type S) and a function f: (S, A) => S (that is, a function that takes the current state and an element from the list, and gives the next state, ie, it updates the state according to the next element).
The operation will then iterate over the elements of the list, using each one to update the state according to the given function. In Java, it would be something like:
interface Function<T, R> { R apply(T t); }
class Pair<A, B> { ... }
<State> State fold(List<A> list, State s0, Function<Pair<A, State>, State> f) {
State s = s0;
for (A a: list) {
s = f.apply(new Pair<A, State>(a, s));
}
return s;
}
For example, if you want to add all the elements of a List[Int], the state would be the partial sum, that would have to be initialized to 0, and the new state produced by a function would simply add the current state to the current element being processed:
myList.fold(0)((partialSum, element) => partialSum + element)
Try to write a fold to multiply the elements of a list, then another one to find extreme values (max, min).
Now, the fold presented above is a bit more complex, since the state is composed of the new list being created along with the maximum element found so far. The function that updates the state is more or less straightforward once you grasp these concepts. It simply puts into the new list the minimum between the current maximum and the current element, while the other value goes to the current maximum of the updated state.
What is a bit more complex than to understand this (if you have no FP background) is to come up with this solution. However, this is only to show you that it exists, can be done. It's just a completely different mindset.
EDIT: As you see, the first and second case in the solution I proposed are used to setup the fold. It is equivalent to what you see in other answers when they do xs.tail.fold((xs.head, ...)) {...}. Note that the solutions proposed until now using xs.tail/xs.head don't cover the case in which xs is List(), and will throw an exception. The solution above will return List() instead. Since you didn't specify the behavior of the function on empty lists, both are valid.
Another option would be:
package code.array
object SliceArrays {
def main(args: Array[String]): Unit = {
println(removeMaxCool(Vector(1,2,3,100,12,23,44)))
}
def removeMaxCool(xs: Vector[Int]) = xs.filter(_ < xs.max)
}
Using Vector instead of List, the reason is that Vector is more versatile and has a better general performance and time complexity if compared to List.
Consider the following collections operations:
head, tail, apply, update, prepend, append
Vector takes an amortized constant time for all operations, as per Scala docs:
"The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys"
While List takes constant time only for head, tail and prepend operations.
Using
scalac -print
generates:
package code.array {
object SliceArrays extends Object {
def main(args: Array[String]): Unit = scala.Predef.println(SliceArrays.this.removeMaxCool(scala.`package`.Vector().apply(scala.Predef.wrapIntArray(Array[Int]{1, 2, 3, 100, 12, 23, 44})).$asInstanceOf[scala.collection.immutable.Vector]()));
def removeMaxCool(xs: scala.collection.immutable.Vector): scala.collection.immutable.Vector = xs.filter({
((x$1: Int) => SliceArrays.this.$anonfun$removeMaxCool$1(xs, x$1))
}).$asInstanceOf[scala.collection.immutable.Vector]();
final <artifact> private[this] def $anonfun$removeMaxCool$1(xs$1: scala.collection.immutable.Vector, x$1: Int): Boolean = x$1.<(scala.Int.unbox(xs$1.max(scala.math.Ordering$Int)));
def <init>(): code.array.SliceArrays.type = {
SliceArrays.super.<init>();
()
}
}
}
Another contender. This uses a ListBuffer, like Daniel's second offering, but shares the post-max tail of the original list, avoiding copying it.
def shareTail(xs: List[Int]): List[Int] = {
var res = ListBuffer[Int]()
var maxTail = xs
var first = true;
var x = xs
while ( x != Nil ) {
if (x.head > maxTail.head) {
while (!(maxTail.head == x.head)) {
res += maxTail.head
maxTail = maxTail.tail
}
}
x = x.tail
}
res.prependToList(maxTail.tail)
}

Infinite streams in Scala

Say I have a function, for example the old favourite
def factorial(n:Int) = (BigInt(1) /: (1 to n)) (_*_)
Now I want to find the biggest value of n for which factorial(n) fits in a Long. I could do
(1 to 100) takeWhile (factorial(_) <= Long.MaxValue) last
This works, but the 100 is an arbitrary large number; what I really want on the left hand side is an infinite stream that keeps generating higher numbers until the takeWhile condition is met.
I've come up with
val s = Stream.continually(1).zipWithIndex.map(p => p._1 + p._2)
but is there a better way?
(I'm also aware I could get a solution recursively but that's not what I'm looking for.)
Stream.from(1)
creates a stream starting from 1 and incrementing by 1. It's all in the API docs.
A Solution Using Iterators
You can also use an Iterator instead of a Stream. The Stream keeps references of all computed values. So if you plan to visit each value only once, an iterator is a more efficient approach. The downside of the iterator is its mutability, though.
There are some nice convenience methods for creating Iterators defined on its companion object.
Edit
Unfortunately there's no short (library supported) way I know of to achieve something like
Stream.from(1) takeWhile (factorial(_) <= Long.MaxValue) last
The approach I take to advance an Iterator for a certain number of elements is drop(n: Int) or dropWhile:
Iterator.from(1).dropWhile( factorial(_) <= Long.MaxValue).next - 1
The - 1 works for this special purpose but is not a general solution. But it should be no problem to implement a last method on an Iterator using pimp my library. The problem is taking the last element of an infinite Iterator could be problematic. So it should be implemented as method like lastWith integrating the takeWhile.
An ugly workaround can be done using sliding, which is implemented for Iterator:
scala> Iterator.from(1).sliding(2).dropWhile(_.tail.head < 10).next.head
res12: Int = 9
as #ziggystar pointed out, Streams keeps the list of previously computed values in memory, so using Iterator is a great improvment.
to further improve the answer, I would argue that "infinite streams", are usually computed (or can be computed) based on pre-computed values. if this is the case (and in your factorial stream it definately is), I would suggest using Iterator.iterate instead.
would look roughly like this:
scala> val it = Iterator.iterate((1,BigInt(1))){case (i,f) => (i+1,f*(i+1))}
it: Iterator[(Int, scala.math.BigInt)] = non-empty iterator
then, you could do something like:
scala> it.find(_._2 >= Long.MaxValue).map(_._1).get - 1
res0: Int = 22
or use #ziggystar sliding solution...
another easy example that comes to mind, would be fibonacci numbers:
scala> val it = Iterator.iterate((1,1)){case (a,b) => (b,a+b)}.map(_._1)
it: Iterator[Int] = non-empty iterator
in these cases, your'e not computing your new element from scratch every time, but rather do an O(1) work for every new element, which would improve your running time even more.
The original "factorial" function is not optimal, since factorials are computed from scratch every time. The simplest/immutable implementation using memoization is like this:
val f : Stream[BigInt] = 1 #:: (Stream.from(1) zip f).map { case (x,y) => x * y }
And now, the answer can be computed like this:
println( "count: " + (f takeWhile (_<Long.MaxValue)).length )
The following variant does not test the current, but the next integer, in order to find and return the last valid number:
Iterator.from(1).find(i => factorial(i+1) > Long.MaxValue).get
Using .get here is acceptable, since find on an infinite sequence will never return None.