Scala - folding on values that result from object interaction - scala

In Scala I have a list of objects that represent points and contain x and y values. The list describes a path that goes through all these points sequentially. My question is how to use folding on that list in order to find the total length of the path? Or maybe there is even a better functional or Scala way to do this?
What I have came up with is this:
def distance = (0 /: wps)(Waypoint.distance(_, _))
but ofcourse this is totally wrong because distance returns Float, but accepts two Waypoint objects.
UPDATE:
Thanks for the proposed solutions! They are definitely interesting, but I think that this is too much functional for real-time calculations that may become heavy. So far I have came out with these lines:
val distances = for(i <- 0 until wps.size) yield wps(i).distanceTo(wps(i + 1))
val distance = (0f /: distances)(_ + _)
I feel this to be a fair imperative/functional mix that is both fast and also leaves the distances values between each waypoint for further possible references which is also a benifit in my case.
UPDATE 2: Actually, to determine, what is faster, I will have to do benchmarks of all the proposed solutions on all types of sequences.

This should work.
(wps, wps drop 1).zipped.map(Waypoint.distance).sum

Don't know if fold can be used here, but try this:
wps.sliding(2).map(segment => Waypoint.distance(segment(0), segment(1))).sum
wps.sliding(2) returns a list of all subsequent pairs. Or if you prefer pattern matching:
wps.sliding(2).collect{case start :: end :: Nil => Waypoint.distance(start, end)}.sum
BTW consider defining:
def distanceTo(to: Waypoint)
on Waypoint class directly, not on companion object as it looks more object-oriented and will allow you to write nice DSL-like code:
point1.distanceTo(point2)
or even:
point1 distanceTo point2
wps.sliding(2).collect{
case start :: end :: Nil => start distanceTo end
}.sum

Your comment "too much functional for real-time calculations that may become heavy" makes this interesting. Benchmarking and profiling are critical, since you don't want to write a bunch of hard-to-maintain code for the sake of performance, only to find out that it's not a performance critical part of your application in the first place! Or, even worse, find out that your performance optimizations makes things worse for your specific workload.
The best performing implementation will depend on your specifics (How long are the paths? How many cores are on the system?) But I think blending imperative and functional approaches may give you the worst-of-both worlds. You could lose out on both readability and performance if you're not careful!
I would very slightly modify missingfaktor's answer to allow you to have performance gains from parallel collections. The fact that simply adding .par could give you a tremendous performance boost demonstrates the power of sticking with functional programming!
def distancePar(wps: collection.GenSeq[Waypoint]): Double = {
val parwps = wps.par
parwps.zip(parwps drop 1).map(Function.tupled(distance)).sum
}
My guess is that this would work best if you have several of cores to throw at the problem, and wps tends to be somewhat long. If you have few cores or short paths, then parallelism will probably hurt more than it helps.
The other extreme would be a fully imperative solution. Writing imperative implementations of individual, performance critical, functions is usually acceptable, so long as you avoid shared mutable state. But once you get used to FP, you'll find this sort of function more difficult to write and maintain. And it's also not easy to parallelize.
def distanceImp(wps: collection.GenSeq[Waypoint]): Double = {
if (wps.size <= 1) {
0.0
} else {
var r = 0.0
var here = wps.head
var remaining = wps.tail
while (!remaining.isEmpty) {
r += distance(here, remaining.head)
here = remaining.head
remaining = remaining.tail
}
r
}
}
Finally, if you're looking for a middle ground between FP and imperative, you might try recursion. I haven't profiled it, but my guess is that this will be roughly equivalent to the imperative solution in terms of performance.
def distanceRec(wps: collection.GenSeq[Waypoint]): Double = {
#annotation.tailrec
def helper(acc: Double, here: Waypoint, remaining: collection.GenSeq[Waypoint]): Double =
if (remaining.isEmpty)
acc
else
helper(acc + distance(here, remaining.head), remaining.head, remaining.tail)
if (wps.size <= 1)
0.0
else
helper(0.0, wps.head, wps.tail)
}

If you are doing indexing of any kind you want to be using Vector, not List:
scala> def timed(op: => Unit) = { val start = System.nanoTime; op; (System.nanoTime - start) / 1e9 }
timed: (op: => Unit)Double
scala> val l = List.fill(100000)(1)
scala> val v = Vector.fill(100000)(1)
scala> timed { var t = 0; for (i <- 0 until l.length - 1) yield t += l(i) + l(i + 1) }
res2: Double = 16.252194583
scala> timed { var t = 0; for (i <- 0 until v.length - 1) yield t += v(i) + v(i + 1) }
res3: Double = 0.047047654
ListBuffer offers fast appends, it doesn't offer fast random access.

Related

Scala: IndexedSeq.newBuilder vs. ArrayBuffer

I had accepted that building an IndexedSeq in a loop should use an ArrayBuffer, followed by a conversion to a Vector via ".toVector()".
In an example profiled showed the CPU hotspot was in this section, and so I tried an alternative: use IndexedSeq.newBuilder() followed by conversion to immutable via ".result()".
This change gave a significance performance improvement. The code looks almost the same. So it seems using IndexedSeq.newBuilder() is best practice. Is this correct? The example method is shown below, with the ArrayBuffer difference commented out.
def interleave[T](a: IndexedSeq[T], b: IndexedSeq[T]): IndexedSeq[T] = {
val al = a.length
val bl = b.length
val buffer = IndexedSeq.newBuilder[T]
//---> val buffer = new ArrayBuffer[T](al + bl)
val commonLength = Math.min(al, bl)
val aExtra = al - commonLength
val bExtra = bl - commonLength
var i = 0
while (i < commonLength) {
buffer += a(i)
buffer += b(i)
i += 1
}
if (aExtra > 0) {
while (i < al) {
buffer += a(i)
i += 1
}
} else if (bExtra > 0) {
while (i < bl) {
buffer += b(i)
i += 1
}
}
buffer.result()
//---> buffer.toVector()
}
As to which is best practice, I guess it depends upon your requirements. Both approaches are acceptable and understandable. All things being equal, in this particular case, I would favor the IndexedSeq.newBuilder over ArrayBuilder (since the latter targets the creation of an Array, while the former's result is a Vector).
Just one point on benchmarking: this is a real art form, due to caching, JIT & HotSpot performance, garbage collection, etc. One piece of software you might consider using to do this is ScalaMeter. You will need to write both versions of the function to populate the final vector, and ScalaMeter will give you accurate statistics on both. ScalaMeter allows the code to warm-up before taking measurements, and can also look at memory requirements as well as CPU time.
In this example informal testing did not deceive, but ScalaMeter does provide a clearer picture of performance. Building the result in an ArrayBuffer (top orange line) is definitely slower than the more direct newBuilder (blue line).
Returning the ArrayBuffer as an IndexedSeq is the fastest (green line), but of course it does not give you the true protection of an immutable collection.
Building the intermediate result in an Array (red line) is intermediate between ArrayBuffer and newBuilder.
The "zipAll" collection method allows the interleave to be done in a more functional style:
def interleaveZipAllBuilderPat[T](a: IndexedSeq[T], b: IndexedSeq[T]): IndexedSeq[T] = {
a.zipAll(b, null, null).foldLeft(Vector.newBuilder[T]) { (z, tp) =>
tp match {
case ((x:T, null)) => z += x
case ((x:T,y:T)) => z += x += y
}
}.result()
}
The slowest are the functional method, with the top two almost the same and these differ only in that one does the pattern match, and the other an if statement, so the pattern is not slow.
Functional is marginally worse than the direct loop method if an ArrayBuffer is used to accumulate the result, but the direct loop using the newBuilder is significantly faster.
If "zipAll" could return a builder, and if the builder were iterable, the functional style could be faster - no need to produce the immutable result if the next step just requires an iteration over elements.
So for me newBuilder is the clear winner.

Selection Sort - Functional Style with recursion

Have only recently started learning Scala and am trying to delve into Functional Programming. I have seen many of the posts on Selection Sort Functional style; but am not totally been able to understand all the solutions that have been given. My Scala skills are still Nascent.
I have written a piece of Scala code using tail recursion and would appreciate any feedback on the style. Does it look like Functional Programming? Is there a way to make this better or make it more functional?
import scala.annotation.tailrec
object FuncSelectionSort {
/**
* Selection Sort - Trying Functional Style
*/
def sort(a: Array[Int]) = {
val b: Array[Int] = new Array[Int](a.size)
Array.copy(a, 0, b, 0, a.size)
// Function to swap elements
def exchange(i: Int, j: Int): Unit = {
val k = b(i);
b(i) = b(j);
b(j) = k;
}
#tailrec
def helper(b: Array[Int], n: Int): Array[Int] = {
if (n == b.length-1) return b
else {
val head = b(n);
val minimumInTail = b.slice(n, b.length).min;
if (head > minimumInTail) {
val minimumInTailIndex = b.slice(n, b.length).indexOf(minimumInTail);
exchange(n, minimumInTailIndex + n);
}
helper(b, n + 1)
}
}
helper(b, 0)
}
}
The logic that I have tried to adopt is fairly simple. I start with the first index of the Array and find the minimum from the rest. But instead of passing the Array.tail for the next recursion; I pass in the full array and check a slice, where each slice is one smaller than the previous recursion slice.
For example,
If Array(10, 4, 6, 9, 3, 5)
First pass -> head = 10, slice = 4,6,9,3,5
First pass -> head = 4, slice = 6,9,3,5
I feel it looks the same as passing the tail, but I wanted to try and slice and see if it works the same way.
Appreciate your help.
For detailed feedback on working code, you should better go to codereview; however, I can say one thing: namely, in-place sorting arrays is per se not a good example of functional programming. This is because we purists don't like mutability, as it doesn't fit together well with recursion over data -- especially your mixing of recursion and mutation is not really good style, I'd say (and hard to read).
One clean variant would be to copy the full original array, and use in-place selection sort implemented as normal imperative code (with loops and in-place swap). Encapsulated in a function, this is pure to the outside. This pattern is commonly used in the standard library; cf. List.scala.
The other variant, and probably more instructive for learning immutable programming, is to use an immutable recursive algorithm over linked lists:
def sorted(a: List[Int]): List[Int] = a match {
case Nil => Nil
case xs => xs.min :: sorted(xs.diff(List(xs.min)))
}
From that style of programming, you'll learn much more about functional thinking (leaving aside efficiency though). Exercise: transform that code into tail-recursion.
(And actually, insertion sort works nicer with this pattern, since you don't have to "remove" at every step, but can build up a sorted linked list; you might try to implement that, too).

How to write an efficient groupBy-size filter in Scala, can be approximate

Given a List[Int] in Scala, I wish to get the Set[Int] of all Ints which appear at least thresh times. I can do this using groupBy or foldLeft, then filter. For example:
val thresh = 3
val myList = List(1,2,3,2,1,4,3,2,1)
myList.foldLeft(Map[Int,Int]()){case(m, i) => m + (i -> (m.getOrElse(i, 0) + 1))}.filter(_._2 >= thresh).keys
will give Set(1,2).
Now suppose the List[Int] is very large. How large it's hard to say but in any case this seems wasteful as I don't care about each of the Ints frequencies, and I only care if they're at least thresh. Once it passed thresh there's no need to check anymore, just add the Int to the Set[Int].
The question is: can I do this more efficiently for a very large List[Int],
a) if I need a true, accurate result (no room for mistakes)
b) if the result can be approximate, e.g. by using some Hashing trick or Bloom Filters, where Set[Int] might include some false-positives, or whether {the frequency of an Int > thresh} isn't really a Boolean but a Double in [0-1].
First of all, you can't do better than O(N), as you need to check each element of your initial array at least once. You current approach is O(N), presuming that operations with IntMap are effectively constant.
Now what you can try in order to increase efficiency:
update map only when current counter value is less or equal to threshold. This will eliminate huge number of most expensive operations — map updates
try faster map instead of IntMap. If you know that values of the initial List are in fixed range, you can use Array instead of IntMap (index as the key). Another possible option will be mutable HashMap with sufficient initail capacity. As my benchmark shows it actually makes significant difference
As #ixx proposed, after incrementing value in the map, check whether it's equal to 3 and in this case add it immediately to result list. This will save you one linear traversing (appears to be not that significant for large input)
I don't see how any approximate solution can be faster (only if you ignore some elements at random). Otherwise it will still be O(N).
Update
I created microbenchmark to measure the actual performance of different implementations. For sufficiently large input and output Ixx's suggestion regarding immediately adding elements to result list doesn't produce significant improvement. However similar approach could be used to eliminate unnecessary Map updates (which appears to be the most expensive operation).
Results of benchmarks (avg run times on 1000000 elems with pre-warming):
Authors solution:
447 ms
Ixx solution:
412 ms
Ixx solution2 (eliminated excessive map writes):
150 ms
My solution:
57 ms
My solution involves using mutable HashMap instead of immutable IntMap and includes all other possible optimizations.
Ixx's updated solution:
val tuple = (Map[Int, Int](), List[Int]())
val res = myList.foldLeft(tuple) {
case ((m, s), i) =>
val count = m.getOrElse(i, 0) + 1
(if (count <= 3) m + (i -> count) else m, if (count == thresh) i :: s else s)
}
My solution:
val map = new mutable.HashMap[Int, Int]()
val res = new ListBuffer[Int]
myList.foreach {
i =>
val c = map.getOrElse(i, 0) + 1
if (c == thresh) {
res += i
}
if (c <= thresh) {
map(i) = c
}
}
The full microbenchmark source is available here.
You could use the foldleft to collect the matching items, like this:
val tuple = (Map[Int,Int](), List[Int]())
myList.foldLeft(tuple) {
case((m, s), i) => {
val count = (m.getOrElse(i, 0) + 1)
(m + (i -> count), if (count == thresh) i :: s else s)
}
}
I could measure a performance improvement of about 40% with a small list, so it's definitely an improvement...
Edited to use List and prepend, which takes constant time (see comments).
If by "more efficiently" you mean the space efficiency (in extreme case when the list is infinite), there's a probabilistic data structure called Count Min Sketch to estimate the frequency of items inside it. Then you can discard those with frequency below your threshold.
There's a Scala implementation from Algebird library.
You can change your foldLeft example a bit using a mutable.Set that is build incrementally and at the same time used as filter for iterating over your Seq by using withFilter. However, because I'm using withFilteri cannot use foldLeft and have to make do with foreach and a mutable map:
import scala.collection.mutable
def getItems[A](in: Seq[A], threshold: Int): Set[A] = {
val counts: mutable.Map[A, Int] = mutable.Map.empty
val result: mutable.Set[A] = mutable.Set.empty
in.withFilter(!result(_)).foreach { x =>
counts.update(x, counts.getOrElse(x, 0) + 1)
if (counts(x) >= threshold) {
result += x
}
}
result.toSet
}
So, this would discard items that have already been added to the result set while running through the Seq the first time, because withFilterfilters the Seqin the appended function (map, flatMap, foreach) rather than returning a filtered Seq.
EDIT:
I changed my solution to not use Seq.count, which was stupid, as Aivean correctly pointed out.
Using Aiveans microbench I can see that it is still slightly slower than his approach, but still better than the authors first approach.
Authors solution
377
Ixx solution:
399
Ixx solution2 (eliminated excessive map writes):
110
Sascha Kolbergs solution:
72
Aivean solution:
54

Why stream fold operation throws Out of memory exception?

I have following simple code
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
And it throws OutOfMemmoryError exception.
I can not understand why, because I think all the parts use constant memmory i.e. lazy evaluation streams and foldLeft...
Those code also don't work
fib(1,1).take(10000000).sum or max, min e.t.c.
How to correctly implement infinite streams and do iterative operations upon it?
Scala version: 2.9.0
Also scala javadoc said, that foldLeft operation is memmory safe for streams
/** Stream specialization of foldLeft which allows GC to collect
* along the way.
*/
#tailrec
override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
if (this.isEmpty) z
else tail.foldLeft(op(z, head))(op)
}
EDIT:
Implementation with iterators still not useful, since it throws ${domainName} exception
def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
How to define correctly infinite stream/iterator in Scala?
EDIT2:
I don't care about int overflow, I just want to understand how to create infinite stream/iterator etc in scala without side effects .
The reason to use Stream instead of Iterator is so that you don't have to calculate all the small terms in the series over again. But this means that you need to store ten million stream nodes. These are pretty large, unfortunately, so that could be enough to overflow the default memory. The only realistic way to overcome this is to start with more memory (e.g. scala -J-Xmx2G). (Also, note that you're going to overflow Long by an enormous margin; the Fibonacci series increases pretty quickly.)
P.S. The iterator implementation I have in mind is completely different; you don't build it out of concatenated singleton Iterators:
def fib(i: Long, j: Long) = Iterator.iterate((i,j)){ case (a,b) => (b,a+b) }.map(_._1)
Now when you fold, past results can be discarded.
The OutOfMemoryError happens indenpendently from the fact that you use Stream. As Rex Kerr mentioned above, Stream -- unlike Iterator -- stores everything in memory. The difference with List is that the elements of Stream are calculated lazily, but once you reach 10000000, there will be 10000000 elements, just like List.
Try with new Array[Int](10000000), you will have the same problem.
To calculate the fibonacci number as above you may want to use different approach. You can take into account the fact that you only need to have two numbers, instead of the whole fibonacci numbers discovered so far.
For example:
scala> def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
fib: (i: Long,j: Long)Iterator[Long]
And to get, for example, the index of the first fibonacci number exceeding 1000000:
scala> fib(1, 1).indexWhere(_ > 1000000)
res12: Int = 30
Edit: I added the following lines to cope with the StackOverflow
If you really want to work with 1 millionth fibonacci number, the iterator definition above will not work either for StackOverflowError. The following is the best I have in mind at the moment:
class FibIterator extends Iterator[BigDecimal] {
var i: BigDecimal = 1
var j: BigDecimal = 1
def next = {val temp = i
i = i + j
j = temp
j }
def hasNext = true
}
scala> new FibIterator().take(1000000).foldLeft(0:BigDecimal)(_ + _)
res49: BigDecimal = 82742358764415552005488531917024390424162251704439978804028473661823057748584031
0652444660067860068576582339667553466723534958196114093963106431270812950808725232290398073106383520
9370070837993419439389400053162345760603732435980206131237515815087375786729469542122086546698588361
1918333940290120089979292470743729680266332315132001038214604422938050077278662240891771323175496710
6543809955073045938575199742538064756142664237279428808177636434609546136862690895665103636058513818
5599492335097606599062280930533577747023889877591518250849190138449610994983754112730003192861138966
1418736269315695488126272680440194742866966916767696600932919528743675517065891097024715258730309025
7920682881137637647091134870921415447854373518256370737719553266719856028732647721347048627996967...
#yura's problem:
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
besides using a Long which can't possibly hold the Fibonacci of 10,000,000, it does work. That is, if the foldLeft is written as:
fib(1,1).take(10000000).foldLeft(0L)(_+_)
Looking at the Streams.scala source, foldLeft() is clearly designed for Garbage Collection, but /: is not def'd.
The other answers alluded to another problem. The Fibonacci of 10 million is a big number and if BigInt is used, instead of just overflowing like with a Long, absolutely enormous numbers are being added to each over and over again.
Since Stream.foldLeft is optimized for GC it does look like the way to solve for really big Fibonacci numbers, rather than using a zip or tail recursion.
// Fibonacci using BigInt
def fib(i:BigInt,j:BigInt):Stream[BigInt] = i #:: fib(j, i+j)
fib(1,0).take(10000000).foldLeft(BigInt("0"))(_+_)
Results of the above code: 10,000,000 is a 8-figure number. How many figures in fib(10000000)? 2,089,877
fib(1,1).take(10000000) is the "this" of the method /:, it is likely that the JVM will consider the reference alive as long as the method runs, even if in this case, it might get rid of it.
So you keep a reference on the head of the stream all along, hence on the whole stream as you build it to 10M elements.
You could just use recursion, which is about as simple:
def fibSum(terms: Int, i: Long = 1, j: Long = 1, total: Long = 2): Long = {
if (terms == 2) total
else fibSum(terms - 1, j, i + j, total + i + j)
}
With this, you can "fold" a billion elements in only a couple of seconds, but as Rex points out, summing the Fibbonaci sequence overflows Long very quickly.
If you really wanted to know the answer to your original problem and don't mind sacrificing some accuracy you could do this:
def fibSum(terms: Int, i: Double = 1, j: Double = 1, tot: Double = 2,
exp: Int = 0): String = {
if (terms == 2) "%.6f".format(tot) + " E+" + exp
else {
val (i1, j1, tot1, exp1) =
if (tot + i + j > 10) (i/10, j/10, tot/10, exp + 1)
else (i, j, tot, exp)
fibSum(terms - 1, j1, i1 + j1, tot1 + i1 + j1, exp1)
}
}
scala> fibSum(10000000)
res54: String = 2.957945 E+2089876

Is Scala idiomatic coding style just a cool trap for writing inefficient code?

I sense that the Scala community has a little big obsession with writing "concise", "cool", "scala idiomatic", "one-liner" -if possible- code. This is immediately followed by a comparison to Java/imperative/ugly code.
While this (sometimes) leads to easy to understand code, it also leads to inefficient code for 99% of developers. And this is where Java/C++ is not easy to beat.
Consider this simple problem: Given a list of integers, remove the greatest element. Ordering does not need to be preserved.
Here is my version of the solution (It may not be the greatest, but it's what the average non-rockstar developer would do).
def removeMaxCool(xs: List[Int]) = {
val maxIndex = xs.indexOf(xs.max);
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.
Here is my totally uncool, Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.
def removeMaxFast(xs: List[Int]) = {
var res = ArrayBuffer[Int]()
var max = xs.head
var first = true;
for (x <- xs) {
if (first) {
first = false;
} else {
if (x > max) {
res.append(max)
max = x
} else {
res.append(x)
}
}
}
res.toList
}
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge
obstacle to cross for greater Scala adoption. Is there a way out of this trap?
I am looking for practical advice to avoid such "inefficiency traps" while keeping implementation clear ans concise.
Clarification: This question comes from a real-life scenario: I had to write a complex algorithm. First I wrote it in Scala, then I "had to" rewrite it in Java. The Java implementation was twice as long, and not that clear, but at the same time it was twice as fast. Rewriting the Scala code to be efficient would probably take some time and a somewhat deeper understanding of scala internal efficiencies (for vs. map vs. fold, etc)
Let's discuss a fallacy in the question:
So, if 99% of Java developers write more efficient code than 99% of
Scala developers, this is a huge obstacle to cross for greater Scala
adoption. Is there a way out of this trap?
This is presumed, with absolutely no evidence backing it up. If false, the question is moot.
Is there evidence to the contrary? Well, let's consider the question itself -- it doesn't prove anything, but shows things are not that clear.
Totally non-Scala idiomatic, non-functional, non-concise, but it's
very efficient. It traverses the list only once!
Of the four claims in the first sentence, the first three are true, and the fourth, as shown by user unknown, is false! And why it is false? Because, contrary to what the second sentence states, it traverses the list more than once.
The code calls the following methods on it:
res.append(max)
res.append(x)
and
res.toList
Let's consider first append.
append takes a vararg parameter. That means max and x are first encapsulated into a sequence of some type (a WrappedArray, in fact), and then passed as parameter. A better method would have been +=.
Ok, append calls ++=, which delegates to +=. But, first, it calls ensureSize, which is the second mistake (+= calls that too -- ++= just optimizes that for multiple elements). Because an Array is a fixed size collection, which means that, at each resize, the whole Array must be copied!
So let's consider this. When you resize, Java first clears the memory by storing 0 in each element, then Scala copies each element of the previous array over to the new array. Since size doubles each time, this happens log(n) times, with the number of elements being copied increasing each time it happens.
Take for example n = 16. It does this four times, copying 1, 2, 4 and 8 elements respectively. Since Java has to clear each of these arrays, and each element must be read and written, each element copied represents 4 traversals of an element. Adding all we have (n - 1) * 4, or, roughly, 4 traversals of the complete list. If you count read and write as a single pass, as people often erroneously do, then it's still three traversals.
One can improve on this by initializing the ArrayBuffer with an initial size equal to the list that will be read, minus one, since we'll be discarding one element. To get this size, we need to traverse the list once, though.
Now let's consider toList. To put it simply, it traverses the whole list to create a new list.
So, we have 1 traversal for the algorithm, 3 or 4 traversals for resize, and 1 additional traversal for toList. That's 4 or 5 traversals.
The original algorithm is a bit difficult to analyse, because take, drop and ::: traverse a variable number of elements. Adding all together, however, it does the equivalent of 3 traversals. If splitAt was used, it would be reduced to 2 traversals. With 2 more traversals to get the maximum, we get 5 traversals -- the same number as the non-functional, non-concise algorithm!
So, let's consider improvements.
On the imperative algorithm, if one uses ListBuffer and +=, then all methods are constant-time, which reduces it to a single traversal.
On the functional algorithm, it could be rewritten as:
val max = xs.max
val (before, _ :: after) = xs span (max !=)
before ::: after
That reduces it to a worst case of three traversals. Of course, there are other alternatives presented, based on recursion or fold, that solve it in one traversal.
And, most interesting of all, all of these algorithms are O(n), and the only one which almost incurred (accidentally) in worst complexity was the imperative one (because of array copying). On the other hand, the cache characteristics of the imperative one might well make it faster, because the data is contiguous in memory. That, however, is unrelated to either big-Oh or functional vs imperative, and it is just a matter of the data structures that were chosen.
So, if we actually go to the trouble of benchmarking, analyzing the results, considering performance of methods, and looking into ways of optimizing it, then we can find faster ways to do this in an imperative manner than in a functional manner.
But all this effort is very different from saying the average Java programmer code will be faster than the average Scala programmer code -- if the question is an example, that is simply false. And even discounting the question, we have seen no evidence that the fundamental premise of the question is true.
EDIT
First, let me restate my point, because it seems I wasn't clear. My point is that the code the average Java programmer writes may seem to be more efficient, but actually isn't. Or, put another way, traditional Java style doesn't gain you performance -- only hard work does, be it Java or Scala.
Next, I have a benchmark and results too, including almost all solutions suggested. Two interesting points about it:
Depending on list size, the creation of objects can have a bigger impact than multiple traversals of the list. The original functional code by Adrian takes advantage of the fact that lists are persistent data structures by not copying the elements right of the maximum element at all. If a Vector was used instead, both left and right sides would be mostly unchanged, which might lead to even better performance.
Even though user unknown and paradigmatic have similar recursive solutions, paradigmatic's is way faster. The reason for that is that he avoids pattern matching. Pattern matching can be really slow.
The benchmark code is here, and the results are here.
def removeOneMax (xs: List [Int]) : List [Int] = xs match {
case x :: Nil => Nil
case a :: b :: xs => if (a < b) a :: removeOneMax (b :: xs) else b :: removeOneMax (a :: xs)
case Nil => Nil
}
Here is a recursive method, which only iterates once. If you need performance, you have to think about it, if not, not.
You can make it tail-recursive in the standard way: giving an extra parameter carry, which is per default the empty List, and collects the result while iterating. That is, of course, a bit longer, but if you need performance, you have to pay for it:
import annotation.tailrec
#tailrec
def removeOneMax (xs: List [Int], carry: List [Int] = List.empty) : List [Int] = xs match {
case a :: b :: xs => if (a < b) removeOneMax (b :: xs, a :: carry) else removeOneMax (a :: xs, b :: carry)
case x :: Nil => carry
case Nil => Nil
}
I don't know what the chances are, that later compilers will improve slower map-calls to be as fast as while-loops. However: You rarely need high speed solutions, but if you need them often, you will learn them fast.
Do you know how big your collection has to be, to use a whole second for your solution on your machine?
As oneliner, similar to Daniel C. Sobrals solution:
((Nil : List[Int], xs(0)) /: xs.tail) ((p, x)=> if (p._2 > x) (x :: p._1, p._2) else ((p._2 :: p._1), x))._1
but that is hard to read, and I didn't measure the effective performance. The normal pattern is (x /: xs) ((a, b) => /* something */). Here, x and a are pairs of List-so-far and max-so-far, which solves the problem to bring everything into one line of code, but isn't very readable. However, you can earn reputation on CodeGolf this way, and maybe someone likes to make a performance measurement.
And now to our big surprise, some measurements:
An updated timing-method, to get the garbage collection out of the way, and have the hotspot-compiler warm up, a main, and many methods from this thread, together in an Object named
object PerfRemMax {
def timed (name: String, xs: List [Int]) (f: List [Int] => List [Int]) = {
val a = System.currentTimeMillis
val res = f (xs)
val z = System.currentTimeMillis
val delta = z-a
println (name + ": " + (delta / 1000.0))
res
}
def main (args: Array [String]) : Unit = {
val n = args(0).toInt
val funs : List [(String, List[Int] => List[Int])] = List (
"indexOf/take-drop" -> adrian1 _,
"arraybuf" -> adrian2 _, /* out of memory */
"paradigmatic1" -> pm1 _, /**/
"paradigmatic2" -> pm2 _,
// "match" -> uu1 _, /*oom*/
"tailrec match" -> uu2 _,
"foldLeft" -> uu3 _,
"buf-=buf.max" -> soc1 _,
"for/yield" -> soc2 _,
"splitAt" -> daniel1,
"ListBuffer" -> daniel2
)
val r = util.Random
val xs = (for (x <- 1 to n) yield r.nextInt (n)).toList
// With 1 Mio. as param, it starts with 100 000, 200k, 300k, ... 1Mio. cases.
// a) warmup
// b) look, where the process gets linear to size
funs.foreach (f => {
(1 to 10) foreach (i => {
timed (f._1, xs.take (n/10 * i)) (f._2)
compat.Platform.collectGarbage
});
println ()
})
}
I renamed all the methods, and had to modify uu2 a bit, to fit to the common method declaration (List [Int] => List [Int]).
From the long result, i only provide the output for 1M invocations:
scala -Dserver PerfRemMax 2000000
indexOf/take-drop: 0.882
arraybuf: 1.681
paradigmatic1: 0.55
paradigmatic2: 1.13
tailrec match: 0.812
foldLeft: 1.054
buf-=buf.max: 1.185
for/yield: 0.725
splitAt: 1.127
ListBuffer: 0.61
The numbers aren't completly stable, depending on the sample size, and a bit varying from run to run. For example, for 100k to 1M runs, in steps of 100k, the timing for splitAt was as follows:
splitAt: 0.109
splitAt: 0.118
splitAt: 0.129
splitAt: 0.139
splitAt: 0.157
splitAt: 0.166
splitAt: 0.749
splitAt: 0.752
splitAt: 1.444
splitAt: 1.127
The initial solution is already pretty fast. splitAt is a modification from Daniel, often faster, but not always.
The measurement was done on a single core 2Ghz Centrino, running xUbuntu Linux, Scala-2.8 with Sun-Java-1.6 (desktop).
The two lessons for me are:
always measure your performance improvements; it is very hard to estimate it, if you don't do it on a daily basis
it is not only fun, to write functional code - sometimes the result is even faster
Here is a link to my benchmarkcode, if somebody is interested.
First of all, the behavior of the methods you presented is not the same. The first one keeps the element ordering, while the second one doesn't.
Second, among all the possible solution which could be qualified as "idiomatic", some are more efficient than others. Staying very close to your example, you can for instance use tail-recursion to eliminate variables and manual state management:
def removeMax1( xs: List[Int] ) = {
def rec( max: Int, rest: List[Int], result: List[Int]): List[Int] = {
if( rest.isEmpty ) result
else if( rest.head > max ) rec( rest.head, rest.tail, max :: result)
else rec( max, rest.tail, rest.head :: result )
}
rec( xs.head, xs.tail, List() )
}
or fold the list:
def removeMax2( xs: List[Int] ) = {
val result = xs.tail.foldLeft( xs.head -> List[Int]() ) {
(acc,x) =>
val (max,res) = acc
if( x > max ) x -> ( max :: res )
else max -> ( x :: res )
}
result._2
}
If you want to keep the original insertion order, you can (at the expense of having two passes, rather than one) without any effort write something like:
def removeMax3( xs: List[Int] ) = {
val max = xs.max
xs.filterNot( _ == max )
}
which is more clear than your first example.
The biggest inefficiency when you're writing a program is worrying about the wrong things. This is usually the wrong thing to worry about. Why?
Developer time is generally much more expensive than CPU time — in fact, there is usually a dearth of the former and a surplus of the latter.
Most code does not need to be very efficient because it will never be running on million-item datasets multiple times every second.
Most code does need to bug free, and less code is less room for bugs to hide.
The example you gave is not very functional, actually. Here's what you are doing:
// Given a list of Int
def removeMaxCool(xs: List[Int]): List[Int] = {
// Find the index of the biggest Int
val maxIndex = xs.indexOf(xs.max);
// Then take the ints before and after it, and then concatenate then
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
Mind you, it is not bad, but you know when functional code is at its best when it describes what you want, instead of how you want it. As a minor criticism, if you used splitAt instead of take and drop you could improve it slightly.
Another way of doing it is this:
def removeMaxCool(xs: List[Int]): List[Int] = {
// the result is the folding of the tail over the head
// and an empty list
xs.tail.foldLeft(xs.head -> List[Int]()) {
// Where the accumulated list is increased by the
// lesser of the current element and the accumulated
// element, and the accumulated element is the maximum between them
case ((max, ys), x) =>
if (x > max) (x, max :: ys)
else (max, x :: ys)
// and of which we return only the accumulated list
}._2
}
Now, let's discuss the main issue. Is this code slower than the Java one? Most certainly! Is the Java code slower than a C equivalent? You can bet it is, JIT or no JIT. And if you write it directly in assembler, you can make it even faster!
But the cost of that speed is that you get more bugs, you spend more time trying to understand the code to debug it, and you have less visibility of what the overall program is doing as opposed to what a little piece of code is doing -- which might result in performance problems of its own.
So my answer is simple: if you think the speed penalty of programming in Scala is not worth the gains it brings, you should program in assembler. If you think I'm being radical, then I counter that you just chose the familiar as being the "ideal" trade off.
Do I think performance doesn't matter? Not at all! I think one of the main advantages of Scala is leveraging gains often found in dynamically typed languages with the performance of a statically typed language! Performance matters, algorithm complexity matters a lot, and constant costs matters too.
But, whenever there is a choice between performance and readability and maintainability, the latter is preferable. Sure, if performance must be improved, then there isn't a choice: you have to sacrifice something to it. And if there's no lost in readability/maintainability -- such as Scala vs dynamically typed languages -- sure, go for performance.
Lastly, to gain performance out of functional programming you have to know functional algorithms and data structures. Sure, 99% of Java programmers with 5-10 years experience will beat the performance of 99% of Scala programmers with 6 months experience. The same was true for imperative programming vs object oriented programming a couple of decades ago, and history shows it didn't matter.
EDIT
As a side note, your "fast" algorithm suffer from a serious problem: you use ArrayBuffer. That collection does not have constant time append, and has linear time toList. If you use ListBuffer instead, you get constant time append and toList.
For reference, here's how splitAt is defined in TraversableLike in the Scala standard library,
def splitAt(n: Int): (Repr, Repr) = {
val l, r = newBuilder
l.sizeHintBounded(n, this)
if (n >= 0) r.sizeHint(this, -n)
var i = 0
for (x <- this) {
(if (i < n) l else r) += x
i += 1
}
(l.result, r.result)
}
It's not unlike your example code of what a Java programmer might come up with.
I like Scala because, where performance matters, mutability is a reasonable way to go. The collections library is a great example; especially how it hides this mutability behind a functional interface.
Where performance isn't as important, such as some application code, the higher order functions in Scala's library allow great expressivity and programmer efficiency.
Out of curiosity, I picked an arbitrary large file in the Scala compiler (scala.tools.nsc.typechecker.Typers.scala) and counted something like 37 for loops, 11 while loops, 6 concatenations (++), and 1 fold (it happens to be a foldRight).
What about this?
def removeMax(xs: List[Int]) = {
val buf = xs.toBuffer
buf -= (buf.max)
}
A bit more ugly, but faster:
def removeMax(xs: List[Int]) = {
var max = xs.head
for ( x <- xs.tail )
yield {
if (x > max) { val result = max; max = x; result}
else x
}
}
Try this:
(myList.foldLeft((List[Int](), None: Option[Int]))) {
case ((_, None), x) => (List(), Some(x))
case ((Nil, Some(m), x) => (List(Math.min(x, m)), Some(Math.max(x, m))
case ((l, Some(m), x) => (Math.min(x, m) :: l, Some(Math.max(x, m))
})._1
Idiomatic, functional, traverses only once. Maybe somewhat cryptic if you are not used to functional-programming idioms.
Let's try to explain what is happening here. I will try to make it as simple as possible, lacking some rigor.
A fold is an operation on a List[A] (that is, a list that contains elements of type A) that will take an initial state s0: S (that is, an instance of a type S) and a function f: (S, A) => S (that is, a function that takes the current state and an element from the list, and gives the next state, ie, it updates the state according to the next element).
The operation will then iterate over the elements of the list, using each one to update the state according to the given function. In Java, it would be something like:
interface Function<T, R> { R apply(T t); }
class Pair<A, B> { ... }
<State> State fold(List<A> list, State s0, Function<Pair<A, State>, State> f) {
State s = s0;
for (A a: list) {
s = f.apply(new Pair<A, State>(a, s));
}
return s;
}
For example, if you want to add all the elements of a List[Int], the state would be the partial sum, that would have to be initialized to 0, and the new state produced by a function would simply add the current state to the current element being processed:
myList.fold(0)((partialSum, element) => partialSum + element)
Try to write a fold to multiply the elements of a list, then another one to find extreme values (max, min).
Now, the fold presented above is a bit more complex, since the state is composed of the new list being created along with the maximum element found so far. The function that updates the state is more or less straightforward once you grasp these concepts. It simply puts into the new list the minimum between the current maximum and the current element, while the other value goes to the current maximum of the updated state.
What is a bit more complex than to understand this (if you have no FP background) is to come up with this solution. However, this is only to show you that it exists, can be done. It's just a completely different mindset.
EDIT: As you see, the first and second case in the solution I proposed are used to setup the fold. It is equivalent to what you see in other answers when they do xs.tail.fold((xs.head, ...)) {...}. Note that the solutions proposed until now using xs.tail/xs.head don't cover the case in which xs is List(), and will throw an exception. The solution above will return List() instead. Since you didn't specify the behavior of the function on empty lists, both are valid.
Another option would be:
package code.array
object SliceArrays {
def main(args: Array[String]): Unit = {
println(removeMaxCool(Vector(1,2,3,100,12,23,44)))
}
def removeMaxCool(xs: Vector[Int]) = xs.filter(_ < xs.max)
}
Using Vector instead of List, the reason is that Vector is more versatile and has a better general performance and time complexity if compared to List.
Consider the following collections operations:
head, tail, apply, update, prepend, append
Vector takes an amortized constant time for all operations, as per Scala docs:
"The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys"
While List takes constant time only for head, tail and prepend operations.
Using
scalac -print
generates:
package code.array {
object SliceArrays extends Object {
def main(args: Array[String]): Unit = scala.Predef.println(SliceArrays.this.removeMaxCool(scala.`package`.Vector().apply(scala.Predef.wrapIntArray(Array[Int]{1, 2, 3, 100, 12, 23, 44})).$asInstanceOf[scala.collection.immutable.Vector]()));
def removeMaxCool(xs: scala.collection.immutable.Vector): scala.collection.immutable.Vector = xs.filter({
((x$1: Int) => SliceArrays.this.$anonfun$removeMaxCool$1(xs, x$1))
}).$asInstanceOf[scala.collection.immutable.Vector]();
final <artifact> private[this] def $anonfun$removeMaxCool$1(xs$1: scala.collection.immutable.Vector, x$1: Int): Boolean = x$1.<(scala.Int.unbox(xs$1.max(scala.math.Ordering$Int)));
def <init>(): code.array.SliceArrays.type = {
SliceArrays.super.<init>();
()
}
}
}
Another contender. This uses a ListBuffer, like Daniel's second offering, but shares the post-max tail of the original list, avoiding copying it.
def shareTail(xs: List[Int]): List[Int] = {
var res = ListBuffer[Int]()
var maxTail = xs
var first = true;
var x = xs
while ( x != Nil ) {
if (x.head > maxTail.head) {
while (!(maxTail.head == x.head)) {
res += maxTail.head
maxTail = maxTail.tail
}
}
x = x.tail
}
res.prependToList(maxTail.tail)
}