Streams vs. tail recursion for iterative processes - scala

This is a follow-up to my previous question.
I understand that we can use streams to generate an approximation of 'pi' (and other numbers), n-th fibonacci, etc. However I doubt if streams is the right approach to do that.
The main drawback (as I see it) is memory consumption: e.g. stream will retains all fibonacci numbers for i < n while I need only fibonacci n-th. Of course, I can use drop but it makes the solution a bit more complicated. The tail recursion looks like a more suitable approach to the tasks like that.
What do you think?

If need to go fast, travel light. That means; avoid allocation of any unneccessary memory. If you need memory, use the fastast collections available. If you know how much memory you need; preallocate. Allocation is the absolute performance killer... for calculation. Your code may not look nice anymore, but it will go fast.
However, if you're working with IO (disk, network) or any user interaction then allocation pales. It's then better to shift priority from code performance to maintainability.

Use Iterator. It does not retain intermediate values.

If you want n-th fibonacci number and use a stream just as a temporary data structure (if you do not hold references to previously computed elements of stream) then your algorithm would run in constant space.
Previously computed elements of a Stream (which are not used anymore) are going to be garbage collected. And as they were allocated in the youngest generation and immediately collected, allmost all allocations might be in cache.
Update:
It seems that the current implementation of Stream is not as space-efficient as it may be, mainly because it inherits an implementation of apply method from LinearSeqOptimized trait, where it is defined as
def apply(n: Int): A = {
val rest = drop(n)
if (n < 0 || rest.isEmpty) throw new IndexOutOfBoundsException("" + n)
rest.head
}
Reference to a head of a stream is hold here by this and prevents the stream from being gc'ed. So combination of drop and head methods (as in f.drop(100).head) may be better for situations where dropping intermediate results is feasible. (thanks to Sebastien Bocq for explaining this stuff on scala-user).

Related

Scala objects and thread safety

I am new to Scala.
I am trying to figure out how to ensure thread safety with functions in a Scala object (aka singleton)
From what I have read so far, it seems that I should keep visibility to function scope (or below) and use immutable variables wherever possible. However, I have not seen examples of where thread safety is violated, so I am not sure what other precautions should be taken.
Can someone point me to a good discussion of this issue, preferably with examples of where thread safety is violated?
Oh man. This is a huge topic. Here's a Scala-based intro to concurrency and Oracle's Java lessons actually have a pretty good intro as well. Here's a brief intro that motivates why concurrent reading and writing of shared state (of which Scala objects are particular specific case) is a problem and provides a quick overview of common solutions.
There's two (fundamentally related) classes of problems when it comes to thread safety and state mutation:
Clobbering (missing) writes
Inaccurate (changing out from under you) reads
Let's look at each of these in turn.
First clobbering writes:
object WritesExample {
var myList: List[Int] = List.empty
}
Imagine we had two threads concurrently accessing WritesExample, each of executes the following updateList
def updateList(x: WritesExample.type): Unit =
WritesExample.myList = 1 :: WritesExample.myList
You'd probably hope when both threads are done that WritesExample.myList has a length of 2. Unfortunately, that might not be the case if both threads read WritesExample.myList before the other thread has finished a write. If when both threads read WritesExample.myList it is empty, then both will write back a list of length 1, with one write overwriting the other, so that in the end WritesExample.myList only has a length of one. Hence we've effectively lost a write we were supposed to execute. Not good.
Now let's look at inaccurate reads.
object ReadsExample {
val myMutableList: collection.mutable.MutableList[Int]
}
Once again, let's say we had two threads concurrently accessing ReadsExample. This time each of them executes updateList2 repeatedly.
def updateList2(x: ReadsExample.type): Unit =
ReadsExample.myMutableList += ReadsExample.myMutableList.length
In a single-threaded context, you would expect updateList2, when repeatedly called, to simply generate an ordered list of incrementing numbers, e.g. 0, 1, 2, 3, 4,.... Unfortunately, when multiple threads are accessing ReadsExample.myMutableList with updateList2 at the same time, it's possible that between when ReadsExample.myMutableList.length is read and when the write is finally persisted, ReadsExample.myMutableList has already been modified by another thread. So in theory you could see something like 0, 0, 1, 1 or potentially if one thread takes longer to write than another 0, 1, 2, 1 (where the slower thread finally writes to the list after the other thread has already accessed and written to the list three times).
What happened is that the read was inaccurate/out-of-date; the actual data structure that was updated was different from the one that was read, i.e. was changed out from under you in the middle of things. This is also a huge source of bugs because many invariants you might expect to hold (e.g. every number in the list corresponds exactly to its index or every number appears only once) hold in a single-threaded context, but fail in a concurrent context.
Now that we've motivated some of the problems, let's dive into some of the solutions. You mentioned immutability so let's talk about that first. You might notice that in my example of clobbering writes I use an immutable data structure whereas in my inconsistent reads example I use a mutable data structure. That is intentional. They are in a sense dual to one another.
With immutable data structures you cannot have an "inaccurate" read in the sense I laid out above because you never mutate data structures, but rather place a new copy of a data structure in the same location. The data structure cannot change out from under you because it cannot change! However you can lose a write in the process by placing a version of a data structure back to its original location that does not incorporate a change made previously by another process.
With mutable data structures on the other hand, you cannot lose a write because all writes are in-place mutations of the data structure, but you can end up executing a write to a data structure whose state differs from when you analyzed it to formulate the write.
If it's a "pick your poison" kind of scenario, why do you often hear advice to go with immutable data structures to help with concurrency? Well immutable data structures make it easier to ensure invariants about the state being modified hold even if writes are lost. For example, if I rewrote the ReadsList example to use an immutable List (and a var instead), then I could confidently say that the integer elements of the list will always correspond to the indices of the list. This means that your program is much less likely to enter an inconsistent state (e.g. it's not hard to imagine that a naive mutable set implementation could end up with non-unique elements when mutated concurrently). And it turns out that modern techniques for dealing with concurrency usually are pretty good at dealing with missing writes.
Let's look at some of those approaches that deal with shared state concurrency. At their hearts they can all be summed up as various ways of serializing read/write pairs.
Locks (a.k.a. directly try to serialize read/write pairs): This is usually the one you'll hear first as a fundamental way of dealing with concurrency. Every process that wants to access state first places a lock on it. Any other process is now excluded from accessing that state. The process then writes to that state and on completion releases the lock. Other processes are now free to repeat the process. In our WritesExample, updateList would first acquire the lock before executing and releasing the lock; this would prevent other processes from reading WritesExample.myList until the write was completed, thereby preventing them from seeing old versions of myList that would lead to clobbering writes (note that are more sophisticated locking procedures that allow for simultaneous reads, but let's stick with the basics for now).
Locks often do not scale well to multiple pieces of state. With multiple locks, often you need to acquire and release locks in a certain order otherwise you can end up deadlocking or livelocking.
The Oracle and Twitter docs linked a the beginning have good overviews of this approach.
Describe Your Action, Don't Execute It (a.k.a. build up a serial representation of your actions and have someone else process it): Instead of accessing and modifying state directly, you describe an action of how to do this and then give it to someone else to actually execute the action. For example, you might pass messages to an object (e.g. actors in Scala) that queues up these requests and then executes them one-by-one on some internal state that it never directly exposes to anyone else. In the particular case of actors, this improves the situation over locks by removing the need to explicitly acquire and release locks. As long as you encapsulate all the state you need to access at once in a single object, message passing works great. Actors break down when you distribute state across multiple objects (and as such this is heavily discouraged in this paradigm).
Akka actors are one good example of this in Scala.
Transactions (a.k.a. temporarily isolate some reads and writes from others and let the isolation system serialize things for you): Wrap all your read/writes in transactions that ensure during the course of your reads and writes your view of the world is isolated from any other changes. There's usually two ways of achieving this. Either you go for an approach similar to locks where you prevent other people from accessing the data while a transaction is running or you restart a transaction from the very beginning whenever you detect that a change has occurred to the shared state and throw away any progress you've made (usually the latter for performance reasons). On the one hand, transactions, unlike locks and actors, scale to disparate pieces of state very well. Just wrap all your accesses in transactions and you're good to go. On the other hand, your reads and writes have to be side-effect-free because they might be thrown away and retried many times and you can't really undo most side effects.
And if you're really unlucky, although you usually can't truly deadlock with a good implementation of transactions, a long-lived transaction can constantly be interrupted by other short-lived transactions such that it keeps getting thrown away and retried and never actually succeeds (which amounts to something like livelocking). In effect you're giving up direct control of serialization order and hoping your transaction system orders things sensibly.
Scala's STM library is a good example of this approach.
Remove Shared State: The final "solution" is to rethink the problem altogether and try to think about whether you truly need global, shared state that is writable. If you don't need writable shared state, then concurrency problems go away altogether!
Everything in life is about trade-offs and concurrency is no exception. When thinking about concurrency first understand what state you have and what invariants you want to preserve about that state. Then use that to guide your decision as to what kind of tools you want to use to tackle the problem.
The Thread Safety Problem section within this Scala concurrency article might be of interest to you. In essence, it illustrates the thread safety problem using a simple example and outlines 3 different approaches to tackle the problem, namely synchronization, volatile and AtomicReference:
When you enter synchronized points, access volatile references, or
deference AtomicReferences, Java forces the processor to flush their
cache lines and provide a consistent view of data.
There is also a brief overview comparing the cost of the 3 approaches:
AtomicReference is the most costly of these two choices since you
have to go through method dispatch to access values. volatile and
synchronized are built on top of Java’s built-in monitors. Monitors
cost very little if there’s no contention. Since synchronized allows
you more fine-grained control over when you synchronize, there will be
less contention so synchronized tends to be the cheapest option.
This is not specific to Scala, if your object contains a state that can be modified concurrently thread safety can be violated depending on the implementation. For example:
object BankAccount {
private var balance: Long = 0L
def deposit(amount: Long): Unit = balance += amount
}
In this case the the object is not thread safe, there are a lot of approachs to make it thread safe, for example using Akka, or synchronized blocks. For simplicity I will write it using synchronized blocks
object BankAccount {
private var balance: Long = 0L
def deposit(amount: Long): Unit =
this.synchronized {
balance += amount
}
}

Measure the function-level memory usage in Scala in my Application Code during runtime

First of all, this is not an "off-line" profiling task!
I am working on some SCala codebase, and currently what I am trying to do is, if a function foo consumes too many memory (let's say over 10G), kill this function and return a default value.
So it should look like:
monitor{
foo() <--- if foo has used over 10G memory, just cut it off
}
catch {
case MemoryUsageError => default_value
}
Note that currently foo is running in the same process with my main function.
Is it possible to do so? I quickly googled such materials and only find a way to show the current memory usage of a SCala application; it is not as fine-grained as what I am looking for.
Am I clear on this? Could anyone shed some lights here? Thanks a lot!
========================================================================
Note that what I am looking for is an "online" method! It is not like off-line profiling. My application ifself should determine the memory usage of foo function, and if it goes too high, just cut it off.
Is it possible?
In general jvm doesn't track creator of objects allocated on heap and place of creation. This is very costly and doesn't matter for GC.
How to live with it
Termination
Self-controlled program. If you want terminate some continuous computation then computation shouldn't be continuous. What you need is check points where condition could be validated. For example, every start of iteration in a loop or at the beginning of every recursive call. Obviously computation could consist of several different stages instead of simple loop but approach is the same.
Separation of computation and control. For example, execute function as Future with predetermined Thread and interrupt it if needed or using ForkJoinTask and cancel() method.
Measurement
Usually only one or couple of classes fulfill most of the memory. If instances are about the same size then memory control could be implemented with counter of objects. Classes of 'heavy' objects could be find by inspection of algorithm or using jvisualvm. Increase counter during instance creation. Decrement is harder. Update counter when references are released (count instances that couldn't be removed by GC) or use PhantomReference (count all instances existed in VM). But don't use finalize()!
Second method is java instrumentation package. It allows to measure objects size (probably there are methods determining consumption of all objects of certain class). Also you could try measuring available memory. The flaw is you measure objects of not certain function but all of them.
For time control write down timestamp at the beginning of computation and measure duration at every check point.

What is the smallest unit of work that is sensible to parallelize with actors?

While Scala actors are described as light-weight, Akka actors even more so, there is obviously some overhead to using them.
So my question is, what is the smallest unit of work that is worth parallelising with Actors (assuming it can be parallelized)? Is it only worth it if there is some potentially latency or there are a lot of heavy calculations?
I'm looking for a general rule of thumb that I can easily apply in my everyday work.
EDIT: The answers so far have made me realise that what I'm interested in is perhaps actually the inverse of the question that I originally asked. So:
Assuming that structuring my program with actors is a very good fit, and therefore incurs no extra development overhead (or even incurs less development overhead than a non-actor implementation would), but the units of work it performs are quite small - is there a point at which using actors would be damaging in terms of performance and should be avoided?
Whether to use actors is not primarily a question of the unit of work, its main benefit is to make concurrent programs easier to get right. In exchange for this, you need to model your solution according to a different paradigm.
So, you need to decide first whether to use concurrency at all (which may be due to performance or correctness) and then whether to use actors. The latter is very much a matter of taste, although with Akka 2.0 I would need good reasons not to, since you get distributability (up & out) essentially for free with very little overhead.
If you still want to decide the other way around, a rule of thumb from our performance tests might be that the target message processing rate should not be higher than a few million per second.
My rule of thumb--for everyday work--is that if it takes milliseconds then it's potentially worth parallelizing. Although the transaction rates are higher than that (usually no more than a few 10s of microseconds of overhead), I like to stay well away from overhead-dominated cases. Of course, it may need to take much longer than a few milliseconds to actually be worth parallelizing. You always have to balance time time taken by writing more code against the time saved running it.
If no side effects are expected in work units then it is better to make decision for work splitting in run-time:
protected T compute() {
if (r – l <= T1 || getSurplusQueuedTaskCount() >= T2)
return problem.solve(l, r);
// decompose
}
Where:
T1 = N / (L * Runtime.getRuntime.availableProcessors())
N - Size of work in units
L = 8..16 - Load factor, configured manually
T2 = 1..3 - Max length of work queue after all stealings
Here is presentation with much more details and figures:
http://shipilev.net/pub/talks/jeeconf-May2012-forkjoin.pdf

Asymptotic behaviour of Scala methods

Is there somewhere I can find out the expected time and space complexities of operations on collections like HashSet, TreeSet, List and so on?
Is one just expected to know these from the properties of the abstract-data-types themselves?
I know of Performance characteristics for Scala collections, but this only mentions some very basic operations. Perhaps the rest of the operations for these collections are built purely from a small base-set, but then, it seems I am just expected to know that they have implemented them in this way?
The guide for the other methods should be - just think what an efficient implementation should look like.
Most other bulk-operations on collections (operations that process each element in the collection) are O(n), so they are not mentioned there. Examples are filter, map, foreach, indexOf, reverse, find ...
Methods returning iterators or streams like combinations and permutations are usually O(1).
Methods involving 2 collections are usually O(max(n, m)) or O(min(n, m)). These are zip, zipAll, sameElements, corresponds, ...
Methods union, diff, and intersect are O(n + m).
Sort variants are, naturally, O(nlogn). The groupBy is O(nlogn) in the current implementation. The indexOfSlice uses the KMP algorithm and is O(m + n), where m and n are lengths of the strings.
Methods such as +:, :+ or patch are generally O(n) as well, unless you are dealing with a specific case of an immutable collection for which the operation in question is more efficient - for example, prepending an element on a functional List or appending an element to a Vector.
Methods toX are generally O(n), as they have to iterate all the elements and create a new collection. An exception is toStream which builds the collection lazily - so it's O(1). Also, whenever X is the type of the collection toX just returns this, being O(1).
Iterator implementations should have an O(1) (amortized) next and hasNext operations. Iterator creation should be worst-case O(logn), but O(1) in most cases.
Performance characteristics of the other methods is really difficult to assert. Consider the following:
These methods are all implemented based on foreach or iterator, and at usually very high levels in the hierachy. Vector's map is implemented on collection.TraversableLike, for example.
To add insult to injury, which method implementation is used depends on the linearization of the class inheritance. This also applies to any method called as a helper. It has happened before that changes here caused unforeseen performance problems.
Since foreach and iterator are both O(n), any improved performance depends on specialization at other methods, such as size and slice.
For many of them, there's further dependency on the performance characteristics of the builder that was provided, which depends on the call site instead of the definition site.
So the result is that the place where the method is defined -- and documented -- does not have near enough information to state its performance characteristics, and may depend not only on how other methods are implemented by the inheriting collection, but even by the performance characteristics of an object, Builder, obtained from CanBuildFrom, that is passed at the call site.
At best, any such documentation would be described in terms of other methods. Which doesn't mean it isn't worthwhile, but it isn't easily done -- and hard tasks on open source projects depend on volunteers, who usually work at what they like, not what is needed.

Is this scala parallel array code threadsafe?

I want to use parallel arrays for a task, and before I start with the coding, I'd be interested in knowing if this small snipept is threadsafe:
import collection.mutable._
var listBuffer = ListBuffer[String]("one","two","three","four","five","six","seven","eight","nine")
var jSyncList = java.util.Collections.synchronizedList(new java.util.ArrayList[String]())
listBuffer.par.foreach { e =>
println("processed :"+e)
// using sleep here to simulate a random delay
Thread.sleep((scala.math.random * 1000).toLong)
jSyncList.add(e)
}
jSyncList.toArray.foreach(println)
Are there better ways of processing something with parallel collections, and acumulating the results elsewhere?
The code you posted is perfectly safe; I'm not sure about the premise though: why do you need to accumulate the results of a parallel collection in a non-parallel one? One of the whole points of the parallel collections is that they look like other collections.
I think that parallel collections also will provide a seq method to switch to sequential ones. So you should probably use this!
For this pattern to be safe:
listBuffer.par.foreach { e => f(e) }
f has to be able to run concurrently in a safe way. I think the same rules that you need for safe multi-threading apply (access to share state needs to be thread safe, the order of the f calls for different e won't be deterministic and you may run into deadlocks as you start synchronizing your statements in f).
Additionally I'm not clear what guarantees the parallel collections gives you about the underlying collection being modified while being processed, so a mutable list buffer which can have elements added/removed is possibly a poor choice. You never know when the next coder will call something like foo(listBuffer) before your foreach and pass that reference to another thread which may mutate the list while it's being processed.
Other than that, I think for any f that will take a long time, can be called concurrently and where e can be processed out of order, this is a fine pattern.
immutCol.par.foreach { e => threadSafeOutOfOrderProcessingOf(e) }
disclaimer: I have not tried // colls myself, but I'm looking forward at having SO questions/answers show us what works well.
The synchronisedList should be safe, though the println may give unexpected results - you have no guarantees of the order that items will be printed, or even that your printlns won't be interleaved mid-character.
A synchronised list is also unlikely to be the fastest way you can do this, a safer solution is to map over an immutable collection (Vector is probably your best bet here), then print all the lines (in order) afterwards:
val input = Vector("one","two","three","four","five","six","seven","eight","nine")
val output = input.par.map { e =>
val msg = "processed :" + e
// using sleep here to simulate a random delay
Thread.sleep((math.random * 1000).toLong)
msg
}
println(output mkString "\n")
You'll also note that this code has about as much practical usefulness as your example :)
This code is plain weird -- why add stuff in parallel to something that needs to be synchronized? You'll add contention and gain absolutely nothing in return.
The principle of the thing -- accumulating results from parallel processing, are better achieved with stuff like fold, reduce or aggregate.
The code you've posted is safe - there will be no errors due to inconsistent state of your array list, because access to it is synchronized.
However, parallel collections process items concurrently (at the same time), AND out-of-order. The out-of-order means that the 54. element may be processed before the 2. element - your synchronized array list will contain items in non-predefined order.
In general it's better to use map, filter and other functional combinators to transform a collection into another collection - these will ensure that the ordering guarantees are preserved if a collection has some (like Seqs do). For example:
ParArray(1, 2, 3, 4).map(_ + 1)
always returns ParArray(2, 3, 4, 5).
However, if you need a specific thread-safe collection type such as a ConcurrentSkipListMap or a synchronized collection to be passed to some method in some API, modifying it from a parallel foreach is safe.
Finally, a note - parallel collection provide parallel bulk operations on data. Mutable parallel collections are not thread-safe in the sense that you can add elements to them from different threads. Mutable operations like insertion to a map or appending a buffer still have to be synchronized.