Update: this question is how can we use amortized analysis for immutable collections? Scala's immutable queue is just an example. How this immutable queue is implemented is clearly visible from sources. And as it was pointed in answers Scala's sources does not mention amortized time for it al all. But guides and internet podcasts do. And as I saw C# has the similar immutable queue with similar statements about amortized time for it.
The amortized analysis was invented originally for mutable collections. How we can apply it to a Scala's mutable Queue is clear. But how can we apply it to this sequence of operations for example?
val q0 = collection.immutable.Queue[Int]()
val q1 = q0.enqueue(1)
val h1 = q1.head
val q2 = q1.enqueue(2)
val h2 = q2.head
val (d2, q3) = q2.dequeue()
val (d1, q4) = q3.dequeue()
We have different immutable queues in sequence q0-q4. May we consider them as one single queue or not? How can we use O(1) enqueue operations to amortize both heavy head and the first dequeue? What method of amortized analysis can we use for this sequence? I don't know. I can not find anything in textbooks.
Final answer:
(Thanks to all who answered my question!)
In short: Yes, and no!
"No" because a data structure may be used as immutable but not persistent. The data structure is ephemeral if making a change we forget (or destroy) all old versions of the data structure. Mutable data structures is an example. Dequeuing of immutable queue with two strict lists can be called "amortized O(1)" in such ephemeral contexts. But full persistence with forking of the immutable data structure history is desirable for many algorithms. For such algorithms the expensive O(N) operations of the immutable queue with two strict lists are not amortized O(1) operations. So the guide authors should add an asterisk and print by 6pt font in footnote: * for specially selected sequences of operations.
In answers I was given an excellent reference: Amortization, Lazy Evaluation, and Persistence: Lists with Catenation via Lazy Linking:
We first review the basic concepts of lazy evaluation, amortization, and persistence. We next discuss why the traditional approach to amortization breaks down in a persistent setting. Then we outline our approach to amortization, which can often avoid such problems through judicious use of lazy evaluation.
We can create fully persistent immutable queue with amortized O(1) operations. It must be specially designed and use lazy evaluation. Without such framework with lazy evaluation of the structure parts and memorization of results we can not apply amortized analysis to fully persistent immutable data structures. (Also it is possible to create a double-ended queue with all operations having worst-case constant time and without lazy evaluation but my question was about amortized analysis).
Original question:
According to an original definition the amortized time complexity is a worst case averaged over sequence complexity for allowed sequences of operations: "we average the running time per operation over a (worst-case) sequence of operations" https://courses.cs.duke.edu/fall11/cps234/reading/Tarjan85_AmortizedComplexity.pdf See textbooks also ("Introduction To Algorithms" by Cormen et al. for example)
Scala's Collection Library guide states two collection.immutable.Queue methods (head and tail) have amortized O(1) time complexity: https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html This table does not mention complexities of enqueue and dequeue operations but another unofficial guide states O(1) time complexity for enqueue and amortized O(1) time complexity for dequeue. https://www.waitingforcode.com/scala-collections/collections-complexity-scala-immutable-collections/read
But what that statements for the amortized time complexity really mean? Intuitively they allow us to make predictions for algorithms with the data structure used such as any allowed by the data structure itself sequence of N amortized O(1) operations have no worse than O(N) complexity for the sequence. Unfortunately this intuitive meaning is clearly broken for immutable collections. For example, next function does have time complexity O(n^2) for 2n amortized O(1) (according to the guides) operations:
def quadraticInTime(n: Int) = {
var q = collection.immutable.Queue[Int]()
for (i <- 1 to n) q = q.enqueue(i)
List.fill(n)(q.head)
}
val tryIt = quadraticInTime(100000)
The second parameter of List.fill method is a by name parameter and is evaluated n times in sequence. We can also use q.dequeue._1 instead of q.head of course with the same result.
Also we can read in "Programming in Scala" by M. Odersky et al.: "Assuming that head, tail, and enqueue operations appear with about the same frequency, the amortized complexity is hence constant for each operation. So functional queues are asymptotically just as efficient as mutable ones." This contradicts to the worst case amortized complexity property from textbooks and wrong for quadraticInTime method above.
But if a data structure has O(1) time complexity of cloning we can break amortized time analysis assumptions for it just by executing N worst case "heavy" operations on N the data structure copies in sequence. And generality speaking any immutable collection have O(1) time complexity of cloning.
Question: is there a good formal definition of amortized time complexity for operations on immutable structures? The definition clearly must farther limit allowed sequences of operations to be useful.
Chris Okasaki described how to solve this problem with lazy evaluation in Amortization, Lazy Evaluation, and Persistence:
Lists with Catenation via Lazy Linking from 1995. The main idea is that you can guarantee that some action is done only once by hiding it in a thunk and letting the language runtime manage evaluating it exactly once.
The docs for Queue give tighter conditions for which the asymptotic complexity holds:
Adding items to the queue always has cost O(1). Removing items has cost O(1), except in the case where a pivot is required, in which case, a cost of O(n) is incurred, where n is the number of elements in the queue. When this happens, n remove operations with O(1) cost are guaranteed. Removing an item is on average O(1).
Note that removing from an immutable queue implies that when dequeueing, subsequent operations are on the returned Queue. Not doing this also means that it's not actually being used as a queue:
val q = Queue.empty[Int].enqueue(0).enqueue(1)
q.dequeue()._1 // 1
q.dequeue()._1 // 1
In your code, you're not actually using the Queue as a queue. Addressing this:
def f(n: Int): Unit = {
var q = Queue.empty[Int]
(1 to n).foreach { i => q = q.enqueue(i) }
List.fill(n) {
val (head, nextQ) = q.dequeue
q = nextQ
head
}
}
def time(block: => Unit): Unit = {
val startNanos = System.nanoTime()
block
println(s"Block took ${ System.nanoTime() - startNanos }ns")
}
scala> time(f(10000))
Block took 2483235ns
scala> time(f(20000))
Block took 5039420ns
Note also the if we do an approximately equal number of enqueue and head operations on the same scala.collection.immutable.Queue, the head operations are in fact constant time even without amortization:
val q = Queue.empty[Int]
(1 to n).foreach { i => q.enqueue(i) }
List.fill(n)(Try(q.head))
Related
i have a question about Structural sharing from List in Scala. I have read some where in the Internet this sentence
List implements structural sharing of the tail list. This means that many operations are either zero- or constant-memory cost.
but i dont really understand how would time and memory cost of the operations for lists be reduced. For example
val mainList = List(3, 2, 1)
val with4 = 4 :: mainList // O(1)
if we want to create another list with4 time would just be O(1) and memory cost one instead but for operations of the list how would it be different ? I mean with length() or reverse() ... it would still be O(n) as normal ? Can anyone please explain me and if you can maybe an example would be really helpful. Thank you!
The operations on list that run in constant time (O(1)) due to structural sharing are prepend (::), head, and tail. Most other operations are linear time (O(n)).
You example is correct 4 :: myList is constant time, as are myList.head and mylist.tail, other things like length and reverse are linear time.
This is why List is not a particularly good collection to use in most cases unless you only use those operations, because everything else is O(n).
You can refer to http://docs.scala-lang.org/overviews/collections/performance-characteristics.html for an overview of the runtime characteristics of different collections.
I have been reading some posts and was wondering if someone can present a situation on when a TrieMap would be preferable to using a HashMap.
So essentially what architecture decision should motivate the use of a TrieMap?
As per documentation.
It's mutable collection that can be safely used in multithreading applications.
A concurrent hash-trie or TrieMap is a concurrent thread-safe lock-free
implementation of a hash array mapped trie. It is used to implement the
concurrent map abstraction. It has particularly scalable concurrent insert
and remove operations and is memory-efficient. It supports O(1), atomic,
lock-free snapshots which are used to implement linearizable lock-free size,
iterator and clear operations. The cost of evaluating the (lazy) snapshot is
distributed across subsequent updates, thus making snapshot evaluation horizontally scalable.
For details, see: http://lampwww.epfl.ch/~prokopec/ctries-snapshot.pdf
Also it has really nice API for caching.
So for example you have to calculate factorials of different number and sometimes re-use this results.
object o {
val factorialsCache = new TrieMap[Int, Int]()
def factorial(num: Int) = ??? // really heavy operations
def doWorkWithFuctorial(num: Int) = {
val factRes = factorialsCache.getOrElseUpdate(num, {
// we do not want to invoke it very often
factorial(num)
// this function will be executed only if there are no records in Map for such key
})
// start do some work `withfactRes`
factRes
}
}
Pay attention - function above use global state (cache) for write operations, but it's absolutely safe to use it in concurrent threads. You'll not loose any data.
This is a follow-up to my previous question.
I understand that we can use streams to generate an approximation of 'pi' (and other numbers), n-th fibonacci, etc. However I doubt if streams is the right approach to do that.
The main drawback (as I see it) is memory consumption: e.g. stream will retains all fibonacci numbers for i < n while I need only fibonacci n-th. Of course, I can use drop but it makes the solution a bit more complicated. The tail recursion looks like a more suitable approach to the tasks like that.
What do you think?
If need to go fast, travel light. That means; avoid allocation of any unneccessary memory. If you need memory, use the fastast collections available. If you know how much memory you need; preallocate. Allocation is the absolute performance killer... for calculation. Your code may not look nice anymore, but it will go fast.
However, if you're working with IO (disk, network) or any user interaction then allocation pales. It's then better to shift priority from code performance to maintainability.
Use Iterator. It does not retain intermediate values.
If you want n-th fibonacci number and use a stream just as a temporary data structure (if you do not hold references to previously computed elements of stream) then your algorithm would run in constant space.
Previously computed elements of a Stream (which are not used anymore) are going to be garbage collected. And as they were allocated in the youngest generation and immediately collected, allmost all allocations might be in cache.
Update:
It seems that the current implementation of Stream is not as space-efficient as it may be, mainly because it inherits an implementation of apply method from LinearSeqOptimized trait, where it is defined as
def apply(n: Int): A = {
val rest = drop(n)
if (n < 0 || rest.isEmpty) throw new IndexOutOfBoundsException("" + n)
rest.head
}
Reference to a head of a stream is hold here by this and prevents the stream from being gc'ed. So combination of drop and head methods (as in f.drop(100).head) may be better for situations where dropping intermediate results is feasible. (thanks to Sebastien Bocq for explaining this stuff on scala-user).
Is there somewhere I can find out the expected time and space complexities of operations on collections like HashSet, TreeSet, List and so on?
Is one just expected to know these from the properties of the abstract-data-types themselves?
I know of Performance characteristics for Scala collections, but this only mentions some very basic operations. Perhaps the rest of the operations for these collections are built purely from a small base-set, but then, it seems I am just expected to know that they have implemented them in this way?
The guide for the other methods should be - just think what an efficient implementation should look like.
Most other bulk-operations on collections (operations that process each element in the collection) are O(n), so they are not mentioned there. Examples are filter, map, foreach, indexOf, reverse, find ...
Methods returning iterators or streams like combinations and permutations are usually O(1).
Methods involving 2 collections are usually O(max(n, m)) or O(min(n, m)). These are zip, zipAll, sameElements, corresponds, ...
Methods union, diff, and intersect are O(n + m).
Sort variants are, naturally, O(nlogn). The groupBy is O(nlogn) in the current implementation. The indexOfSlice uses the KMP algorithm and is O(m + n), where m and n are lengths of the strings.
Methods such as +:, :+ or patch are generally O(n) as well, unless you are dealing with a specific case of an immutable collection for which the operation in question is more efficient - for example, prepending an element on a functional List or appending an element to a Vector.
Methods toX are generally O(n), as they have to iterate all the elements and create a new collection. An exception is toStream which builds the collection lazily - so it's O(1). Also, whenever X is the type of the collection toX just returns this, being O(1).
Iterator implementations should have an O(1) (amortized) next and hasNext operations. Iterator creation should be worst-case O(logn), but O(1) in most cases.
Performance characteristics of the other methods is really difficult to assert. Consider the following:
These methods are all implemented based on foreach or iterator, and at usually very high levels in the hierachy. Vector's map is implemented on collection.TraversableLike, for example.
To add insult to injury, which method implementation is used depends on the linearization of the class inheritance. This also applies to any method called as a helper. It has happened before that changes here caused unforeseen performance problems.
Since foreach and iterator are both O(n), any improved performance depends on specialization at other methods, such as size and slice.
For many of them, there's further dependency on the performance characteristics of the builder that was provided, which depends on the call site instead of the definition site.
So the result is that the place where the method is defined -- and documented -- does not have near enough information to state its performance characteristics, and may depend not only on how other methods are implemented by the inheriting collection, but even by the performance characteristics of an object, Builder, obtained from CanBuildFrom, that is passed at the call site.
At best, any such documentation would be described in terms of other methods. Which doesn't mean it isn't worthwhile, but it isn't easily done -- and hard tasks on open source projects depend on volunteers, who usually work at what they like, not what is needed.
I want to use parallel arrays for a task, and before I start with the coding, I'd be interested in knowing if this small snipept is threadsafe:
import collection.mutable._
var listBuffer = ListBuffer[String]("one","two","three","four","five","six","seven","eight","nine")
var jSyncList = java.util.Collections.synchronizedList(new java.util.ArrayList[String]())
listBuffer.par.foreach { e =>
println("processed :"+e)
// using sleep here to simulate a random delay
Thread.sleep((scala.math.random * 1000).toLong)
jSyncList.add(e)
}
jSyncList.toArray.foreach(println)
Are there better ways of processing something with parallel collections, and acumulating the results elsewhere?
The code you posted is perfectly safe; I'm not sure about the premise though: why do you need to accumulate the results of a parallel collection in a non-parallel one? One of the whole points of the parallel collections is that they look like other collections.
I think that parallel collections also will provide a seq method to switch to sequential ones. So you should probably use this!
For this pattern to be safe:
listBuffer.par.foreach { e => f(e) }
f has to be able to run concurrently in a safe way. I think the same rules that you need for safe multi-threading apply (access to share state needs to be thread safe, the order of the f calls for different e won't be deterministic and you may run into deadlocks as you start synchronizing your statements in f).
Additionally I'm not clear what guarantees the parallel collections gives you about the underlying collection being modified while being processed, so a mutable list buffer which can have elements added/removed is possibly a poor choice. You never know when the next coder will call something like foo(listBuffer) before your foreach and pass that reference to another thread which may mutate the list while it's being processed.
Other than that, I think for any f that will take a long time, can be called concurrently and where e can be processed out of order, this is a fine pattern.
immutCol.par.foreach { e => threadSafeOutOfOrderProcessingOf(e) }
disclaimer: I have not tried // colls myself, but I'm looking forward at having SO questions/answers show us what works well.
The synchronisedList should be safe, though the println may give unexpected results - you have no guarantees of the order that items will be printed, or even that your printlns won't be interleaved mid-character.
A synchronised list is also unlikely to be the fastest way you can do this, a safer solution is to map over an immutable collection (Vector is probably your best bet here), then print all the lines (in order) afterwards:
val input = Vector("one","two","three","four","five","six","seven","eight","nine")
val output = input.par.map { e =>
val msg = "processed :" + e
// using sleep here to simulate a random delay
Thread.sleep((math.random * 1000).toLong)
msg
}
println(output mkString "\n")
You'll also note that this code has about as much practical usefulness as your example :)
This code is plain weird -- why add stuff in parallel to something that needs to be synchronized? You'll add contention and gain absolutely nothing in return.
The principle of the thing -- accumulating results from parallel processing, are better achieved with stuff like fold, reduce or aggregate.
The code you've posted is safe - there will be no errors due to inconsistent state of your array list, because access to it is synchronized.
However, parallel collections process items concurrently (at the same time), AND out-of-order. The out-of-order means that the 54. element may be processed before the 2. element - your synchronized array list will contain items in non-predefined order.
In general it's better to use map, filter and other functional combinators to transform a collection into another collection - these will ensure that the ordering guarantees are preserved if a collection has some (like Seqs do). For example:
ParArray(1, 2, 3, 4).map(_ + 1)
always returns ParArray(2, 3, 4, 5).
However, if you need a specific thread-safe collection type such as a ConcurrentSkipListMap or a synchronized collection to be passed to some method in some API, modifying it from a parallel foreach is safe.
Finally, a note - parallel collection provide parallel bulk operations on data. Mutable parallel collections are not thread-safe in the sense that you can add elements to them from different threads. Mutable operations like insertion to a map or appending a buffer still have to be synchronized.