Is Guava's ImmutableList.Builder thread safe? - guava

What are the thread safety guarantees for Guava's ImmutableList.Builder? The javadocs don't say.

While the Guava Immutable classes are threadsafe, their builders are not. For most applications, only one thread will interact with any particular Builder instance.
While the absence of thread-safety usually doesn't need to be documented, such Javadoc might make sense for the Immutable collection builders. People may be surprised that ImmutableList is threadsafe while ImmutableList.Builder isn't.

If thread-safety is not mentioned in the javadocs, don't assume it!
More seriously, "no".
I would also prefer javadocs of ImmutableList and friends include such a -rather obvious, yes- remark (so you wouldn't have to assume it yourself), because the "obvious" is not always the case. Just the other day I was discussing scala.List, an immutable list, and some surprizing issues it may cause if exchanged between threads inappropriately (via a data race), which people didn't think about because they see the word "immutable" on the tin, plus they equate "immutable == thread-safe", so it pays off to be on the safe side even when documenting "obvious" thread-safety aspects.

Agree with #Dimitris Andreou: definitely do not assume thread safety if its not documented as such. When you go to the effort of making a non-trivial class threadsafe, you want users to know it.
Beyond that, I think the most common use case for a builder will be thread-confined: ie as a local variable in some method. If you need multiple threads to build a List, is is really immutable yet?
If you have multiple threads feeding into a list, but want to snapshot it at some point and say "no more changes going forward, its immutable" then I'd write something that takes the elements from those threads and freezes the contents into a new ImmutableList when you know its ready.

Related

Message ordering of ReactiveKafkaConsumerTemplate receiveAutoAck

i am asking myself if the ReactiveKafkaConsumerTemplate of the spring-kafka project does guarantee the correct ordering of messages. I read the documentation of the reactor-kafka project and it states that messages should be consumed using the concatMap operator, but the ReactiveKafkaConsumerTemplate uses the flatMap operator at least in case of the receiveAutoAck method here:
https://github.com/spring-projects/spring-kafka/blob/master/spring-kafka/src/main/java/org/springframework/kafka/core/reactive/ReactiveKafkaConsumerTemplate.java#L69
Reference documentation of the reactor-kafka project:
https://projectreactor.io/docs/kafka/release/reference/#_auto_acknowledgement_of_batches_of_records
I am interested in using receiveAutoAck as it seems to be the most simpelst and comfortable approach, which suffices my use case. The only way to overcome this behaviour of the receiveAutoAck method seems to subclass the ReactiveKafkaConsumerTemplate and overwrite this behaviour. Is this correct?
I don't think it really matters here because internally the source of data for us is Flux.fromIterable(consumerRecords) which cannot lose its order because of an iterator therefore how hard we wouldn't try to process them in parallel, we still would get the order in one iterator. Yes, the order in between iterators we flatten is really unpredictable, but this doesn't matter for us since we worry about an order withing a single partition, nothing more.
Nevertheless I think we definitely need to fix that for the mentioned concatMap() to avoid such a confusion in the future. Feel free to provide a contribution on the matter!

Scala immutable collections cannot be shared without synchronization?

From the «Learning concurrent programming in Scala» book:
In current versions of Scala (2.11.1), however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Could anyone demonstrate this with a small example? And does this still apply to 2.11.7?
The behavior of changes made in one thread when viewed from another is governed by the Java Memory Model. In particular, these rules are extremely weak when it comes to something like building a collection and then passing the built-and-now-immutable collection to another thread. The JMM does not guarantee that the other thread won't see an earlier view where the collection was not fully built!
Since synchronized blocks enforce an ordering, they can be used to get a consistent view if they're used on every single operation.
In practice, though, this is rarely actually necessary. On the CPU side, there is typically a memory barrier operation that can be used to enforce memory consistency (i.e. if you write the tail of your list and then pass a memory barrier, no other thread can see the tail un-set). And in practice, JVMs usually have to implement synchronized by using memory barriers. So one could hope that you could just pass the created list within a synchronzied block, trusting that a memory barrier would be issued, and everything thereafter would be fine.
Unfortunately, the JMM doesn't require that it be implemented in this way (and you can't assume that the memory-barrier-like behavior of object creation will actually be a full memory barrier that applies to everything in that thread as opposed to simply the final fields of that object), which is both why the recommendation is what it is, and why it's not fixed (yet, anyway) in the library.
For what it's worth, on x86 architectures, I've never observed a problem if you hand off the immutable object within a synchronized block. I have observed problems if you try to do it with CAS (e.g. by using the java.util.concurrent.atomic classes).
As an addition to the excellent answer from Rex Kerr:
it should be noted that most common use cases of immutable collections in a multithreading context are not affected by this problem. The only situation where this might affect you is when you do something that you probably should not do in the first place.
E.g. you have a variable var x: Vector[Int], which you write from one thread A and read from another thread B.
If you mark x with #volatile, there will be no problem, since the volatile write introduces a memory barrier. So you will never be able to observe the Vector in an inconsistent state. The same is true when using a synchronized { } block when writing and reading, or when using java.util.concurrent.atomic.AtomicReference.
If you don't mark x with #volatile, you might observe the vector in an inconsistent state (not just wrong elements, but internally inconsistent!). But in that case your code is arguably broken to begin with. It is completely undefined when you will see the changes from A in B.
You might see them
immediately
after there is a memory barrier somewhere else in your program
not at all
depending on the architecture you`re running on, the phase of the moon, whatever. So as Viktor Klang put it: "Unsafe publication is unsafe..."
Note that if you use a higher level concurrency framework such as akka actors, it is also guaranteed that receivers of messages can not see immutable collections in an inconsistent state.

Atomic function/method in scala (without introducing actor system overheads)

I currently use an Akka actor to establish a code block that is executed atomically and in a thread safe manner (Akka mailbox semantics impose atomicity by virtue of processing one message at a time).
However this introduces the need for an actor system, and additional side-effects or bloat (having to manually propagate exceptions to the caller, losing type safety on ask, and in general using message semantics rather than function calls).
Can a thread-safe atomic code block be accomplished in scala in a simpler way? would you apply #volatile to a function?
It depends on what kind of shared state you want to protect here:
The easiest and universal choice is using same old synchronized. However, unlike the Akka, it's completely blocking, so may easily kill your performance and of course the code-style, as it's hard to control messy side effects. It may also allow for dead-locks.
Java's locks is same approach, but might be a little better for performance.
Another option is same old Java's AtomicReference(implements CAS operations) and related classes. The positive thing about is that they're non-blocking - developers actually use them to build high-performant collections. The ways of using locks and CAS are decribed here. They both are pretty low-level mechanizms, so I would not recommend to use them much, especially for business-logic (any actor's implementation would be better).
If your shared state is a collection - you may want use same old Java's concurrent collections (they have atomic operations like putIfAbscent). Scala has interesting non-blocking TrieMap for instance.
Scala STM is also an alternative
Finally, this question is dedicated to lightweight actor model implementations.
P.S. Volatile annotation is nothing more than volatile keyword analog from Java. You can put it on the method just because any annotation can be put on anything.
Depending on what you're trying to achieve, the simplest might be old synchronized:
//your mutable state
private var x = 0
//better than locking on 'this' is to have a dedicated lock
private val lock = new Object
def add(i:Int) = lock.synchronized { x += i }
This is the 'old Java' way, but it might work for you depending on what you're doing. Of course, this is the fastest way to deadlocks if your synchronize operation is more complex and/or you need high throughput.

What happened to Scala.React?

I read the paper cowritten by Odersky, "Deprecating the Observer Pattern
with Scala.React"
The github looks abandoned:
https://github.com/ingoem/scala-react
Also, the recent Reactive Programming Coursera class, used the JavaRx Observable library (with Scala support of course).
Is there a story behind this? I can presume scala.react just didn't make it very far. Is the JavaRx library based on Observable advisable? Or can we expect something similar or better from Typesafe?
Citing Li Haoyi,
who has used Scala.React, his observations are:
"it is extremely difficult to set up and get started."
"It requires a fair amount of global configuration"
"It took several days to get a basic dataflow graph (..,) working."
He had a lot of questions but did not manage to contact the author of the publication...
Li also implemented a Scala.RX addressing these and other issues.
The code is good shape but I cannot observe any action of pushing it into the Standard Scala library. Also, Li is the driver behind the ongoing Scala & Javascript effort thus he is mostly occupied with that project.
Answering your questions:
Is the JavaRx library based on Observable advisable?
JavaRx is based on the Observer pattern Martin Odersky tried to deprecate...
https://github.com/Netflix/RxJava/blob/master/rxjava-core/src/main/java/rx/Observer.java
https://github.com/Netflix/RxJava/blob/master/rxjava-core/src/main/java/rx/Observable.java
While every issue Martin pointed out in the paper is true and valid,
Netflix had exploited a major property of Observables:
Futures and Observables share an isomorphism, thus are composable.
In JavaRx, an Observable returns a stream of events. However, a Future
on the other hand, can be seen as a specialized Observable that returns
only a singleton. In this case, Futures and Observables can be asynchronously composed
whenever it makes sense.
Is there a story behind this?
No idea but maybe Netflix did some sponsoring. You may have noticed the Netflix logo appearing in the RX diamonds examples....
Or can we expect something similar or better from Typesafe?
I honestly doubt that. Why should they? Typesafe is busy with pushing their
stack into industry and advancing Akka further. Scala.React is a neat idea but
does not produce any cash whereas Akka brings them paying customers....
Instead I would ask the question what exactly Scala.React, after all, tries to solve?
IMHO,JavaRx already does a good job, is in production and those improvements Scala.React could possible add are most likely not enough for a major change.
RxJava: Reactive Extensions has very little in common with scala.react. RxJava deals with observers and concurrency but helps very little regarding correctness of evaluation order. Basically it is just streams of events, and if events that are split into several effects those will never be coherent again. Basically it's a mess and can only be used for GUI where precision in computation is not so critical. You never know when you get an extra update or extra refresh.
scala.react is a single threaded computation model and deals with order of computation with a strict evaluation order that is defined by the functional dependencies between computations.
Akka, or actors, again, is a third model and completely different thing. It is just threads with some fancy syntax and scheduliing, really.
No wonder everyone is confused. Sadly scala.react has not moved anywhere, which is bad as it's the only innovative model of these three.

Why do Eclipse APIs use Arrays instead of Collections?

In the Eclipse APIs, the return and argument types are mostly arrays instead of collections. An example is the members method on IContainer, which returns IResources[].
I am interested in why this is the case. Maybe it is one of the following:
The APIs were designed before generics generics were available, so IResource[] was better than just Collection or List
Memory concerns, e.g. ArrayList internally holds an array which has more space than is needed (to offer an efficient implementation of add), whereas an array is always constructed for just the needed target size
It's not possible to add/remove elements on an array, so it is safe for iterating (but defensive copying is still necessary, because one can still change elements, e.g. set them to null)
Does anyone have any insights or other ideas why the API was developed that way?
Posting this as an answer, so it can be accepted.
Eclipse predates generics and they are really serious about API stability. Also, at the low level of SWT passing arrays seems to be used to reflect the operating system APIs that are being wrapped. Once you have a bunch of tooling using Arrays I guess it makes sense to keep things consistent. Also note that arrays aren't subject to all of the type erasure issues when using reflection.
Yeah, I hear you as far as the collections api being generally much easier to work with for dynamic lists of items.