Should I make a single NSLock instance in the application delegate, to be used by all classes? Or is it advisable to have each class instantiate its own NSLock instance as needed?
Would the locking work in the second case, if I, for example, had access to a managed object context that is spread across two view controllers?
If multiple objects access your object only to read its contents, then you do not need a lock at all. If at least one of the objects accesses your object to write/update its contents, then it does not matter if the other objects access your object to read or write/update it: in this case you need a lock.
Now, in order to correctly protect your object (in a critical section of code where multiple objects may access it), you must use the SAME LOCK INSTANCE which must then be shared by ALL of the possible objects accessing the object you are willing to protect.
If your application need to protect an object that may be accessed simultaneously by the majority of the classes, then having a single lock instance is fine. If you want better performances (especially if the number of simultaneous accesses to your object is high), then you can have multiple locks. Each lock will be responsible for allowing/denying access to a specific attribute/field of your object. This way, several objects may access your object changing a different attribute/field simultaneously. You are basically incrementing the number of concurrent operations on your object. However, each lock MUST STILL be shared among the other objects that will access the object you are protecting.
Having a lock instance for each controller simply does NOT work; this will NOT protect your object from concurrent accesses from other objects in different threads. NSLock is implemented using POSIX pthread mutexes, so it must be used in exactly the same way. This is also clearly stated in the NSLock documentation:
Warning: The NSLock class uses POSIX threads to implement its locking behavior. When sending an unlock message to an NSLock object, you must be sure that message is sent from the same thread that sent the initial lock message. Unlocking a lock from a different thread can result in undefined behavior.
So, in order to preserve the critical section semantics, it is the same thread that acquired the lock that is responsible for releasing it when done. Note also that the locking mechanism is intended for fast operations only, i.e. you should acquire a lock only for a short period of time before releasing it. If you need to wait for an unpredictable amount of time, then you need a different synchronization mechanism, namely a condition variable which is available through the NSCondition class.
Hope this helps.
You should not use locks with Core Data. That documentation is probably out of date. Ideally you should have one context per thread and let the context handle the locking of its underlying NSPersistentStoreCoordinator. This is considered the only safe way to use Core Data in a multi-threaded application currently.
Related
Most POSIX named objects (or all?) have unlink functions. e.g:
shm_unlink
mq_unlink
They all have in common, that they remove the name of the object from the system, causing next opens to fail or create a new object.
Why is this designed like this? I know, this is connected to the "everything is a file" policy, but why not delete the file on close? Would you do this the same if you create a new interface?
I think, this has a big drawback. Say, we have a server process and several client processes. If any process unlinks the object (by mistake) all new clients would not find the server. (This can be prohibited by user permissions on the according file, but still...)
Would it not be better, if it had reference counting and the name would be removed automatically when the last object is closed? Why would you want to keep it open?
Because they are low level tools that could be used when performance matters. Deleting the object when it is not used to create it again on next use has a (slight) performance penalty against keeping it alive.
I once used a named semaphore that I used to synchronize accesses to a spool with various producers and consumers. I used an init module to create the named semaphore that was called as part of the boot process, and all other processes knew that the well known semaphore should exist.
If you want a more programmer friendly way that creates the object on demand and destroys it when it is no longer used, you can build a higher level library and encapsulate the creation/unlink operations in it. But if the system call included it, it would not be possible to build a user level library avoiding it.
Would it not be better, if it had reference counting and the name would be removed automatically when the last object is closed?
No.
Because unlink() can fail and because always unlinking a resource that can be shared between processes when all processes merely close that resource simply doesn't fit the paradigm of a shared resource.
You don't demolish a roller coaster just because there's no one waiting in line to ride it again at that moment in time.
I am new to Scala.
I am trying to figure out how to ensure thread safety with functions in a Scala object (aka singleton)
From what I have read so far, it seems that I should keep visibility to function scope (or below) and use immutable variables wherever possible. However, I have not seen examples of where thread safety is violated, so I am not sure what other precautions should be taken.
Can someone point me to a good discussion of this issue, preferably with examples of where thread safety is violated?
Oh man. This is a huge topic. Here's a Scala-based intro to concurrency and Oracle's Java lessons actually have a pretty good intro as well. Here's a brief intro that motivates why concurrent reading and writing of shared state (of which Scala objects are particular specific case) is a problem and provides a quick overview of common solutions.
There's two (fundamentally related) classes of problems when it comes to thread safety and state mutation:
Clobbering (missing) writes
Inaccurate (changing out from under you) reads
Let's look at each of these in turn.
First clobbering writes:
object WritesExample {
var myList: List[Int] = List.empty
}
Imagine we had two threads concurrently accessing WritesExample, each of executes the following updateList
def updateList(x: WritesExample.type): Unit =
WritesExample.myList = 1 :: WritesExample.myList
You'd probably hope when both threads are done that WritesExample.myList has a length of 2. Unfortunately, that might not be the case if both threads read WritesExample.myList before the other thread has finished a write. If when both threads read WritesExample.myList it is empty, then both will write back a list of length 1, with one write overwriting the other, so that in the end WritesExample.myList only has a length of one. Hence we've effectively lost a write we were supposed to execute. Not good.
Now let's look at inaccurate reads.
object ReadsExample {
val myMutableList: collection.mutable.MutableList[Int]
}
Once again, let's say we had two threads concurrently accessing ReadsExample. This time each of them executes updateList2 repeatedly.
def updateList2(x: ReadsExample.type): Unit =
ReadsExample.myMutableList += ReadsExample.myMutableList.length
In a single-threaded context, you would expect updateList2, when repeatedly called, to simply generate an ordered list of incrementing numbers, e.g. 0, 1, 2, 3, 4,.... Unfortunately, when multiple threads are accessing ReadsExample.myMutableList with updateList2 at the same time, it's possible that between when ReadsExample.myMutableList.length is read and when the write is finally persisted, ReadsExample.myMutableList has already been modified by another thread. So in theory you could see something like 0, 0, 1, 1 or potentially if one thread takes longer to write than another 0, 1, 2, 1 (where the slower thread finally writes to the list after the other thread has already accessed and written to the list three times).
What happened is that the read was inaccurate/out-of-date; the actual data structure that was updated was different from the one that was read, i.e. was changed out from under you in the middle of things. This is also a huge source of bugs because many invariants you might expect to hold (e.g. every number in the list corresponds exactly to its index or every number appears only once) hold in a single-threaded context, but fail in a concurrent context.
Now that we've motivated some of the problems, let's dive into some of the solutions. You mentioned immutability so let's talk about that first. You might notice that in my example of clobbering writes I use an immutable data structure whereas in my inconsistent reads example I use a mutable data structure. That is intentional. They are in a sense dual to one another.
With immutable data structures you cannot have an "inaccurate" read in the sense I laid out above because you never mutate data structures, but rather place a new copy of a data structure in the same location. The data structure cannot change out from under you because it cannot change! However you can lose a write in the process by placing a version of a data structure back to its original location that does not incorporate a change made previously by another process.
With mutable data structures on the other hand, you cannot lose a write because all writes are in-place mutations of the data structure, but you can end up executing a write to a data structure whose state differs from when you analyzed it to formulate the write.
If it's a "pick your poison" kind of scenario, why do you often hear advice to go with immutable data structures to help with concurrency? Well immutable data structures make it easier to ensure invariants about the state being modified hold even if writes are lost. For example, if I rewrote the ReadsList example to use an immutable List (and a var instead), then I could confidently say that the integer elements of the list will always correspond to the indices of the list. This means that your program is much less likely to enter an inconsistent state (e.g. it's not hard to imagine that a naive mutable set implementation could end up with non-unique elements when mutated concurrently). And it turns out that modern techniques for dealing with concurrency usually are pretty good at dealing with missing writes.
Let's look at some of those approaches that deal with shared state concurrency. At their hearts they can all be summed up as various ways of serializing read/write pairs.
Locks (a.k.a. directly try to serialize read/write pairs): This is usually the one you'll hear first as a fundamental way of dealing with concurrency. Every process that wants to access state first places a lock on it. Any other process is now excluded from accessing that state. The process then writes to that state and on completion releases the lock. Other processes are now free to repeat the process. In our WritesExample, updateList would first acquire the lock before executing and releasing the lock; this would prevent other processes from reading WritesExample.myList until the write was completed, thereby preventing them from seeing old versions of myList that would lead to clobbering writes (note that are more sophisticated locking procedures that allow for simultaneous reads, but let's stick with the basics for now).
Locks often do not scale well to multiple pieces of state. With multiple locks, often you need to acquire and release locks in a certain order otherwise you can end up deadlocking or livelocking.
The Oracle and Twitter docs linked a the beginning have good overviews of this approach.
Describe Your Action, Don't Execute It (a.k.a. build up a serial representation of your actions and have someone else process it): Instead of accessing and modifying state directly, you describe an action of how to do this and then give it to someone else to actually execute the action. For example, you might pass messages to an object (e.g. actors in Scala) that queues up these requests and then executes them one-by-one on some internal state that it never directly exposes to anyone else. In the particular case of actors, this improves the situation over locks by removing the need to explicitly acquire and release locks. As long as you encapsulate all the state you need to access at once in a single object, message passing works great. Actors break down when you distribute state across multiple objects (and as such this is heavily discouraged in this paradigm).
Akka actors are one good example of this in Scala.
Transactions (a.k.a. temporarily isolate some reads and writes from others and let the isolation system serialize things for you): Wrap all your read/writes in transactions that ensure during the course of your reads and writes your view of the world is isolated from any other changes. There's usually two ways of achieving this. Either you go for an approach similar to locks where you prevent other people from accessing the data while a transaction is running or you restart a transaction from the very beginning whenever you detect that a change has occurred to the shared state and throw away any progress you've made (usually the latter for performance reasons). On the one hand, transactions, unlike locks and actors, scale to disparate pieces of state very well. Just wrap all your accesses in transactions and you're good to go. On the other hand, your reads and writes have to be side-effect-free because they might be thrown away and retried many times and you can't really undo most side effects.
And if you're really unlucky, although you usually can't truly deadlock with a good implementation of transactions, a long-lived transaction can constantly be interrupted by other short-lived transactions such that it keeps getting thrown away and retried and never actually succeeds (which amounts to something like livelocking). In effect you're giving up direct control of serialization order and hoping your transaction system orders things sensibly.
Scala's STM library is a good example of this approach.
Remove Shared State: The final "solution" is to rethink the problem altogether and try to think about whether you truly need global, shared state that is writable. If you don't need writable shared state, then concurrency problems go away altogether!
Everything in life is about trade-offs and concurrency is no exception. When thinking about concurrency first understand what state you have and what invariants you want to preserve about that state. Then use that to guide your decision as to what kind of tools you want to use to tackle the problem.
The Thread Safety Problem section within this Scala concurrency article might be of interest to you. In essence, it illustrates the thread safety problem using a simple example and outlines 3 different approaches to tackle the problem, namely synchronization, volatile and AtomicReference:
When you enter synchronized points, access volatile references, or
deference AtomicReferences, Java forces the processor to flush their
cache lines and provide a consistent view of data.
There is also a brief overview comparing the cost of the 3 approaches:
AtomicReference is the most costly of these two choices since you
have to go through method dispatch to access values. volatile and
synchronized are built on top of Java’s built-in monitors. Monitors
cost very little if there’s no contention. Since synchronized allows
you more fine-grained control over when you synchronize, there will be
less contention so synchronized tends to be the cheapest option.
This is not specific to Scala, if your object contains a state that can be modified concurrently thread safety can be violated depending on the implementation. For example:
object BankAccount {
private var balance: Long = 0L
def deposit(amount: Long): Unit = balance += amount
}
In this case the the object is not thread safe, there are a lot of approachs to make it thread safe, for example using Akka, or synchronized blocks. For simplicity I will write it using synchronized blocks
object BankAccount {
private var balance: Long = 0L
def deposit(amount: Long): Unit =
this.synchronized {
balance += amount
}
}
I'm passing user information as part of each call to a stateful service. I use this information for audit purposes in the service.
Do I have to pass this information around within the service, or is there some other mechanism such as a user context to keep the data in that I can access globally? I have used thread storage (data slots) in the past to hold data, but since the code is async I assume that won't work?
There's a concept called CallContext, that could help. Read more here: http://blog.stephencleary.com/2013/04/implicit-async-context-asynclocal.html
You can't keep that information in some member in the service since you have no control over how concurrent calls are being executed async on multiple threads. The way the runtime allows Tasks and async methods to 'jump' between means that ThreadStorage is not a safe way to keep that context.
What you need is something that can carry all the way from your ServiceRemotingDispatcher down to the execution of your actual service methods and is not affected by concurrent calls (to potentially the same method). The way the underlying Service Fabric implementation executes your method means that there is multiple Tasks and async methods that are being called, this also means that simply based on this the Thread id is out of the question as an option since it is almost guaranteed to jump threads at least once before you reach your code.
I wrote this answer to a similar question (that was deleted, so I added this q back myself). It basically solves your question and it is based on CallContext as mentioned by #LoekD
From the «Learning concurrent programming in Scala» book:
In current versions of Scala (2.11.1), however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Could anyone demonstrate this with a small example? And does this still apply to 2.11.7?
The behavior of changes made in one thread when viewed from another is governed by the Java Memory Model. In particular, these rules are extremely weak when it comes to something like building a collection and then passing the built-and-now-immutable collection to another thread. The JMM does not guarantee that the other thread won't see an earlier view where the collection was not fully built!
Since synchronized blocks enforce an ordering, they can be used to get a consistent view if they're used on every single operation.
In practice, though, this is rarely actually necessary. On the CPU side, there is typically a memory barrier operation that can be used to enforce memory consistency (i.e. if you write the tail of your list and then pass a memory barrier, no other thread can see the tail un-set). And in practice, JVMs usually have to implement synchronized by using memory barriers. So one could hope that you could just pass the created list within a synchronzied block, trusting that a memory barrier would be issued, and everything thereafter would be fine.
Unfortunately, the JMM doesn't require that it be implemented in this way (and you can't assume that the memory-barrier-like behavior of object creation will actually be a full memory barrier that applies to everything in that thread as opposed to simply the final fields of that object), which is both why the recommendation is what it is, and why it's not fixed (yet, anyway) in the library.
For what it's worth, on x86 architectures, I've never observed a problem if you hand off the immutable object within a synchronized block. I have observed problems if you try to do it with CAS (e.g. by using the java.util.concurrent.atomic classes).
As an addition to the excellent answer from Rex Kerr:
it should be noted that most common use cases of immutable collections in a multithreading context are not affected by this problem. The only situation where this might affect you is when you do something that you probably should not do in the first place.
E.g. you have a variable var x: Vector[Int], which you write from one thread A and read from another thread B.
If you mark x with #volatile, there will be no problem, since the volatile write introduces a memory barrier. So you will never be able to observe the Vector in an inconsistent state. The same is true when using a synchronized { } block when writing and reading, or when using java.util.concurrent.atomic.AtomicReference.
If you don't mark x with #volatile, you might observe the vector in an inconsistent state (not just wrong elements, but internally inconsistent!). But in that case your code is arguably broken to begin with. It is completely undefined when you will see the changes from A in B.
You might see them
immediately
after there is a memory barrier somewhere else in your program
not at all
depending on the architecture you`re running on, the phase of the moon, whatever. So as Viktor Klang put it: "Unsafe publication is unsafe..."
Note that if you use a higher level concurrency framework such as akka actors, it is also guaranteed that receivers of messages can not see immutable collections in an inconsistent state.
In Apple's Core Data documentation for Concurrency with Core Data, they list the preferred method for thread safety as using a seperate NSManagedObjectContext per thread, with a shared NSPersistentStoreCoordinator.
If I have a number of NSOperations running one after the other on an NSOperationQueue, will there be a large overhead creating the context with each task?
With the NSOperationQueue having a max concurrent operation count of 1, many of my operations will be using the same thread. Can I use the thread dictionary to create one NSManagedObjectContext per thread? If I do so, will I have problems cleaning up my contexts later?
What’s the correct way to use Core Data in this instance?
The correct way to use Core Data in this case is to create a separate NSManagedObjectContext per operation or to have a single context which you lock (via -[NSManagedObjectContext lock] before use and -[NSManagedObjectContext unlock] after use). The locked approach might make sense if the operations are serial and there are no other threads using the context.
Which approach to use is an empirical question that can't be asnwered without data. There are too many variables to have a general rule. Hard numbers from performance testing are the only way to make an informed decision.
Operations started using NSOperationQueue using a maximum concurrent operation count of 1 will not run all operations on the same thread. The operations will be executed one after the other, but a new thread will be created every time.
So creating objects in the thread dictionary will be of little use.
While this question is old, it's actually at the top of Google's search results on 'NSMangedObjectContext threading', so, I'll just drop in a new answer.
The new 'preferred' method is to use the initWithConcurrencyType: and tell the MOC whether it's a main thread MOC or a secondary thread moc. You can then use the new performBlock: and performBlockAndWait: methods on it and the MOC will take care of serializing operations on it's 'native' thread.
The issue then becomes how do you intelligently handle merging the data between the various MOCs your application may spawn, along with a thousand other details that make life 'fun' as a programmer.