Strategy for reading and writing to threaded data structure

Strategy for reading and writing to threaded data structure - iphone

I have a class that contains an NSDictionary and periodically, I have a thread writing data into this NSDictionary. Then at other times, I have another view controller reading data out of the class's NSDictionary.
What's the best objective-c way to make the data in this class thread-safe so that if you were to ask data for 'read', you're getting the correct version aka, the last written version and not the one that maybe getting written to currently?

As Carl mentioned, #synchronized is one option.
If you are targeting iOS 4.0+, another one is using a Grand Central Dispatch queue to regulate access to a shared data structure from multiple threads/queues. The WWDC 2010 Session 211 video has a good explanation of this technique.
In a nutshell: you create a custom GCD queue (dispatch_queue_create()) whose single responsibility is to regulate access to the shared data structure. All code that accesses the shared structure then must do so from inside this queue. Because queues only execute one block of code at a time, no two threads can access the data structure at the same time.

You are looking for #synchronize, I think.

Related

How to store sagas’ data?

From what I read aggregates must only contain properties which are used to protect their invariants.
I also read sagas can be aggregates which makes sense to me.
Now I modeled a registration process using a saga: on RegistrationStarted event it sends a ReserveEmail command which will trigger an EmailReserved or EmailReservationFailed given if the email is free or not. A listener will then either send a validation link or a message telling an account already exists.
I would like to use data from the RegistrationStarted event in this listener (say the IP and user-agent). How should I do it?
Storing these data in the saga? But they’re not used to protect invariants.
Pushing them through ReserveEmail command and the resulting event? Sounds tedious.
Project the saga to the read model? What about eventual consistency?
Another way?

Rinat Abdullin wrote a good overview of sagas / process managers.
The usual answer is that the saga has copies of the events that it cares about, and uses the information in those events to compute the command messages to send.
List[Command] processManager(List[Event] events)
Pushing them through ReserveEmail command and the resulting event?
Yes, that's the usual approach; we get a list [RegistrationStarted], and we use that to calculate the result [ReserveEmail]. Later on, we'll get [RegistrationStarted, EmailReserved], and we can use that to compute the next set of commands (if any).
Sounds tedious.
The data has to travel between the two capabilities somehow. So you are either copying the data from one message to another, or you are copying a correlation identifier from one message to another and then allowing the consumer to decide how to use the correlation identifier to fetch a copy of the data.
Storing these data in the saga? But they’re not used to protect invariants.
You are typically going to be storing events in the sagas (to keep track of what has happened). That gives you a copy of the data provided in the event. You don't have an invariant to protect because you are just caching a copy of a decision made somewhere else. You won't usually have the process manager running queries to collect additional data.
What about eventual consistency?
By their nature, sagas are always going to be "eventually consistent"; the "state" of an instance of a saga is just cached copies of data controlled elsewhere. The data is probably nanoseconds old by the time the saga sees it, there's no point in pretending that the data is "now".
If I understand correctly I could model my saga as a Registration aggregate storing all the events whose correlation identifier is its own identifier?
Udi Dahan, writing about CQRS:
Here’s the strongest indication I can give you to know that you’re doing CQRS correctly: Your aggregate roots are sagas.

iOS: using GCD with Core Data

at the heart of it, my app will ask the user for a bunch of numbers, store them via core data, and then my app is responsible for showing the user the average of all these numbers.
So what I figure I should do is that after the user inputs a new number, I could fire up a new thread, fetch all the objects in a NSFetchDescription instance and call it on my NSManagedObjectContext, do the proper calculations, and then update the UI on the main thread.
I'm aware that the rule for concurrency in Core Data is one thread per NSManagedObjectContext instance so what I want to know is, do you I think can what I just described without having my app explode 5 months down the line? I just don't think it's necessary to instantiate a whole a new context just to do some measly calculations...

Based on what you have described, why not just store the numbers as they are entered into a CoreData model and also into an NSMutableArray? It seems as though you are storing these for future retrieval in case someone needs to look at (and maybe modify) a previous calculation. Under that scenario, there is no need to do a fetch after a current set of numbers is entered. Just use a mutable array and populate it with all the numbers for the current calculation. As a number is entered, save it to the model AND to the array. When the user is ready to see the average, do the math on the numbers in the already populated array. If the user wants to modify a previous calculation, retrieve those numbers into an array and work from there.
Bottom line is that you shouldn't need to work with multiple threads and merging Contexts unless you are populating a model from a large data set (like initial seeding of a phonebook, etc). Modifying a Context and calling save on that context is a very fast thing for such a small change as you are describing.

I would say you may want to do some testing, especially in regard to the size of the data set. if it is pretty small, the sqlite calls are pretty fast so you may get away with doing in on the main queue. But if it is going to take some time, then it would be wise to get it off the main thread.
Apple introduced the concept of parent and child managed object contexts in 2011 to make using MO contexts on different threads easier. you may want to check out the WWDC videos on Core Data.
You can use NSExpression with you fetch to get really high performance functions like min, max, average, etc. here is a good link. There are examples on SO
http://useyourloaf.com/blog/2012/01/19/core-data-queries-using-expressions.html
Good luck!

Cocoa app layout with Core Data and lots of business logic

For those who have seen my other questions: I am making progress but I haven't yet wrapped my head around this aspect. I've been pouring over stackoverflow answers and sites like Cocoa With Love but I haven't found an app layout that fits (why such a lack of scientific or business app examples? recipe and book examples are too simplistic).
I have a data analysis app that is laid out like this:
Communication Manager (singleton, manages the hardware)
DataController (tells Comm.mgr what to do, and checks raw data it receives)
Model (receives data from datacontroller, cleans, analyzes and stores it)
MainViewController (skeleton right now, listens to comm.mgr to present views and alerts)
Now, never will my data be directly shown on a view (like a simple table of entities and attributes), I'll probably use core plot to plot the analyzed results (once I figure that out). The raw data saved will be huge (10,000's of points), and I am using a c++ vector wrapped in an ObjC++ class to access it. The vector class also has the encodeWithCoder and initWithCoder functions which use NSData as a transport for the vector. I'm trying to follow proper design practices, but I'm lost on how to get persistent storage into my app (which will be needed to store and review old data sets).
I've read several sources that say the "business logic" should go into the model class. This is how I have it right now, I send it the raw data, and it parses, cleans and analyzes the results and then saves those into ivar arrays (of the vector class). However, I haven't seen a Core Data example yet that has a Managed Object that is anything but a simple storage of very basic attributes (strings, dates) and they never have any business logic. So I wonder, how can I meld these two aspects? Should all of my analysis go into the data controller and have it manage the object context? If so, where is my model? (seems to break the MVC architecture if my data is stored in my controller - read: since these are vector arrays, I can't be constantly encoding and decoding them into NSData streams, they need a place to exist before I save them to disk with Core Data, and they need a place to exist after I retrieve them from storage and decode them for review).
Any suggestions would be helpful (even on the layout I've already started). I just drew some of the communication between objects to give you an idea. Also, I don't have any of the connections between the model and view/view controllers yet (using NSLog for now).

While vector<> is great for handling your data that you are sampling (because of its support for dynamically resizing underlying storage), you may find that straight C arrays are sufficient (even better) for data that is already stored. This does add a level of complexity but it avoids a copy for data arrays that are already of a known and static size.
NSData's -bytes returns a pointer to the raw data within an NSData object. Core Data supports NSData as one its attribute types. If you know the size of each item in data, then you can use -length to calculate the number of elements, etc.
On the sampling side, I would suggest using vector<> as you collect data and, intermittently, copy data to an NSData attribute and save. Note: I ran into a bit of problem with this approach (Truncated Core Data NSData objects) that I attribute to Core Data not recognizing changes made to NSData attribute when it is backed by an NSMutableData object and that mutable object's data is changed.
As for MVC question. I would suggest that data (model) is managed in by Model. Views and Controllers can ask Model for data (or subsets of data) in order to display. But ownership of data is with the Model. In my case, which may be similar to yours, there were times when the Model returns abridged data sets (using Douglas-Peucker algorithm). The views and controllers were none the wiser that points were being dropped - even though their requests to the Model may have played in a role in that (graph scaling factors, etc.).
Update
Here is a snippet of code from my Data class which extends NSManagedObject. For a filesystem solution, NSFileHandle's -writeData: and methods for monitoring file offset might allow similar (better) management controls.
// Exposed interface for adding data point to stored data
- (void) addDatum:(double_t)datum
{
[self addToCache:datum];
}
- (void) addToCache:(double_t)datum
{
if (cache == nil)
{
// This is temporary. Ideally, cache is separate from main store, but
// is appended to main store periodically - and then cleared for reuse.
cache = [NSMutableData dataWithData:[self dataSet]];
[cache retain];
}
[cache appendBytes:&datum length:sizeof(double_t)];
// Periodic copying of cache to dataSet could happen here...
}
// Called at end of sampling.
- (void) wrapup
{
[self setDataSet:[NSData dataWithData:cache]]; // force a copy to alert Core Data of change
[cache release];
cache = nil;
}

I think you may be reading into Core Data a bit too much. I'm not that experienced with it, so I speak as a non-expert, but there are basically two categories of data storage systems.
First is the basic database, like SQLite, PostgreSQL, or any number of solutions. These are meant to store data and retrieve it; there's no logic so you have to figure out how to manage the tables, etc. It's a bit tricky at times but they're very efficient. They're best at managing lots of data in raw form; if you want objects with that data you have to create and manage them yourself.
Then you have something like Core Data, which shouldn't be considered a database as much as an "object persistence" framework. With Core Data, you create objects, store them, and then retrieve them as objects. It's great for applications where you have large numbers of objects that each contain several pieces of data and relationships with other objects.
From my limited knowledge of your situation, I would venture a guess that a true database may be better suited to your needs, but you'll have to make the decision there (like I said, I don't know much about your situation).
As for MVC, my view is that the view should only contain display code, the model should only contain code for managing the data itself and its storage, and the controller should contain the logic that processes the data. In your case it sounds like you're gathering raw data and processing it before storing it, in which case you'd want to have another object to process the data before storing it in the model, then a separate controller to sort, manage, and otherwise prepare the data before the view receives it. Again, this may not be the best explanation (or the best methods for your situation) so take it with a grain of salt.
EDIT: Also, if you're looking on getting into Core Data more, I like this book. It explains the whole object-persistence-vs-database concept a lot better than I can.

Multithreaded use of Core Data (NSOperationQueue and NSManagedObjectContext)

In Apple's Core Data documentation for Concurrency with Core Data, they list the preferred method for thread safety as using a seperate NSManagedObjectContext per thread, with a shared NSPersistentStoreCoordinator.
If I have a number of NSOperations running one after the other on an NSOperationQueue, will there be a large overhead creating the context with each task?
With the NSOperationQueue having a max concurrent operation count of 1, many of my operations will be using the same thread. Can I use the thread dictionary to create one NSManagedObjectContext per thread? If I do so, will I have problems cleaning up my contexts later?
What’s the correct way to use Core Data in this instance?

The correct way to use Core Data in this case is to create a separate NSManagedObjectContext per operation or to have a single context which you lock (via -[NSManagedObjectContext lock] before use and -[NSManagedObjectContext unlock] after use). The locked approach might make sense if the operations are serial and there are no other threads using the context.
Which approach to use is an empirical question that can't be asnwered without data. There are too many variables to have a general rule. Hard numbers from performance testing are the only way to make an informed decision.

Operations started using NSOperationQueue using a maximum concurrent operation count of 1 will not run all operations on the same thread. The operations will be executed one after the other, but a new thread will be created every time.
So creating objects in the thread dictionary will be of little use.

While this question is old, it's actually at the top of Google's search results on 'NSMangedObjectContext threading', so, I'll just drop in a new answer.
The new 'preferred' method is to use the initWithConcurrencyType: and tell the MOC whether it's a main thread MOC or a secondary thread moc. You can then use the new performBlock: and performBlockAndWait: methods on it and the MOC will take care of serializing operations on it's 'native' thread.
The issue then becomes how do you intelligently handle merging the data between the various MOCs your application may spawn, along with a thousand other details that make life 'fun' as a programmer.

Should an NSLock instance be "global"?

Should I make a single NSLock instance in the application delegate, to be used by all classes? Or is it advisable to have each class instantiate its own NSLock instance as needed?
Would the locking work in the second case, if I, for example, had access to a managed object context that is spread across two view controllers?

If multiple objects access your object only to read its contents, then you do not need a lock at all. If at least one of the objects accesses your object to write/update its contents, then it does not matter if the other objects access your object to read or write/update it: in this case you need a lock.
Now, in order to correctly protect your object (in a critical section of code where multiple objects may access it), you must use the SAME LOCK INSTANCE which must then be shared by ALL of the possible objects accessing the object you are willing to protect.
If your application need to protect an object that may be accessed simultaneously by the majority of the classes, then having a single lock instance is fine. If you want better performances (especially if the number of simultaneous accesses to your object is high), then you can have multiple locks. Each lock will be responsible for allowing/denying access to a specific attribute/field of your object. This way, several objects may access your object changing a different attribute/field simultaneously. You are basically incrementing the number of concurrent operations on your object. However, each lock MUST STILL be shared among the other objects that will access the object you are protecting.
Having a lock instance for each controller simply does NOT work; this will NOT protect your object from concurrent accesses from other objects in different threads. NSLock is implemented using POSIX pthread mutexes, so it must be used in exactly the same way. This is also clearly stated in the NSLock documentation:
Warning: The NSLock class uses POSIX threads to implement its locking behavior. When sending an unlock message to an NSLock object, you must be sure that message is sent from the same thread that sent the initial lock message. Unlocking a lock from a different thread can result in undefined behavior.
So, in order to preserve the critical section semantics, it is the same thread that acquired the lock that is responsible for releasing it when done. Note also that the locking mechanism is intended for fast operations only, i.e. you should acquire a lock only for a short period of time before releasing it. If you need to wait for an unpredictable amount of time, then you need a different synchronization mechanism, namely a condition variable which is available through the NSCondition class.
Hope this helps.

You should not use locks with Core Data. That documentation is probably out of date. Ideally you should have one context per thread and let the context handle the locking of its underlying NSPersistentStoreCoordinator. This is considered the only safe way to use Core Data in a multi-threaded application currently.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse