copy to a list, pool, set of threads (1:n) / each thread local storage - threadpool

I'm currently exploring boost::thread/threadpool and thread local storage basically to achieve a copy of one datagram to the job-queues for a pool of threads.
The current setup uses a 1:1 setup to copy loosely coupled one datagram from one thread to another using a custom notification queue. That queue does support multiple reader threads but every reader consumes the message.
I'm currently exploring to extend the queue by a thread_local_storage variant to get each datagram copied to the job-queue of each thread in the pool.
But I was wondering if boost might already support this kind of operation despite I coudln't find anything. Does boost out-of-the-box support a single copy to the local storage of every thread in a pool?
Thank you very much!

Why would you want a copy of the data to every thread? The only reason I can think of is, because the thread is making changes to the data that should not be visible to other threads. If no changes are made, simply pass a reference (aka pointer) to the data to every thread. If changes have to be made, pass a reference to every thread too and let the thread make a copy of the data. If you need the data to be released, when it's not used any more, use a shared pointer.
Thread local storage is mostly used to emulate old style static variables.

Related

Asynchronouos Socket Communication & Heap fragmentation

I wrote a multithreaded Socket Server application which accepts over a 1000 concurrent connections. Recently we had application crash; after analyzing the dump files came to know app has crash due to heap corruption. I found the same issue discussed in following links.
.NET Does NOT Have Reliable Asynchronouos Socket Communication?
http://support.microsoft.com/kb/947862
And also discussion suggest 3 solutions.
The network application should have an upper bound on the number of outstanding asynchronous IO that it posts.
Use Microsoft CCR
Use TPL
Due to the time factor, I thought to stick with #1, but I don't have a clear picture how to implement this. Can some one give a good starting point please?
And also has anyone used Async with TPL to solve this issue?
You mean a better starting point than the blog posting that I linked to in the answer that you refer to?
The issue is this:
Memory and other per-operation resources that are used during an async write are often "in use" until the remote peer's TCP stack acks the data and the local stack can complete your async write operation to tell you that you can reuse your buffer.
The local peer has no control over this as it's all governed by the speed at which the remote peer reads data from its socket and the congestion on the link between the two peers.
Because of the above you need to have a hard limit on the amount of async writes that you have outstanding at any one time. You can track this by incrementing a counter just before you issue an async write and decrementing it in the completion handler.
What you do once you hit that limit is up to you. In the original article I favour a queue that data to be written is placed into. This queue can then be used as a source of data as write completions occur. Once the queue is empty you can send normally again. Of course this simply moves the problem - you still have a memory resource that's controlled by the remote peer (the queued data) but you don't also have other OS resources used too (non-paged pool, I/O page lock limit, etc).
You could simply stop your peer sending when you reach your limit - and now the API that you build over the async API needs to have a 'can't sent at the moment, try again later' return from a send which previously used to always "work".
If you're doing this I would also seriously look at avoiding the pinned memory issue by allocating a large block of buffers in one contiguous block and using them from the pool.
First, that's a very old KB article. How are you sure you have that particular problem?
Then, as Hans Passant answers in the SO question, if you write bad async code, it will bite you. If you don't take care of your resources (and memory buffers are resources), a concurrent program will face memory errors
It's very hard to write good concurrent code using raw Threads and TPL does make it easier but it won't fix the bugs you already have. In fact, unless you identify your current problems you are likely to transfer them to the version that uses TPL.
Without knowing the specific problem that caused your application to crash, I can only make some suggestions:
Use BufferManager to reuse memory buffers instead of allocating new ones.
Use a queue to store requests and process them asynchronously instead of starting a new thread for each request.
There are other techniques you can use as well, depending on the type of application you are building. Eg you could use TPL DataFlow to break processing in independent steps.
As for CCR, there is not much point in using it outside Robotics Studio. TPL contains most of the relevant functionality you need to write concurrent apps.

iphone - Should I use NSOperationQueue and NSOperation instead of NSThread?

I am facing a design problem of my app.
Basically, the followings are what I am going to do in my app.
A single task is like this:
Read custom object from the underlying CoreData databse
Download a json from a url
Parse the json to update the custom object or create a new one (parsing may take 1 - 3 secs, big data)
Analyse the custom object (some calculations will be involved, may take 1 - 5 sec)
Save the custom object into CoreData database.
There may be a number of tasks being executed concurrently.
The steps within one task obviously are ordered (i.e., without step 2 downloading the json, step 3 cannot continue), but they also can be discrete. I mean, for example, task2's step 4 can be executed before task1's step 3 (if maybe task2's downloading is faster than task1's)
Tasks have priorities. User can start a task with higher priority so all the task's steps will be tried to be executed before all others.
I hope the UI can be responsive as much as possible.
So I was going to creating a NSThread with lowest priority.
I put a custom priority event queue in that thread. Every step of a task becomes an event (work unit). So, for example, step 1 downloading a json becomes an event. After downloading, the event generates another event for step 3 and be put into the queue. every event has its own priority set.
Now I see this article: Concurrency and Application Design. Apple suggests that we Move Away from Threads and use GCD or NSOperation.
I find that NSOperation match my draft design very much. But I have following questions:
In consideration of iPhone/iPad cpu cores, should I just use one NSOperationQueue or create multiple ones?
Will the NSOperationQueue or NSOperation be executed with lowest thread priority? Will the execution affect the UI response (I care because the steps involve computations)?
Can I generate a NSOpeartion from another one and put it to the queue? I don't see a queue property in NSOperation, how do I know the queue?
How do I cooperate NSOperationQueue with CoreData? Each time I access the CoreData, should I create a new context? Will that be expensive?
Each step of a task become a NSOperation, is this design correct?
Thanks
In consideration of iPhone/iPad cpu cores, should I just use one NSOperationQueue or create multiple ones?
Two (CPU, Network+I/O) or Three (CPU, Network, I/O) serial queues should work well for most cases, to keep the app responsive and your programs streaming work by what they are bound to. Of course, you may find another combination/formula works for your particular distribution of work.
Will the NSOperationQueue or NSOperation be executed with lowest thread priority? Will the execution affect the UI response (I care because the steps involve computations)?
Not by default. see -[NSOperation setThreadPriority:] if you want to reduce the priority.
Can I generate a NSOpeartion from another one and put it to the queue? I don't see a queue property in NSOperation, how do I know the queue?
Sure. If you use the serial approach I outlined, locating the correct queue is easy enough -- or you could use an ivar.
How do I cooperate NSOperationQueue with CoreData? Each time I access the CoreData, should I create a new context? Will that be expensive?
(no comment)
Each step of a task become a NSOperation, is this design correct?
Yes - dividing your queues to the resource it is bound to is a good idea.
By the looks, NSOperationQueue is what you're after. You can set the number of concurrent operations to be run at the same time. If using multiple NSOperation, they will all run at the same time ... unless you handle a queue on your own, which will be the same as using NSOperationQueue
Thread priority ... I'm not sure what you mean, but in iOS, the UI drawing, events and user interaction are all run on the main thread. If you are running things on the background thread, the interface will still be responsive, no matter how complicated or cpu-heavy operations you are running
Generating and handling of operations you should do it on the main thread, as it won't take any time, you just run them in a background thread so that your main thread doesn't get locked
CoreData, I haven't worked much with it specifically, but so far every Core~ I've worked with it works perfectly on background threads, so it shouldn't be a problem
As far as design goes, it's just a point of view ... As for me, I would've gone with having one NSOperation per task, and have it handle all the steps. Maybe write callbacks whenever a step is finished if you want to give some feedback or continue with another download or something
The affection of computation when multithreading is not going to be different just because you are using NSThread instead of NSOperation. However keep in mind that must current iOS devices are using dual core processors.
Some of the questions you have are not very specific. You may or may not want to use multiple NSOperationQueue. It all depends on how you want to approach it. if you have different NSOperation subclasses, or different NSBlockOperations, you can manage order of execution by using priorities, or you might want to have different queues for different types of operations (especially when working with serial queues). I personally prefer to use 1 operation queue when dealing with the same type of operation, and have a different operation queue when the operations are not related/dependable. This gives me the flexibility to cancel and stop the operations within a queue based on something happening (network dropping, app going to the background).
I have never found a good reason to add an operation based on something happening during the execution of a current operation. Should you need to do so, you can use NSOperationQueue's class method, currentQueue, which will give you the operation queue in which the current operation is operating.
If you are doing core data work using NSOperation, i would recommend to create a context for each particular operation. Make sure to initialize the context inside the main method, since this is where you are on the right thread of the NSOperation background execution.
You do not necessarily need to have one NSOperation object for each task. You can download the data and parse it inside the NSOperation. You can also do the data download abstractly and do the data manipulation of the content downloaded using the completion block property of NSOperation. This will allow you to use the same object to get the data, but have different data manipulation.
My recommendation would be to read the documentation for NSOperation, NSBlockOperation and NSOperationQueue. Check your current design to see how you can adapt these classes with your current project. I strongly suggest you to go the route of the NSOperation family instead of the NSThread family.
Good luck.
Just to add to #justin's answer
How do I cooperate NSOperationQueue with CoreData? Each time I access
the CoreData, should I create a new context? Will that be expensive?
You should be really careful when using NSOperation with Core Data.
What you always have to remember here is that if you want to run CoreData operations on a separate thread you have to create a new NSManagedObjectContext for that thread, and share the main's Managed Object Context persistant store coordinator (the "main" MOC is the one in the app delegate).
Also, it's very important that the new Managed Object Context for that thread is create from that thread.
So if you plan to use Core Data with NSOperation make sure you initialize the new MOC in NSOperation's main method instead of init.
Here's a really good article about Core Data and threading
Use GCD - its a much better framework than NS*
Keep all your CoreData access on one queue and dispatch_async at the end of your routines to save back to your CoreData database.
If you have a developer account, check this WWDC video out: https://developer.apple.com/videos/wwdc/2012/?id=712

Core Data - How to force NSPrivateQueueConcurrencyType context save in serial?

I was excited about the newly supported concurrent functions of CoreData since iOS 5.
A private queue is maintained and all save or fetch requests can be done via that queue.
However, can I set up the private queue for CoreData so that it executes request one by one?
My app is downloading news items from a number of feeds. Each time after downloading and parsing from one feed are finished, I just save the feed's items into CoreData via the private queue.
However, since I am downloading and parsing from multiple feeds simultaneously, I always have multiple groups of items, i.e., multiple save requests, for the CoreData.
Now the situation is that I guess CoreData just have a number of threads and each one is saving a group of items into the db. My UI got stuck in the mean time.
Do you think I can control the private queue so that no matter how many simultaneous save requests are, they will be done one by one?
Core Data is (probably) only using one serial queue or thread since its serial. I recently converted my app from using a serial queue I had created (app was 4.3) to use this new option in iOS 5. In all cases when you 'performBlock' the method is handled in a serial fashion. Also, you can now call '[moc performBlocK:...]' from any queue as that call is thread safe!
I believe what you want to do is have your background threads, which are most likely adding options, to use 'performBlock:' (without the wait). The block you provide is then queued and processed in a FIFO fashion. Later on, if your table wants to get objects, it can issue a 'performBlockAndWait:', or optionally your code can ask for the latest objects using performBlock, and at the end of the supplied block message back to your app the set of objects you need.
Also, I only ever save often in development builds, to verify validity. Once you are pretty sure things are working OK, you can then just perform a background save once all the data is downloaded.
EIDT: To reiterate - if you are downloading and also using images or other data while loading a viewController, and lots of things are going on, this is the WORST time to do a save. Use a timer or dispatch_after, and many seconds after everything seems stable THEN do the save.

Class which handles tasks in thread

I want to make a class which as long as an instance of it is alive, keeps a thread (worker) going and when someone calls a method on it - performTaskWithData:(NSData*)data - then it should process this data in its worker thread.
If additional data is sent while an operation is taking place, then this new data/operation should be queued until the previous processing is done.
I need each instance of this helper class to hold one single worker thread (i.e. the same thread should handle all the processing).
How should I go about doing this?
NSRunLoop? Synchronize access to data block being passed?
Starting in iOS4, Grand Central Dispatch provides by far the simplest and most powerful interface to multithreaded programming.
If you're a registered developer, go watch some of the WWDC videos from 2010 about it. It's intimidating at first, but it's actually really simple and good.
You can do this directly with NSThreads and run loops. However, I would consider using NSOperationQueues, one per instance of your class and set the maximum concurrency of the queue to 1. Your performTaskWithData: would simply add a new instance of a subclass of NSOperation to the queue and that's it.

How to prevent NSManagedObjectContext from corrupting the database when a thread gets killed while saving?

In this question, octy wrote:
BTW, if you save on a background
thread, you also need to consider what
happens when your app is terminated
while a save operation is in progress.
Background threads get killed right
away, whereas the watchdog waits 5
seconds for the main thread to finish
up.
Now I've spent all day implementing NSOperation and creating NSManagedObjectContext instances directly inside the NSOperation subclasses so every NSOperation owns it's own non-shared MOC. But now this is very bad news since a scenario like this, which likely happens all the time, would corrupt the Core Data database. I mean it can't start to write a half byte of something in the sqlite3 file and then just stop right away.
And then there's another problem: In my NSOperations I also do File I/O with NSFileManager.
So what can I do about this? Must I keep track of all running NSOperations and NSOperationQueues in my app und take care of them quickly in the App Delegate when the app gets terminated, so that I can tell the NSOperations to SFF (Save F*****g Fast) or cancel all operations, grab their MOC's and "hard-save" them immediately? What's best practice to solve this problem?
And why am I hearing about this the first time in my career? I mean none of the NSOperationQueue and Core Data mentioning books even talks about this but it seems it's a random app killer that forces the user to re-install (and possibly lose tons of data) if we don't take explicit care of this.
The main power of MOC is that represents an abstract storage that is independent from the format. It stores all it's objects in the memory and uses the mechanism of transactions to commit any changes. So when you're inserting/deleting/editing some objects that in MOC they are only changed in memory and not in persistent store (whether it is SQLite database, XML file or whatever). The changes are only committed when you call save: method.
As for NSOperation and file handling: if you want to stop some operation, you should call cancel for that. From docs:
This method does not force your
operation code to stop. Instead, it
updates the object’s internal flags to
reflect the change in state. If the
operation has already finished
executing, this method has no effect.
Canceling an operation that is
currently in an operation queue, but
not yet executing, makes it possible
to remove the operation from the queue
sooner than usual.
It means that if there is some IO operation running when you're canceling an operation operation would wait until it is finished.
Also despite the Core Data transaction mechanism you should implement your own one for any data that is not managed by it (if needed).