i want to render 30 different images. Each task has to merge different image layers to just one final image- 30 final images.
Currently i use a GCD serial queue. Now i want to know if this approach uses the CPU power of all available cores automatically?
Or can i improve the rendertime for all these tasks when using a GCD concurrent queue instead?
Thanks for clarification..
Serial queue = 1 task = 1 core. But the real problem in your use case is I/O contention. What happens if you spawn a concurrent queue to read from one resource? you end up with the CPU(s) sitting idle on each block while they take turns reading the disk. GCD reacts to idle CPU increasing the thread pool. In this case that results in too many threads and even more contention.
The solution is to use dispatch_io functions for the reading, and do the image processing on a different concurrent queue, which will be free to grow as needed.
dispatch_queue_t imageProcessing = dispatch_queue_create("com.yourReverseDomainHere", DISPATCH_QUEUE_CONCURRENT);
for (NSURL *url in ...){
dispatch_io_t io = dispatch_io_create_with_path(DISPATCH_IO_RANDOM,[[url path] fileSystemRepresentation], O_RDONLY, 0, NULL, NULL);
dispatch_io_set_low_water(io, SIZE_MAX);
dispatch_io_read(io, 0, SIZE_MAX, dispatch_get_main_queue(),^(bool done, dispatch_data_t data, int error){
// convert the file from dispatch_data_t to NSData
const void *buffer = NULL;
size_t size = 0;
dispatch_data_t tmpData = dispatch_data_create_map(data, &buffer, &size);
NSData *nsdata = [[NSData alloc] initWithBytes:buffer length:size];
dispatch_release(tmpData);
free(buffer);
// send this nsdata elsewhere for processing
dispatch_async(imageProcessing, ^{
// ...image processing code...
});
});
}
A serial queue runs one task at a time and thus only uses one core at a time per serial queue (though which core is used at any time is not defined and can change).
Related
I am observing high CPU usage in my UDP server implementation which runs an infinite loop expecting 15 1.5KB packets every milliseconds. It looks like below:
struct RecvContext
{
enum { BufferSize = 1600 };
RecvContext()
{
senderSockAddrLen = sizeof(sockaddr_storage);
memset(&overlapped, 0, sizeof(OVERLAPPED));
overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
memset(&sendersSockAddr, 0, sizeof(sockaddr_storage));
buffer.clear();
buffer.resize(BufferSize);
wsabuf.buf = (char*)buffer.data();
wsabuf.len = ULONG(buffer.size());
}
void CloseEventHandle()
{
if (overlapped.hEvent != INVALID_HANDLE_VALUE)
{
CloseHandle(overlapped.hEvent);
overlapped.hEvent = INVALID_HANDLE_VALUE;
}
}
OVERLAPPED overlapped;
int senderSockAddrLen;
sockaddr_storage sendersSockAddr;
std::vector<uint8_t> buffer;
WSABUF wsabuf;
};
void Receive()
{
DWORD flags = 0, bytesRecv = 0;
SOCKET sockHandle =...;
while (//stopping condition//)
{
std::shared_ptr<RecvContext> _recvContext = std::make_shared<IO::RecvContext>();
if (SOCKET_ERROR == WSARecvFrom(sockHandle, &_recvContext->wsabuf, 1, nullptr, &flags, (sockaddr*)&_recvContext->sendersSockAddr,
(LPINT)&_recvContext->senderSockAddrLen, &_recvContext->overlapped, nullptr))
{
if (WSAGetLastError() != WSA_IO_PENDING)
{
//error
}
else
{
if (WSA_WAIT_FAILED == WSAWaitForMultipleEvents(1, &_recvContext->overlapped.hEvent, FALSE, INFINITE, FALSE))
{
//error
}
if (!WSAGetOverlappedResult(sockHandle, &_recvContext->overlapped, &bytesRecv, FALSE, &flags))
{
//error
}
}
}
_recvContext->CloseEventHandle();
// async task to process _recvContext->buffer
}
}
The cpu consumption for this udp server is very high even when the packets are not being processed post receipt. How can the cpu consumption be improved here?
You've chosen about the most inefficient combination of mechanisms imaginable.
Why use overlapped I/O if you're only going to pend one operation and then wait for it complete?
Why use an event, which is about the slowest notification scheme that Windows has.
Why do you only pend one operation at a time? You're forcing the implementation to stash datagrams in its own buffers and then copy them into yours.
Why do you post the receive operation right before you're going to wait for it to complete rather than right after the previous one completes?
Why do you create a new receive context each time instead of re-using the existing buffer, event, and so on?
Use IOCP. Windows events are very slow and heavy.
Post lots of operations. You want the operating system to be able to put the datagram right in your buffer rather than having to allocate another buffer that it copies data into and out of.
Re-use your buffers and allocate all your receive buffers from a contiguous pool rather than fragmenting them throughout process memory. The memory used for your buffers has to be pinned and you want to minimize the amount of pinning needed.
Re-post operations as soon as they complete. Don't process them and then re-post. There's no reason to delay starting the operation. You can probably ignore this if you followed all the other suggestions because you wouldn't have a "spare" buffer to post anyway.
Alternatively, you can probably get away with having a thread that spins on a blocking receive operation. Just make sure your code has a loop that is as tight as possible, posting a different (already-allocated) buffer as soon as it returns after dispatching another thread to process the buffer it just filled with the receive operation.
The problem is I need to modify (update/create/delete) from 0 to 10000 NSManagedObject's subclasses. Of course if it's <= 1000 everything works fine. I'm using this code:
+ (void)saveDataInBackgroundWithBlock:(void (^)(NSManagedObjectContext *))saveBlock completion:(void (^)(void))completion {
NSManagedObjectContext *tempContext = [self newMergableBackgroundThreadContext];
[tempContext performBlock:^{
if (saveBlock) {
saveBlock(tempContext);
}
if ([tempContext hasChanges]) {
[tempContext saveWithCompletion:completion];
} else {
dispatch_async(dispatch_get_main_queue(), ^{
if (completion) {
completion();
}
});
}
}];
}
- (void)saveWithCompletion:(void(^)(void))completion {
[self performBlock:^{
NSError *error = nil;
if ([self save:&error]) {
NSNumber *contextID = [self.userInfo objectForKey:#"contextID"];
if (contextID.integerValue == VKCoreDataManagedObjectContextIDMainThread) {
dispatch_async(dispatch_get_main_queue(), ^{
if (completion) {
completion();
}
});
}
[[self class] logContextSaved:self];
if (self.parentContext) {
[self.parentContext saveWithCompletion:completion];
}
} else {
[VKCoreData handleError:error];
dispatch_async(dispatch_get_main_queue(), ^{
if (completion) {
completion();
}
});
}
}];
}
completion will be fired only when main-thread context will be saved. This solution works just perfect, but
When I get more than 1000 entities from server I would like to parallel objects processing, cause update operation takes too much time (for example, 4500 updating about 90 seconds and less than 1/3 of this time takes JSON receiving process, so about 60 seconds I just drilling NSManagedObjects). Without CoreData it's pretty easy by using dispatch_group_t to divide data into subarray and process it in different threads at the same time, but... is somebody knows how to make something similar with CoreData and NSManagedObjectContexts? Is it possible to working with NSManagedObjectContext with NSPrivateQueueConcurrencyType (iOS 5 style) without performBlock: ? And what is the best way to save and merge about 10 contexts? Thanks!
By your description, it appears you are grasping at straws to recover performance.
Core Data file I/O performance is dominated by the single threaded nature of SQLite. Having multiple contexts beating on the same store coordinator is not going to make things go faster.
To improve performance, you need to do things differently. For example, you could batch your background writes into larger operations. (How? You need to do more in each GCD block before the save.) You can use Core Data's debugging tools to see what kind of SQL is being emitted by your fetches and saves. (There are lots of ways to improve CD fetch performance, fewer to improve saving.)
ok people, after I finish implementing all I want I discovered the following:
dispatch_group_t with different PrivateQueues and NSManagedObjectContexts results:
format is "number of entities/secs":
333 /6
1447/27
3982/77
Single background thread (NSManagedObjectContext + NSPrivateQueueConcurrencyType + performBlock:)
333 /1
1447/8
3982/47
So think I shouldn't try it again, also there is a lot of another issues like app freezes while merging a great number of context (even in background). I will try something else to improve performance.
You can create multiple contexts and process a slice of your data on each one...?
This question already has answers here:
dispatch_async vs. dispatch_sync using Serial Queues in Grand Central Dispatch
(3 answers)
Closed 10 years ago.
When I submit two blocks into a serial queue by dispatch_async, does it assure the second one runs after the first one:
dispatch_async(serial_queue, b1);
dispatch_async(serial_queue, b2);
can we assure b1 runs before b2?
Below is full source code section:
#define COUNTER 10000
m_value = 0;
dispatch_queue_t globalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_queue_t queue = dispatch_queue_create("myqueue", NULL);
dispatch_group_t group = dispatch_group_create();
for (int i = 0; i < COUNTER; ++i) {
dispatch_group_async(group, globalQueue, ^() {
dispatch_async(queue, ^() {
++m_value;
});
});
}
dispatch_group_notify(group, queue, ^() {
NSLog(#"m_value Actual: %d Expected: %d", m_value, COUNTER);
});
dispatch_release(queue);
dispatch_release(group);
queue = nil;
group = nil;
return YES;
Can we assure m_value == COUNTER always? Thanks
Blocks submitted to concurrent queues may execute concurrently, however from the Apple docs "Blocks submitted to a serial queue are executed one at a time in FIFO order." The main queue is serial as are any queues that you create with dispatch_create_queue.
according the documentation the blocks (on global queue) could be executed concurrently
"Blocks submitted to these global concurrent queues may be executed concurrently with respect to each other."
this may sound a newbie question anyway Im new to GCD,
I'm creating and running these two following threads. The first one puts data into ivar mMutableArray and the second one reads from it. How do i lock and unclock the threads to avoid crashes and keep the code thread safe ?
// Thread for writing data into mutable array
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_source_t timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, queue);
if (timer) {
dispatch_source_set_timer(timer, dispatch_time(DISPATCH_TIME_NOW, NSEC_PER_SEC * interval), interval * NSEC_PER_SEC, leeway);
dispatch_source_set_event_handler(timer, ^{
...
// Put data into ivar
[mMutableArray addObject:someObject];
...
});
dispatch_resume(timer);
}
// Thread for reading from mutable array
dispatch_source_t timer1 = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, queue);
if (timer1) {
dispatch_source_set_timer(timer1, dispatch_time(DISPATCH_TIME_NOW, NSEC_PER_SEC * interval), interval * NSEC_PER_SEC, leeway);
dispatch_source_set_event_handler(timer1, ^{
...
if (mMutableArray) {
// Read data from ivar
SomeObject* someobject = [mMutableArray lastObject];
}
...
});
dispatch_resume(timer1);
}
You are using it wrong, by locking access to the variables you are simply losing any benefit from GCD. Create a single serial queue which is associated to the variables you want to modify (in this case the mutable array). Then use that queue to both write and read to it, which will happen in guaranteed serial sequence and with minimum locking overhead. You can read more about it in "Asynchronous setters" at http://www.fieryrobot.com/blog/2010/09/01/synchronization-using-grand-central-dispatch/. As long as your access to the shared variable happens through its associated dispatch queue, you won't have concurrency problems ever.
I've used mutexes in my project and I'm quite happy with how it works at the moment.
Create the mutex and initialise it
pthread_mutex_t *mutexLock;
pthread_mutex_init(&_mutex, NULL);
Then put the lock around your code, once the second thread tries to get the lock, it will wait until the lock is freed again by the first thread. Note that you still might want to check if the first thread is actually the one writing to it.
pthread_mutex_lock(self_mutex);
{
** Code here
}
pthread_mutex_unlock(self_mutex);
You can still use #synchronized on your critical sections with GCD
I have two methods that I need to run, lets call them metA and metB.
When I start coding this app, I called both methods without using threads, but the app started freezing, so I decided to go with threads.
metA and metB are called by touch events, so they can occur any time in any order. They don't depend on each other.
My problem is the time it takes to either threads start running. There's a lag between the time the thread is created with
[NSThread detachNewThreadSelector:#selector(.... bla bla
and the time the thread starts running.
I suppose this time is related to the amount of time required by iOS to create the thread itself. How can I speed this? If I pre create both threads, how do I make them just do their stuff when needed and never terminate? I mean, a kind of sleeping thread that is always alive and works when asked and sleeps after that?
thanks.
If you want to avoid the expensive startup time of creating new threads, create both threads at startup as you suggested. To have them only run when needed, you can have them wait on a condition variable. Since you're using the NSThread class for threading, I'd recommend using the NSCondition class for condition variables (an alternative would be to use the POSIX threading (pthread) condition variables, pthread_cond_t).
One thing you'll have to be careful of is if you get another touch event while the thread is still running. In that case, I'd recommend using a queue to keep track of work items, and then the touch event handler can just add the work item to the queue, and the worker thread can process them as long as the queue is not empty.
Here's one way to do this:
typedef struct WorkItem
{
// information about the work item
...
struct WorkItem *next; // linked list of work items
} WorkItem;
WorkItem *workQueue = NULL; // head of linked list of work items
WorkItem *workQueueTail = NULL; // tail of linked list of work items
NSCondition *workCondition = NULL; // condition variable for the queue
...
-(id) init
{
if((self = [super init]))
{
// Make sure this gets initialized before the worker thread starts
// running
workCondition = [[NSCondition alloc] init];
// Start the worker thread
[NSThread detachNewThreadSelector:#selector(threadProc:)
toTarget:self withObject:nil];
}
return self;
}
// Suppose this function gets called whenever we receive an appropriate touch
// event
-(void) onTouch
{
// Construct a new work item. Note that this must be allocated on the
// heap (*not* the stack) so that it doesn't get destroyed before the
// worker thread has a chance to work on it.
WorkItem *workItem = (WorkItem *)malloc(sizeof(WorkItem));
// fill out the relevant info about the work that needs to get done here
...
workItem->next = NULL;
// Lock the mutex & add the work item to the tail of the queue (we
// maintain that the following invariant is always true:
// (workQueueTail == NULL || workQueueTail->next == NULL)
[workCondition lock];
if(workQueueTail != NULL)
workQueueTail->next = workItem;
else
workQueue = workItem;
workQueueTail = workItem;
[workCondition unlock];
// Finally, signal the condition variable to wake up the worker thread
[workCondition signal];
}
-(void) threadProc:(id)arg
{
// Loop & wait for work to arrive. Note that the condition variable must
// be locked before it can be waited on. You may also want to add
// another variable that gets checked every iteration so this thread can
// exit gracefully if need be.
while(1)
{
[workCondition lock];
while(workQueue == NULL)
{
[workCondition wait];
// The work queue should have something in it, but there are rare
// edge cases that can cause spurious signals. So double-check
// that it's not empty.
}
// Dequeue the work item & unlock the mutex so we don't block the
// main thread more than we have to
WorkItem *workItem = workQueue;
workQueue = workQueue->next;
if(workQueue == NULL)
workQueueTail = NULL;
[workCondition unlock];
// Process the work item here
...
free(workItem); // don't leak memory
}
}
If you can target iOS4 and higher, consider using blocks with Grand Central Dispatch asynch queue, which operates on background threads which the queue manages... or for backwards compatibility, as mentioned use NSOperations inside an NSOperation queue to have bits of work performed for you in the background. You can specify exactly how many background threads you want to support with an NSOperationQueue if both operations have to run at the same time.