How to save objects using Multi-Threading in Core Data? - iphone

I'm getting some data from the web service and saving it in the core data. This workflow looks like this:
get xml feed
go over every item in that feed, create a new ManagedObject for every feed item
download some big binary data for every item and save it into ManagedObject
call [managedObjectContext save:]
Now, the problem is of course the performance - everything runs on the main thread. I'd like to re-factor as much as possible to another thread, but I'm not sure where I should start. Is it OK to put everything (1-4) to the separate thread?

Yes, I recommend reviewing both Apple's Docs on multi-threaded Core Data and my article on the MDN (Mac Developer Network) http://www.mac-developer-network.com/columns/coredata/may2009/ which discuss the things you need to avoid and how to set everything up.
BTW, saving a lot of binary data into a Core Data object is generally a bad idea. The rule goes:
< 100KB save in the entity
< 1MB save in a separate entity hanging off a relationship
1MB save to disk and store its path into the managed object
Therefore you could spin off the download of the binary data into separate threads, save them to disk and then tell the main thread the NSManagedObjectID of the referencing object and the path and let the main thread do the very quick and easy linking. That would let your Core Data implementation stay single threaded and only spin off the data downloads.

Related

What are the best practices to cache the data?

What are the best practices to cache the data in iOS apps connected to data source via web service?
You should lookat NSCache
http://developer.apple.com/library/mac/#documentation/Cocoa/Reference/NSCache_Class/Reference/Reference.html
An NSCache object is a collection-like container, or cache, that
stores key-value pairs, similar to the NSDictionary class. Developers
often incorporate caches to temporarily store objects with transient
data that are expensive to create. Reusing these objects can provide
performance benefits, because their values do not have to be
recalculated. However, the objects are not critical to the application
and can be discarded if memory is tight. If discarded, their values
will have to be recomputed again when needed.
Depends on the type of data
for binary data (files):
- Cache your files in the Cache folder using NSFileManager and NSData writeToFile:
for small ammounts of data (ascii/utf8):
- Use NSUserDefaults
for large ammounts of data (ascii/utf8):
- Use a sqlite3 database
It depends on how much data you want to cache and how you'll be accessing it once you have it cached, and a bunch of other cache management issues.
If you have a small amount of data, you could store that in a dictionary or array, and simply write it out and read it in. But this kind of solution can become slow if you have a lot of data; those reads and writes can take a long time. And flushing a dirty cache to disk means writing the whole object.
You could write individual files, but again, if you have a lot of files that might become a performance issue as well.
Another alternative is to use CoreData. If you have a lot of data (say, many objects) it may make sense to define what those look like as CoreData entities. Then you just store and fetch objects as you need them, falling back to fetching from your web service (and then caching) if the data is not local. You can also optimize other cache management tasks (like expiring unused entries) easily and efficiently using CoreData.
I actually went down this road, with a couple different apps. I started with an NSDictionary, and that became quite slow. I switched to CoreData, which not only simplified a lot of my code for cache initialization and management, but gave the apps quite a performance boost in the process.
If you're using NSURLConnection, or anything that uses NSURLRequest, caching is already taken care of for you:
http://developer.apple.com/library/ios/#documentation/Cocoa/Conceptual/URLLoadingSystem/Tasks/UsingNSURLConnection.html#//apple_ref/doc/uid/20001836-169425
http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/URLLoadingSystem/Concepts/CachePolicies.html#//apple_ref/doc/uid/20001843-BAJEAIEE
By default these use the cache policies of the protocol, which for a web service would be the HTTP headers it returns. This is also true, IIRC, of ASIHttpRequest.
Core Data also implements its own row and object caching, which works pretty well. So the reality here is that you really don't need to worry about caching when it comes to these things - it's optimizing your use of things like NSDateFormatter that starts to become important (they're expensive to create, not thread safe, etc...)
And when in doubt, use Instruments to find bottlenecks and latency

Core Data with only one Data Context. Is it right?

I'm trying to make my first application using Objective C + Core Data, but I'm not sure it's the correct way, as it feels really weird to me.
I have only one data context, which I create at launch time, in the Application Delegate. This data context is used for all the operations (read, write). In another environment (C# and LINQ for example), I try to make these operations as unitary as possible. Here it seems I just have to create the data context once, and work with it without closing it ever (except when the application exits).
I also have an asynchronous operation in which I update this data. Of course, it uses the same data context again. It works, but doesn't feel right.
My Application Delegate keeps a NSArray of the objects contained in Core Data. I use this same NSArray in all my views.
I would actually naturally close the data context once I got all the objects I require, but... aren't the objects always attached to the data context? If I close or release the data context, all these objects will get releases as well, right?
As you can notice, there is something I'm missing here :) Thanks for your help.
The NSManagedObjectContext to which you refer is more of a "scratchpad" than a database connection. Objects are created, amended, destroyed in this working area, and only persisted ("written to the database" if you prefer) when you tell the MOC to save state. You can (and should) init and release MOCs if you are working in separate threads, but the App Delegate makes a MOC available so that all code executing on the main thread can use the same context. This is both convenient, and saves you from having to ensure that multiple MOCs are kept in sync with each other.
By keeping an NSArray of Core Data objects, you are in effect duplicating its functionality. Is there any reason for not working with an NSSet of Core Data objects provided by the MOC?
If you are working asynchronously, then you should not be sharing an NSManagedObjectContext object across threads, as they are not thread-safe. Instead, create one for each thread, but set them to use same NSPersistentStoreCoordinator. This will serialise their access to the persisted data, but you'll need to use notifications to make them each aware of the others changes.
There is a good tutorial/description on how to use Core Data on multiple threads here:
http://www.duckrowing.com/2010/03/11/using-core-data-on-multiple-threads/
1) CORE DATA AND THREADS, WITHOUT THE HEADACHE
http://www.cimgf.com/2011/05/04/core-data-and-threads-without-the-headache/
2) Concurrency with Core Data
http://developer.apple.com/library/ios/#documentation/cocoa/conceptual/CoreData/Articles/cdConcurrency.html
3) Multi-Context CoreData
http://www.cocoanetics.com/2012/07/multi-context-coredata/

How to sync CoreData and a REST web service asynchronously and the same time properly propagate any REST errors into the UI

Hey, I'm working on the model layer for our app here.
Some of the requirements are like this:
It should work on iPhone OS 3.0+.
The source of our data is a RESTful Rails application.
We should cache the data locally using Core Data.
The client code (our UI controllers) should have as little knowledge about any network stuff as possible and should query/update the model with the Core Data API.
I've checked out the WWDC10 Session 117 on Building a Server-driven User Experience, spent some time checking out the Objective Resource, Core Resource, and RestfulCoreData frameworks.
The Objective Resource framework doesn't talk to Core Data on its own and is merely a REST client implementation. The Core Resource and RestfulCoreData all assume you talk to Core Data in your code and they solve all the nuts and bolts in the background on the model layer.
All looks okay so far and initially I though either Core Resource or RestfulCoreData will cover all of the above requirements, but... There's a couple of things none of them seemingly happen to solve correctly:
The main thread should not be blocked while saving local updates to the server.
If the saving operation fails the error should be propagated to the UI and no changes should be saved to the local Core Data storage.
Core Resource happens to issue all of its requests to the server when you call - (BOOL)save:(NSError **)error on your Managed Object Context and therefore is able to provide a correct NSError instance of the underlying requests to the server fail somehow. But it blocks the calling thread until the save operation finishes. FAIL.
RestfulCoreData keeps your -save: calls intact and doesn't introduce any additional waiting time for the client thread. It merely watches out for the NSManagedObjectContextDidSaveNotification and then issues the corresponding requests to the server in the notification handler. But this way the -save: call always completes successfully (well, given Core Data is okay with the saved changes) and the client code that actually called it has no way to know the save might have failed to propagate to the server because of some 404 or 421 or whatever server-side error occurred. And even more, the local storage becomes to have the data updated, but the server never knows about the changes. FAIL.
So, I'm looking for a possible solution / common practices in dealing with all these problems:
I don't want the calling thread to block on each -save: call while the network requests happen.
I want to somehow get notifications in the UI that some sync operation went wrong.
I want the actual Core Data save fail as well if the server requests fail.
Any ideas?
You should really take a look at RestKit (http://restkit.org) for this use case. It is designed to solve the problems of modeling and syncing remote JSON resources to a local Core Data backed cache. It supports an offline mode for working entirely from the cache when there is no network available. All syncing occurs on a background thread (network access, payload parsing, and managed object context merging) and there is a rich set of delegate methods so you can tell what is going on.
There are three basic components:
The UI Action and persisting the change to CoreData
Persisting that change up to the server
Refreshing the UI with the response of the server
An NSOperation + NSOperationQueue will help keep the network requests orderly. A delegate protocol will help your UI classes understand what state the network requests are in, something like:
#protocol NetworkOperationDelegate
- (void)operation:(NSOperation *)op willSendRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
- (void)operation:(NSOperation *)op didSuccessfullySendRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
- (void)operation:(NSOperation *)op encounteredAnError:(NSError *)error afterSendingRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
#end
The protocol format will of course depend on your specific use case but essentially what you're creating is a mechanism by which changes can be "pushed" up to your server.
Next there's the UI loop to consider, to keep your code clean it would be nice to call save: and have the changes automatically pushed up to the server. You can use NSManagedObjectContextDidSave notifications for this.
- (void)managedObjectContextDidSave:(NSNotification *)saveNotification {
NSArray *inserted = [[saveNotification userInfo] valueForKey:NSInsertedObjects];
for (NSManagedObject *obj in inserted) {
//create a new NSOperation for this entity which will invoke the appropraite rest api
//add to operation queue
}
//do the same thing for deleted and updated objects
}
The computational overhead for inserting the network operations should be rather low, however if it creates a noticeable lag on the UI you could simply grab the entity ids out of the save notification and create the operations on a background thread.
If your REST API supports batching, you could even send the entire array across at once and then notify you UI that multiple entities were synchronized.
The only issue I foresee, and for which there is no "real" solution is that the user will not want to wait for their changes to be pushed to the server to be allowed to make more changes. The only good paradigm I have come across is that you allow the user to keep editing objects, and batch their edits together when appropriate, i.e. you do not push on every save notification.
This becomes a sync problem and not one easy to solve. Here's what I'd do: In your iPhone UI use one context and then using another context (and another thread) download the data from your web service. Once it's all there go through the sync/importing processes recommended below and then refresh your UI after everything has imported properly. If things go bad while accessing the network, just roll back the changes in the non UI context. It's a bunch of work, but I think it's the best way to approach it.
Core Data: Efficiently Importing Data
Core Data: Change Management
Core Data: Multi-Threading with Core Data
You need a callback function that's going to run on the other thread (the one where actual server interaction happens) and then put the result code/error info a semi-global data which will be periodically checked by UI thread. Make sure that the wirting of the number that serves as the flag is atomic or you are going to have a race condition - say if your error response is 32 bytes you need an int (whihc should have atomic acces) and then you keep that int in the off/false/not-ready state till your larger data block has been written and only then write "true" to flip the switch so to speak.
For the correlated saving on the client side you have to either just keep that data and not save it till you get OK from the server of make sure that you have a kinnf of rollback option - say a way to delete is server failed.
Beware that it's never going to be 100% safe unless you do full 2-phase commit procedure (client save or delete can fail after the signal from the server server) but that's going to cost you 2 trips to the server at the very least (might cost you 4 if your sole rollback option is delete).
Ideally, you'd do the whole blocking version of the operation on a separate thread but you'd need 4.0 for that.

Good strategies for REST -> XML -> Core Data -> UITableView?

What are good practices for asynchronously pulling large amounts of XML from a RESTful service into a Core Data store, and from this store, populating a UITableView on the fly?
I'm thinking of using libxml2's xmlParseChunk() function to parse chunks of incoming XML and translate a node and its children into the relevant managed objects, as nodes come in.
At the same time that these XML nodes are turned into managed objects, I want to generate UITableView rows, in turn. Say, 50 rows at a time. Is this realistic?
In your experience, what do you do to accomplish this task, to maintain performance and handle, potentially, thousands of rows? Are there different, simpler approaches that work as well or better?
Sure, this is a pretty standard thing. The easiest solution is to do the loading in a background thread on one MOC, and have the UI running on the main thread with its own MOC. Whenever you get a chunk of data you want to have appear (say 50 entries), you have the background MOCsave:.
Assuming you have the foreground MOC rigged to merge changes (via mergeChangesFromContextDidSaveNotification:) then whenever you save the background MOC the foreground MOC will get all of those changes. Assuming you are using NSFetchedResultsController it has delegate methods to cope with changes in its MOC, and if you are using Apple's sample code then you probably already have everything setup correctly.
In general CoreData is going to be faster than anything you roll yourself unless you really know what you are doing and are willing to spend a ton of time tuning for your specific case. The biggest thing you can do is make sure that slow things (like XML processing and synchronous flash I/O caused by save:) are not on the main thread blocking user interaction.
Joe Hewitt (Facebook app developer) has release much of his code as open-source. It is called Three20. There is a class there that is great for fetching internet data and populating it into a table, without the need for the data beforehand. The classes used for this are called TTTableViewController and TTTableViewDataSource.
From here, it would not be much of a stretch to store as CoreData, just subclass the classes as you see fit with the supplied hooks.
If you are worried about too much data, 50 at a time does sound reasonable. These classes have a built in "More" button to help you out.
From the Three20 readme:
Internet-aware table view controllers
TTTableViewController and
TTTableViewDataSource help you to
build tables which load their content
from the Internet. Rather than just
assuming you have all the data ready
to go, like UITableView does by
default, TTTableViewController lets
you communicate when your data is
loading, and when there is an error or
nothing to display. It also helps you
to add a "More" button to load the
next page of data, and optionally
supports reloading the data by shaking
the device.
No one has mentioned RestKit yet? My friends ... seriously, you have to check this out. If you are doing anything with REST on iOS (and now on OS X) and particularly if you're wanting to work with Core Data ... PLEASE have a look at RestKit. I've saved countless hours implementing some pretty complex data synchronization between a server and my Core Data models on iOS. RestKit made it so damned easy, it almost makes you sick.

Core data, file downloads, and thread-safety

What's the preferred approach for constantly sharing data across threads when using Core Data? I am downloading a large file and want to show progress of the download in a UIProgressBar. The actual download is happening in a background thread created by NSOperation.
The download information (local path, total bytes, bytes received) is modeled as a Core Data managed object, and the actual file is stored in the Documents/ directory. One solution that came to my mind was to create a separate managed object context in the background thread and pass it the objectID and pull it up using the objectWithID: method. Whenever the background thread does a save, the main thread gets a notification and the main context merges those changes and the table view is subsequently updated.
This approach works, but the save can't be done too frequently or the UI freezes up. So the UI is updated after every X KB's of data is received where X has to be at least 500 KB for the UI to be somewhat responsive. Is there a better approach to pass the download progress data to the main thread as it is received?
EDIT: Would using KVO be of any help? If yes, do you know of any good tutorials on the topic?
I know you already built your own system, but I use ASIHTTPRequest for all my network operations. It is very robust and has tons of goodies like file resuming, saving directly to disk, upload progress monitoring, download progress monitoring, and the kitchen sink. If you dont use it, you can look at the source to see how they do it because the UI never freezes when I use the progress reporting in this framework.
Although I am going to use ASIHTTPRequest for my project, it's still good to mention my solution to the problem for completeness. It is kind of obvious, but saving the core data context as frequently as every couple of seconds is a terrible mistake.
Instead, I added a progress delegate to the download operation, which gets update notification on the main thread.
NSNumber bytesDownloaded = [NSNumber numberWithLongLong:[data length]];
[downloadDelegate performSelectorOnMainThread:#selector(updateProgress:) withObject:bytesDownloaded waitUntilDone:NO];
The important thing was to pass the download progress information to the delegate on the main thread. The delegate updates the progress, keeps accumulating changes and either saves when the download completes or at much bigger intervals.