Patterns for accessing remote data with Core Data? - iphone

I am trying to write a Core Data application for the iPhone that uses an external data source. I'm not really using Core Data to persist my objects but rather for the object life-cycle management. I have a pretty good idea on how to use Core Data for local data, but have run into a few issues with remote data. I'll just use Flickr's API as an example.
The first thing is that if I need say, a list of the recent photos, I need to grab them from an external data source. After I've retrieved the list, it seems like I should iterate and create managed objects for each photo. At this point, I can continue in my code and use the standard Core Data API to set up a fetch request and retrieve a subset of photos about, say, dogs.
But what if I then want to continue and retrieve a list of the user's photos? Since there's a possibility that these two data sets might intersect, do I have to perform a fetch request on the existing data, update what's already there, and then insert the new objects?
--
In the older pattern, I would simply have separate data structures for each of these data sets and access them appropriately. A recentPhotos set and and a usersPhotos set. But since the general pattern of Core Data seems to be to use one managed object context, it seems (I could be wrong) that I have to merge my data with the main pool of data. But that seems like a lot of overhead just to grab a list of photos. Should I create a separate managed object context for the different set? Should Core Data even be used here?
I think that what I find appealing about Core Data is that before (for a web service) I would make a request for certain data and either filter it in the request or filter it in code and produce a list I would use. With Core Data, I can just get list of objects, add them to my pool (updating old objects as necessary), and then query against it. One problem, I can see with this approach, however, is that if objects are externally deleted, I can't know, since I'm keeping my old data.
Am I way off base here? Are there any patterns people follow for dealing with remote data and Core Data? :) I've found a few posts of people saying they've done it, and that it works for them, but little in the way of examples. Thanks.

You might try a combination of two things. This strategy will give you an interface where you get the results of a NSFetchRequest twice: Once synchronously, and once again when data has been loaded from the network.
Create your own subclass of
NSFetchRequest that takes an additional block property to
execute when the fetch is finished.
This is for your asynchronous
request to the network. Let's call
it FLRFetchRequest
Create a class to which you pass
this request. Let's call it
FLRPhotoManager. FLRPhotoManager has a method executeFetchRequest: which takes an
instance of the FLRFetchRequest and...
Queues your network request based on the fetch request and passes along the retained fetch request to be processed again when the network request is finished.
Executes the fetch request against your CoreData cache and immediately returns the results.
Now when the network request finishes, update your core data cache with the network data, run the fetch request again against the cache, and this time, pull the block from the FLRFetchRequest and pass the results of this fetch request into the block, completing the second phase.
This is the best pattern I have come up with, but like you, I'm interested in other's opinions.

It seems to me that your first instincts are right: you should use fetchrequests to update your existing store. The approach I used for an importer was the following: get a list of all the files that are eligible for importing and store it somewhere. I'm assuming here that getting that list is fast and lightweight (just a name and an url or unique id), but that really importing something will take a bit more time and effort and the user may quit the program or want to do something else before all the importing is done.
Then, on a separate background thread (this is not as hard as it sounds thanks to NSRunLoop and NSTimer, google on "Core Data: Efficiently Importing Data"), get the first item of that list, get the object from Flickr or wherever and search for it in the Core Data database (carefully read Apple's Predicate Programming Guide on setting up efficient, cached NSFetchRequests). If the remote object already lives in Core Data, update the information as necessary, if not insert. When that is done, remove the item from the to-be-imported list and move on to the next one.
As for the problem of objects that have been deleted in the remote store, there are two solutions: periodic syncing or lazy, on-demand syncing. Does importing a photo from Flickr mean importing the original thing and all its metadata (I don't know what the policy is regarding ownership etc) or do you just want to import a thumbnail and some info?
If you store everything locally, you could just run a check every few days or weeks to see if everything in your local store is present remotely as well: if not, the user may decide to keep the photo anyway or delete it.
If you only store thumbnails or previews, then you will need to connect to Flickr each time the user wants to see the full picture. If it has been deleted, you can then inform the user and delete it locally as well, or mark it as not being accessible any more.

For a situation like this you could use Cocoa's archiving facilities to save the photo objects (and an index) to disk between sessions, and just overwrite it all every time the app calls home to Flickr.
But since you're already using Core Data, and like the features it provides, why not modify your data model to include a "source" or "callType" attribute? At the moment you're implicitly creating a bunch of objects with source "Flickr API", but you can just as easily treat the different API calls as unique sources and then store that explicitly.
To handle deletion, the simplest way would be to clear the data store each time it's refreshed. Otherwise you'd need to iterate over everything and only delete the photo objects with filenames that weren't included in the new results.
I'm planning to do something similar to this myself so I hope this helps.
PS: If you're not storing the photo objects between sessions at all, you could just use two different contexts and query them separately. As long as they're never saved, and the central store doesn't have anything in it already, it would work just like you describe.

Related

Best way to store dyamic data on iOS App from Web Service

I want to know what is the best way to store data on the iPhone from a web service.
I want the information to be stored on the device so the person doesn't need to access the web service every time he/she needs it. The currently information isn't much and contains less that 150 records. The records might update from time to time and a few new ones will be added. What is the best way to go about storing the data?
Thanks
If you use ASIHTTPRequest for your network stuff (and if you don't already, I can't sing its praises highly enough), you will find it has a cache layer built in which is perfect for situations like this.
You can activate it with a simple one line;
[ASIHTTPRequest setDefaultCache:[ASIDownloadCache sharedCache]];
And you have full control over the cache policy etc - just read the documentation.
The other simple approach of course is - on the assumption that your web service is returning JSON or XML - simply to store the response in a local file against a hash of the request parameters, then when you request the data again, you can first look to see if the file exists and if it does, return that data rather than going back to the website. You can roll your own cache policies etc too.
Since I discovered ASIHTTPRequest had a cache though, I've not needed to roll my own again.
I find that using coreData or sqllite3 is just overkill for 99% my requirements and a simple cache works very well.
If the data is relational, a Sqlite3 database would be the best storage option you have.
Also, this helps by allowing you to retrieve from the server and to update only the records that have changed, thus saving time and bandwidth.
This is the best option from a scalability point of view as well, as you stated that "current information isn't much", thus giving the impression that this is only a current situation, that may be subjected to further change, probably towards more records being added in time.
Sqite3 also gives you more control and better performance than using, for instance, Core Data. Here's an article explaining some of the details. Moreover, if you work through an Objective-C wrapper, such as FMDB, you get all the advantages without managing the complexity yourself.

When and How Much Data to Load into Model Objects?

I come to iPhone programming from a web development paradigm and am having a bit of a problem understanding how to design my iPhone application.
The crux of my question is: How much data do you load into your model and when do you load it with data from the database?
In the web apps I've created, the objects on the server-side are filled by the database based off form values supplied by each request. Take the example of a simple list. You click a list value, the id for the list is sent to the server (query string), the server loads an object for just that list item, server-side code uses the object, and then destroys it before the page is returned to the user.
With iPhone apps (or I guess any app where objects persist), you could load all the list item objects into a singleton dictionary from the database before the user ever interacts with them. Then you never have to go back to the database when the user clicks on a link. You just load the object from the dictionary.
Alternatively, you could design it like a web app and just go back to the database each time and fill the object with the data requested.
Can you give me any guidance on when to use one way over the other? When do I load the data? I'm tempted to just load a bunch of data when the application starts up so that I never have to go back to the database. But this feels dirty.
For static data that isn't too large, loading it all at startup works.
In one of our products, we do this for simplicity on one of the tables (we don't expect more than a few thousand rows) and load the other table lazily (high-res images). This is a reasonable option if you don't have background threads also accessing the database.
Core Data does batched lazy loading (i.e. it will load a batch of result rows at once).
Sidenote: Writes using Core Data and an SQLite store seem exceptionally slow, to the extent that we moved processing to a background thread to avoid blocking the UI (and this is for not very much data at all) and gained some annoying concurrency issues as a result. Sigh.

Core Data with Web Services recommended pattern?

I am writing an app for iOS that uses data provided by a web service. I am using core data for local storage and persistence of the data, so that some core set of the data is available to the user if the web is not reachable.
In building this app, I've been reading lots of posts about core data. While there seems to be lots out there on the mechanics of doing this, I've seen less on the general principles/patterns for this.
I am wondering if there are some good references out there for a recommended interaction model.
For example, the user will be able to create new objects on the app. Lets say the user creates a new employee object, the user will typically create it, update it and then save it. I've seen recommendations that updates each of these steps to the server --> when the user creates it, when the user makes changes to the fields. And if the user cancels at the end, a delete is sent to the server. Another different recommendation for the same operation is to keep everything locally, and only send the complete update to the server when the user saves.
This example aside, I am curious if there are some general recommendations/patterns on how to handle CRUD operations and ensure they are sync'd between the webserver and coredata.
Thanks much.
I think the best approach in the case you mention is to store data only locally until the point the user commits the adding of the new record. Sending every field edit to the server is somewhat excessive.
A general idiom of iPhone apps is that there isn't such a thing as "Save". The user generally will expect things to be committed at some sensible point, but it isn't presented to the user as saving per se.
So, for example, imagine you have a UI that lets the user edit some sort of record that will be saved to local core data and also be sent to the server. At the point the user exits the UI for creating a new record, they will perhaps hit a button called "Done" (N.B. not usually called "Save"). At the point they hit "Done", you'll want to kick off a core data write and also start a push to the remote server. The server pus h won't necessarily hog the UI or make them wait till it completes -- it's nicer to allow them to continue using the app -- but it is happening. If the update push to server failed, you might want to signal it to the user or do something appropriate.
A good question to ask yourself when planning the granularity of writes to core data and/or a remote server is: what would happen if the app crashed out, or the phone ran out of power, at any particular spots in the app? How much loss of data could possibly occur? Good apps lower the risk of data loss and can re-launch in a very similar state to what they were previously in after being exited for whatever reason.
Be prepared to tear your hair out quite a bit. I've been working on this, and the problem is that the Core Data samples are quite simple. The minute you move to a complex model and you try to use the NSFetchedResultsController and its delegate, you bump into all sorts of problems with using multiple contexts.
I use one to populate data from your webservice in a background "block", and a second for the tableview to use - you'll most likely end up using a tableview for a master list and a detail view.
Brush up on using blocks in Cocoa if you want to keep your app responsive whilst receiving or sending data to/from a server.
You might want to read about 'transactions' - which is basically the grouping of multiple actions/changes as a single atomic action/change. This helps avoid partial saves that might result in inconsistent data on server.
Ultimately, this is a very big topic - especially if server data is shared across multiple clients. At the simplest, you would want to decide on basic policies. Does last save win? Is there some notion of remotely held locks on objects in server data store? How is conflict resolved, when two clients are, say, editing the same property of the same object?
With respect to how things are done on the iPhone, I would agree with occulus that "Done" provides a natural point for persisting changes to server (in a separate thread).

How to Sync iPhone Core Data with web server, and then push to other devices? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have been working on a method to sync core data stored in an iPhone application between multiple devices, such as an iPad or a Mac. There are not many (if any at all) sync frameworks for use with Core Data on iOS. However, I have been thinking about the following concept:
A change is made to the local core data store, and the change is saved. (a) If the device is online, it tries to send the changeset to the server, including the device ID of the device which sent the changeset. (b) If the changeset does not reach the server, or if the device is not online, the app will add the change set to a queue to send when it does come online.
The server, sitting in the cloud, merges the specific change sets it receives with its master database.
After a change set (or a queue of change sets) is merged on the cloud server, the server pushes all of those change sets to the other devices registered with the server using some sort of polling system. (I thought to use Apple's Push services, but apparently according to the comments this is not a workable system.)
Is there anything fancy that I need to be thinking about? I have looked at REST frameworks such as ObjectiveResource, Core Resource, and RestfulCoreData. Of course, these are all working with Ruby on Rails, which I am not tied to, but it's a place to start. The main requirements I have for my solution are:
Any changes should be sent in the background without pausing the main thread.
It should use as little bandwidth as possible.
I have thought about a number of the challenges:
Making sure that the object IDs for the different data stores on different devices are attached on the server. That is to say, I will have a table of object IDs and device IDs, which are tied via a reference to the object stored in the database. I will have a record (DatabaseId [unique to this table], ObjectId [unique to the item in the whole database], Datafield1, Datafield2), the ObjectId field will reference another table, AllObjects: (ObjectId, DeviceId, DeviceObjectId). Then, when the device pushes up a change set, it will pass along the device Id and the objectId from the core data object in the local data store. Then my cloud server will check against the objectId and device Id in the AllObjects table, and find the record to change in the initial table.
All changes should be timestamped, so that they can be merged.
The device will have to poll the server, without using up too much battery.
The local devices will also need to update anything held in memory if/when changes are received from the server.
Is there anything else I am missing here? What kinds of frameworks should I look at to make this possible?
I've done something similar to what you're trying to do. Let me tell you what I've learned and how I did it.
I assume you have a one-to-one relationship between your Core Data object and the model (or db schema) on the server. You simply want to keep the server contents in sync with the clients, but clients can also modify and add data. If I got that right, then keep reading.
I added four fields to assist with synchronization:
sync_status - Add this field to your core data model only. It's used by the app to determine if you have a pending change on the item. I use the following codes: 0 means no changes, 1 means it's queued to be synchronized to the server, and 2 means it's a temporary object and can be purged.
is_deleted - Add this to the server and core data model. Delete event shouldn't actually delete a row from the database or from your client model because it leaves you with nothing to synchronize back. By having this simple boolean flag, you can set is_deleted to 1, synchronize it, and everyone will be happy. You must also modify the code on the server and client to query non deleted items with "is_deleted=0".
last_modified - Add this to the server and core data model. This field should automatically be updated with the current date and time by the server whenever anything changes on that record. It should never be modified by the client.
guid - Add a globally unique id (see http://en.wikipedia.org/wiki/Globally_unique_identifier) field to the server and core data model. This field becomes the primary key and becomes important when creating new records on the client. Normally your primary key is an incrementing integer on the server, but we have to keep in mind that content could be created offline and synchronized later. The GUID allows us to create a key while being offline.
On the client, add code to set sync_status to 1 on your model object whenever something changes and needs to be synchronized to the server. New model objects must generate a GUID.
Synchronization is a single request. The request contains:
The MAX last_modified time stamp of your model objects. This tells the server you only want changes after this time stamp.
A JSON array containing all items with sync_status=1.
The server gets the request and does this:
It takes the contents from the JSON array and modifies or adds the records it contains. The last_modified field is automatically updated.
The server returns a JSON array containing all objects with a last_modified time stamp greater than the time stamp sent in the request. This will include the objects it just received, which serves as an acknowledgment that the record was successfully synchronized to the server.
The app receives the response and does this:
It takes the contents from the JSON array and modifies or adds the records it contains. Each record get set a sync_status of 0.
I used the word record and model interchangeably, but I think you get the idea.
I suggest carefully reading and implementing the sync strategy discussed by Dan Grover at iPhone 2009 conference, available here as a pdf document.
This is a viable solution and is not that difficult to implement (Dan implemented this in several of its applications), overlapping the solution described by Chris. For an in-depth, theoretical discussion of syncing, see the paper from Russ Cox (MIT) and William Josephson (Princeton):
File Synchronization with Vector Time Pairs
which applies equally well to core data with some obvious modifications. This provides an overall much more robust and reliable sync strategy, but requires more effort to be implemented correctly.
EDIT:
It seems that the Grover's pdf file is no longer available (broken link, March 2015). UPDATE: the link is available through the Way Back Machine here
The Objective-C framework called ZSync and developed by Marcus Zarra has been deprecated, given that iCloud finally seems to support correct core data synchronization.
If you are still looking for a way to go, look into the Couchbase mobile. This basically does all you want. (http://www.couchbase.com/nosql-databases/couchbase-mobile)
Similar like #Cris I've implemented class for synchronization between client and server and solved all known problems so far (send/receive data to/from server, merge conflicts based on timestamps, removed duplicate entries in unreliable network conditions, synchronize nested data and files etc .. )
You just tell the class which entity and which columns should it sync and where is your server.
M3Synchronization * syncEntity = [[M3Synchronization alloc] initForClass: #"Car"
andContext: context
andServerUrl: kWebsiteUrl
andServerReceiverScriptName: kServerReceiverScript
andServerFetcherScriptName: kServerFetcherScript
ansSyncedTableFields:#[#"licenceNumber", #"manufacturer", #"model"]
andUniqueTableFields:#[#"licenceNumber"]];
syncEntity.delegate = self; // delegate should implement onComplete and onError methods
syncEntity.additionalPostParamsDictionary = ... // add some POST params to authenticate current user
[syncEntity sync];
You can find source, working example and more instructions here: github.com/knagode/M3Synchronization.
Notice user to update data via push notification.
Use a background thread in the app to check the local data and the data on the cloud server,while change happens on server,change the local data,vice versa.
So I think the most difficult part is to estimate data in which side is invalidate.
Hope this can help u
I have just posted the first version of my new Core Data Cloud Syncing API, known as SynCloud.
SynCloud has a lot of differences with iCloud because it allows for Multi-user sync interface. It is also different from other syncing api's because it allows for multi-table, relational data.
Please find out more at http://www.syncloudapi.com
Build with iOS 6 SDK, it is very up to date as of 9/27/2012.
I think a good solution to the GUID issue is "distributed ID system". I'm not sure what the correct term is, but I think that's what MS SQL server docs used to call it (SQL uses/used this method for distributed/sync'ed databases). It's pretty simple:
The server assigns all IDs. Each time a sync is done, the first thing that is checked are "How many IDs do I have left on this client?" If the client is running low, it asks the server for a new block of IDs. The client then uses IDs in that range for new records. This works great for most needs, if you can assign a block large enough that it should "never" run out before the next sync, but not so large that the server runs out over time. If the client ever does run out, the handling can be pretty simple, just tell the user "sorry you cannot add more items until you sync"... if they are adding that many items, shouldn't they sync to avoid stale data issues anyway?
I think this is superior to using random GUIDs because random GUIDs are not 100% safe, and usually need to be much longer than a standard ID (128-bits vs 32-bits). You usually have indexes by ID and often keep ID numbers in memory, so it is important to keep them small.
Didn't really want to post as answer, but I don't know that anyone would see as a comment, and I think it's important to this topic and not included in other answers.
First you should rethink how many data, tables and relations you will have. In my solution I’ve implemented syncing through Dropbox files. I observe changes in main MOC and save these data to files (each row is saved as gzipped json). If there is an internet connection working, I check if there are any changes on Dropbox (Dropbox gives me delta changes), download them and merge (latest wins), and finally put changed files. Before sync I put lock file on Dropbox to prevent other clients syncing incomplete data. When downloading changes it’s safe that only partial data is downloaded (eg lost internet connection). When downloading is finished (fully or partial) it starts to load files into Core Data. When there are unresolved relations (not all files are downloaded) it stops loading files and tries to finish downloading later. Relations are stored only as GUID, so I can easly check which files to load to have full data integrity.
Syncing is starting after changes to core data are made. If there are no changes, than it checks for changes on Dropbox every few minutes and on app startup. Additionaly when changes are sent to server I send a broadcast to other devices to inform them about changes, so they can sync faster.
Each synced entity has GUID property (guid is used also as a filename for exchange files). I have also Sync database where I store Dropbox revision of each file (I can compare it when Dropbox delta resets it’s state). Files also contain entity name, state (deleted/not deleted), guid (same as filename), database revision (to detect data migrations or to avoid syncing with never app versions) and of course the data (if row is not deleted).
This solution is working for thousands of files and about 30 entities. Instead of Dropbox I could use key/value store as REST web service which I want to do later, but have no time for this :) For now, in my opinion, my solution is more reliable than iCloud and, which is very important, I have full control on how it’s working (mainly because it’s my own code).
Another solution is to save MOC changes as transactions - there will be much less files exchanged with server, but it’s harder to do initial load in proper order into empty core data. iCloud is working this way, and also other syncing solutions have similar approach, eg TICoreDataSync.
--
UPDATE
After a while, I migrated to Ensembles - I recommend this solution over reinventing the wheel.

Core Data questions around typical usage

I have some basic questions about core data (which I am new to) and I would like some points of view on current standards and implementations.
Basically I have an app on the iPhone (supporting iOS 3.0 and above) which gets a lot of data from web calls over HTTP, Im looking at moving the results into local storage for fast retrieval for the next time the user might load the same data again (the data doesnt change, which is why I can rely on the cached version be accurate).
I just wanted to know a few things first:
Do people these days treat the managed objects that extend NSManagedObject as domain objects, or do you create seperate classes strictly for storage and create helper methods to create them into domain objects? I sometimes find keep all persistence logic out of the domain to be a good thing.
What about clean up? How does one typically delete all the data when the app closes, or perhaps, expire data in the local storage? I certainly dont want to hold the data on the users phone at all times.
Is there any type of atomicity with Core Data? My implementation will first check for data locally before hitting the web services, I would like to make sure that there is never half a dataset being committed to the local storage and get funny results.
I would like to run a fair few background threads to fetch data in the background, are there any things I would need to consider when persisting objects on a background thread?
In relation to the above question, what is the best way to create a "background fetching" loop? In the app delegate? Per view, depending on the view? etc...?
I hope these are not too basic :)
Thanks for any help you can give.
Do people these days treat the managed
objects that extend NSManagedObject as
domain objects, or do you create
seperate classes strictly for storage
and create helper methods to create
them into domain objects? I sometimes
find keep all persistence logic out of
the domain to be a good thing.
If you create totally independent domain objects, you have the cost of keeping them in sync with your Core Data model, and keeping the translation between core data and these objects working - plus you have duplicate objects in memory, depending on how many objects you have this might be a concern.
However the benefit side of using separate domain objects is that you are no longer wedded to a managed object context. One case where something like that can hurt you is if you maintain references to managed objects and then some background operation causes the main managed object context to remove objects - now if you access any property in the deleted managed object, you trigger a fault exception (even if you have explicitly had the object loaded with no faulted data).
One thing I have tried with moderate success is occasional very lightweight separate data objects for specific uses - what I did was to define a protocol to represent the data object accessors, with the same names as the core data accessors. Then I had both the core data object and a custom standalone data object implement this protocol, and had a mechanism to automatically copy properties from one to the other. So I didn't do every object as custom, and could treat objects either coming from the local store or standalone interchangeably.
I don't have a clear preference on this one yet but lean to using the managed objects directly, because of the lack of duplication. You can mitigate bad side effects by listening for changes or using the core data controller class.
One thing that helps to keep domain objects and data objects sort of the same yet not, is using mogenerator to generate data objects. It generates very nice object representations of the objects in your core data store, plus front-end objects you are meant to edit - adding custom accessors, or complex methods to. On changing the data store mogenerator regenerates the data object but leaves your custom code alone.
http://rentzsch.github.com/mogenerator/
What about clean up? How does one
typically delete all the data when the
app closes, or perhaps, expire data in
the local storage? I certainly dont
want to hold the data on the users
phone at all times.
The data is generally small enough that I just leave it there, with an expiration timestamp for use so that you know when the data is too old to use directly. There is a ton of value to keeping data around since users close and reopen applications so frequently, and with data already there you can present results instantly while still fetching content updates.
Is there any type of atomicity with
Core Data? My implementation will
first check for data locally before
hitting the web services, I would like
to make sure that there is never half
a dataset being committed to the local
storage and get funny results.
The atomicity comes in that you perform operations in a context and then tell the context to save. So true atomicity means avoiding other components issuing a save before you are ready, which generally means doing something in its own context and merging back into a master context.
I would like to run a fair few
background threads to fetch data in
the background, are there any things I
would need to consider when persisting
objects on a background thread?
Every background thread needs its own context, you should listen for the save notification and merge into the master context at that time.
You should strive mightily to avoid duplicate requests that might be saving to the same objects nearly at the same time, this can sometimes cause core data errors on merge. Related to that - set a merge policy on your main context as the default policy is to throw an exception.
That also means that in doing modeling, go for as many separate objects as you possibly can rather than one large object that aggregates data from a lot of different sources.
For more information on saving and merging into other contexts, see this question:
CoreData and mergeChangesFromContextDidSaveNotification
In relation to the above question,
what is the best way to create a
"background fetching" loop? In the app
delegate? Per view, depending on the
view? etc...?
I like to do this from a separate singleton class (after all, the AppDelegate itself is a singleton...), that I can ask for the main managed object context in addition to a context specific to a thread.
This is also useful in that when starting a new Core Data project, you don't have to use the Core Data template and can just re-use this core data manager.