How to Sync iPhone Core Data with web server, and then push to other devices? [closed] - iphone

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have been working on a method to sync core data stored in an iPhone application between multiple devices, such as an iPad or a Mac. There are not many (if any at all) sync frameworks for use with Core Data on iOS. However, I have been thinking about the following concept:
A change is made to the local core data store, and the change is saved. (a) If the device is online, it tries to send the changeset to the server, including the device ID of the device which sent the changeset. (b) If the changeset does not reach the server, or if the device is not online, the app will add the change set to a queue to send when it does come online.
The server, sitting in the cloud, merges the specific change sets it receives with its master database.
After a change set (or a queue of change sets) is merged on the cloud server, the server pushes all of those change sets to the other devices registered with the server using some sort of polling system. (I thought to use Apple's Push services, but apparently according to the comments this is not a workable system.)
Is there anything fancy that I need to be thinking about? I have looked at REST frameworks such as ObjectiveResource, Core Resource, and RestfulCoreData. Of course, these are all working with Ruby on Rails, which I am not tied to, but it's a place to start. The main requirements I have for my solution are:
Any changes should be sent in the background without pausing the main thread.
It should use as little bandwidth as possible.
I have thought about a number of the challenges:
Making sure that the object IDs for the different data stores on different devices are attached on the server. That is to say, I will have a table of object IDs and device IDs, which are tied via a reference to the object stored in the database. I will have a record (DatabaseId [unique to this table], ObjectId [unique to the item in the whole database], Datafield1, Datafield2), the ObjectId field will reference another table, AllObjects: (ObjectId, DeviceId, DeviceObjectId). Then, when the device pushes up a change set, it will pass along the device Id and the objectId from the core data object in the local data store. Then my cloud server will check against the objectId and device Id in the AllObjects table, and find the record to change in the initial table.
All changes should be timestamped, so that they can be merged.
The device will have to poll the server, without using up too much battery.
The local devices will also need to update anything held in memory if/when changes are received from the server.
Is there anything else I am missing here? What kinds of frameworks should I look at to make this possible?

I've done something similar to what you're trying to do. Let me tell you what I've learned and how I did it.
I assume you have a one-to-one relationship between your Core Data object and the model (or db schema) on the server. You simply want to keep the server contents in sync with the clients, but clients can also modify and add data. If I got that right, then keep reading.
I added four fields to assist with synchronization:
sync_status - Add this field to your core data model only. It's used by the app to determine if you have a pending change on the item. I use the following codes: 0 means no changes, 1 means it's queued to be synchronized to the server, and 2 means it's a temporary object and can be purged.
is_deleted - Add this to the server and core data model. Delete event shouldn't actually delete a row from the database or from your client model because it leaves you with nothing to synchronize back. By having this simple boolean flag, you can set is_deleted to 1, synchronize it, and everyone will be happy. You must also modify the code on the server and client to query non deleted items with "is_deleted=0".
last_modified - Add this to the server and core data model. This field should automatically be updated with the current date and time by the server whenever anything changes on that record. It should never be modified by the client.
guid - Add a globally unique id (see http://en.wikipedia.org/wiki/Globally_unique_identifier) field to the server and core data model. This field becomes the primary key and becomes important when creating new records on the client. Normally your primary key is an incrementing integer on the server, but we have to keep in mind that content could be created offline and synchronized later. The GUID allows us to create a key while being offline.
On the client, add code to set sync_status to 1 on your model object whenever something changes and needs to be synchronized to the server. New model objects must generate a GUID.
Synchronization is a single request. The request contains:
The MAX last_modified time stamp of your model objects. This tells the server you only want changes after this time stamp.
A JSON array containing all items with sync_status=1.
The server gets the request and does this:
It takes the contents from the JSON array and modifies or adds the records it contains. The last_modified field is automatically updated.
The server returns a JSON array containing all objects with a last_modified time stamp greater than the time stamp sent in the request. This will include the objects it just received, which serves as an acknowledgment that the record was successfully synchronized to the server.
The app receives the response and does this:
It takes the contents from the JSON array and modifies or adds the records it contains. Each record get set a sync_status of 0.
I used the word record and model interchangeably, but I think you get the idea.

I suggest carefully reading and implementing the sync strategy discussed by Dan Grover at iPhone 2009 conference, available here as a pdf document.
This is a viable solution and is not that difficult to implement (Dan implemented this in several of its applications), overlapping the solution described by Chris. For an in-depth, theoretical discussion of syncing, see the paper from Russ Cox (MIT) and William Josephson (Princeton):
File Synchronization with Vector Time Pairs
which applies equally well to core data with some obvious modifications. This provides an overall much more robust and reliable sync strategy, but requires more effort to be implemented correctly.
EDIT:
It seems that the Grover's pdf file is no longer available (broken link, March 2015). UPDATE: the link is available through the Way Back Machine here
The Objective-C framework called ZSync and developed by Marcus Zarra has been deprecated, given that iCloud finally seems to support correct core data synchronization.

If you are still looking for a way to go, look into the Couchbase mobile. This basically does all you want. (http://www.couchbase.com/nosql-databases/couchbase-mobile)

Similar like #Cris I've implemented class for synchronization between client and server and solved all known problems so far (send/receive data to/from server, merge conflicts based on timestamps, removed duplicate entries in unreliable network conditions, synchronize nested data and files etc .. )
You just tell the class which entity and which columns should it sync and where is your server.
M3Synchronization * syncEntity = [[M3Synchronization alloc] initForClass: #"Car"
andContext: context
andServerUrl: kWebsiteUrl
andServerReceiverScriptName: kServerReceiverScript
andServerFetcherScriptName: kServerFetcherScript
ansSyncedTableFields:#[#"licenceNumber", #"manufacturer", #"model"]
andUniqueTableFields:#[#"licenceNumber"]];
syncEntity.delegate = self; // delegate should implement onComplete and onError methods
syncEntity.additionalPostParamsDictionary = ... // add some POST params to authenticate current user
[syncEntity sync];
You can find source, working example and more instructions here: github.com/knagode/M3Synchronization.

Notice user to update data via push notification.
Use a background thread in the app to check the local data and the data on the cloud server,while change happens on server,change the local data,vice versa.
So I think the most difficult part is to estimate data in which side is invalidate.
Hope this can help u

I have just posted the first version of my new Core Data Cloud Syncing API, known as SynCloud.
SynCloud has a lot of differences with iCloud because it allows for Multi-user sync interface. It is also different from other syncing api's because it allows for multi-table, relational data.
Please find out more at http://www.syncloudapi.com
Build with iOS 6 SDK, it is very up to date as of 9/27/2012.

I think a good solution to the GUID issue is "distributed ID system". I'm not sure what the correct term is, but I think that's what MS SQL server docs used to call it (SQL uses/used this method for distributed/sync'ed databases). It's pretty simple:
The server assigns all IDs. Each time a sync is done, the first thing that is checked are "How many IDs do I have left on this client?" If the client is running low, it asks the server for a new block of IDs. The client then uses IDs in that range for new records. This works great for most needs, if you can assign a block large enough that it should "never" run out before the next sync, but not so large that the server runs out over time. If the client ever does run out, the handling can be pretty simple, just tell the user "sorry you cannot add more items until you sync"... if they are adding that many items, shouldn't they sync to avoid stale data issues anyway?
I think this is superior to using random GUIDs because random GUIDs are not 100% safe, and usually need to be much longer than a standard ID (128-bits vs 32-bits). You usually have indexes by ID and often keep ID numbers in memory, so it is important to keep them small.
Didn't really want to post as answer, but I don't know that anyone would see as a comment, and I think it's important to this topic and not included in other answers.

First you should rethink how many data, tables and relations you will have. In my solution I’ve implemented syncing through Dropbox files. I observe changes in main MOC and save these data to files (each row is saved as gzipped json). If there is an internet connection working, I check if there are any changes on Dropbox (Dropbox gives me delta changes), download them and merge (latest wins), and finally put changed files. Before sync I put lock file on Dropbox to prevent other clients syncing incomplete data. When downloading changes it’s safe that only partial data is downloaded (eg lost internet connection). When downloading is finished (fully or partial) it starts to load files into Core Data. When there are unresolved relations (not all files are downloaded) it stops loading files and tries to finish downloading later. Relations are stored only as GUID, so I can easly check which files to load to have full data integrity.
Syncing is starting after changes to core data are made. If there are no changes, than it checks for changes on Dropbox every few minutes and on app startup. Additionaly when changes are sent to server I send a broadcast to other devices to inform them about changes, so they can sync faster.
Each synced entity has GUID property (guid is used also as a filename for exchange files). I have also Sync database where I store Dropbox revision of each file (I can compare it when Dropbox delta resets it’s state). Files also contain entity name, state (deleted/not deleted), guid (same as filename), database revision (to detect data migrations or to avoid syncing with never app versions) and of course the data (if row is not deleted).
This solution is working for thousands of files and about 30 entities. Instead of Dropbox I could use key/value store as REST web service which I want to do later, but have no time for this :) For now, in my opinion, my solution is more reliable than iCloud and, which is very important, I have full control on how it’s working (mainly because it’s my own code).
Another solution is to save MOC changes as transactions - there will be much less files exchanged with server, but it’s harder to do initial load in proper order into empty core data. iCloud is working this way, and also other syncing solutions have similar approach, eg TICoreDataSync.
--
UPDATE
After a while, I migrated to Ensembles - I recommend this solution over reinventing the wheel.

Related

Is there any value in using core data for iPhone apps?

Can people give me examples of why they would use coreData in an application?
I ask this because most apps are just clients to a central server where an API of some sort gives you the information you need.
In my case I'm writing a timesheet application for a web app which has an API and I'm debating if there is any value in replicating the data structure on my server in core data(Sqlite)
e.g
Project has many timesheets
employee has many timesheets
It seems to me that I can just connect to the API on every call for lists of projects or existing timesheets for example.
I realize for some kind of offline mode you could store locally in core data but this creates way more problems because you now have a big problem with syncing that data back to the web server when you get connection again.. e.g. the project selected for a timesheet no longer exists.
Can any experienced developer shed some light on there experiences on when core data is best practice approach?
EDIT
I realise of course there is value in storing local persistance but the key value of user defaults seems to cover most applications I can think of.
You shouldn't think of CoreData simply as an SQLite database. It's not JUST an SQLite database. Sure, SQLite is an option, but there are other options as well, such as in-memory and, as of iOS5, a whole slew of custom data stores. The biggest benefit with CoreData is persistence, obviously. But even if you are using an in-memory data store, you get the benefits of a very well structured object graph, and all of the heavy lifting with regards to pulling information out of or putting information into the data store is handled by CoreData for you, without you necessarily needing to concern yourself with what is backing that data store. Sure, today you don't care too much about persistence, so you could use an in-memory data store. What happens if tomorrow, or in a month, or a year, you decide to add a feature that would really benefit from persistence? With CoreData, you simply change or add a persistent data store, and all of your methods to get information out or in remain unchanged. The overhead for that sort of addition is minimal in comparison to if you were trying to access SQLite or some other data store directly. IMHO, that's the biggest benefit: abstraction. And, in essence, abstraction is one of the most powerful things behind OOP. Granted, building the Data Model just for in-memory storage could be overkill for your app, depending on how involved the app is. But, just as a side note, you may want to consider what is faster: Requesting information from your web service every time you want to perform some action, or requesting the information once, storing it in memory, and acting on that stored value for the remainder of the session. An in-memory data store wouldn't persistent beyond that particular session.
Additionally, with CoreData you get a lot of other great features like saving, fetching, and undo-redo.
There are basically two kinds of apps. Those that provide you with local functionality (games, professional applications, navigation systems...) and those that grant access to a remote service.
Your app seems to be in the second category. If you access remote services, your users will want to access new or real-time data (you don't want to read 2 week old Facebook posts) but in some cases, local caching makes sense (e.g. reading your mails when you're on the train with unstable network).
I assume that the value of accessing cached entries when not connected to a network is pretty low for your customers (internal or external) compared to the importance of accessing real-time-data. So local storage might be not necessary at all.
If you don't have hundreds of entries in your timetable, "normal" serialization (NSCoding-protocol) might be enough. If you only access some "dashboard-data", you will be able to get along with simple request/response-caching (NSURLCache can do a lot of things...).
Core Data does make more sense if you have complex data structures which should be synchronized with a server. This adds a lot of synchronization logic to your project as well as complexity from Core Data integration (concurrency, thread-safety, in-app-conflicts...).
If you want to create a "client"-app with a server driven user experience, local storage is not necessary at all so my suggestion is: Keep it as simple as possible unless there is a real need for offline storage.
It's ideal for if you want to store data locally on the phone.
Seriously though, if you can't see a need for it for your timesheet app, then don't worry about it and don't use it.
Solving the sync problems that you would have with an "offline" mode would be detailed in your design of your app. For example - don't allow projects to be deleted. Why would you? Wouldn't you want to go back in time and look at previous data for particular projects? Instead just have a marker on the project to show it as inactive and a date/time that it was made inactive. If the data that is being synced from the device is for that project and is before the date/time that it was marked as inactive, then it's fine to sync. Otherwise display a message and the user will have to sort it.
It depends purely on your application's design whether you need to store some data locally or not, if it is a real problem or a thin GUI client around your web service. Apart from "offline" mode the other reason to cache server data on client side might be to take traffic load from your server. Just think what does it mean for your server to send every time the whole timesheet data to the client, or just the changes. Yes, it means more implementation on both side, but in some cases it has serious advantages.
EDIT: example added
You have 1000 records per user in your timesheet application and one record is cca 1 kbyte. In this case every time a user starts your application, it has to fetch ~1Mbyte data from your server. If you cache the data locally, the server can tell you that let's say two records were updated since your last update, so you'll have to download only 2 kbyte. Now you should scale up this for several tens of thousands of user and you will immediately notice the difference of the server bandwidth and CPU usage.

Core Data Sync - Tracking Deleted Objects

I'm setting up a basic sync service for an iPad application I'm developing. The goal is to have data consistent throughout several instances of the iPad app, as well as having a read-only version of the data on the web, hence rolling a custom solution.
The current flow is this:
Each entity has a 'created', 'modified' and 'UUID' field which are automatically updated by Core Data
On sync, each entity with a created or modified date after the last sync date is serialised into JSON and sent to the server
The server persists any changes to a MySQL database using the client-generated UUIDs as PKs (if there's a conflict, it just uses the most recently modified entity as the 'true' version, nothing fancy there) and sends back any updated entities to the client
The client then merges these changes back into its Core Data DB
This all seems to be working fine. My problem is how to track deleted objects using this method? I'm guessing I can add a 'deleted' flag to each entity and set this whenever a client deletes something, I can then push that change to the server with the rest of the sync data. Once the sync is complete then the client can actually delete these entities. My questions are:
Can I override Core Data's delete methods to automatically set this flag?
Will this require keeping all deleted entities indefinitely on the server? We'll have no way of knowing when every client has synced and actually deleted each entity (I'm not currently tracking client instances)
Is there a better way of doing this?
How about you keep a delta history table with UUID and created/updated/deleted field, maybe with a revision number for each update? So you keep a small check list of changes since your last successful sync.
That way, if you delete an object you could add an entry in the delta history table with the deleted UUID and mark it deleted. Same with created and updated objects, you only need to check the delta table to see what items you the server needs to delete, update, create, etc. You could even store every revision on the server to support rolling back to a previous version in the future if you feel like it.
I think a revision number is better than relying on client's clock that could potentially be changed manually.
You could use NSManagedObjectContext's insertedObjects, updatedObjects, deletedObjects methods to create the delta objects before every save procedure :)
My 2 cents
Whether or not you have to keep deleted objects on the server or not totally depends on your needs. You will need a deleted flag locally to mark as deleted for the sync, maybe also on the server depending on your desire to roll back.
I have taken care of this problem a few ways before. Here is one possibility:
When a client deletes something, just mark it to be deleted locally and delete from the server during the sync (at which point you can purge from core data). When other clients request to access that data, send back an HTTP 404 because you dont have the object any more. At that point the client can delete the entity locally. Now if a client requests a list of things and this object has been deleted, it will just be missing from the list of things he gets back so you can detect that and delete it. I do that in a client by creating an array of object IDs when I get a response from the server and deleting any local objects that don't have those IDs.
We have a deleted field on the server, but just to have the ability to roll back in case something is deleted by accident.
Of course you could return deleted objects to the client so they know to delete but if you don't want to keep a copy on the server, you would have to make some assumption that the clients would all update within a time frame. Then you could garbage collect after that time frame has expired.
I don't really like that solution though. If your data is too heavy to ask for all the objects for a complete sync, you could use your current merge strategy for creating and updating, and then run a separate call to check for deleted items. That call could simply ask for all IDs that the client should have on the device. It could delete the ones that don't exist. OR it could send all IDs on the client and get back a list of IDs to delete.
I think you have to provide more details about the nature of the data if you want a more opinionated suggestion.
Regarding your second question: You can design this so that the server doesn't have to keep deleted records around, if you want to. Let each app know if a given piece of data (based on its UUID) is stored on the server (e.g. add an existsOnServer property or similar). This starts out false when a new item is created in the app, but is set to true once it has been synced to the server for the first time. That way, if the app tries to sync later, but the UUID is not found, you can differentiate the two cases: If existsOnServer is false, then then this item is newly created and should be synced to the server, but if it is true then it can be taken to mean that it was already on the sever before, but has now been deleted, so you can delete it in the app too.
I'd probably argue against this approach, since it seems more error prone to me (I imagine a database or connection error incorrectly being interpreted as a deletion) and keeping records around on your server would usually not be a big deal, but it is possible. The "delta-approach" suggested by dzeikei could be used at the same time, so an update to a record that does not exist on the server signifies that it was deleted, while an insert does not.
You may take a look at Cross-Platform Data Synchronization by Dan Grover if you haven't. It's a very well written paper regarding synchronization and iOS.
About your questions:
You can avoid deleting a file in Core Data and set a 'deleted flag': just update the file instead of deleting it. You could make your own 'deleting' method that actually would call and update the flag on the record.
Keep always a last_sync and a last_updated for each record on the server and on each client. This way you'll always know when someone did change something anywhere and if that change was synced or not against the 'truth database'.
Keeping track of deleted files is a hard thing to do, I guess the best way to do it is keeping track of the history of syncs for each table, but is a difficult task. The easiest way, using this 'truth-database' kind of configuration is to flag the files, so that way yes, you should keep the data on the server as well as on the client.
during synchronization of data between tow table some records or deleted when the table rows are same. and when the rows are different the correctly synchronized, i used this Code click here on image

Core Data syncronization procedure with Web service

I'm developing an application that needs to be syncronized with remote database. The database is connected to the a web-based application that user able to modify some records on the web page.(add/remove/modify) User also able to modify the same records in mobile application. So each side (server - client) must be keep the SAME latest records when an user press the sync button in mobile app. Communication between server and client is provided by Web Serives.(SOAP) and i am not able to change it because of it is strict requirements. (i know this is the worst way that can be used). And another requirement is the clients are not able to delete the server records.
I already be familiar with communicating web service (NSURLConnection), receiving data (NSData) and parsing it. But i could not figure out how the syncronization procedure should be. I have already read this answer which about how i can modify server and client sides with some extra attributes (last_updated_date and is_sync)
Then i could imagine to solve the issue like:
As a first step, client keeps try to modify the server records with sending unsyncronized ones. New recoords are directly added into DB but modified records shoud be compared depending on last_updated_date. At the end of this step, server has the latest data.
But the problem is how can manage to modify the records in mobile app. I thought it in a two way:
is the dummiest way that create a new MOC, download all records into this and change with existing one.
is the getting all modified records which are not in client side, import them into a new MOC and combine these two. But in this point i have some concerns like
There could be two items which are replicated (old version - updated version)
Deleted items could be still located in the main MOCs.
I have to connect multiple relationships among the MOCs. (the new record could have more than 4 relationships with old records)
So i guess you guys can help me to have another ideas which is the best ??
Syncing data is a non-trivial task.
There are several levels of synchronization. Based on your question I am guessing you just need to push changes back to a server. In that case I would suggest catching it during the -save: of the NSManagedObjectContext. Just before the -save: you can query the NSManagedObjectContext and ask it for what objects have been created, updated and deleted. From there you can build a query to post back to your web service.
Dealing with merges, however, is far more complicated and I suggest you deal with them on the server.
As for your relationship question; I suggest you open a second question for that so that there is no confusion.
Update
Once the server has finished the merge it pushes the new "truth" to the client. The client should take these updated records and merge them into its own changes. This merge is fairly simple:
Look for an existing record using a uniqueID.
If the record exists then update it.
If the record does not exist then create it.
Ignoring performance for the moment, this is fairly straight forward:
Set up a loop over the new data coming in.
Set up a NSPredicate to identify the record to be updated/created.
Run your fetch request.
If the record exists update it.
If it doesn't then create it.
Once you get this working with a full round trip then you can start looking at performance, etc. Step one is to get it to work :)

Core Data with Web Services recommended pattern?

I am writing an app for iOS that uses data provided by a web service. I am using core data for local storage and persistence of the data, so that some core set of the data is available to the user if the web is not reachable.
In building this app, I've been reading lots of posts about core data. While there seems to be lots out there on the mechanics of doing this, I've seen less on the general principles/patterns for this.
I am wondering if there are some good references out there for a recommended interaction model.
For example, the user will be able to create new objects on the app. Lets say the user creates a new employee object, the user will typically create it, update it and then save it. I've seen recommendations that updates each of these steps to the server --> when the user creates it, when the user makes changes to the fields. And if the user cancels at the end, a delete is sent to the server. Another different recommendation for the same operation is to keep everything locally, and only send the complete update to the server when the user saves.
This example aside, I am curious if there are some general recommendations/patterns on how to handle CRUD operations and ensure they are sync'd between the webserver and coredata.
Thanks much.
I think the best approach in the case you mention is to store data only locally until the point the user commits the adding of the new record. Sending every field edit to the server is somewhat excessive.
A general idiom of iPhone apps is that there isn't such a thing as "Save". The user generally will expect things to be committed at some sensible point, but it isn't presented to the user as saving per se.
So, for example, imagine you have a UI that lets the user edit some sort of record that will be saved to local core data and also be sent to the server. At the point the user exits the UI for creating a new record, they will perhaps hit a button called "Done" (N.B. not usually called "Save"). At the point they hit "Done", you'll want to kick off a core data write and also start a push to the remote server. The server pus h won't necessarily hog the UI or make them wait till it completes -- it's nicer to allow them to continue using the app -- but it is happening. If the update push to server failed, you might want to signal it to the user or do something appropriate.
A good question to ask yourself when planning the granularity of writes to core data and/or a remote server is: what would happen if the app crashed out, or the phone ran out of power, at any particular spots in the app? How much loss of data could possibly occur? Good apps lower the risk of data loss and can re-launch in a very similar state to what they were previously in after being exited for whatever reason.
Be prepared to tear your hair out quite a bit. I've been working on this, and the problem is that the Core Data samples are quite simple. The minute you move to a complex model and you try to use the NSFetchedResultsController and its delegate, you bump into all sorts of problems with using multiple contexts.
I use one to populate data from your webservice in a background "block", and a second for the tableview to use - you'll most likely end up using a tableview for a master list and a detail view.
Brush up on using blocks in Cocoa if you want to keep your app responsive whilst receiving or sending data to/from a server.
You might want to read about 'transactions' - which is basically the grouping of multiple actions/changes as a single atomic action/change. This helps avoid partial saves that might result in inconsistent data on server.
Ultimately, this is a very big topic - especially if server data is shared across multiple clients. At the simplest, you would want to decide on basic policies. Does last save win? Is there some notion of remotely held locks on objects in server data store? How is conflict resolved, when two clients are, say, editing the same property of the same object?
With respect to how things are done on the iPhone, I would agree with occulus that "Done" provides a natural point for persisting changes to server (in a separate thread).

Best strategy for synching data in iPhone app

I am working on a regular iPhone app which pulls data from a server (XML, JSON, etc...), and I'm wondering what is the best way to implement synching data. Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access and flexibility (adaptable when the structure of the database changes slightly, like a new column). I know it varies from app to app, but can you guys share some of your strategy/experience?
For me, I'm thinking of something like this:
1) Store Last Modified Date in iPhone
2) Upon launching, send a message like getNewData.php?lastModifiedDate=...
3) Server will process and send back only modified data from last time.
4) This data is formatted as so:
<+><data id="..."></data></+> // add this to SQLite/CoreData
<-><data id="..."></data></-> // remove this
<%><data id="..."><attribute>newValue</attribute></data></%> // new modified value
I don't want to make <+>, <->, <%>... for each attribute as well, because it would be too complicated, so probably when receive a <%> field, I would just remove the data with the specified id and then add it again (assuming id here is not some automatically auto-incremented field).
5) Once everything is downloaded and updated, I will update the Last Modified Date field.
The main problem with this strategy is: If the network goes down when I am updating something => the Last Modified Date is not yet updated => next time I relaunch the app, I will have to go through the same thing again. Not to mention potential inconsistent data. If I use a temporary table for update and make the whole thing atomic, it would work, but then again, if the update is too long (lots of data change), the user has to wait a long time until new data is available. Should I use Last-Modified-Date for each of the data field and update data gradually?
I would start by making the update routine atomic, since you'll have enough on your hands figuring out how to get the client-server communication working properly.
After that is a good time to consider tweaking it to be incremental, but only after you do some testing to figure out if it's really necessary. If you're tuning your update protocol to be as low bandwidth as possible, you might discover that even a "big" update is downloaded fast enough.
Another way to look at it is to ask yourself, how often is there going to be network trouble when an average user is doing a sync? You probably don't want to tune for unlikely scenarios.
If you are trying to optimize (minimize) the data transfer you may want to consider a different format than XML, since XML is fairly verbose. Or at least you may want to trade in XML readability for space by making each element name and attribute as small as possible, and eliminate all unnecessary whitespace.
Your basic scheme is good. The thing you need to do is to somehow make your updates idempotent so that you can restart a partially-completed transfer without risk. This is a better way to go than to try to implement some sort of true atomic commit (though you could do that too, using, eg, the SQLite database).
In our experience fairly large updates (10s of KB) can be downloaded quite rapidly, if the server is fast enough. No great need to break updates up into tiny bits. But certainly it won't hurt to try to minimize the amount of data transferred by keeping more granular info on "last update".
(And definitely you should use JSON rather than XML as your transmitted data representation.)
Wonder if you have considered using a Sync Framework to manage the synchronization. If that interests you can take a look at the open source project, OpenMobster's Sync service. You can do the following sync operations
two-way
one-way client
one-way device
bootup
Besides that, all modifications are automatically tracked and synced with the Cloud. You can have your app offline when network connection is down. It will track any changes and automatically in the background synchronize it with the cloud when the connection returns. It also provides synchronization like iCloud across multiple devices
Also, modifications in the Cloud are synched using Push notifications, so the data is always current even if it is stored locally.
In your case,
Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access
Speed: Only the changes are sent across the network in both directions
Robustness: It stores data in a transactional store like sqlite and any failed updates are communicated in the SyncML payload. Only the successful operations are processed while the failed operations are re-tried during the next sync
Here is a link to the open source project: http://openmobster.googlecode.com
Here is a link to iPhone App Sync: http://code.google.com/p/openmobster/wiki/iPhoneSyncApp