I am using iCloud with Core Data, based on the SQLite "Library-style" application design as specified by Apple. While the basic functionality works very well, I am concerned about the transaction logs and how they are managed.
While the database for my app is not large, it is very active and the core data stack is saved many times while the app is in use. I have noticed that a new transaction log is created for every core data save. The end result is that I have a TON of transaction logs and they take up much more space than the actual database.
1) Do the transaction logs ever get automatically pruned / coalesced, or will they continue to grow indefinitely, eventually numbering in the thousands and taking up many megabytes? It seems that the only way to manually purge the transaction logs and recreate a .baseline archive would be to disable and then re-enable iCloud (removing the ubiquity container and starting fresh). But this is obviously not a good solution.
2) My current architecture saves the core data stack often, even for minor changes. In general, this makes sense as my database is small and saving often ensures that the database file is always up-to-date. However, given the above issues with transaction logs, I am thinking that I should perhaps minimize saves to the database. Maybe doing so on a timed basis and/or on app transition states.
3) Even if I minimize the number of transaction logs by reducing how often I save the database, there seems to be an issue here, as the logs will continue to grow in number over time. Eventually the benefit of the "transaction log" design will become a burden in terms of the amount of iCloud storage used and the initial iCloud sync as a new device is added.
As Apple has provided very sparse information on iCloud and almost nothing in the form of "best practices", I would appreciate any insight from the community.
I filed a radar on this issue and received the following reply. They noted that it should be working correctly in iOS 5.1, though I have not yet verified this myself.
A clarification for any who might misunderstand the following. The transaction logs will be cleaned up by the core data internals. This is NOT something that should be performed by the application itself.
Engineering has provided the following feedback regarding this issue:
The transaction logs are intended to be deleted after all the active
peers have had a chance to read them, and they exceed a threshold of
space consumed. There was a previous issue that prevented the devices
from correctly doing so.
Related
I could not find detailed information in the documentation. I have several questions regarding the offline persistence of firestore.
I understood that firestore locally caches everything and syncs back once online. My questions:
If I attach an onCompleteListener to my setDocument method it only fires when the device is online and has network access. But with offline persistence enabled, how can I detect that data has successfully been written to the cache (Is it always successful?!) - I see data is immediatly there without any listener ever triggering.
What if I wrote data to the cache while the device is offline then comes back online and everything gets synched. What if now any sort of error happens (So the onSuccessListener would contain an error, but the persistence cache already has the data). How do I know that offline and online data are ALWAYS in sync once network connection is restored on all devices?
What about race conditions? Lets say two users update a document at the "same time" while the device is offline. What happens once it comes back online?
But the most pressing question is: right now I continue with my programflow when the onSuccessListener fires, but it never does as long as the device is offline (showing an indefinete progress bar forever). I still need to continue with my program (thats why we have offline persistence) - How do I do this?
How can I detect that data has successfully been written to the cache
This is the case when the statement that write the data has completed. If writing to the local cache fails, an exception is thrown from that write statement.
You second point is hard to summarize, but:
Firestore keeps the pending writes separate from the snapshots it returns for local reads, and will update the cached snapshot correctly both for successful and for rejected writes.
If you want to know whether the snapshot you read contains any pending writes, you can check the pendingWrites field in its metadata.
What about race conditions? Let's say two users update a document at the "same time" while the device is offline. What happens once it comes back online?
The last write wins. If that's not what you need, use security rules to enforce your requirements on the server.
I have a project that I'm working on that uses a background thread to insert data into the database from a different source. Now one of the problems that I have occasionally, once the user interacts with the database an error due to "could not serialize access due to concurrent update" is triggered.
Because my data source generates a lot of data my background thread is always busy keeping up and inserting the data and I've written the algorithm to basically "retry" whenever a row is locked, so I can live with the occasional error that occurs on the background thread.
The real problem is that my users basically interact with the database via a website that has ORM mapping in various layers and it seems that occasionally the user might be modifying the same record that is being worked on by the background thread.
Is there any recommendation of what I could do make my not be confronted with the serialization errors?
Presumably your users have been presented with some data from the database, and based on what they saw have decided to change something. If the data they looked at is obsolete, why would you expect that decision based on obsolete information to still stand? In general, they need to look at the more-current data and decide if they still want to make the change.
If those decisions are "easy" to make, then why do you have humans making them in the first place? And if they are hard to make, what choice is there but to have the people re-make them? There may be particular answers to this, but I don't see any general answers, and you haven't given us any particular details to work with.
I am writing an app for iOS that uses data provided by a web service. I am using core data for local storage and persistence of the data, so that some core set of the data is available to the user if the web is not reachable.
In building this app, I've been reading lots of posts about core data. While there seems to be lots out there on the mechanics of doing this, I've seen less on the general principles/patterns for this.
I am wondering if there are some good references out there for a recommended interaction model.
For example, the user will be able to create new objects on the app. Lets say the user creates a new employee object, the user will typically create it, update it and then save it. I've seen recommendations that updates each of these steps to the server --> when the user creates it, when the user makes changes to the fields. And if the user cancels at the end, a delete is sent to the server. Another different recommendation for the same operation is to keep everything locally, and only send the complete update to the server when the user saves.
This example aside, I am curious if there are some general recommendations/patterns on how to handle CRUD operations and ensure they are sync'd between the webserver and coredata.
Thanks much.
I think the best approach in the case you mention is to store data only locally until the point the user commits the adding of the new record. Sending every field edit to the server is somewhat excessive.
A general idiom of iPhone apps is that there isn't such a thing as "Save". The user generally will expect things to be committed at some sensible point, but it isn't presented to the user as saving per se.
So, for example, imagine you have a UI that lets the user edit some sort of record that will be saved to local core data and also be sent to the server. At the point the user exits the UI for creating a new record, they will perhaps hit a button called "Done" (N.B. not usually called "Save"). At the point they hit "Done", you'll want to kick off a core data write and also start a push to the remote server. The server pus h won't necessarily hog the UI or make them wait till it completes -- it's nicer to allow them to continue using the app -- but it is happening. If the update push to server failed, you might want to signal it to the user or do something appropriate.
A good question to ask yourself when planning the granularity of writes to core data and/or a remote server is: what would happen if the app crashed out, or the phone ran out of power, at any particular spots in the app? How much loss of data could possibly occur? Good apps lower the risk of data loss and can re-launch in a very similar state to what they were previously in after being exited for whatever reason.
Be prepared to tear your hair out quite a bit. I've been working on this, and the problem is that the Core Data samples are quite simple. The minute you move to a complex model and you try to use the NSFetchedResultsController and its delegate, you bump into all sorts of problems with using multiple contexts.
I use one to populate data from your webservice in a background "block", and a second for the tableview to use - you'll most likely end up using a tableview for a master list and a detail view.
Brush up on using blocks in Cocoa if you want to keep your app responsive whilst receiving or sending data to/from a server.
You might want to read about 'transactions' - which is basically the grouping of multiple actions/changes as a single atomic action/change. This helps avoid partial saves that might result in inconsistent data on server.
Ultimately, this is a very big topic - especially if server data is shared across multiple clients. At the simplest, you would want to decide on basic policies. Does last save win? Is there some notion of remotely held locks on objects in server data store? How is conflict resolved, when two clients are, say, editing the same property of the same object?
With respect to how things are done on the iPhone, I would agree with occulus that "Done" provides a natural point for persisting changes to server (in a separate thread).
I am working on a regular iPhone app which pulls data from a server (XML, JSON, etc...), and I'm wondering what is the best way to implement synching data. Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access and flexibility (adaptable when the structure of the database changes slightly, like a new column). I know it varies from app to app, but can you guys share some of your strategy/experience?
For me, I'm thinking of something like this:
1) Store Last Modified Date in iPhone
2) Upon launching, send a message like getNewData.php?lastModifiedDate=...
3) Server will process and send back only modified data from last time.
4) This data is formatted as so:
<+><data id="..."></data></+> // add this to SQLite/CoreData
<-><data id="..."></data></-> // remove this
<%><data id="..."><attribute>newValue</attribute></data></%> // new modified value
I don't want to make <+>, <->, <%>... for each attribute as well, because it would be too complicated, so probably when receive a <%> field, I would just remove the data with the specified id and then add it again (assuming id here is not some automatically auto-incremented field).
5) Once everything is downloaded and updated, I will update the Last Modified Date field.
The main problem with this strategy is: If the network goes down when I am updating something => the Last Modified Date is not yet updated => next time I relaunch the app, I will have to go through the same thing again. Not to mention potential inconsistent data. If I use a temporary table for update and make the whole thing atomic, it would work, but then again, if the update is too long (lots of data change), the user has to wait a long time until new data is available. Should I use Last-Modified-Date for each of the data field and update data gradually?
I would start by making the update routine atomic, since you'll have enough on your hands figuring out how to get the client-server communication working properly.
After that is a good time to consider tweaking it to be incremental, but only after you do some testing to figure out if it's really necessary. If you're tuning your update protocol to be as low bandwidth as possible, you might discover that even a "big" update is downloaded fast enough.
Another way to look at it is to ask yourself, how often is there going to be network trouble when an average user is doing a sync? You probably don't want to tune for unlikely scenarios.
If you are trying to optimize (minimize) the data transfer you may want to consider a different format than XML, since XML is fairly verbose. Or at least you may want to trade in XML readability for space by making each element name and attribute as small as possible, and eliminate all unnecessary whitespace.
Your basic scheme is good. The thing you need to do is to somehow make your updates idempotent so that you can restart a partially-completed transfer without risk. This is a better way to go than to try to implement some sort of true atomic commit (though you could do that too, using, eg, the SQLite database).
In our experience fairly large updates (10s of KB) can be downloaded quite rapidly, if the server is fast enough. No great need to break updates up into tiny bits. But certainly it won't hurt to try to minimize the amount of data transferred by keeping more granular info on "last update".
(And definitely you should use JSON rather than XML as your transmitted data representation.)
Wonder if you have considered using a Sync Framework to manage the synchronization. If that interests you can take a look at the open source project, OpenMobster's Sync service. You can do the following sync operations
two-way
one-way client
one-way device
bootup
Besides that, all modifications are automatically tracked and synced with the Cloud. You can have your app offline when network connection is down. It will track any changes and automatically in the background synchronize it with the cloud when the connection returns. It also provides synchronization like iCloud across multiple devices
Also, modifications in the Cloud are synched using Push notifications, so the data is always current even if it is stored locally.
In your case,
Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access
Speed: Only the changes are sent across the network in both directions
Robustness: It stores data in a transactional store like sqlite and any failed updates are communicated in the SyncML payload. Only the successful operations are processed while the failed operations are re-tried during the next sync
Here is a link to the open source project: http://openmobster.googlecode.com
Here is a link to iPhone App Sync: http://code.google.com/p/openmobster/wiki/iPhoneSyncApp
Short version
If my process is terminated in the middle of a transaction, or while SQLite is committing a transaction, what are the chances that the database file will be corrupted?
Long version
My application uses an SQLite database for storage (directly, not via Core Data). I'm working on a new version of the application which will require an update to the database schema. On launch, the app will check the database and, if it needs updating, execute a series of SQL statements to do so.
Depending on the amount of data in the database, the update may be long running (on the order of seconds), so I need to consider the possibility that the process may be terminated before the update is completed. (For context, this on an iPhone, where the processor is slow and the app may be terminated by an incoming phone call.) I will, of course, wrap the upgrade SQL statements in a transaction. Will that be enough to guarantee that the database will not be corrupted?
I'm assuming that transactions work as advertised, and that if the process is terminated in the middle of the transaction, the file will be OK. But I'm also assuming there is a window of time during the COMMIT where something can go wrong.
To play it safe, I could create a backup copy of the database file before starting the update, but if the transactions are safe then that would be overkill. It would also make the update process take longer, which increases the chance it would be interrupted, and then I'd have to consider that the file copy operation might be interrupted... I'd like to keep the code as simple as possible (but no simpler).
In the course of researching this question I've started reading "Atomic Commit In SQLite", which is more detail than I probably need to know, but is giving me faith that I don't need to second-guess SQLite's ability to protect the database file. But I'd still like to hear from Stack Overflow: is a transaction good enough, or should I be more cautious?
I have read the Atomic Commit in SQLite document. It may not be overkill if you really want to understand what's going on, but in a nutshell, a transaction goes like this:
Lock the database file
Create the rollback journal
Determine what portions of the database file are going to be changing
Write copies of those pages to the journal file
Write the journal file header
Write your intended changes to the database file
Delete the rollback journal (THIS IS THE COMMIT)
When the user is done talking to mom and re-starts your app, when it tries to open the database file, if there is a rollback journal present, it will write the original data back to the datafile using a similarly safe process. Even if you lose your transaction, and lose a rollback, it will eventually be taken care of once mom's nervous breakdown is properly thwarted and he can run the app for more than a couple seconds at a time.
If it were me, I would trust the transactions. With so many users of SQLite, even in embedded apps, I think transaction commit failures would be a very hot topic all over the net if they weren't working properly.
Are you using CoreData with a SQLite backend? If so, I actually find that the best way to handle this problem is to create two separate NSManagedObjectContexts (a read-only and an editing). When the process completes, just save the "editing" context and then the two contexts will be in sync. If something happens during your operation, the editing context won't get saved, so you'll be fine.