I am using a SQLite backed core data store for an iPhone app I am currently developing.
I was wondering how database maintenance is handled with SQLite on iPhone if at all? Server based RDBMS systems usually require regular defragmentation and statistics updates etc to ensure consistent performance. I'm not sure how this is handled on mobile devices.
How is this handled in iOs and SQLite.? Is there anything that needs to be done on the part of the developer or is it all handled automatically?
My database will contains at most 100,000 records with approximately 500 inserts or deletes a day.
SQLite has the ANALYZE command to update statstics, but it is unlikely that it would have much of an effect in a small database like yours. You could check with EXPLAIN QUERY PLAN whether there is any difference. (In any case, it wouldn't hurt.)
To defragment, you can use the VACUUM command. However, on flash-based storage, fragmentation is much less of a problem than the time where the database is blocked by a complete reorganization would be.
In practice, database maintenance is almost never worth the bother on handheld devices.
Related
I am running various tests that spend a lot of time in the database.
I'd like to keep it all in memory and have it not touch the db, hopefully that would speed things up. Like using sqlite3's in-memory option. I don't need persistence/durability/whatnot, everything is immediately discarded after the test.
Is that possible? I tried tweaking my postgres memory-related vars (as in the solution below), but that doesn't seem to affect the number of db writes it performs, and I couldn't find anything that looks like an 'in-memory' option.
https://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram
I wrote a detailed post on this some time ago:
Optimise PostgreSQL for fast testing
You may find it informative; it covers options for making PostgreSQL run without durability and other tweaks that're useful for running tests.
You do not actually need in-memory operation. If PostgreSQL is set not to flush changes to disk then in practice there'll be little difference for DBs that fit in RAM, and for DBs that don't fit in RAM it won't crash.
You should test with the same database engine you're using in production. Testing with SQLite, Derby, H2, etc then deploying live on PostgreSQL doesn't make tons of sense... as any Heroku/Rails user can tell you from experience.
Can people give me examples of why they would use coreData in an application?
I ask this because most apps are just clients to a central server where an API of some sort gives you the information you need.
In my case I'm writing a timesheet application for a web app which has an API and I'm debating if there is any value in replicating the data structure on my server in core data(Sqlite)
e.g
Project has many timesheets
employee has many timesheets
It seems to me that I can just connect to the API on every call for lists of projects or existing timesheets for example.
I realize for some kind of offline mode you could store locally in core data but this creates way more problems because you now have a big problem with syncing that data back to the web server when you get connection again.. e.g. the project selected for a timesheet no longer exists.
Can any experienced developer shed some light on there experiences on when core data is best practice approach?
EDIT
I realise of course there is value in storing local persistance but the key value of user defaults seems to cover most applications I can think of.
You shouldn't think of CoreData simply as an SQLite database. It's not JUST an SQLite database. Sure, SQLite is an option, but there are other options as well, such as in-memory and, as of iOS5, a whole slew of custom data stores. The biggest benefit with CoreData is persistence, obviously. But even if you are using an in-memory data store, you get the benefits of a very well structured object graph, and all of the heavy lifting with regards to pulling information out of or putting information into the data store is handled by CoreData for you, without you necessarily needing to concern yourself with what is backing that data store. Sure, today you don't care too much about persistence, so you could use an in-memory data store. What happens if tomorrow, or in a month, or a year, you decide to add a feature that would really benefit from persistence? With CoreData, you simply change or add a persistent data store, and all of your methods to get information out or in remain unchanged. The overhead for that sort of addition is minimal in comparison to if you were trying to access SQLite or some other data store directly. IMHO, that's the biggest benefit: abstraction. And, in essence, abstraction is one of the most powerful things behind OOP. Granted, building the Data Model just for in-memory storage could be overkill for your app, depending on how involved the app is. But, just as a side note, you may want to consider what is faster: Requesting information from your web service every time you want to perform some action, or requesting the information once, storing it in memory, and acting on that stored value for the remainder of the session. An in-memory data store wouldn't persistent beyond that particular session.
Additionally, with CoreData you get a lot of other great features like saving, fetching, and undo-redo.
There are basically two kinds of apps. Those that provide you with local functionality (games, professional applications, navigation systems...) and those that grant access to a remote service.
Your app seems to be in the second category. If you access remote services, your users will want to access new or real-time data (you don't want to read 2 week old Facebook posts) but in some cases, local caching makes sense (e.g. reading your mails when you're on the train with unstable network).
I assume that the value of accessing cached entries when not connected to a network is pretty low for your customers (internal or external) compared to the importance of accessing real-time-data. So local storage might be not necessary at all.
If you don't have hundreds of entries in your timetable, "normal" serialization (NSCoding-protocol) might be enough. If you only access some "dashboard-data", you will be able to get along with simple request/response-caching (NSURLCache can do a lot of things...).
Core Data does make more sense if you have complex data structures which should be synchronized with a server. This adds a lot of synchronization logic to your project as well as complexity from Core Data integration (concurrency, thread-safety, in-app-conflicts...).
If you want to create a "client"-app with a server driven user experience, local storage is not necessary at all so my suggestion is: Keep it as simple as possible unless there is a real need for offline storage.
It's ideal for if you want to store data locally on the phone.
Seriously though, if you can't see a need for it for your timesheet app, then don't worry about it and don't use it.
Solving the sync problems that you would have with an "offline" mode would be detailed in your design of your app. For example - don't allow projects to be deleted. Why would you? Wouldn't you want to go back in time and look at previous data for particular projects? Instead just have a marker on the project to show it as inactive and a date/time that it was made inactive. If the data that is being synced from the device is for that project and is before the date/time that it was marked as inactive, then it's fine to sync. Otherwise display a message and the user will have to sort it.
It depends purely on your application's design whether you need to store some data locally or not, if it is a real problem or a thin GUI client around your web service. Apart from "offline" mode the other reason to cache server data on client side might be to take traffic load from your server. Just think what does it mean for your server to send every time the whole timesheet data to the client, or just the changes. Yes, it means more implementation on both side, but in some cases it has serious advantages.
EDIT: example added
You have 1000 records per user in your timesheet application and one record is cca 1 kbyte. In this case every time a user starts your application, it has to fetch ~1Mbyte data from your server. If you cache the data locally, the server can tell you that let's say two records were updated since your last update, so you'll have to download only 2 kbyte. Now you should scale up this for several tens of thousands of user and you will immediately notice the difference of the server bandwidth and CPU usage.
I am using iCloud with Core Data, based on the SQLite "Library-style" application design as specified by Apple. While the basic functionality works very well, I am concerned about the transaction logs and how they are managed.
While the database for my app is not large, it is very active and the core data stack is saved many times while the app is in use. I have noticed that a new transaction log is created for every core data save. The end result is that I have a TON of transaction logs and they take up much more space than the actual database.
1) Do the transaction logs ever get automatically pruned / coalesced, or will they continue to grow indefinitely, eventually numbering in the thousands and taking up many megabytes? It seems that the only way to manually purge the transaction logs and recreate a .baseline archive would be to disable and then re-enable iCloud (removing the ubiquity container and starting fresh). But this is obviously not a good solution.
2) My current architecture saves the core data stack often, even for minor changes. In general, this makes sense as my database is small and saving often ensures that the database file is always up-to-date. However, given the above issues with transaction logs, I am thinking that I should perhaps minimize saves to the database. Maybe doing so on a timed basis and/or on app transition states.
3) Even if I minimize the number of transaction logs by reducing how often I save the database, there seems to be an issue here, as the logs will continue to grow in number over time. Eventually the benefit of the "transaction log" design will become a burden in terms of the amount of iCloud storage used and the initial iCloud sync as a new device is added.
As Apple has provided very sparse information on iCloud and almost nothing in the form of "best practices", I would appreciate any insight from the community.
I filed a radar on this issue and received the following reply. They noted that it should be working correctly in iOS 5.1, though I have not yet verified this myself.
A clarification for any who might misunderstand the following. The transaction logs will be cleaned up by the core data internals. This is NOT something that should be performed by the application itself.
Engineering has provided the following feedback regarding this issue:
The transaction logs are intended to be deleted after all the active
peers have had a chance to read them, and they exceed a threshold of
space consumed. There was a previous issue that prevented the devices
from correctly doing so.
I am working on a regular iPhone app which pulls data from a server (XML, JSON, etc...), and I'm wondering what is the best way to implement synching data. Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access and flexibility (adaptable when the structure of the database changes slightly, like a new column). I know it varies from app to app, but can you guys share some of your strategy/experience?
For me, I'm thinking of something like this:
1) Store Last Modified Date in iPhone
2) Upon launching, send a message like getNewData.php?lastModifiedDate=...
3) Server will process and send back only modified data from last time.
4) This data is formatted as so:
<+><data id="..."></data></+> // add this to SQLite/CoreData
<-><data id="..."></data></-> // remove this
<%><data id="..."><attribute>newValue</attribute></data></%> // new modified value
I don't want to make <+>, <->, <%>... for each attribute as well, because it would be too complicated, so probably when receive a <%> field, I would just remove the data with the specified id and then add it again (assuming id here is not some automatically auto-incremented field).
5) Once everything is downloaded and updated, I will update the Last Modified Date field.
The main problem with this strategy is: If the network goes down when I am updating something => the Last Modified Date is not yet updated => next time I relaunch the app, I will have to go through the same thing again. Not to mention potential inconsistent data. If I use a temporary table for update and make the whole thing atomic, it would work, but then again, if the update is too long (lots of data change), the user has to wait a long time until new data is available. Should I use Last-Modified-Date for each of the data field and update data gradually?
I would start by making the update routine atomic, since you'll have enough on your hands figuring out how to get the client-server communication working properly.
After that is a good time to consider tweaking it to be incremental, but only after you do some testing to figure out if it's really necessary. If you're tuning your update protocol to be as low bandwidth as possible, you might discover that even a "big" update is downloaded fast enough.
Another way to look at it is to ask yourself, how often is there going to be network trouble when an average user is doing a sync? You probably don't want to tune for unlikely scenarios.
If you are trying to optimize (minimize) the data transfer you may want to consider a different format than XML, since XML is fairly verbose. Or at least you may want to trade in XML readability for space by making each element name and attribute as small as possible, and eliminate all unnecessary whitespace.
Your basic scheme is good. The thing you need to do is to somehow make your updates idempotent so that you can restart a partially-completed transfer without risk. This is a better way to go than to try to implement some sort of true atomic commit (though you could do that too, using, eg, the SQLite database).
In our experience fairly large updates (10s of KB) can be downloaded quite rapidly, if the server is fast enough. No great need to break updates up into tiny bits. But certainly it won't hurt to try to minimize the amount of data transferred by keeping more granular info on "last update".
(And definitely you should use JSON rather than XML as your transmitted data representation.)
Wonder if you have considered using a Sync Framework to manage the synchronization. If that interests you can take a look at the open source project, OpenMobster's Sync service. You can do the following sync operations
two-way
one-way client
one-way device
bootup
Besides that, all modifications are automatically tracked and synced with the Cloud. You can have your app offline when network connection is down. It will track any changes and automatically in the background synchronize it with the cloud when the connection returns. It also provides synchronization like iCloud across multiple devices
Also, modifications in the Cloud are synched using Push notifications, so the data is always current even if it is stored locally.
In your case,
Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access
Speed: Only the changes are sent across the network in both directions
Robustness: It stores data in a transactional store like sqlite and any failed updates are communicated in the SyncML payload. Only the successful operations are processed while the failed operations are re-tried during the next sync
Here is a link to the open source project: http://openmobster.googlecode.com
Here is a link to iPhone App Sync: http://code.google.com/p/openmobster/wiki/iPhoneSyncApp
Recently we are working on migrate our software from general PC server to a kind of embedded system which use Disk on module (DOM) instead of hard disk drive.
My colleague insist that as DOM could only support about 1 million times of write operation, we should running our database entirely in a RAM disk and backup the database to DOM.
There 3 ways to trigger the backup :
User trigger
Every 30 minutes
Every time when there is some add/update/delete operation in database
As we expecte that user will only modify the database when system is installed, I think maybe postgresql would not write that often.
But I don't know much about postgresql, I can not judge if it worth all this trouble and which approach is better.
What do you think about it?
The problem of wearing out SSDs can be alleviated by whatever firmware the SSD has. Sometimes those chipsets don't do it well, or leave the responsibility to someone else. In this case, you can use a filesystem designed to do wear levelling by itself. UBIFS or LogFS are suitable filesystems.
Assuming that the claim about the DOM write cycles is true, which I can't comment on, then this won't work very well. PostgreSQL assumes that it can write whatever it wants whenever it wants (even if no logical updates are happening), and you have no real chance of making it go along with the 3 triggers that you mention.
What you could do is have the entire thing run on a RAM disk and have some operating system process back this up atomically to permanent storage. This needs careful file system and kernel support. This could work if your device is on most of the time, but probably not so well if it's the sort of thing that you switch on and off like a TV, because the recovery times could be annoying.
Alternatives are using either a more embedded-like RDBMS such as SQLite, or using a storage system that can handle PostgreSQL, like the recent solid state drives, although some SSDs have bogus cache settings that might make them unsuitable for PostgreSQL.