sqlite versus csv file for iPhone - iphone

We have about 10 sqlite files getting downloaded in our app and each of which contains about 4000 rows. We process that data and display it in a tableview. We are running into speed and memory issues when scrolling through the tableview.
We were thinking whether instead of sqlite files, if we have csv files or some other format, can we get better performance than sqlite? I have read that xml or json won't help since the number of records is too huge and parsing time would go up.
Please suggest.

First, don't assume that SQLite is your bottleneck. I made that same assumption in my own application and spent days trying to optimize the database access, only to run Instruments against it and find that I had a slow string-processing routine in my interface that was bogging things down.
Use Time Profiler and Object Allocations first to verify where your hotspots are in code. SQLite is ridiculously fast.
That said, with 4000 rows, you will probably run into memory issues at the least if you try to load all of them into an array for display to the screen. My recommendation would be to import that data into a Core Data SQLite database and use an NSFetchedResultsController with a batch size set for its fetch request to be slightly larger than the number of rows displayed onscreen.
Core Data will handle the loading / unloading of batched data this way, meaning that only a small part of the database is loaded into memory at once. This can lead to a tremendous speedup (particularly on the initial load) and will significantly reduce memory usage. It also does it using a trivial amount of code.

A properly indexed SQLite database will run circles around any flat file, especially if you have a lot of records. Also try consolidating those 10 files into 1 database, so you can perform joins on indexed columns and use clever tricks such as views. Right now it seems like you're pulling data from 10 different databases and manually comparing/processing them, which would of course take a lot of time and memory.

It is going to depend on the application, how you are using and querying the data. Profile it, confirm that sqlite is or isn't the problem. Then attack whatever the profiling turns up.
Profilers: Shark
Or some other profiling solution

Related

Efficient storage of large amounts of data in iOS

I'm building an application which has a "record" feature which records user interaction over time. As time progresses, I fill an array in memory with "state" objects representing the current state of the user input. A typical recording will result in about 5k of these objects.
I then archive this data using NSKeyedArchiver archiveRootObject: toFile:. This works fine, however the file size is very large (3.5 megs or so). My question is this:
Is there any inherent file-size overhead involved in archiving files? Would I be able to save this data using much less disk space if I were to use SQLite, or even roll my own file format? Or is the only way to reduce the disk size of the data going to be to reduce the bit depth of the numbers I'm storing?
If your concern is performance, Core Data gives you more granularity. You can lazy load and save by parts during app execution vs loading/saving the whole 3.5Mb object graph.
If your concern is file size, this is the binary plist format, and this is the SQLite file format. But more important than the overhead, is how complex is the translation between your object graph and the Core Data model.
You may also be interested in this comparison of speed and performance for several file formats: https://github.com/eishay/jvm-serializers/wiki/ Not sure if everything there has an C, C++ or objective-C implementation.
3.5 MB isn't a very large file. However, if your app has to load or save a 3.5 MB file all the time, then using Core Data is a lot smarter as this allows you to save only the data that has changed and retrieve only the parts that you're interested in -- not the whole thing every time.
If storage is the main concern, there would be little difference b/w sqlite and core data.
I had to store UIViewControllers with state in an app, where I ended up not saving the serialized objects but saving only the most specific properties and creating a class which read that data and re-allocated those objects.
The property map was then stored in a csv [admittedly very difficult to manage, but small like anything] and then compressed.

Speed Issue with sqlite and core data on the iPhone

I have 40,000+ records in an sqlite db table and am using core data to model the data.
when deployed to a device (iPhone 3G) the data is very slow to load (it takes 5 seconds for the data to load into the tableview). I was wondering if anyone had any tips on how to improve this. I've heard about indexing the data, but am not sure how this is done.
thanks for any help.
...the 40K records are broken up into 70+
categories, the most any tableview
would show is 2000 records. the
categories are in a plist which then
points to the sqlite db using
NSFetchedResultsController.
That sounds like a bottleneck. Firstly, the categories have to all be loaded into memory at once as the plist is read in. Depending on how big the category objects/data are, that could eat quite a bit of memory.
More importantly though, it suggest your data model is not well configured. There should be no need for any significant data external to Core Data model. The category data should be part of the data model. If you are using a lot of external data to configure the fetched results controller, then you probably end up with complex, slow predicates for the fetch request. That will bog everything down.
Well configured, Core Data can handle very large and complex data sets without any apparent effort because the data is read only in smallish chunks.

Has anyone used an object database with a large amount of data?

Object databases like MongoDB and db4o are getting lots of publicity lately. Everyone that plays with them seems to love it. I'm guessing that they are dealing with about 640K of data in their sample apps.
Has anyone tried to use an object database with a large amount of data (say, 50GB or more)? Are you able to still execute complex queries against it (like from a search screen)? How does it compare to your usual relational database of choice?
I'm just curious. I want to take the object database plunge, but I need to know if it'll work on something more than a sample app.
Someone just went into production with a 12 terabytes of data in MongoDB. The largest I knew of before that was 1 TB. Lots of people are keeping really large amounts of data in Mongo.
It's important to remember that Mongo works a lot like a relational database: you need the right indexes to get good performance. You can use explain() on queries and contact the user list for help with this.
When I started db4o back in 2000 I didn't have huge databases in mind. The key goal was to store any complex object very simply with one line of code and to do that good and fast with low ressource consumption, so it can run embedded and on mobile devices.
Over time we had many users that used db4o for webapps and with quite large amounts of data, going close to todays maximum database file size of 256GB (with a configured block size of 127 bytes). So to answer your question: Yes, db4o will work with 50GB, but you shouldn't plan to use it for terabytes of data (unless you can nicely split your data over multiple db4o databases, the setup costs for a single database are negligible, you can just call #openFile() )
db4o was acquired by Versant in 2008, because it's capabilites (embedded, low ressource-consumption, lightweight) make it a great complimentary product to Versant's high-end object database VOD. VOD scales for huge amounts of data and it does so much better than relational databases. I think it will merely chuckle over 50GB.
MongoDB powers SourceForge, The New York Times, and several other large databases...
You should read the MongoDB use cases. People who are just playing with technology are often just looking at how does this work and are not at the point where they can understand the limitations. For the right sorts of datasets and access patterns 50GB is nothing for MongoDB running on the right hardware.
These non-relational systems look at the trade-offs which RDBMs made, and changed them a bit. Consistency is not as important as other things in some situations so these solutions let you trade that off for something else. The trade-off is still relatively minor ms or maybe secs in some situations.
It is worth reading about the CAP theorem too.
I was looking at moving the API I have for sure with the stack overflow iphone app I wrote a while back to MongoDB from where it currently sits in a MySQL database. In raw form the SO CC dump is in the multi-gigabyte range and the way I constructed the documents for MongoDB resulted in a 10G+ database. It is arguable that I didn't construct the documents well but I didn't want to spend a ton of time doing this.
One of the very first things you will run into if you start down this path is the lack of 32 bit support. Of course everything is moving to 64 bit now but just something to keep in mind. I don't think any of the major document databases support paging in 32 bit mode and that is understandable from a code complexity standpoint.
To test what I wanted to do I used a 64 bit instance EC2 node. The second thing I ran into is that even though this machine had 7G of memory when the physical memory was exhausted things went from fast to not so fast. I'm not sure I didn't have something set up incorrectly at this point because the non-support of 32 bit system killed what I wanted to use it for but I still wanted to see what it looked like. Loading the same data dump into MySQL takes about 2 minutes on a much less powerful box but the script I used to load the two database works differently so I can't make a good comparison. Running only a subset of the data into MongoDB was much faster as long as it resulted in a database that was less than 7G.
I think my take away from it was that large databases will work just fine but you may have to think about how the data is structured more than you would with a traditional database if you want to maintain the high performance. I see a lot of people using MongoDB for logging and I can imagine that a lot of those databases are massive but at the same time they may not be doing a lot of random access so that may mask what performance would look like for more traditional applications.
A recent resource that might be helpful is the visual guide to nosql systems. There are a decent number of choices outside of MongoDB. I have used Redis as well although not with as large of a database.
Here's some benchmarks on db4o:
http://www.db4o.com/about/productinformation/benchmarks/
I think it ultimately depends on a lot of factors, including the complexity of the data, but db4o seems to certainly hang with the best of them.
Perhaps worth a mention.
The European Space Agency's Planck mission is running on the Versant Object Database.
http://sci.esa.int/science-e/www/object/index.cfm?fobjectid=46951
It is a satelite with 74 onboard sensors launched last year which is mapping the infrarred spectrum of the universe and storing the information in a map segment model. It has been getting a ton of hype these days because of it's producing some of the coolest images ever seen of the universe.
Anyway, it has generated 25T of information stored in Versant and replicated across 3 continents. When the mission is complete next year, it will be a total of 50T
Probably also worth noting, object databases tend to be a lot smaller to hold the same information. It is because they are truly normalized, no data duplication for joins, no empty wasted column space and few indexes rather than 100's of them. You can find public information about testing ESA did to consider storage in multi-column relational database format -vs- using a proper object model and storing in the Versant object database. THey found they could save 75% disk space by using Versant.
Here is the implementation:
http://www.planck.fr/Piodoc/PIOlib_Overview_V1.0.pdf
Here they talk about 3T -vs- 12T found in the testing
http://newscenter.lbl.gov/feature-stories/2008/12/10/cosmic-data/
Also ... there are benchmarks which show Versant orders of magnitude faster on the analysis side of the mission.
CHeers,
-Robert

XML and SQLite memory utilization and performance on the iPhone

How do the memory utilization and performance for XML or SQLite compare on the iPhone?
The initial data set for our application is 500 records with no more than 750 characters each.
How well would XML compare with SQLite for accessing say record 397 without going through the first 396? I know SQLite3 would have a better methods for that, but how is the memory utilization?
When dealing with XML, you'll probably need to read the entire file into memory to parse it, as well as write out the entire file when you want to save. With SQLite and Core Data, you can query the database to extract only certain records, and can write only the records that have been changed or added. Additionally, Core Data makes it easy to do batched fetching.
These limited reads and writes can make your application much faster if it is using SQLite or Core Data for its data store, particularly if you take advantage of Core Data's batched fetching. As Graham says, specific numbers on performance can only be obtained by testing under your specific circumstances, but in general XML is significantly slower for all but the smallest data sets. Memory usage can also be much greater, due to the need to load and parse records you do not need at that instant.
To find out how the memory usage for your application fares, you need to measure your application :). The Instruments tool will help you.

What's the fastest way to save data and read it next time in a IPhone App?

In my dictionary IPhone app I need to save an array of strings which actually contains about 125.000 distinct words; this transforms in aprox. 3.2Mb of data.
The first time I run the app I get this data from an SQLite db. As it takes ages for this query to run, I need to save the data somehow, to read it faster each time the app launches.
Until now I've tried serializing the array and write it to a file, and afterword I've tested if writing directly to NSUserDefaults to see if there's any speed gain but there's none. In both ways it takes about 7 seconds on the device to load the data. It seems that not reading from the file (or NSUserDefaults) actually takes all that time, but the deserialization does:
objectsForCharacters = [[NSKeyedUnarchiver unarchiveObjectWithData:data] retain];
Do you have any ideeas about how I could write this data structure somehow that I could read/put in memory it faster?
The UITableView is not really designed to handle 10s of thousands of records. If would take a long time for a user to find what they want.
It would be better to load a portion of the table, perhaps a few hundred rows, as the user enters data so that it appears they have all the records available to them (Perhaps providing a label which shows the number of records that they have got left in there filtered view.)
The SQLite db should be perfect for this job. Add an index to the words table and then select a limited number of rows from it to show the user some progress. Adding an index makes a big difference to the performance of the even this simple table.
For example, I created two tables in a sqlite db and populated them with around 80,000 words
#Create and populate the indexed table
create table words(word);
.import dictionary.txt words
create unique index on words_index on word DESC;
#Create and populate the unindexed table
create table unindexed_words(word);
.import dictionary.txt unindexed_words
Then I ran the following query and got the CPU Time taken for each query
.timer ON
select * from words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.031250 sys 0.015625;
select * from unindex_words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.062500 sys 0.0312
The results vary but the indexed version was consistently faster that the unindexed one.
With fast access to parts of the dictionary through an indexed table, you can bind the UITableView to the database using NSFecthedResultsController. This class takes care of fecthing records as required, caches results to improve performance and allows predicates to be easily specified.
An example of how to use the NSFetchedResultsController is included in the iPhone Developers Cookbook. See main.m
Just keep the strings in a file on the disk, and do the binary search directly in the file.
So: you say the file is 3.2mb. Suppose the format of the file is like this:
key DELIMITER value PAIRDELIMITER
where key is a string, and value is the value you want to associate. The DELIMITER and PAIRDELIMITER must be chosen as such that they don't occur in the value and key.
Furthermore, the file must be sorted on the key
With this file you can just do the binary search in the file itself.
Suppose one types a letter, you go to the half of the file, and search(forwards or backwards) to the first PAIRDELIMITER. Then check the key and see if you have to search upwards or downwards. And repeat untill you find the key you need,
I'm betting this will be fast enough.
Store your dictionary in Core Data and use NSFetchedResultsController to manage the display of these dictionary entries in your table view. Loading all 125,000 words into memory at once is a terrible idea, both performance- and memory-wise. Using the -setFetchBatchSize: method on your fetch request for loading the words for your table, you can limit NSFetchedResultsController to only handling the small subset of words that are visible at any given moment, plus a little buffer. As the user scrolls up and down the list of words, new batches of words are fetched in transparently.
A case like yours is exactly why this class (and Core Data) was added to iPhone OS 3.0.
Do you need to store/load all data at once?
Maybe you can just load the chunk of strings you need to display and load all other strings in the background.
Perhaps you can load data into memory in one thread and search from it in another? You may not get search results instantly, but having some searches feel snappier may be better than none at all, by waiting until all data are loaded.
Are some words searched more frequently or repeatedly than others? Perhaps you can cache frequently searched terms in a separate database or other store. Load it in a separate thread as a searchable store, while you are loading the main store.
As for a data structure solution, you might look into a suffix trie to search for substrings in linear time. This will probably increase your storage requirements, though, which may affect your ability to implement this with an iPhone's limited memory and disk storage capabilities.
I really don't think you're on the right path trying to load everything at once.
You've already determined that your bottleneck is the deserialization.
Regardless what the UI does, the user only sees a handful (literally) of search results at a time.
SQLlite already has a robust indexing mechanism, there is likely no need to re-invent that wheel with your own indexing, etc.
IMHO, you need to rethink how you are using UITableView. It only needs a few screenfuls of data at a time, and you should reuse cell objects as they scroll out of view rather than creating a ton of them to begin with.
So, use SQLlite's indexing and grab "TOP x" rows, where x is the right balance between giving the user some immediately-available rows to scroll through without spending too much time loading them. Set the table's scroll bar scaling using a separate SELECT COUNT(*) query, which only needs to be updated when the user types something different.
You can always go back and cache aggressively after you deserialize enough to get something up on-screen. A slight lag after the first flick or typing a letter is more acceptable than a 7-second delay just starting the app.
I have currently a somewhat similar coding problem with a large amount of searchable strings.
My solution is to store the prepared data in one large memory array, containing both the texttual data and offsets as links. Meaning I do not allocate objects for each item. This makes the data use less memory and also allows me to load & save it to a file without further processing.
Not sure if this is an option for you, since this is quite an obvious solution once you've realized that the object tree is causing the slowdown.
I use a large NSData memory block, then search through it. Well, there's more to it, it took me about two days to get it well optimized.
In your case I suspect you have a dictionary with a lot of words that have similar beginnings. You could prepare them on another computer in a format the both compacts the data and also facilitates fast lookup. As a first step, the words should be sorted. With that, you can already perform a binary search on them for a fast lookup. If you store it all in one large memory area, you can do the search quite fast, compared to how sqlite would search, I think.
Another way would be to see the words as a kind of tree: You have many thousands that begin with the same letter. So you divide your data accordingly: You have a sql table for each beginning letter of your set of words. that way, if you look up a word, you'd select one of the now-smaller tables depening on the first letter. This makes the amount that has to be searched already much smaller. and you can do this for the 2nd and 3rd letter as well, and you already could have quite a fast access.
Did this give you some ideas?
Well actually I figured it out myself in the end, but of course I thank you all for your quick and pertinent answers. To be concise I will just say that, the fact that Objective-C, just like any other object-based programming language, due to introspection and other objective requirements is significantly slower than procedural programming languages.
The solution was in fact to load all my data in a continuous chunk of memory using malloc (a char **) and search on-demand in it and transform to objects. This concluded in a .5 sec loading time (from file to memory) and resonable (should be read "fast") operations during execution. Thank you all again and if you have any questions I'm here for you. Thanks