Load and perform search on large amount of data - iphone

I need a suggest how to operate with large amount of data on iPhone. Let say I have xml file with ~120k text records. I need to perform search on this data. The solution i have tried is to use Core Data to store information in sorted order in caches. And then use binary search which works fast. But the problem is to build this caches. On first launch application takes about 15-25 seconds to build this caches. Maybe I need to use different approach to search the data?
Thanks in advance.

If you're using an XML file with the requirement that you can't cache, then you're not going to succeed unless you somehow carefully format your XML file to have useful data traversal properties -- but then you may as well use a binary file that's more useful unless you have some very esoteric requirements.
Really what you want is one of the typical indexing algorithms (on disk hash, B-tree, etc) from the get-go.
However...
If you have to read in and parse your XML text file, then you can skirt using a typical big and slow generic XML parser and write a fast hackish version since most of the data records you'll need to recognize are probably formatted the same way over and over. Nothing special, just find where the relevant data fields start, grab the data until it ends, move on to the next data field.
Honestly, 120k of text isn't very much-- sounds like whatever XML parser you're using is just slow. (I use this trick all the time for autogenerated XML data that just represents things like tables or simple data records -- my own parser is faster than any generic XML parser.)
This is probably the solution you actually want since you sound fairly attached to the XML file format. It won't be as error-proof as a generic XML parser if you're not careful, however it will eat that 120KB file up like nobody's business. And it's entry level CS work -- read in a file with certain specific formatting and grab the data values from it. Regexps are your friend if you have access to them.

Try storing and doing your searches in the cloud. (using a database stored on a server somewhere)
Unless you specifically need ALL of the information on the device..

Related

Will using Core Data speed up key-value parsing?

Currently I am downloading and parsing my data from an XML file that is on my server. At one point, I have to check my first XML key value against all of the values from a second XML file and store the matching values into an array of NSDictionarys then display them in a tableview. This can take up to about 5 seconds and I want to speed this up.
I am wondering if I download my XML files into a Core Data structure first if this will speed up the this process, so the load time doesn't take as long when checking these values against one another.
No, Core Data won't speed up parsing the XML data -- you'll need to parse the XML before adding the data to your Core Data store. It may or may not speed up the next step, where you apparently are looking for matches, but as you haven't really described what's going on there it's difficult to say one way or the other.
Sounds like you are effectively doing a comparison of the keys in the two XML documents. If you're using XML-based API calls to do that, you may be doing a linear search in XML doc 1 for every key in doc 2. If there are N keys in doc 1 and M keys in doc 2, that's on the order of N * M operations.
Walking through each document just once to get all the keys and add them to something like Core Data (or just an NSDictionary) that's optimized for retrieval by key seems like it could be an improvement, assuming that looking for matches is what's slowing things down. (If most of your time is being spent simply parsing the XML in the first place, you won't gain much by speeding up the matching.)

How to load data into Core Data?

thanks for you help.
I'm attempting to add core data to my project and I'm stuck at where and how to add the actual data into the persistent store (I'm assuming this is the place for the raw data).
I will have 1000 < objects so I don't want to use a plist approach. From my searches, there seems to be xml and csv approaches. Is there a way I can use SQL for input?
The data will not be changed by the user and the data file will be typed in by hand, so I won't need to update these files during runtime, and at this point I am not limited in any type of file - the lightest on syntax is preferred.
Thanks again for any help.
You could load your data from an xml/csv/json file and create the DB on the first lunch of your application (if the DB is not there, then read the data and create it).
A better/faster approach might be to ship your sqllite DB within your application. You can parse the file in any format you want on the simulator, create a DB with all your entities, then take it from the ApplicationData and just add it to your app as a resource.
Although I'm sure there are lighter file types that could be used, I would include a JSON file into the app bundle from which you import the initial dataset.
Update: some folks are recommending XML. NSXMLParser is almost as fast as JSONKit (but much faster than most other parsers), but the XML syntax is heavier than JSON. So an XML bundled file that holds the initial dataset would weight more than if it was in JSON.
Considering Apple considers the format of its persistent stores implementation details, shipping a prefabricated SQLite database is not a very good idea. I.e. the names of fields and tables may change between iOS versions/phones/whatever hidden variable you can think of. You should, in general, not concern yourself with how this serialization of your data is formatted.
There's a brief article about importing data on Apple's developer site: Efficiently Importing Data
You should ship initial data in whatever format you're comfortable with (XML allows you to do incremental parsing efficiently, which reduces memory footprint) and write an import routine to run if you need to import data.
Edit: With EliBud's comment in mind, I still consider the approach a bit "iffy"... The format of the SQLite database used by Core Data is not something you'd want to generate by yourself (it's weird, simply put, and still not something you should really rely on).
So you'd want to use a mock app running on the Simulator and use Core Data to create the database (as per EliBud's answer). But you'd still have to import the data into that mock-app! And while it might make sense to do this once on a "real" computer instead of a lot of times on a mobile device (i.e. copying a file is easy, importing data is hard), you're essentially using the Simulator as an administration tool.
But hey, if it works...

Performance issue - plist vs sqlite in iOS

I need to keep track of some variables and to save them very frequently. I don't need complex search and sort, just simple read/write.
What is the difference in read/write performance between plist and sqlite ?
Besides the above two methods, should I use core data ?
Please give me some hints.
Thanks.
In SQlite you can perform all functions related SQL like create,delete..and also store large amount of data.But in Plist its you jst store .
Plist and SQLite have different use as below..
PList is a file format used to store a small amount of structural data (less than a few hundred kilobytes), typically a dictionary. PList doesn't have sorting capabilities in and of itself, although code can easily be written to sort it.
A property list is probably the easiest to maintain, but it will be loaded into memory all at once. This could eat up a lot of the device's memory
SQLite is a full-fledged database. File sizes (on an iphone) are essentially unlimited. Sorting capabilities are built in. Querying and relational table designs are possible. Performance should be as good as any sorting algorithm you might come up with.
An sqlite database, on the other hand, will load only the data you request. I'm not sure how your data is structured, but you could quite easily create key-value pairs with a single database table. (A single table with a key column and a value column) Then, if it were me, I'd write an Objective-C class to wrap the database queries so I can write easy statements like:
NSString *welcomeText = [[MyData sharedData] dataWithKey:#"WelcomeText"];
Getting the data into the database in the first place doesn't have to be difficult. You can use the command line sqlite3 utility to bulk load your data. There's a command called .import that will import your data from a text file.
From the answer provided by #Robert Harvey in this previous SO question plist or sqlite
PList is a file format used to store a
small amount of structural data (less
than a few hundred kilobytes),
typically a dictionary. PList doesn't
have sorting capabilities in and of
itself, although code can easily be
written to sort it.A property list is
probably the easiest to maintain, but
it will be loaded into memory all at
once. This could eat up a lot of the
device's memory.
SQLite is a full-fledged database.
File sizes (on an iphone) are
essentially unlimited. Sorting
capabilities are built in. Querying
and relational table designs are
possible. Performance should be as
good as any sorting algorithm you
might come up with.An sqlite database,
on the other hand, will load only the
data you request. I'm not sure how
your data is structured, but you could
quite easily create key-value pairs
with a single database table.
If you are storing "huge data" then
you will benefit from sqlite.
Especially if you are going to be
performing complex queries,
extraction, searching and sorting
etc..
If you are going to do operations like search, sort you have to use sqlite. In sqlite we can able to store large amount of data but it is not possible in plist.I dont know about the performace between this.

Saving fetched data on disk

I'm creating an iPhone app, which fetches information from a server every time it is started. However, i'm planning on using the fetched data of the last month/few months/year to calculate some averages.
I had been thinking about saving them to NSUserDefaults using dictionaries (associating a date with a value), but i just remembered there also exists something like core data. Seeing that i do not have any experience with core data, i don't know if it's better. If it wouldn't, i could save the
time i'd use learning it otherwise.
The data comes in in XML format, and i get several sets of the same response each time (for different locations on a map). The amount of sets can change, as the user can add more locations. I currently only save the raw data to the disk to load if the load fails next time it starts. However, i also want to save some specific values from that XML in a way that i can easily access it. What would be the best way to do this?
Edit: I actually also need to know how fast/efficient core data is. I'm currently passing around NSArrays with NSDictionaries for the sets of data during that session. For saving the data that last longer than the session core data is ideal, i found out that much (just need to find a nice way to associate an entity with a date), i just need some advice on the efficiency.
If you're going to be working with larger amounts of data, it's probably anyway better to give Core Data a try, it's after all not that complicated and there's a plenty of good tutorials where you can learn it. There are different settings for the storage type, you can either use a sqlite database or an xml file.
According to the guys from Apple, it should be fast and memory effective to use Core Data in contrast to self-made solutions, so it's a preferred way to go.
Core data would be easier to manipulate the data and query the data using predicates. Core Data supports dates so you can even find items in date ranges.

What's the fastest way to save data and read it next time in a IPhone App?

In my dictionary IPhone app I need to save an array of strings which actually contains about 125.000 distinct words; this transforms in aprox. 3.2Mb of data.
The first time I run the app I get this data from an SQLite db. As it takes ages for this query to run, I need to save the data somehow, to read it faster each time the app launches.
Until now I've tried serializing the array and write it to a file, and afterword I've tested if writing directly to NSUserDefaults to see if there's any speed gain but there's none. In both ways it takes about 7 seconds on the device to load the data. It seems that not reading from the file (or NSUserDefaults) actually takes all that time, but the deserialization does:
objectsForCharacters = [[NSKeyedUnarchiver unarchiveObjectWithData:data] retain];
Do you have any ideeas about how I could write this data structure somehow that I could read/put in memory it faster?
The UITableView is not really designed to handle 10s of thousands of records. If would take a long time for a user to find what they want.
It would be better to load a portion of the table, perhaps a few hundred rows, as the user enters data so that it appears they have all the records available to them (Perhaps providing a label which shows the number of records that they have got left in there filtered view.)
The SQLite db should be perfect for this job. Add an index to the words table and then select a limited number of rows from it to show the user some progress. Adding an index makes a big difference to the performance of the even this simple table.
For example, I created two tables in a sqlite db and populated them with around 80,000 words
#Create and populate the indexed table
create table words(word);
.import dictionary.txt words
create unique index on words_index on word DESC;
#Create and populate the unindexed table
create table unindexed_words(word);
.import dictionary.txt unindexed_words
Then I ran the following query and got the CPU Time taken for each query
.timer ON
select * from words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.031250 sys 0.015625;
select * from unindex_words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.062500 sys 0.0312
The results vary but the indexed version was consistently faster that the unindexed one.
With fast access to parts of the dictionary through an indexed table, you can bind the UITableView to the database using NSFecthedResultsController. This class takes care of fecthing records as required, caches results to improve performance and allows predicates to be easily specified.
An example of how to use the NSFetchedResultsController is included in the iPhone Developers Cookbook. See main.m
Just keep the strings in a file on the disk, and do the binary search directly in the file.
So: you say the file is 3.2mb. Suppose the format of the file is like this:
key DELIMITER value PAIRDELIMITER
where key is a string, and value is the value you want to associate. The DELIMITER and PAIRDELIMITER must be chosen as such that they don't occur in the value and key.
Furthermore, the file must be sorted on the key
With this file you can just do the binary search in the file itself.
Suppose one types a letter, you go to the half of the file, and search(forwards or backwards) to the first PAIRDELIMITER. Then check the key and see if you have to search upwards or downwards. And repeat untill you find the key you need,
I'm betting this will be fast enough.
Store your dictionary in Core Data and use NSFetchedResultsController to manage the display of these dictionary entries in your table view. Loading all 125,000 words into memory at once is a terrible idea, both performance- and memory-wise. Using the -setFetchBatchSize: method on your fetch request for loading the words for your table, you can limit NSFetchedResultsController to only handling the small subset of words that are visible at any given moment, plus a little buffer. As the user scrolls up and down the list of words, new batches of words are fetched in transparently.
A case like yours is exactly why this class (and Core Data) was added to iPhone OS 3.0.
Do you need to store/load all data at once?
Maybe you can just load the chunk of strings you need to display and load all other strings in the background.
Perhaps you can load data into memory in one thread and search from it in another? You may not get search results instantly, but having some searches feel snappier may be better than none at all, by waiting until all data are loaded.
Are some words searched more frequently or repeatedly than others? Perhaps you can cache frequently searched terms in a separate database or other store. Load it in a separate thread as a searchable store, while you are loading the main store.
As for a data structure solution, you might look into a suffix trie to search for substrings in linear time. This will probably increase your storage requirements, though, which may affect your ability to implement this with an iPhone's limited memory and disk storage capabilities.
I really don't think you're on the right path trying to load everything at once.
You've already determined that your bottleneck is the deserialization.
Regardless what the UI does, the user only sees a handful (literally) of search results at a time.
SQLlite already has a robust indexing mechanism, there is likely no need to re-invent that wheel with your own indexing, etc.
IMHO, you need to rethink how you are using UITableView. It only needs a few screenfuls of data at a time, and you should reuse cell objects as they scroll out of view rather than creating a ton of them to begin with.
So, use SQLlite's indexing and grab "TOP x" rows, where x is the right balance between giving the user some immediately-available rows to scroll through without spending too much time loading them. Set the table's scroll bar scaling using a separate SELECT COUNT(*) query, which only needs to be updated when the user types something different.
You can always go back and cache aggressively after you deserialize enough to get something up on-screen. A slight lag after the first flick or typing a letter is more acceptable than a 7-second delay just starting the app.
I have currently a somewhat similar coding problem with a large amount of searchable strings.
My solution is to store the prepared data in one large memory array, containing both the texttual data and offsets as links. Meaning I do not allocate objects for each item. This makes the data use less memory and also allows me to load & save it to a file without further processing.
Not sure if this is an option for you, since this is quite an obvious solution once you've realized that the object tree is causing the slowdown.
I use a large NSData memory block, then search through it. Well, there's more to it, it took me about two days to get it well optimized.
In your case I suspect you have a dictionary with a lot of words that have similar beginnings. You could prepare them on another computer in a format the both compacts the data and also facilitates fast lookup. As a first step, the words should be sorted. With that, you can already perform a binary search on them for a fast lookup. If you store it all in one large memory area, you can do the search quite fast, compared to how sqlite would search, I think.
Another way would be to see the words as a kind of tree: You have many thousands that begin with the same letter. So you divide your data accordingly: You have a sql table for each beginning letter of your set of words. that way, if you look up a word, you'd select one of the now-smaller tables depening on the first letter. This makes the amount that has to be searched already much smaller. and you can do this for the 2nd and 3rd letter as well, and you already could have quite a fast access.
Did this give you some ideas?
Well actually I figured it out myself in the end, but of course I thank you all for your quick and pertinent answers. To be concise I will just say that, the fact that Objective-C, just like any other object-based programming language, due to introspection and other objective requirements is significantly slower than procedural programming languages.
The solution was in fact to load all my data in a continuous chunk of memory using malloc (a char **) and search on-demand in it and transform to objects. This concluded in a .5 sec loading time (from file to memory) and resonable (should be read "fast") operations during execution. Thank you all again and if you have any questions I'm here for you. Thanks