What's the fastest way to save data and read it next time in a IPhone App? - iphone

In my dictionary IPhone app I need to save an array of strings which actually contains about 125.000 distinct words; this transforms in aprox. 3.2Mb of data.
The first time I run the app I get this data from an SQLite db. As it takes ages for this query to run, I need to save the data somehow, to read it faster each time the app launches.
Until now I've tried serializing the array and write it to a file, and afterword I've tested if writing directly to NSUserDefaults to see if there's any speed gain but there's none. In both ways it takes about 7 seconds on the device to load the data. It seems that not reading from the file (or NSUserDefaults) actually takes all that time, but the deserialization does:
objectsForCharacters = [[NSKeyedUnarchiver unarchiveObjectWithData:data] retain];
Do you have any ideeas about how I could write this data structure somehow that I could read/put in memory it faster?

The UITableView is not really designed to handle 10s of thousands of records. If would take a long time for a user to find what they want.
It would be better to load a portion of the table, perhaps a few hundred rows, as the user enters data so that it appears they have all the records available to them (Perhaps providing a label which shows the number of records that they have got left in there filtered view.)
The SQLite db should be perfect for this job. Add an index to the words table and then select a limited number of rows from it to show the user some progress. Adding an index makes a big difference to the performance of the even this simple table.
For example, I created two tables in a sqlite db and populated them with around 80,000 words
#Create and populate the indexed table
create table words(word);
.import dictionary.txt words
create unique index on words_index on word DESC;
#Create and populate the unindexed table
create table unindexed_words(word);
.import dictionary.txt unindexed_words
Then I ran the following query and got the CPU Time taken for each query
.timer ON
select * from words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.031250 sys 0.015625;
select * from unindex_words where word like 'sn%' limit 5000;
...
>CPU Time: user 0.062500 sys 0.0312
The results vary but the indexed version was consistently faster that the unindexed one.
With fast access to parts of the dictionary through an indexed table, you can bind the UITableView to the database using NSFecthedResultsController. This class takes care of fecthing records as required, caches results to improve performance and allows predicates to be easily specified.
An example of how to use the NSFetchedResultsController is included in the iPhone Developers Cookbook. See main.m

Just keep the strings in a file on the disk, and do the binary search directly in the file.
So: you say the file is 3.2mb. Suppose the format of the file is like this:
key DELIMITER value PAIRDELIMITER
where key is a string, and value is the value you want to associate. The DELIMITER and PAIRDELIMITER must be chosen as such that they don't occur in the value and key.
Furthermore, the file must be sorted on the key
With this file you can just do the binary search in the file itself.
Suppose one types a letter, you go to the half of the file, and search(forwards or backwards) to the first PAIRDELIMITER. Then check the key and see if you have to search upwards or downwards. And repeat untill you find the key you need,
I'm betting this will be fast enough.

Store your dictionary in Core Data and use NSFetchedResultsController to manage the display of these dictionary entries in your table view. Loading all 125,000 words into memory at once is a terrible idea, both performance- and memory-wise. Using the -setFetchBatchSize: method on your fetch request for loading the words for your table, you can limit NSFetchedResultsController to only handling the small subset of words that are visible at any given moment, plus a little buffer. As the user scrolls up and down the list of words, new batches of words are fetched in transparently.
A case like yours is exactly why this class (and Core Data) was added to iPhone OS 3.0.

Do you need to store/load all data at once?
Maybe you can just load the chunk of strings you need to display and load all other strings in the background.

Perhaps you can load data into memory in one thread and search from it in another? You may not get search results instantly, but having some searches feel snappier may be better than none at all, by waiting until all data are loaded.

Are some words searched more frequently or repeatedly than others? Perhaps you can cache frequently searched terms in a separate database or other store. Load it in a separate thread as a searchable store, while you are loading the main store.
As for a data structure solution, you might look into a suffix trie to search for substrings in linear time. This will probably increase your storage requirements, though, which may affect your ability to implement this with an iPhone's limited memory and disk storage capabilities.

I really don't think you're on the right path trying to load everything at once.
You've already determined that your bottleneck is the deserialization.
Regardless what the UI does, the user only sees a handful (literally) of search results at a time.
SQLlite already has a robust indexing mechanism, there is likely no need to re-invent that wheel with your own indexing, etc.
IMHO, you need to rethink how you are using UITableView. It only needs a few screenfuls of data at a time, and you should reuse cell objects as they scroll out of view rather than creating a ton of them to begin with.
So, use SQLlite's indexing and grab "TOP x" rows, where x is the right balance between giving the user some immediately-available rows to scroll through without spending too much time loading them. Set the table's scroll bar scaling using a separate SELECT COUNT(*) query, which only needs to be updated when the user types something different.
You can always go back and cache aggressively after you deserialize enough to get something up on-screen. A slight lag after the first flick or typing a letter is more acceptable than a 7-second delay just starting the app.

I have currently a somewhat similar coding problem with a large amount of searchable strings.
My solution is to store the prepared data in one large memory array, containing both the texttual data and offsets as links. Meaning I do not allocate objects for each item. This makes the data use less memory and also allows me to load & save it to a file without further processing.
Not sure if this is an option for you, since this is quite an obvious solution once you've realized that the object tree is causing the slowdown.

I use a large NSData memory block, then search through it. Well, there's more to it, it took me about two days to get it well optimized.
In your case I suspect you have a dictionary with a lot of words that have similar beginnings. You could prepare them on another computer in a format the both compacts the data and also facilitates fast lookup. As a first step, the words should be sorted. With that, you can already perform a binary search on them for a fast lookup. If you store it all in one large memory area, you can do the search quite fast, compared to how sqlite would search, I think.
Another way would be to see the words as a kind of tree: You have many thousands that begin with the same letter. So you divide your data accordingly: You have a sql table for each beginning letter of your set of words. that way, if you look up a word, you'd select one of the now-smaller tables depening on the first letter. This makes the amount that has to be searched already much smaller. and you can do this for the 2nd and 3rd letter as well, and you already could have quite a fast access.
Did this give you some ideas?

Well actually I figured it out myself in the end, but of course I thank you all for your quick and pertinent answers. To be concise I will just say that, the fact that Objective-C, just like any other object-based programming language, due to introspection and other objective requirements is significantly slower than procedural programming languages.
The solution was in fact to load all my data in a continuous chunk of memory using malloc (a char **) and search on-demand in it and transform to objects. This concluded in a .5 sec loading time (from file to memory) and resonable (should be read "fast") operations during execution. Thank you all again and if you have any questions I'm here for you. Thanks

Related

Redis GET vs. SQL SELECT

I am pretty new to NoSQL, but I always liked the idea of it. I took a look at Redis, and got a few questions about the best ways of storing and recieving multiple hashes.
Assuming the following scenario:
Store a list of objects (redis 'Hashes') and select them by their timestamp.
To archive this in SQL, it would require one table and two simple queries (INSERT & SELECT).
Trying to do this in Redis, I ended up creating the following structure:
Key object:$id (hash) containing the object
Key index:timestamp:$id (sorted set)
score equals timestamp and value includes id
While I can live with the additional maintenance work of two keys instead of one table (SQL), I am curious about the process of selecting multiple objects:
ZRANGEBYSCORE index:timestamp:$id timestampStart timestampEnd
This returns an array of all IDs which got created between timestampStart and timestampEnd. To get the object itself I am requesting every single one by:
GET object:$id
Is this the right way of doing it?
In comparison with an SQL Database: Is it still appreciably faster or might it even become slower caused by the high number of GETs?
A ZRANGEBYSCORE costs O(log(N) + M) where N=|items in your set| and M=|items you're selecting|. So, doing the ZRANGEBYSCORE and then M GET operations is just O(long(N)+M+M) = O(log(N)+M) and would at most be twice as slow. The network back and forth could have been a major slow down, but since each of your gets is an independent operation, you can just pipeline them. You can also put the whole thing in a Lua script and just have one back and forth, which would be the most optimal. I'd say with 99% certainty this would be faster than doing the same thing in SQL.
Also, if this is a very frequent operation for you, you can get even more speed up by just storing the entire object in your sorted set instead of just the id. You'd have key = object encoded as json, score = timestamp. This would save you O(M) on your operation in terms of not needing to do any GETs.
Whether or not this is a good way of doing things really depends on your use case. How much speed do you really need, and how important are other features of a traditional database to you? Remember, Redis is much more just datastructures accessible by clients than a traditional database, and it must store everything in RAM. To know whether it's the right thing for you, we'd need more information.

UITableView alphabetical index with large JSON data

I have a table that loads in data from a webservice that returns a bunch of JSON data. I load more data when the user scrolls down as the DB I am querying holds quite a bit of data.
The question I have is - will it be feasible to implement the right side alphabetical listing on such a table and how could this be done? It is definitely possible if I load in ALL the data and then sort them locally, populate the index and cache the data for every other time. But what if this is going to be 10K rows of data or more. Maybe load this data on application first launch is one option.
So in terms of performance and usability, does anyone have any recommendations of what is possible to do?
I don't think that you should download all data to make those indexes, it would decrease refreshing time and might cause memory problems.
But if you think that indexes could make a good difference than you can add some features to your server API. I would add either a different API call like get_indexes. Or even I would add POST parameter get_indexes which adds an array of indexes to any call which has this parameter set.
And you should be ready to handle cases when user taps on indexes without any downloaded data or when user just stresses out your app making fast index scrolling up and down.
First see how big the data download is. If the server can gzip the data, it may be surprisingly small - JSON zips very well because of the duplicated keys.
If it's too big, I would recommend modifying the server if possible to let you specify a starting letter. That way, if the user hits the "W" in the index you should be able to request all items that begin with "W".
It would also be helpful to get a total record count from the server so you can know how many rows are in the table ahead of time. I would also return a "loading..." string for each unknown row until the actual data comes down.

How to create a quick seach-as-you-type mechanism for a very large Core Data database?

I am trying to implement a quick search as-you-type mechanism.
In my current implementation, when the user launches the app for the first time, he has to wait a little bit for a downloading process to complete. During that time, information about the 20,000 products that the app sells is being downloaded. Each product is represented by an instance of NSManagedObject and is added to a Core Data database.
The real problem is the way to use those products. After the user launches the app once again (not the first time), the products need to be loaded to memory so the search would be quick.
In order to do that, I loop over the entire database and create an instance of NSDictionary for each product that contains its information, because it is much easier to use dictionary objects in my program to retrieve information about the product.
Because the dictionaries are stored in the memory and therefore the search process is very quick, but iterating over the 20,000 objects (onces per launch) and creating dictionaries takes a lot of time (about a minute), so that solution is not good.
I thought about another way to reach the quick-search goal: Fetching objects from the database after each letter has been typed. But I do not know how fast it would be.
What is the recommended way to do that?
Thanks,
Sagiftw
I have a similar feature in my app but have considerable less records. I have indices on all search fields and create as simple (inexpensive) sql querys (NSPredicate)as possible from the input (2nd fetchedResultsController only for searching). The result set contains the 'search items'. This is at least fast enough for around 1000 entries (test data size) with a random distribution of text type search keys. Its possible a good idea to fetch in the background to prevent the gui from being unresponsive.

Memory Efficient and Speedy iPhone/Android Dictionary Storage/Access

Im having trouble with memory on older generation iPhones (ipod touch 1st gen, 2nd gen e.t.c). This is due to the amount of memory allocated when I load and store a 170k word dictionary.
This is the code (very simple):
string[] words = dictionaryRef.text.Split("\n"[0]);
_words = new List<string>(words);
This allocates on start around 12mb of storage, iphone has around 43mb I think. So that + textures + sounds + the OS it tends to break.
Speed wise, accessing using a binary search is fine. But its storing it in memory more efficiently (and loading it more efficiently).
The text.Split appears to take up alot of heap memory.
Any advice?
You can't count too much on how much memory these pre-3.0 devices have available on startup. 43 MB is rather optimistic. Is your app just checking to see if the word is in the list or not? You might want to roll your own hash table instead of using a binary search. I'd search some of the literature and stack overflow to look for efficient ways to store a large dictionary with the particular word sizes you have. A google search on hash table might give you a better implementation.
Use SQLite. It will use less memory and be faster. Create an index on your words column and voila, you have binary search, without having the whole dictionary loaded in memory.
First if dictionaryRef.text is a string (and it looks so) then you already got something huge being allocated (2 bytes per characters). Check this it might well account for a large (near half) amount of the total memory being allocated. You should think about caching this (the database idea is a good one, but a file could do to then use File.ReadAllLines in future execution).
Next you can try do a bit better than Mono's Split method. It creates a List and then turn it into an array (calling ToList) at the end - which you end up creating a new List from. Since your requirement (only '/n') is fairly basic I suggest you to roll your own Split method (or copy/paste/reduce the one from Mono) and avoid the temporary memory allocations.
In any case take a lot of (memory) measurements since allocations, even more for strings, often occurs where we don't look ;-)
I would have to agree with Morningstar that using a SQLite backend for your word storage sounds like the best solution to what you are trying to do.
However, if you insist on using a word list, here's a suggestion:
It looks to me like dictionaryRef.text is constructed by reading a text file in its entirety (File.ReadAllText() or some such).
Instead of doing that, why not use TextReader.ReadLine() to read 1 word at a time from the file into a List, thus avoiding the need to use String.Split() and using tons of temporary storage space?
Ultimately that seems to be what you want anyway... and ReadLine() will "split" on \n for you.

iphone sdk sqlite lookup performance for +40k records

What is the best way to get this thing done:
I have a huge table with +40k records (tv show titles) in sqlite and I want to do real time lookups to this table. For eg if user searches for a show, as and when user enters search terms I read sqlite and filter records after every keystroke (like google search suggestion).
My performance benchmark is 100 milliseconds. A few things I have thought of are: creating indexes, splitting the data into multiple tables.
However, I would really appreciate any suggestions to achieve this in the fastest possible time so I can avoid any ui refresh delays - it would be awesome to have feedback from coders who have already done something similar.
Things to do:
Index fields appropriately.
Limit yourself to only 10-15 records on the initial query—that should be enough to populate the top of the table view.
If you don't need to sort, don't. If you do need to sort, sort on an indexed field.
Do as much as you can in SQLite rather than your own code.
Do as little as you can overall.
You'll likely find what I have: SQLite and the iPhone are actually amazingly capable as long as you don't do anything really dumb.
Keep "perceived performance" in mind - doing lookups right after a key is hit is could be somewhat expensive. How many milliseconds does it take a user to hit a key, though? You can probably get away with not updating the resultlist until the user hasn't typed anything for several hundred milliseconds. (For really fast users, perhaps update every X hundred millisecodns while he's still typing).
How do you know the performance will be bad? 40k rows is not that much, even for an iPhone... try it on the phone before you optimize.
Avoid doing any joins, try to use paging so that you keep the amount of data returned to a minimum. Perhaps you should try loading the whole thing into memory, then sort and do binary search? If it is just a list of show titles it would fit?