I'm performing a search of a large plist file which contains dictionaries, tens of thousands of them, each with 2 key/string pairs. My search algorithms goes through the dictionaries, and when it finds a text match in either of the strings in the dictionary, the contents of the dictionary are inserted. Here is how it works:
NSDictionary *eachEntry;
NSArray *rawGlossaryArray = [[NSArray alloc] initWithContentsOfFile:thePath]; // this contains the contents of the plist
for (eachEntry in rawGlossaryArray)
{
GlossaryEntry *anEntry = [[GlossaryEntry alloc] initWithDictionary:eachEntry];
NSRange titleResultsRange = [anEntry.title rangeOfString:filterString options:NSCaseInsensitiveSearch];
NSRange defResultsRange = [anEntry.definition rangeOfString:filterString options:NSCaseInsensitiveSearch];
if (titleResultsRange.length > 0 || defResultsRange.length > 0) {
// store that item in the glossary dictionary with the name as the key
[glossaryDictionary setObject:anEntry forKey:anEntry.title];
}
[anEntry release];
}
Each time a search is performed, there is a delay of around 3-4 seconds in my iPhone app (on the device at least; everything runs pretty quickly in the simulator). Can anyone advise on how I might optimize this search?
Without looking at the data set I can't be sure, but if you profile it you are spending the vast percentage of your time in -rangeOfString:options:. If that is the case you will not be able to improve performance without fundamentally changing the data structure you are using to store your data.
You might want to construct some sort trie with strings and substrings pointing to the objects. It is much more complicated thing to setup, and insertions into it will be more expensive, but lookup would be very fast. Given that you are serializing out the structure anyway expensive inserts should not be much of an issue.
That just cries out for using a database, that you pre-populate and put into the application.
A few suggestions:
You're doing a lot of allocing and releasing in that loop. Could you create a single GlossaryEntry before the loop, then just reload it's contents inside the loop? This would avoid a bunch of alloc/releases.
Rather than loading the file each time, could you lazy load it once and keep it cached in memory (maybe in a singleton type object)? Generally this isn't a good idea on the iPhone, but you could have some code in your "didReceiveMemoryWarning" handler that would free the cache if it became an issue.
You should run your application is Instruments, and see what the bottleneck really is. Performance optimizations in the blind are really difficult, and we have tools to make them clear, and the tools are good too!
There's also the possibility that this isn't optimizable. I'm not sure if it's actually hanging the UI in your app or just taking a long time. If it's blocking the UI you need to get out of the main thread to do this work. Same with any significant work to keep an app responsive.
try the following, and see if you get any improvement:
1) use
- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask
and as mask, pass the value NSLiteralSearch. This may speedup search considerably as described in the Apple documentation (String Programming Guide for Cocoa):
NSLiteralSearch Performs a byte-for-byte comparison. Differing literal sequences (such as composed character sequences) that would otherwise be considered equivalent are considered not to match. Using this option can speed some operations dramatically.
2) From the documentation (String Programming Guide for Cocoa):
If you simply want to determine whether a string contains a given pattern, you can use a predicate:
BOOL match = [myPredicate evaluateWithObject:myString];
For more about predicates, see Predicate Programming Guide.
You're probably getting the best performance you're likely to get, given your current data structures. You need to change how you're accessing the data, in order to get better performance.
Suggestions, in no particular order:
Don't create your GlossaryEntry objects in a loop while you're filtering them. Rather than storing the data in a Property List, just archive your array of GlossaryEntry objects. See the NSCoding documentation.
Rather than searching through tens of thousands of strings at every keystroke, generate an index of common substrings (maybe 2 or 3 letters), and create an NSDictionary that maps from that common substring to the set of results to use as an index. You can create the index at build time, rather than at run-time. If you can slice up your data set into several smaller pieces, the linear search for matching strings will be considerably faster.
Store your data in an SQLite database, and use SQL to query it - probably overkill for just this problem, but allows for more sophisticated searches in the future, if you'll need them.
If creating a simple index doesn't work well enough, you'll need to create a search tree style data structure.
You should profile it in instruments to find where the bottleneck actually is. If I had to guess, I would say the bottleneck would be [[NSArray alloc] initWithContentsOfFile:thePath].
Having said that, you'd probably get the best performance by storing the data in an sqlite database (which you would search with SQL) instead of using a plist.
Related
I was looking at the TableSearch example code from Apple. It looks like that they have a NSArray for all the content, and a NSMutableArray for filtered content. And then if the filter is on, then they would show the NSMutableArray. If it is off, they would show the NSArray that has all the data.
1) I was wondering if this is a common implementation for filters since I haven't done much filtering before.
2) To add to that question, if I had a filter of four different categories, would I still use one NSMutableArray that shows the filtered content when the filter is on? Or do I create four different NSMutableArrays for each different type of filter, and then show that list depending on which filter is on.
Assuming that the common implementation is to have an NSArray for the list, I'm getting confused if creating the arrays of filtered list up front is expensive if I were to do four different NSMutableArrays, or if depending on the click from the user of what filter option they select, should I create the NSMutableArray on the fly, and then reload the [tableView reloadData];
Thanks.
I don't have that sample app in front of me, but you typically would filter using a predicate, so it would be helpful for you to review the docs on NSPredicate.
So when you want to change the filter, you do so by changing the predicate. You don't have to create all filtered results. You only create the one you need at any given moment.
With arrays, you can filter using code like that shown in this example. The key lines are
NSPredicate *predicate;
predicate = [NSPredicate predicateWithFormat:#"length == 9"];
NSArray *myArray2 = [myArray filteredArrayUsingPredicate:predicate];
Filtering is not always done with arrays. It can be done with NSFetchedResultsControllers if using Core Data. Predicates are used there also, in very much the same way. Predicates can be used for other things, too, including regular expression filtering. It's worth looking at, if you aren't familiar with it.
It really depends. If your underlying data is in Core Data, use NSFetchedResultsController and give it NSPredicates. If you have an array of data, it may be easiest to traverse it and create another array of data.
In general, the filter itself is not likely to be as expensive as the overall drawing process (which includes instantiating or recycling table cells). You can do what's easy and profile with Instruments.
Keeping four different arrays is normally not a good idea in terms of memory, which is a scarce resource.
No matter what though, reloadData is going to be involved. (Depending on OS version, perhaps — see the NSFetchedResultsController docs.)
First off, I've seen this, but it doesn't quite seem to suit my needs.
I've got a situation where I need a sparse array. Some situations where I could have, say 3000 potential entries with only 20 allocated, other situations where I could have most or all of the 3000 allocated. Using an NSMutableDictionary (with NSString representations of the integer index values) would appear to work well for the first case, but would seemingly be inefficient for the second, both in storage and lookup speed. Using an NSMutableArray with NSNull objects for the empty entries would work fairly well for the second case, but it seems a bit wasteful (and it could produce an annoying delay at the UI) to insert most of 3000 NSNull entries for the first case.
The referenced article mentions using an NSMapTable, since it supposedly allows integer keys, but apparently that class is not available on iPhone (and I'm not sure I like having an object that doesn't retain, either).
So, is there another option?
Added 9/22
I've been looking at a custom class that embeds an NSMutableSet, with set entries consisting of a custom class with integer (ie, element#) and element pointer, and written to mimic an NSMutableArray in terms of adds/updates/finds (but not inserts/removals). This seems to be the most reasonable approach.
A NSMutableDictionary probably will not be slow, dictionaries generally use hashing and are rather fast, bench mark.
Another option is a C array of pointers. Allocation a large array only allocates virtual memory until the real memory is accessed (cure calloc, not malloc, memset). The downside is that memory is allocated in 4KB pages which can be wasteful for small numbers of entries, for large numbers of entries many may fall in the same page.
What about CFDictionary (or actually CFMutableDictionary)? In the documentation, it says that you can use any C data type as a key, so perhaps that would be closer to what you need?
I've got the custom class going and it works pretty well so far. It's 322 lines of code in the h+m files, including the inner class stuff, a lot of blank lines, comments, description formatter (currently giving me more trouble than anything else) and some LRU management code unrelated to the basic concept. Performance-wise it seems to be working faster than another scheme I had that only allowed "sparseness" on the tail end, presumably because I was able to eliminate a lot of special-case logic.
One nice thing about the approach was that I could make much of the API identical to NSMutableArray, so I only needed to change maybe 25% of the lines that somehow reference the class.
I also needed a sparse array and have put mine on git hub.
If you need a sparse array feel free to grab https://github.com/LavaSlider/DSSparseArray
I am wondering what the best approach would be to check whether or not a common first name is contained within an NSString on an iPhone app. I've got a sorted flat text file of ~5500 common American first names delimited by new lines. The NSString I am searching within for a name is not very long, most likely the size of a normal sentence.
My original plan was to load the sorted list into memory and then iterate over every word in the NSString performing a binary search of the list to determine whether or not that word was a common name.
Am I better off trying to put this name list into CoreData or a SQLite table and performing a query with that? My understanding is I would not have to load the entire list into memory if I went that route.
I am guessing this situation is a common problem with word dictionaries for word games, so I'm just wondering what the best practice is for fast lookups. Thanks!
SQLite sounds ideal for this in terms of both speed of lookup and minimising memory usage. It would also make it potentially possible to update the first name list over the internet if so desired.
Using Core Data (which is in effect an elabourate wrapper around SQLite) would be overkill in this instance, especially as you don't require the ORM like capabilities.
An NSSet might be useful as well. Dave DeLong's answer for another question demonstrates that NSSets have constant look-up times, i.e. O(1).
Load your names into an NSMutableSet one by one. This will be the slowest part but will only need to be done once. If your file is a simple line-delimited file of names, it may be easier to use the standard C library for reading the file, since line-by-line input is not well-supported by Cocoa.
After that, simply use [nameSet containsObject:name] to check whether it is in the list.
A couple of drawbacks to this approach:
The name you want to test must be in the same case as the name in the set, that is “paul” and “Paul” are different strings. You can circumvent this by converting all names to lowercase before inserting them into the set, and then also converting the name you want to check into lowercase before checking it against the set.
It might be easier just to go with the already-accepted answer.
From the docs:
To summarize, though, if you execute a
fetch directly, you should typically
not add Objective-C-based predicates
or sort descriptors to the fetch
request. Instead you should apply
these to the results of the fetch. If
you use an array controller, you may
need to subclass NSArrayController so
you can have it not pass the sort
descriptors to the persistent store
and instead do the sorting after your
data has been fetched.
I don't get it. What's wrong with using them on fetch requests? Isn't it stupid to get back a whole big bunch of managed objects just to pick out a 1% of them in memory, leaving 99% garbage floating around? Isn't it much better to only fetch from the persistent store what you really need, in the order you need it? Probably I did get that wrong...
The documentation refers to Objective-C-based predicates or sort descriptors. This is NOT the same thing as a standard predicate or sort descriptor you see in the example available in the same page of the documentation you are quoting.
For instance, using
+ (NSPredicate *)predicateWithBlock:(BOOL (^)(id evaluatedObject, NSDictionary *bindings))block;
to build a predicate allows you using Objective-C to implement the block used to select the objects. Since the complexity of the block may be arbitrarily high, in this case Apple recommends to first fetch all of the objects, then to apply these filters.
I would appreciate some help with something I working on and have not done before now and having some proplems because I don't think I understand exactly how to do this. What I'm wanting to do i'm sure is simple to most all of you and will be to me as soon as I do it the first time correctly....anyway.... I have a tableview that I'm needing to populate with two things, a username and a number with a count of items (the username could be a primary key). Currently I have a tableview populating and editable with an array....no problem....I know how to do that.
The two parts I need help with understanding is to:
read a plist with these two values into a dictionary, and read them into two different arrays that I can use with my tables.
Save the arrays back to the dictionary and then back to a plist.
I think I'm getting the most confused with how to store these two things in dictonary keys and values. I've looked that over but just not "getting it".
I would appreciate some short code examples of how to do this or a better way to accomplish the same thing.
As always, thanks for your awesome help....
You can use NSArray method writeToFile: atomically: to dump your data into a file, you can then use initWithContentOfFile to retrieve the information from t hat file just as you dumped it previosly. I believe if you have dictionaries in your array you should be able to get them back this way. You can always use core data as well for storage if you find your structures to store are getting complex and dumping the in a file and getting them back to recreate some o bjects is becoming messy.
The approach that would perhaps be the simplest is to store the data as an array of dictionaries. This has the issue that recreating the array from a plist with mutable leaves is convoluted at best.
But if you can tolerate the performance hit of replacing dictionaries when updating the list instead of modifying them, it might definitely be the simplest course of action.
This also has the added benefit that your datasource only needs to deal with one array, and that the whole shebang would be Key-Value Compliant, which might further simplify your code.