iPhone objective-c: detecting a 'real' word - iphone

I need a (quick and dirty) solution to basically detect if a certain NSString is a 'real' word, that is, if it's in the dictionary. So basically, a very simplistic spell checker. Does anyone know of any way to do this? Basically I either need a file containing all words in the English dictionary (which I've searched for, but to no avail), or a way to interface with the iPhones spell checking service. Of course I would like to interface with the iPhones spell check service in a similar way to NSSpellChecker on OSX so my app will work with other languages, but at this point I'll take what I can get.
Lastly, here's some pseudo-code to better illustrate my needs:
-(BOOL)isDictionaryWord:(NSString*)word; //returns TRUE when word=#"greetings". returns FALSE when word=#"slkfjsdkl";

Use UITextChecker instead. The code below might not be perfect but should give you a good idea.
-(BOOL)isDictionaryWord:(NSString*)word {
UITextChecker *checker = [[UITextChecker alloc] init];
NSLocale *currentLocale = [NSLocale currentLocale];
NSString *currentLanguage = [currentLocale objectForKey:NSLocaleLanguageCode];
NSRange searchRange = NSMakeRange(0, [word length]);
NSRange misspelledRange = [checker rangeOfMisspelledWordInString:word range:searchRange startingAt:0 wrap:NO language:currentLanguage];
return misspelledRange.location == NSNotFound;
}

You can make UITextChecker work accurately without needing to add a new dictionary.
I use a two-step process because I need the first step to be fast (but not accurate). You may only need step two which is the accurate check. Note this makes use of the UITextChecker's completionsForPartialWordRange function which is why it's more accurate than the MisspelledWord function.
//Step one: I quickly check to see if a combination of letters passes the spell check. This is not that accurate but it's very fast so I can quickly exclude lots of letter combinations (brute force approach).
UITextChecker *checker;
NSString *wordToCheck = #"whatever"; // The combination of letters you wish to check
// Set the range to the length of the word
NSRange range = NSMakeRange(0, wordToCheck.length - 1);
NSRange misspelledRange = [checker rangeOfMisspelledWordInString:wordToCheck range: range startingAt:0 wrap:NO language: #"en_US"];
BOOL isRealWord = misspelledRange.location == NSNotFound;
// Call step two, to confirm that this is a real word
if (isRealWord) {
isRealWord = [self isRealWordOK:wordToCheck];
}
return isRealWord; // if true then we found a real word, if not move to next combination of letters
// Step Two: Extra check to make sure the word is really a real word. returns true if we have a real word.
-(BOOL)isRealWordOK:(NSString *)wordToCheck {
// we dont want to use any words that the lexicon has learned.
if ([UITextChecker hasLearnedWord:wordToCheck]) {
return NO;
}
// now we are going to use the word completion function to see if this word really exists, by removing the final letter and then asking auto complete to complete the word, then look through all the results and if its not found then its not a real word. Note the auto complete is very acurate unlike the spell checker.
NSRange range = NSMakeRange(0, wordToCheck.length - 1);
NSArray *guesses = [checker completionsForPartialWordRange:range inString:wordToCheck language:#"en_US"];
// confirm that the word is found in the auto-complete list
for (NSString *guess in guesses) {
if ([guess isEqualToString:wordToCheck]) {
// we found the word in the auto complete list so it's real :-)
return YES;
}
}
// if we get to here then it's not a real word :-(
NSLog(#"Word not found in second dictionary check:%#",wordToCheck);
return NO;
}

Related

IOS isEqualToString Not Working

The following example program outputs the same, but the program does not work correctly.
NSDirectoryEnumerator *directoryEnumerator = [[NSFileManager defaultManager]
enumeratorAtPath:kDocdir];
for (NSString *pathi in directoryEnumerator)
{
NSString *fileName_Manager = [pathi lastPathComponent];
NSLog(#"fileName_Manager = %#",fileName_Manager);
Artist *name_Databse = [self.fetchedResultsController
objectAtIndexPath:IndexPath];
NSLog(#"name_Databse = %#",name_Databse.name);
if ([fileName_Manager isEqualToString:name_Databse.name]) {
NSLog(#"Same Name");
}else{
NSLog(#"Different Name");
}
}
Outputs:
2013-04-25 15:37:43.256 Player[36436:907] fileName_Manager = alizée - mèxico - final j'en
2013-04-25 15:37:43.272 Player[36436:907] name_Databse = alizée - mèxico - final j'en
2013-04-25 15:37:44.107 Player[36436:907] Different Name
does not work correctly when special characters in names. Why is this happening?
Thanks ...
have the same problem here:
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"name == %#",[pathi lastPathComponent]];
How do I make an edit here?
The documentation for isEqualToString: suggests you might have a problem:
The comparison uses the canonical representation of strings, which for
a particular string is the length of the string plus the Unicode
characters that make up the string. When this method compares two
strings, if the individual Unicodes are the same, then the strings are
equal, regardless of the backing store. “Literal” when applied to
string comparison means that various Unicode decomposition rules are
not applied and Unicode characters are individually compared. So,
for instance, “Ö” represented as the composed character sequence “O”
and umlaut would not compare equal to “Ö” represented as one Unicode
character.
Try using (NSOrderedSame == [string1 localizedCompare:string2])
Also, if you haven't already, look into the Apple sample code 'International Mountains' which deals with numerous localization issues.
Have you tried converting both strings to UTF-8 and then do the comparison? I don't know if that works, it's just an idea.

How to cut out parts of NSString?

#"/News/some news text/"
#"/News/some other news text/"
#"/About/Some about text/"
#"/Abcdefg/Some abcdefg text/some more abcdefg text"
How do I cut out the first part of the strings, so that I end up with the following strings?
#"/News/"
#"/News/"
#"/About/"
#"/Abcdefg/"
Use componentsSeparatedByString: to break the string up:
NSArray *components=[string componentsSeparatedByString:#"/"];
if ([components count]>=2) {
// Text after the first slash is second item in the array
return [NSString stringWithFormat:#"/%#/",[components objectAtIndex:1]];
} else {
return nil; // Up to you what happens in this situation
}
If these are pathnames, you may want to look into the path-related methods of NSString, such as pathComponents and pathByDeletingLastPathComponent.
While it's pretty unlikely that the path separator is ever going to change, it's nonetheless a good habit to not rely on such things and use dedicated path-manipulation methods in preference to assuming that the path separator will be a certain character.
EDIT from the year 2013: Or use URLs (more specifically, NS/CFURL objects), which Apple has made pretty clear are the proper way to refer to files from now on, and are necessary for some tasks in a sandbox.

NSString stringWithFormat swizzled to allow missing format numbered args

Based on this SO question asked a few hours ago, I have decided to implement a swizzled method that will allow me to take a formatted NSString as the format arg into stringWithFormat, and have it not break when omitting one of the numbered arg references (%1$#, %2$#)
I have it working, but this is the first copy, and seeing as this method is going to be potentially called hundreds of thousands of times per app run, I need to bounce this off of some experts to see if this method has any red flags, major performance hits, or optimizations
#define NUMARGS(...) (sizeof((int[]){__VA_ARGS__})/sizeof(int))
#implementation NSString (UAFormatOmissions)
+ (id)uaStringWithFormat:(NSString *)format, ... {
if (format != nil) {
va_list args;
va_start(args, format);
// $# is an ordered variable (%1$#, %2$#...)
if ([format rangeOfString:#"$#"].location == NSNotFound) {
//call apples method
NSString *s = [[[NSString alloc] initWithFormat:format arguments:args] autorelease];
va_end(args);
return s;
}
NSMutableArray *newArgs = [NSMutableArray arrayWithCapacity:NUMARGS(args)];
id arg = nil;
int i = 1;
while (arg = va_arg(args, id)) {
NSString *f = [NSString stringWithFormat:#"%%%d\$\#", i];
i++;
if ([format rangeOfString:f].location == NSNotFound) continue;
else [newArgs addObject:arg];
}
va_end(args);
char *newArgList = (char *)malloc(sizeof(id) * [newArgs count]);
[newArgs getObjects:(id *)newArgList];
NSString* result = [[[NSString alloc] initWithFormat:format arguments:newArgList] autorelease];
free(newArgList);
return result;
}
return nil;
}
The basic algorithm is:
search the format string for the %1$#, %2$# variables by searching for %#
if not found, call the normal stringWithFormat and return
else, loop over the args
if the format has a position variable (%i$#) for position i, add the arg to the new arg array
else, don't add the arg
take the new arg array, convert it back into a va_list, and call initWithFormat:arguments: to get the correct string.
The idea is that I would run all [NSString stringWithFormat:] calls through this method instead.
This might seem unnecessary to many, but click on to the referenced SO question (first line) to see examples of why I need to do this.
Ideas? Thoughts? Better implementations? Better Solutions?
Whoa there!
Instead of screwing with a core method that you very probably will introduce subtle bugs into, instead just turn on "Static Analyzer" in your project options, and it will run every build - if you get the arguments wrong it will issue a compiler warning for you.
I appreciate your desire to make the application more robust but I think it very likely that re-writing this method will more likely break your application than save it.
How about defining your own interim method instead of using format specifiers and stringWithFormat:? For example, you could define your own method replaceIndexPoints: to look for ($1) instead of %1$#. You would then format your string and insert translated replacements independently. This method could also take an array of strings, with NSNull or empty strings at the indexes that don't exist in the “untranslated” string.
Your method could look like this (if it were a category method for NSMutableString):
- (void) replaceIndexPointsWithStrings:(NSArray *) replacements
{
// 1. look for largest index in "self".
// 2. loop from the beginning to the largest index, replacing each
// index with corresponding string from replacements array.
}
Here's a few issues that I see with your current implementation (at a glance):
The __VA_ARGS__ thingy explained in the comments.
When you use while (arg = va_arg(args, id)), you are assuming that the arguments are nil terminated (such as for arrayWithObjects:), but with stringWithFormat: this is not a requirement.
I don't think you're required to escape the $ and # in your string format in your arg-loop.
I'm not sure this would work well if uaStringWithFormat: was passed something larger than a pointer (i.e. long long if pointers are 32-bit). This may only be an issue if your translations also require inserting unlocalised numbers of long long magnitude.

Non US characters in section headers for a UITableView

I have added a section list for a simple Core Data iPhone app.
I followed this so question to create it - How to use the first character as a section name but my list also contain items starting with characters outside A-Z, specially Å,Ä and Ö used here in Sweden.
The problem now is that when the table view shows the section list the three last characters are drawn wrong. See image below
alt text http://img.skitch.com/20100130-jkt6e55pgyjwptgix1q8mwt7md.jpg
It seems like my best option right now is to let those items be sorted under 'Z'
if ([letter isEqual:#"Å"] ||
[letter isEqual:#"Ä"] ||
[letter isEqual:#"Ö"])
letter = #"Z";
Someone that have figured this one out?
And while I'm at it... 'Å', 'Ä' and 'Ö' should be sorted in that order but are sorted as 'Ä', 'Å' and 'Ö' by Core Data NSSortDescriptor. I have tried to set set the selector to localizedCaseInsensitiveCompare: but that gives a out of order section name 'Ä. Objects must be sorted by section name' error. Seen that too?
So I could not let go of this one and found the following:
What you need to do in this case is called 'Unicode Normalization Form D'. It is more explained in http://unicode.org/reports/tr15/ (warning, long and dry document)
Here is a function that does the decomposition and then filters out all diacritics. You can use this to convert Äpple to Apple and then use the first letter to build an index.
- (NSString*) decomposeAndFilterString: (NSString*) string
{
NSMutableString *decomposedString = [[string decomposedStringWithCanonicalMapping] mutableCopy];
NSCharacterSet *nonBaseSet = [NSCharacterSet nonBaseCharacterSet];
NSRange range = NSMakeRange([decomposedString length], 0);
while (range.location > 0) {
range = [decomposedString rangeOfCharacterFromSet:nonBaseSet
options:NSBackwardsSearch range:NSMakeRange(0, range.location)];
if (range.length == 0) {
break;
}
[decomposedString deleteCharactersInRange:range];
}
return [decomposedString autorelease];
}
(I found this code on a mailing list, forgot the source, but I took it and fixed it up a little)
I am experiencing the same issue reported in the original question, i.e. extended characters (outside Unicode page 0) not properly displayed in the index bar.
Although I seem to feed NSFetchedResultsController with correct single unicode character strings, what I get in return accessing the 'sectionIndexTitles' property are those same characters with the high byte set to 0; for example character \u30a2 (KATAKANA LETTER A) becomes \u00a2 (CENT SIGN).
I am not sure whether this is a bug in the method sectionIndexTitleForSectionName: of NSFetchedResultsController or my own fault somewhere else, however my workaround consists in overriding it and performing what the documentation says it does internally:
- (NSString *)sectionIndexTitleForSectionName:(NSString *)sectionName
{
NSString *outName;
if ( [sectionName length] )
outName = [[sectionName substringToIndex:1] uppercaseString];
else
outName = [super sectionIndexTitleForSectionName:sectionName];
return outName;
}
This produces the expected output.
Just adding this in my controller solved the original problem for me
- (NSString *)controller:(NSFetchedResultsController *)controller sectionIndexTitleForSectionName:(NSString *)sectionName {
return sectionName;}
Å, Ä and Ö displays properly in the list
When I encounter situation like this, the first thing I ask myself is 'what does Apple do'.
As an experiment I just added 'Joe Äpple' to my iPhone address book and he shows up under the plain A. I think that makes a lot of sense.
So instead of throwing them under Z or Ä you should do the same. There must be some way to get the 'base' letter of a unicode character for the grouping.
You can use UTF-8 encoding
const char *cLetter = (const char *)[ tmp cStringUsingEncoding:NSUTF8StringEncoding];
NSString *original = [NSString stringWithCString:cLetter encoding:NSUTF8StringEncoding];
Did you follow my answer to that other question?
I am doing some tests (refetching my objects) and seeing odd intermittent behavior. It definitely sorts them correctly (with 'Å' right after 'A', but before 'Ä') sometimes. Other times it puts the characters outside of A-Z all at the end (after 'Z') even though I didn't do anything special.
Also, I noticed that 'Å' is drawn correctly if you return its lowercase form. Have you tried overriding sectionIndexTitleForSectionName: to keep the order, but change the drawn character?
Did you ever solve this issue?
I've been able to get the section title index to display correctly by implementing sectionIndexTitlesForTableView: to build my own array of section titles:
- (NSArray *)sectionIndexTitlesForTableView:(UITableView *)tableView {
NSMutableArray *indexKeys = [NSMutableArray arrayWithCapacity:30];
NSArray *fetchedResults = [fetchedResultsController fetchedObjects];
NSString *currKey = #"DEFAULT";
for (NSManagedObject *managedObject in fetchedResults) {
NSString *indexKey = [managedObject valueForKey:#"indexKey"];
if (![indexKey isEqualToString:currKey]) {
[indexKeys addObject:indexKey];
currKey = indexKey;
}
}
return indexKeys;
}
Here, indexKey is the first letter of the name.
However, this creates one of two issues in sectionForSectionIndexTitle: instead:
If I simply return the index for the section this is now the unsorted index and no longer corresponds with the sort order in the fetchResultController:
- (NSInteger)tableView:(UITableView *)tableView sectionForSectionIndexTitle:(NSString *)title atIndex:(NSInteger)index {
return index;
}
Alternatively, If I pass on the call to the fetchedResultsController it breaks on the non-US index titles because these are no longer the weird characters used by the fetchedResultsController:
- (NSInteger)tableView:(UITableView *)tableView sectionForSectionIndexTitle:(NSString *)title atIndex:(NSInteger)index {
return [fetchedResultsController sectionForSectionIndexTitle:title atIndex:index];
}
The latter code generates an error of the following kind when navigating to the "Ø" index title:
Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Index title at 24 is not equal to 'Ø''
A workaround for this is to translate the offending characters back to their weird selves:
- (NSInteger)tableView:(UITableView *)tableView sectionForSectionIndexTitle:(NSString *)title atIndex:(NSInteger)index {
if ([title isEqualToString:#"Æ"]) {
title = #"\u2206";
} else if ([title isEqualToString:#"Ø"]) {
title = #"\u0178";
} else if ([title isEqualToString:#"Å"]) {
title = #"\u2248";
}
return [fetchedResultsController sectionForSectionIndexTitle:title atIndex:index];
}
You can find the Unicode values in the debugger with the action "Print Description to Console".
However, the good solution would be to figure out why this weird encoding happens and prevent it.

Finding a string in a string

Does anyone know a nice efficient way of finding a string within a string (if it exists) in objective c for iPhone Development, I need to find the part of the string in between two words, e.g. here I need to find the co2 rating number in the string, where z is the value I'm looking for ...
xxxxxco_2zendxxxxxxx
Ideally, I'd use a regular expression for this, probably something like co_2(.*?)end, so I'd take a look at RegexKitLite as stimms suggests.
If that is not suitable, you could extract the string you're looking for with something like this:
NSString* src = #"xxxxxco_2zendxxxxxxx";
NSRange startMarker = [src rangeOfString:#"co_2"];
if (startMarker.location != NSNotFound) {
NSScanner* scanner = [NSScanner scannerWithString:src];
[scanner setScanLocation:startMarker.location + startMarker.length];
NSString* co2Value = #"";
[scanner scanUpToString:#"end" intoString:&co2Value];
NSLog(#"co_2 value is %#", co2Value);
} else {
NSLog(#"co_2 marker not found");
}
Here we look for #"co_2", failing if it's not found, then use an NSScanner to grab everything from just after that string to the next occurrence of #"end". Note that if #"end" is missing this code will silently grab the rest of the string.
This might be of interest to you (in particular the rangeOfString function):
(NSRange)rangeOfString:(NSString *)aString
Unfortunately Cocoa doesn't have any built-in RegEx support..
String matching is a well explored domain especially for algorithms dealing with genetic material. You could check out the Art of Computer programming for 10x more than you ever wanted to know about string matching.
Most of that is overkill and you would be fine using a regular expression. Check out http://regexkit.sourceforge.net/RegexKitLite/ a regex library which runs on the iphone.