iPhone - Comparing strings with a German umlaut - iphone

I've few German strings (with umlauts like åä etc) in NSArray.
For example consider a word like "gënder" is there in array.
User enters "gen" in a text field.
I can to check the words in string that matches the characters "gen".
How can I compare the string by consider umlauts as english strings...?
So in above example, when user enters "gen", it has to return "gënder".
Is there any solution for this type of comparision?

Use the NSDiacriticInsensitiveSearch option of the various NSString compare methods. As described in the documentation:
Search ignores diacritic marks.
For example, ‘ö’ is equal to ‘o’.
For example:
NSString *text = #"gënder";
NSString *searchString = #"ender";
NSRange rng = [text rangeOfString:searchString
options:NSDiacriticInsensitiveSearch];
if (rng.location != NSNotFound)
{
NSLog(#"Match at %#", NSStringFromRange(rng));
}
else
{
NSLog(#"No match");
}

Related

ios - making accentuated characters to be well displayed in a file path

I have this code to get all files from a folder :
- (NSMutableArray*) allFilesAtPath:(NSString *)startPath
{
NSMutableArray* listing = [NSMutableArray array];
NSArray* fileNames = [self contentsOfDirectoryAtPath:startPath error:nil];
if (!fileNames) return listing;
for (NSString* file in fileNames) {
NSString* absPath = [startPath stringByAppendingPathComponent:file];
BOOL isDir = NO;
if ([self fileExistsAtPath:absPath isDirectory:&isDir]) {
[listing addObject:absPath];
if (isDir) [listing addObjectsFromArray:[self allFilesAtPath:absPath]];
}
}
return listing;
}
In one test folder, I have a file that is named yahoéo.jpg
When NSLogged, it is displayed as yahoe\U0301o.jpg
Of course, that works fine for any other file without such an accentuated character in the file name.
So, when I try to delete it from the array with :
[theFilesArray removeObject:fileName];
fileName is yahoéo.jpg
it is not remove because it is not found into the array.
Why do I have such a character replacement. I do not find anything in the doc talking about that. Which characters are supposed to have the same treatment ? How should I knew that ?
And most of all, how may I do to get the é character in the files name array ?
EDIT
fileName variable used in the removeObject method is constructed by getting a string from a PList file, and giving it to the following method :
+ (NSString*) fileNameWithString:(NSString*)str
{
NSString* fileName = str;
NSCharacterSet* charactersToRemove = [NSCharacterSet characterSetWithCharactersInString:#".:/\\"];
fileName = [[fileName componentsSeparatedByCharactersInSet:charactersToRemove] componentsJoinedByString:#"#"];
fileName = [fileName stringByAppendingString:#".jpg"];
return fileName;
}
The NSLog output of an NSArray shows all non-ASCII characters in \Unnnn escaped form. But that is only the way NSLog prints it, so that should not be the problem.
I assume that is a problem of "precomposed" vs "decomposed" characters. The HFS filesystem uses decomposed characters in the filenames, so é is stored as two Unicode characters:
U+0065 + U+0301 = "e" + COMBINING ACUTE ACCENT
(and NSLog prints that as e\U0301).
This is different from the single Unicode character (precomposed form)
U+00E9 = "é"
therefore, the string yahoéo.jpg will not be found in the array if its
characters are stored in the precomposed form.
If that is really the problem, you can solve it by
normalizing all file names to either precomposed or decomposed form, using the precomposedStringWithCanonicalMapping or decomposedStringWithCanonicalMapping method of NSString.
Remarks:
Both precomposed and decomposed version of a string will be displayed in the same way (e.g. é).
The compare: method of NSString considers both versions of the string as equal (unless you call it with the NSLiteralSearch option).
The isEqual: method of NSString considers the two versions of the string as different,
and that is used by removeObject: to find the object to remove.

IOS isEqualToString Not Working

The following example program outputs the same, but the program does not work correctly.
NSDirectoryEnumerator *directoryEnumerator = [[NSFileManager defaultManager]
enumeratorAtPath:kDocdir];
for (NSString *pathi in directoryEnumerator)
{
NSString *fileName_Manager = [pathi lastPathComponent];
NSLog(#"fileName_Manager = %#",fileName_Manager);
Artist *name_Databse = [self.fetchedResultsController
objectAtIndexPath:IndexPath];
NSLog(#"name_Databse = %#",name_Databse.name);
if ([fileName_Manager isEqualToString:name_Databse.name]) {
NSLog(#"Same Name");
}else{
NSLog(#"Different Name");
}
}
Outputs:
2013-04-25 15:37:43.256 Player[36436:907] fileName_Manager = alizée - mèxico - final j'en
2013-04-25 15:37:43.272 Player[36436:907] name_Databse = alizée - mèxico - final j'en
2013-04-25 15:37:44.107 Player[36436:907] Different Name
does not work correctly when special characters in names. Why is this happening?
Thanks ...
have the same problem here:
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"name == %#",[pathi lastPathComponent]];
How do I make an edit here?
The documentation for isEqualToString: suggests you might have a problem:
The comparison uses the canonical representation of strings, which for
a particular string is the length of the string plus the Unicode
characters that make up the string. When this method compares two
strings, if the individual Unicodes are the same, then the strings are
equal, regardless of the backing store. “Literal” when applied to
string comparison means that various Unicode decomposition rules are
not applied and Unicode characters are individually compared. So,
for instance, “Ö” represented as the composed character sequence “O”
and umlaut would not compare equal to “Ö” represented as one Unicode
character.
Try using (NSOrderedSame == [string1 localizedCompare:string2])
Also, if you haven't already, look into the Apple sample code 'International Mountains' which deals with numerous localization issues.
Have you tried converting both strings to UTF-8 and then do the comparison? I don't know if that works, it's just an idea.

StringByAddingPercentEscapes not working on ampersands, question marks etc

I'm sending a request from my iphone-application, where some of the arguments are text that the user can enter into textboxes. Therefore, I need to HTML-encode them.
Here's the problem I'm running into:
NSLog(#"%#", testText); // Test & ?
testText = [testText stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(#"%#", testText); // Test%20&%20?
As you can see, only the spaces are encoded, making the server disregard everything past the ampersand for the argument.
Is this the advertised behaviour of stringByAddingPercentEscapes? Do I have to manually replace every special character with corresponding hex code?
Thankful for any contributions.
They are not encoded because they are valid URL characters.
The documentation for stringByAddingPercentEscapesUsingEncoding: says
See CFURLCreateStringByAddingPercentEscapes for more complex transformations.
I encode my query string parameters using the following method (added to a NSString category)
- (NSString *)urlEncodedString {
CFStringRef buffer = CFURLCreateStringByAddingPercentEscapes(kCFAllocatorDefault,
(CFStringRef)self,
NULL,
CFSTR("!*'();:#&=+$,/?%#[]"),
kCFStringEncodingUTF8);
NSString *result = [NSString stringWithString:(NSString *)buffer];
CFRelease(buffer);
return result;
}

iPhone objective-c: detecting a 'real' word

I need a (quick and dirty) solution to basically detect if a certain NSString is a 'real' word, that is, if it's in the dictionary. So basically, a very simplistic spell checker. Does anyone know of any way to do this? Basically I either need a file containing all words in the English dictionary (which I've searched for, but to no avail), or a way to interface with the iPhones spell checking service. Of course I would like to interface with the iPhones spell check service in a similar way to NSSpellChecker on OSX so my app will work with other languages, but at this point I'll take what I can get.
Lastly, here's some pseudo-code to better illustrate my needs:
-(BOOL)isDictionaryWord:(NSString*)word; //returns TRUE when word=#"greetings". returns FALSE when word=#"slkfjsdkl";
Use UITextChecker instead. The code below might not be perfect but should give you a good idea.
-(BOOL)isDictionaryWord:(NSString*)word {
UITextChecker *checker = [[UITextChecker alloc] init];
NSLocale *currentLocale = [NSLocale currentLocale];
NSString *currentLanguage = [currentLocale objectForKey:NSLocaleLanguageCode];
NSRange searchRange = NSMakeRange(0, [word length]);
NSRange misspelledRange = [checker rangeOfMisspelledWordInString:word range:searchRange startingAt:0 wrap:NO language:currentLanguage];
return misspelledRange.location == NSNotFound;
}
You can make UITextChecker work accurately without needing to add a new dictionary.
I use a two-step process because I need the first step to be fast (but not accurate). You may only need step two which is the accurate check. Note this makes use of the UITextChecker's completionsForPartialWordRange function which is why it's more accurate than the MisspelledWord function.
//Step one: I quickly check to see if a combination of letters passes the spell check. This is not that accurate but it's very fast so I can quickly exclude lots of letter combinations (brute force approach).
UITextChecker *checker;
NSString *wordToCheck = #"whatever"; // The combination of letters you wish to check
// Set the range to the length of the word
NSRange range = NSMakeRange(0, wordToCheck.length - 1);
NSRange misspelledRange = [checker rangeOfMisspelledWordInString:wordToCheck range: range startingAt:0 wrap:NO language: #"en_US"];
BOOL isRealWord = misspelledRange.location == NSNotFound;
// Call step two, to confirm that this is a real word
if (isRealWord) {
isRealWord = [self isRealWordOK:wordToCheck];
}
return isRealWord; // if true then we found a real word, if not move to next combination of letters
// Step Two: Extra check to make sure the word is really a real word. returns true if we have a real word.
-(BOOL)isRealWordOK:(NSString *)wordToCheck {
// we dont want to use any words that the lexicon has learned.
if ([UITextChecker hasLearnedWord:wordToCheck]) {
return NO;
}
// now we are going to use the word completion function to see if this word really exists, by removing the final letter and then asking auto complete to complete the word, then look through all the results and if its not found then its not a real word. Note the auto complete is very acurate unlike the spell checker.
NSRange range = NSMakeRange(0, wordToCheck.length - 1);
NSArray *guesses = [checker completionsForPartialWordRange:range inString:wordToCheck language:#"en_US"];
// confirm that the word is found in the auto-complete list
for (NSString *guess in guesses) {
if ([guess isEqualToString:wordToCheck]) {
// we found the word in the auto complete list so it's real :-)
return YES;
}
}
// if we get to here then it's not a real word :-(
NSLog(#"Word not found in second dictionary check:%#",wordToCheck);
return NO;
}

Non US characters in section headers for a UITableView

I have added a section list for a simple Core Data iPhone app.
I followed this so question to create it - How to use the first character as a section name but my list also contain items starting with characters outside A-Z, specially Å,Ä and Ö used here in Sweden.
The problem now is that when the table view shows the section list the three last characters are drawn wrong. See image below
alt text http://img.skitch.com/20100130-jkt6e55pgyjwptgix1q8mwt7md.jpg
It seems like my best option right now is to let those items be sorted under 'Z'
if ([letter isEqual:#"Å"] ||
[letter isEqual:#"Ä"] ||
[letter isEqual:#"Ö"])
letter = #"Z";
Someone that have figured this one out?
And while I'm at it... 'Å', 'Ä' and 'Ö' should be sorted in that order but are sorted as 'Ä', 'Å' and 'Ö' by Core Data NSSortDescriptor. I have tried to set set the selector to localizedCaseInsensitiveCompare: but that gives a out of order section name 'Ä. Objects must be sorted by section name' error. Seen that too?
So I could not let go of this one and found the following:
What you need to do in this case is called 'Unicode Normalization Form D'. It is more explained in http://unicode.org/reports/tr15/ (warning, long and dry document)
Here is a function that does the decomposition and then filters out all diacritics. You can use this to convert Äpple to Apple and then use the first letter to build an index.
- (NSString*) decomposeAndFilterString: (NSString*) string
{
NSMutableString *decomposedString = [[string decomposedStringWithCanonicalMapping] mutableCopy];
NSCharacterSet *nonBaseSet = [NSCharacterSet nonBaseCharacterSet];
NSRange range = NSMakeRange([decomposedString length], 0);
while (range.location > 0) {
range = [decomposedString rangeOfCharacterFromSet:nonBaseSet
options:NSBackwardsSearch range:NSMakeRange(0, range.location)];
if (range.length == 0) {
break;
}
[decomposedString deleteCharactersInRange:range];
}
return [decomposedString autorelease];
}
(I found this code on a mailing list, forgot the source, but I took it and fixed it up a little)
I am experiencing the same issue reported in the original question, i.e. extended characters (outside Unicode page 0) not properly displayed in the index bar.
Although I seem to feed NSFetchedResultsController with correct single unicode character strings, what I get in return accessing the 'sectionIndexTitles' property are those same characters with the high byte set to 0; for example character \u30a2 (KATAKANA LETTER A) becomes \u00a2 (CENT SIGN).
I am not sure whether this is a bug in the method sectionIndexTitleForSectionName: of NSFetchedResultsController or my own fault somewhere else, however my workaround consists in overriding it and performing what the documentation says it does internally:
- (NSString *)sectionIndexTitleForSectionName:(NSString *)sectionName
{
NSString *outName;
if ( [sectionName length] )
outName = [[sectionName substringToIndex:1] uppercaseString];
else
outName = [super sectionIndexTitleForSectionName:sectionName];
return outName;
}
This produces the expected output.
Just adding this in my controller solved the original problem for me
- (NSString *)controller:(NSFetchedResultsController *)controller sectionIndexTitleForSectionName:(NSString *)sectionName {
return sectionName;}
Å, Ä and Ö displays properly in the list
When I encounter situation like this, the first thing I ask myself is 'what does Apple do'.
As an experiment I just added 'Joe Äpple' to my iPhone address book and he shows up under the plain A. I think that makes a lot of sense.
So instead of throwing them under Z or Ä you should do the same. There must be some way to get the 'base' letter of a unicode character for the grouping.
You can use UTF-8 encoding
const char *cLetter = (const char *)[ tmp cStringUsingEncoding:NSUTF8StringEncoding];
NSString *original = [NSString stringWithCString:cLetter encoding:NSUTF8StringEncoding];
Did you follow my answer to that other question?
I am doing some tests (refetching my objects) and seeing odd intermittent behavior. It definitely sorts them correctly (with 'Å' right after 'A', but before 'Ä') sometimes. Other times it puts the characters outside of A-Z all at the end (after 'Z') even though I didn't do anything special.
Also, I noticed that 'Å' is drawn correctly if you return its lowercase form. Have you tried overriding sectionIndexTitleForSectionName: to keep the order, but change the drawn character?
Did you ever solve this issue?
I've been able to get the section title index to display correctly by implementing sectionIndexTitlesForTableView: to build my own array of section titles:
- (NSArray *)sectionIndexTitlesForTableView:(UITableView *)tableView {
NSMutableArray *indexKeys = [NSMutableArray arrayWithCapacity:30];
NSArray *fetchedResults = [fetchedResultsController fetchedObjects];
NSString *currKey = #"DEFAULT";
for (NSManagedObject *managedObject in fetchedResults) {
NSString *indexKey = [managedObject valueForKey:#"indexKey"];
if (![indexKey isEqualToString:currKey]) {
[indexKeys addObject:indexKey];
currKey = indexKey;
}
}
return indexKeys;
}
Here, indexKey is the first letter of the name.
However, this creates one of two issues in sectionForSectionIndexTitle: instead:
If I simply return the index for the section this is now the unsorted index and no longer corresponds with the sort order in the fetchResultController:
- (NSInteger)tableView:(UITableView *)tableView sectionForSectionIndexTitle:(NSString *)title atIndex:(NSInteger)index {
return index;
}
Alternatively, If I pass on the call to the fetchedResultsController it breaks on the non-US index titles because these are no longer the weird characters used by the fetchedResultsController:
- (NSInteger)tableView:(UITableView *)tableView sectionForSectionIndexTitle:(NSString *)title atIndex:(NSInteger)index {
return [fetchedResultsController sectionForSectionIndexTitle:title atIndex:index];
}
The latter code generates an error of the following kind when navigating to the "Ø" index title:
Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Index title at 24 is not equal to 'Ø''
A workaround for this is to translate the offending characters back to their weird selves:
- (NSInteger)tableView:(UITableView *)tableView sectionForSectionIndexTitle:(NSString *)title atIndex:(NSInteger)index {
if ([title isEqualToString:#"Æ"]) {
title = #"\u2206";
} else if ([title isEqualToString:#"Ø"]) {
title = #"\u0178";
} else if ([title isEqualToString:#"Å"]) {
title = #"\u2248";
}
return [fetchedResultsController sectionForSectionIndexTitle:title atIndex:index];
}
You can find the Unicode values in the debugger with the action "Print Description to Console".
However, the good solution would be to figure out why this weird encoding happens and prevent it.