I have this code to get all files from a folder :
- (NSMutableArray*) allFilesAtPath:(NSString *)startPath
{
NSMutableArray* listing = [NSMutableArray array];
NSArray* fileNames = [self contentsOfDirectoryAtPath:startPath error:nil];
if (!fileNames) return listing;
for (NSString* file in fileNames) {
NSString* absPath = [startPath stringByAppendingPathComponent:file];
BOOL isDir = NO;
if ([self fileExistsAtPath:absPath isDirectory:&isDir]) {
[listing addObject:absPath];
if (isDir) [listing addObjectsFromArray:[self allFilesAtPath:absPath]];
}
}
return listing;
}
In one test folder, I have a file that is named yahoéo.jpg
When NSLogged, it is displayed as yahoe\U0301o.jpg
Of course, that works fine for any other file without such an accentuated character in the file name.
So, when I try to delete it from the array with :
[theFilesArray removeObject:fileName];
fileName is yahoéo.jpg
it is not remove because it is not found into the array.
Why do I have such a character replacement. I do not find anything in the doc talking about that. Which characters are supposed to have the same treatment ? How should I knew that ?
And most of all, how may I do to get the é character in the files name array ?
EDIT
fileName variable used in the removeObject method is constructed by getting a string from a PList file, and giving it to the following method :
+ (NSString*) fileNameWithString:(NSString*)str
{
NSString* fileName = str;
NSCharacterSet* charactersToRemove = [NSCharacterSet characterSetWithCharactersInString:#".:/\\"];
fileName = [[fileName componentsSeparatedByCharactersInSet:charactersToRemove] componentsJoinedByString:#"#"];
fileName = [fileName stringByAppendingString:#".jpg"];
return fileName;
}
The NSLog output of an NSArray shows all non-ASCII characters in \Unnnn escaped form. But that is only the way NSLog prints it, so that should not be the problem.
I assume that is a problem of "precomposed" vs "decomposed" characters. The HFS filesystem uses decomposed characters in the filenames, so é is stored as two Unicode characters:
U+0065 + U+0301 = "e" + COMBINING ACUTE ACCENT
(and NSLog prints that as e\U0301).
This is different from the single Unicode character (precomposed form)
U+00E9 = "é"
therefore, the string yahoéo.jpg will not be found in the array if its
characters are stored in the precomposed form.
If that is really the problem, you can solve it by
normalizing all file names to either precomposed or decomposed form, using the precomposedStringWithCanonicalMapping or decomposedStringWithCanonicalMapping method of NSString.
Remarks:
Both precomposed and decomposed version of a string will be displayed in the same way (e.g. é).
The compare: method of NSString considers both versions of the string as equal (unless you call it with the NSLiteralSearch option).
The isEqual: method of NSString considers the two versions of the string as different,
and that is used by removeObject: to find the object to remove.
Related
The following example program outputs the same, but the program does not work correctly.
NSDirectoryEnumerator *directoryEnumerator = [[NSFileManager defaultManager]
enumeratorAtPath:kDocdir];
for (NSString *pathi in directoryEnumerator)
{
NSString *fileName_Manager = [pathi lastPathComponent];
NSLog(#"fileName_Manager = %#",fileName_Manager);
Artist *name_Databse = [self.fetchedResultsController
objectAtIndexPath:IndexPath];
NSLog(#"name_Databse = %#",name_Databse.name);
if ([fileName_Manager isEqualToString:name_Databse.name]) {
NSLog(#"Same Name");
}else{
NSLog(#"Different Name");
}
}
Outputs:
2013-04-25 15:37:43.256 Player[36436:907] fileName_Manager = alizée - mèxico - final j'en
2013-04-25 15:37:43.272 Player[36436:907] name_Databse = alizée - mèxico - final j'en
2013-04-25 15:37:44.107 Player[36436:907] Different Name
does not work correctly when special characters in names. Why is this happening?
Thanks ...
have the same problem here:
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"name == %#",[pathi lastPathComponent]];
How do I make an edit here?
The documentation for isEqualToString: suggests you might have a problem:
The comparison uses the canonical representation of strings, which for
a particular string is the length of the string plus the Unicode
characters that make up the string. When this method compares two
strings, if the individual Unicodes are the same, then the strings are
equal, regardless of the backing store. “Literal” when applied to
string comparison means that various Unicode decomposition rules are
not applied and Unicode characters are individually compared. So,
for instance, “Ö” represented as the composed character sequence “O”
and umlaut would not compare equal to “Ö” represented as one Unicode
character.
Try using (NSOrderedSame == [string1 localizedCompare:string2])
Also, if you haven't already, look into the Apple sample code 'International Mountains' which deals with numerous localization issues.
Have you tried converting both strings to UTF-8 and then do the comparison? I don't know if that works, it's just an idea.
I wants to remove specific characters or group substring from NSString.
mean
NSString *str = #" hello I am #39;doing Parsing So $#39;I get many symbols in &my response";
I wants remove #39; and $#39; and & (Mostly these three strings comes in response)
output should be : hello I am doing Parsing So i get many symbols in my response
Side Question : I can't write & #39; without space here, because it converted in ' <-- this symbol. so i use $ in place of & in my question.
you should use [str stringByReplacingOccurrencesOfString:#"#39" withString:#""]
or you need replace strings of concrete format like "#number"?
try below code ,i think you got whatever you want simply change the charecterset,
NSString *string = #"hello I am #39;doing Parsing So $#39;I get many symbols in &my response";
NSCharacterSet *trim = [NSCharacterSet characterSetWithCharactersInString:#"#39;$&"];
NSString *result = [[string componentsSeparatedByCharactersInSet:trim] componentsJoinedByString:#""];
NSLog(#"%#", result);
I need to display subscripts and superscripts (only arabic numerals) within a UILabel. The data is taken from an XML file. Here is the snippet of XML file:
<text><![CDATA[Hello World X\u00B2 World Hello]]></text>
Its supposed to display X2 (2 as superscript). When I read the string from the NSXMLParser and display it in the UILabel, it displays it as X\u00B2. Any ideas on how to make it work?
I think you can do something like this, assuming the CDATA contents have been read into an NSString and passed into this function:
-(NSString *)removeUnicodeEscapes:(NSString *)stringWithUnicodeEscapes {
unichar codeValue;
NSMutableString *result = [stringWithUnicodeEscapes mutableCopy];
NSRange unicodeLocation = [result rangeOfString:#"\\u"];
while (unicodeLocation.location != NSNotFound) {
// Get the 4-character hex code
NSRange charCodeRange = NSMakeRange(unicodeLocation.location + 2, 4);
NSString *charCode = [result substringWithRange:charCodeRange];
[[NSScanner scannerWithString:charCode] scanHexInt:&codeValue];
// Convert it to an NSString and replace in original string
NSString *unicodeChar = [NSString stringWithFormat:%C", codeValue];
NSRange replacementRange = NSMakeRange(unicodeLocation.location, 6);
[result replaceCharactersInRange:replacementRange withString:unicodeChar];
unicodeLocation = [result rangeOfString:#"\\u"];
}
return result;
}
I haven't had a chance to try this out, but I think the basic approach would work
\u00B2 is not any sort of XML encoding for characters. Apparently your data source has defined their own encoding scheme (which, frankly, is pretty stupid as XML is capable of encoding these directly, using entities outside of CDATA blocks).
In any case, you'll have to write your own parser that handles \u#### and converts that to the correct character.
I asked the question to my colleague and he gave me a nice and simple workaround. Am describing it here, in case others also get stuck at this.
Firstly goto this link. It has a list of all subscripts and superscripts. For example, in my case, I clicked on "superscript 0". In the following HTML page detailing "superscript 0", goto "Java Data" section and copy the "⁰". You can either place this directly in XML or write a simple regex in obj-c to replace \u00B2 with "⁰". And you will get nice X⁰. Do the same fro anyother superscript or subscript that you might want to display.
I've few German strings (with umlauts like åä etc) in NSArray.
For example consider a word like "gënder" is there in array.
User enters "gen" in a text field.
I can to check the words in string that matches the characters "gen".
How can I compare the string by consider umlauts as english strings...?
So in above example, when user enters "gen", it has to return "gënder".
Is there any solution for this type of comparision?
Use the NSDiacriticInsensitiveSearch option of the various NSString compare methods. As described in the documentation:
Search ignores diacritic marks.
For example, ‘ö’ is equal to ‘o’.
For example:
NSString *text = #"gënder";
NSString *searchString = #"ender";
NSRange rng = [text rangeOfString:searchString
options:NSDiacriticInsensitiveSearch];
if (rng.location != NSNotFound)
{
NSLog(#"Match at %#", NSStringFromRange(rng));
}
else
{
NSLog(#"No match");
}
When I fetch the source of any web page, no matter the encoding I use, I always end up with &# - characters (such as © or ®) instead of the actual characters themselves. This goes for foreign characters as well (such as åäö in swedish), which I have to parse from "Å" and such).
I'm using
+stringWithContentsOfUrl: encoding: error;
to fetch the source and have tried several different encodings such as NSUTF8StringEncoding and NSASCIIStringEncoding, but nothing seems to affect the end result string.
Any ideas / tips / solution is greatly appreciated! I'd rather not have to implement the entire ASCII table and replace all occurrances of every character... Thanks in advance!
Regards
I'm using
+stringWithContentsOfUrl: encoding: error;
to fetch the source and have tried several different encodings such as NSUTF8StringEncoding and NSASCIIStringEncoding, but nothing seems to affect the end result string.
You're misunderstanding the purpose of that encoding: argument. The method needs to convert bytes into characters somehow; the encoding tells it what sequences of bytes describe which characters. You need to make sure the encoding matches that of the resource data.
The entity references are an SGML/XML thing. SGML and XML are not encodings; they are markup language syntaxes. stringWithContentsOfURL:encoding:error: and its cousins do not attempt to parse sequences of characters (syntax) in any way, which is what they would have to do to convert one sequence of characters (an entity reference) into a different one (the entity, in practice meaning single character, that is referenced).
You can convert the entity references to un-escaped characters using the CFXMLCreateStringByUnescapingEntities function. It takes a CFString, which an NSString is (toll-free bridging), and returns a CFString, which is an NSString.
Are you sure they originally are not in Å form? Try to view the source code in a browser first.
That really, really sucks. I wanted to convert it directly and the above solution isn't really a good one, so I just wrote my own ascii-table converter (static) class. Works as it should have worked natively (though I have to fill in the ascii table myself...)
Ideas for optimization? ("ASCII" is a static NSDictionary)
#implementation InternetHelper
+(NSString *)HTMLSourceFromUrlWithString:(NSString *)str convertASCII:(BOOL)state
{
NSURL *url = [NSURL URLWithString:str];
NSString *source = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];
if (state)
source = [InternetHelper ConvertASCIICharactersInString:source];
return source;
}
+(NSString *)ConvertASCIICharactersInString:(NSString *)str
{
NSString *ret = [NSString stringWithString:str];
if (!ASCII)
{
NSString *path = [[NSBundle mainBundle] pathForResource:kASCIICharacterTableFilename ofType:kFileFormat];
ASCII = [[NSDictionary alloc] initWithContentsOfFile:path];
}
for (id key in ASCII)
{
ret = [ret stringByReplacingOccurrencesOfString:key withString:[ASCII objectForKey:key]];
}
return ret;
}
#end