Objective C read file wrong encoding - iphone

Hi all I have a problem when I download file from internet from which I need to mine some data. I open it and try to buffer it, but it gives me wrong chars because this file is in Czech...
My code:
- (void) sync {
NSString * path = #"/Users/syky/Documents/stats.csv";
NSFileHandle * fileHandle = [NSFileHandle fileHandleForReadingAtPath:path];
NSData * buffer = nil;
while ((buffer = [fileHandle readDataOfLength:1024])) {
//do something with the buffer
NSString * s = [[NSString alloc]initWithData:buffer encoding:nil];
NSLog(s);
break;
}
No matter which encoding I choose I always get broken chars such as
"Poø.";"Jméno"
I need to get:
"Příjmení";"Jméno"
This file is originaly generated by Microsoft Excel such as *.csv export file...
When I try to open this file by any MAC OS X Text editor I get broken chars as well, but when I open it on other Windows based maschine with Microsoft Excel it works just fine...
Thank you for your help
Solution:
- (void) sync {
NSString * path = #"/Users/syky/Documents/stats.csv";
NSFileHandle * fileHandle = [NSFileHandle fileHandleForReadingAtPath:path];
NSData * buffer = nil;
while ((buffer = [fileHandle readDataOfLength:1024])) {
NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingWindowsLatin2);
NSString *string = [[NSString alloc] initWithData:buffer encoding:encoding];
NSLog(string);
break;
}

First, I'm not a Czech speaker. Second, I think "use UTF-8" is akin to saying "throw a barrel at it." It's heavy-handed in the same way.
From what I've researched, you could use ISO Latin 2 or Apple's Central European Roman encoding. You'll find the former represented among NSStringEncodings, but not the latter, so look to Core Foundation's support:
NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingMacCentralEurRoman);
NSString *string = [[NSString alloc] initWithData:buffer encoding:encoding];
Otherwise, you could (and probably already have, from what you've said) use:
NSString *string = [[NSString alloc] initWithData:buffer encoding:NSISOLatin2StringEncoding];
I'm really curious to see if using CFStringEncoding encodings improves your situation.
EDIT:
If your source was generated by Microsoft Excel, perhaps kCFStringEncodingWindowsLatin2 will work instead of kCFStringEncodingMacCentralEurRoman. Like before, you'll need to convert it using CFStringConvertEncodingToNSStringEncoding.
There's one other approach you might want to try. Since CFStringRef is "toll-bridged" to NSString (and so is CFDataRef to NSData), perhaps working entirely in Core Foundation might work:
CFStringRef stringRef = CFStringCreateFromExternalRepresentation(kCFAllocatorDefault, (CFDataRef)buffer, kCFStringEncodingMacCentralEurRoman);
NSString *string = (NSString *)stringRef;
In this case, don't forget that stringRef has to be released.
Good luck to you in your endeavors.

Related

CHCSV Error : unable to allocate memory for length

I want to parse a .csv file. For this I use the CHCSV Parser. But when I push into the view where the parser should start parsing, the app crashes.
Terminating app due to uncaught exception 'NSMallocException', reason:
'* -[NSConcreteMutableData appendBytes:length:]: unable to allocate
memory for length (4294967295)'
NSString *filePath = #"http://somewhere.com/test.csv";
NSString *fileContent = [NSString stringWithContentsOfURL:[NSURL URLWithString:filePath] encoding:NSUTF8StringEncoding error:nil];
self.csvParser = [[CHCSVParser alloc] initWithContentsOfCSVFile:fileContent];
Edit:
I'm developing for iOS 6+. Thanks for the great comments and answers. I hope to get the right solution.
Input Stream
It doesn't work. When I want to work with the input stream the app crashes because of the wrong encoding.
Incompatible integer to pointer conversion sending 'int' to
parameter of type 'NSStringEncoding *' (aka 'unsigned int *')
NSData *downloadData = [NSData dataWithContentsOfURL:[NSURL URLWithString:#"http://example.com/test.csv"]];
NSInputStream *stream = [NSInputStream inputStreamWithData:downloadData];
self.csvParser = [[CHCSVParser alloc] initWithInputStream:stream usedEncoding:NSUTF8StringEncoding delimiter:#";"];
self.csvParser.delegate = self;
[self.csvParser parse];
CSV-String
NSString *filePath = #"http://example.com/test.csv";
NSString *fileContent = [NSString stringWithContentsOfURL:[NSURL URLWithString:filePath] encoding:NSUTF8StringEncoding error:nil];
self.csvParser = [[CHCSVParser alloc] initWithCSVString:fileContent];
self.csvParser.delegate = self;
[self.csvParser parse];
This parse only (null).
Final Edit: Dave, the author of CHCSVParser, updated his code on github, so this problem should be solved when you use the most recent version. Get it now!
Okay, here we go:
First add the following code in CHCSVParser.m:
In method - (void)_sniffEncoding at the very beginning you have:
uint8_t bytes[CHUNK_SIZE];
NSUInteger readLength = [_stream read:bytes maxLength:CHUNK_SIZE];
[_stringBuffer appendBytes:bytes length:readLength];
[self setTotalBytesRead:[self totalBytesRead] + readLength];
change it to:
uint8_t bytes[CHUNK_SIZE];
NSUInteger readLength = [_stream read:bytes maxLength:CHUNK_SIZE];
if (readLength > CHUNK_SIZE) {
readLength = CHUNK_SIZE;
}
[_stringBuffer appendBytes:bytes length:readLength];
[self setTotalBytesRead:[self totalBytesRead] + readLength];
After that changed I got only null values so I changed the file path (in the sample project it is located in the main(), however I did the parsing in viewDidLoad.
Make sure you copied the file in your bundle directory for that to work!
file = [NSBundle pathForResource:#"Test" ofType:#"scsv" inDirectory:[[NSBundle mainBundle] bundlePath]];
Edit:
When you say you need to download the file you can do following (but notice that this is quick and dirty solution especially on mobile devices)
NSData *downloadData = [NSData dataWithContentsOfURL:[NSURL URLWithString:#"http://www.yourdomain.tld/Test.scsv"]];
NSInputStream *stream = [NSInputStream inputStreamWithData:downloadData];
The last line is the important one here you need to change.
Hope that solves your issue.
Edit 2:
I've just created a repository with a demo project for you where the code actually works. Perhaps you can find out what you do wrong (or at least different). Here is the link.
Edit 3:
Change
self.csvParser = [[CHCSVParser alloc] initWithInputStream:stream usedEncoding:NSUTF8StringEncoding delimiter:#";"];
to
self.csvParser = [[CHCSVParser alloc] initWithInputStream:stream usedEncoding:&encoding delimiter:';'];

NSString stringWithContentsOfFile:fileName non-english letters

I have a file with many lines separated by "\n". One of the lines is:
Christian Grundekjøn
I can't read the file unless I delete the line. I use the following code to read line by line:
for (NSString *line in [[NSString stringWithContentsOfFile:fileName encoding:NSUTF8StringEncoding error:NULL] componentsSeparatedByString:#"\n"])
If I don't delete the line, the code wouldn't even go into the for loop at all. Nothing was read. How to handle the non-English letters?
If you are generating the text file from within iOS then you need to make sure you are encoding it with NSUTF8StringEncoding. But given the problem you are reporting, I suspect that you may be pulling in data from another source and that source hasn't encoded the text as UTF8. If this is the case, you may be able to fix the problem outside your app but converting the source file to UTF8.
If you don't know what encoding is used, e.g. because the user has supplied the file, iOS can try to guess it for you. A pattern that I have used successfully is to first try to get the string using UTF8 encoding, for example using the same approach you use. Assuming you write a method, to which you pass a filename, to get the string something like the following:
- (NSString*) stringFromFile: (NSString*) filePath;
{
NSError* error = nil;
NSString* stringFromFile = [NSString stringWithContentsOfFile: fileName
encoding: NSUTF8StringEncoding
error: &error];
if (stringFromFile) return stringFromFile; // success
NSLog(#"String is not UTF8 encoded. Error: %#", [error localizedDescription]);
NSStringEncoding encoding = 0;
NSError* usedEncodingError = nil;
NSString* stringFromFile = [NSString stringWithContentsOfFile: path
usedEncoding: &encoding
error: &usedEncodingError];
if (stringFromFile)
{
NSLog(#"Retrieved string using an alternative encoding. Encoding was: %d", encoding);
return stringFromFile;
}
// either handle error or attempt further explicit unencodings here
return nil;
}
In many cases, usedEncoding works very well. But there are edge cases where trying to figure out an encoding can be very tricky. It all depends on the source file.
I had problem with Japanese characters. My solution was when saving file to doc directory
NSString *fileData = [NSString stringWithFormat:#"%#", noteContent];
BOOL isWriteToFile = [fileData writeToFile:notePath atomically:YES encoding:NSUTF8StringEncoding error:nil];
When reading file content
[[NSString alloc] initWithContentsOfFile:fullNotePath usedEncoding:nil error:nil];
In the file, store your data in unicode format or you can also store special character in unicode format.

Add data to NSMutableArray from a text file separated by line break?

I have a txt file with some URLs like this
http://url1.com
http://url1.com
http://url1.com
Separated by a line break. How could I add those as different entries separated by line breaks to an NSMutableArray? Thanks :)
Try something like this:
NSMutableArray *txtLines = [NSMutableArray array];
[txtFile enumerateLinesUsingBlock:^(NSString *line, BOOL *stop) {
if ([line length] > 0) {
[txtLines addObject:line];
}
}];
Update
#Evan is right, the above only works if blocks are available on your platform. A compiler directive around that code should take care of this limitation, e.g.:
#if NS_BLOCKS_AVAILABLE
// iOS 4.0+ solution
#else
// iOS 2.0+ solution
#endif
NSString *myListString = /* load / download file */
NSMutableArray *myList = [myListString componentsSeparatedByString:#"\n"];
You may have to use <br/> if it's HTML.
#octy's solution is only available in iOS 4.0 or later. This solution is iOS 2.0 or later. You can check the iOS version and choose which one to use:
BOOL useEnumeratedLineParsing = FALSE;
NSString *reqSysVer = #"4.0";
NSString *currSysVer = [[UIDevice currentDevice] systemVersion];
if ([currSysVer compare:reqSysVer options:NSNumericSearch] != NSOrderedAscending)
useEnumeratedLineParsing = TRUE;
Then check the value of useEnumeratedLineParsing.
NSString *textFilePath = [[NSBundle mainBundle] pathForResource:#"urls" ofType:#"txt"];
NSString *fileContentsUrls = [NSString stringWithContentsOfFile:textFilePath encoding:NSUTF8StringEncoding error:nil];
NSArray *myArray = [urls componentsSeparatedByString:#"\n"];
As long as it isn't mega-large, you could read the whole file into an NSString.
NSString *text = [NSString stringWithContentsOfFile:path encoding:NSUTF8Encoding error:nil];
Then split the lines:
NSArray *lines = [text componentsSeparatedByString:#"\n"];
And make it mutable:
NSMutableArray *mutableLines = [lines mutableCopy];
Now, depending on where your text file is coming from, you probably need to be more careful. It could be separated by \r\n instead of just \n, in which case your lines will contain a bunch of extra \r characters. You could clean this up after the fact, using something to remove extra whitespace (your file also might have blank lines which the above will turn into empty strings).
On the other hand, if you're in control of the file, you won't have to worry about that. (But in that case, why not read a plist instead parsing a plain text file...)

Convert unicode string to utf8

When I get a string of the form \u043F\u043F (Unicode), how do I convert it to a readable NSUT8String? Here is my code (that fails when these are non English characters):
- (void)connectionDidFinishLoading:(NSURLConnection *)connection{
NSString *theStr = [[NSString alloc] initWithBytes:[receivedData bytes]
length:[receivedData length] encoding: NSUTF8StringEncoding];
NSLog(theStr);
}
When the string is in English characters everything is fine - but when it is in Unicode format it fails to give me a readable string (but remains in a Unicode format).
What do you think?
EDIT:
I realized I didn't give enough info on what I'm trying to do. I am trying to use youtube's way of getting auto-suggested keywords when you use the search box (nothing official, just used a sniffer to find out). Here it is:
http://suggestqueries.google.com/complete/search?hl=en&client=youtube&hjson=t&ds=yt&jsonp=window.yt.www.suggest.handleResponse&q=*******&cp=******
q is your query and cp is the length of q.
So basically when q is something in English it works fine. But when q has non English characters (Russian for example) this is what I get (from NSLog):
window.yt.www.suggest.handleResponse(["\u043F\u0440",[["\u043F\u0440\u0438\u043A\u043E\u043B\u044B","","0"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D","","1"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 87","","2"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 88","","3"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 86","","4"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 85","","5"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 89","","6"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 84","","7"],["\u043F\u0440\u0438\u043A\u043E\u043B\u044B \u0432 \u043F\u0440\u044F\u043C\u043E\u043C \u044D\u0444\u0438\u0440\u0435","","8"],["\u043F\u0440\u043E\u0436\u0435\u043A\u0442\u043E\u0440\u043F\u0435\u0440\u0438\u0441\u0445\u0438\u043B\u0442\u043E\u043D 90","","9"]],{}])
You can use:
#interface NSString
{
- (__strong const char *)UTF8String; // Convenience to return
// null-terminated UTF8 representation
}
I think this may help..
NSString *yourString = "\u043F\u0440\u0438\u043A\u043E\u043B\u044B";
NSArray *unicodeArray = [yourString componentsSeparatedByString:#"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:#""];
for (NSString *unicodeString in unicodeArray) {
if (![unicodeString isEqualToString:#""]) {
unichar codeValue;
[[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
NSString* betaString = [NSString stringWithCharacters:&codeValue length:1];
[finalString appendString:betaString];
}
}
//finalString should have encoded one

Load remote csv into CHCSVParser

I am using Dave DeLong's CHCSVParser to parse a csv. I can parse the csv locally, but I cannot get it load a remote csv file. I have been staring at my MacBook way too long today and the answer is right in front of me. Here is my code:
NSString *urlStr = [[NSString alloc] initWithFormat:#"http://www.somewhere.com/LunchSpecials.csv"];
NSURL *lunchFileURL = [NSURL URLWithString:urlStr];
NSStringEncoding encoding = 0;
CHCSVParser *p = [[CHCSVParser alloc] initWithContentsOfCSVFile:[lunchFileURL path] usedEncoding:&encoding error:nil];
[p setParserDelegate:self];
[p parse];
[p release];
Thanks for any help that someone can give me.
-[NSURL path] is not doing what you're expecting.
If I have the URL http://stackoverflow.com/questions/4636428, then it's -path is /questions/4636428. When you pass that path to CHCSVParser, it's going to try and open that path on the local system. Since that file doesn't exist, you won't be able to open it.
What you need to do (as Walter points out) is download the CSV file locally, and then open it. You can download the file in several different ways (+[NSString stringWithContentsOfURL:...], NSURLConnection, etc). Once you've got either the file saved locally to disk or the string of CSV in memory, you can then pass it to the parser.
If this is a very big file, then you'll want to alloc/init a CHCSVParser with the path to the local copy of the CSV file. The parser will then read through it bit by bit and tell you what it finds via the delegate callbacks.
If the CSV file isn't very big, then you can do:
NSString * csv = ...; //the NSString containing the contents of the CSV file
NSArray * rows = [csv CSVComponents];
That will return an NSArray of NSArrays of NSStrings.
Similar to this last approach is using the NSArray category method:
NSString * csv = ...;
NSError * error = nil;
NSArray * rows = [NSArray arrayWithContentsOfCSVString:csv encoding:[csv fastestEncoding] error:&error];
This will return the same structure (an NSArray of NSArrays of NSStrings), but it will also provide you with an NSError object if it encounters a syntax error in the CSV file (ie, malformed CSV).
I think you need an NSString, not an NSURL object to pass to the parser so the extra part you are doing with changing the NSString to an NSURL is the issue. Looking at the CHCSVParser documentation, it looks like he wants NSString in the init.
So maybe you could do something like:
NSError *err = [[[NSError alloc] init] autorelease];
NSString *lunchFileURL = [[NSString stringWithFormat:#"http://www.somewhere.com/LunchSpecials.csv"] stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString *lunchFile = [NSString stringWithContentsOfURL:[NSURL URLWithString:lunchFileURL] encoding:NSUTF8StringEncoding error:&err];
CHCSVParser *p = [[CHCSVParser alloc] initWithContentsOfCSVString:lunchFile usedEncoding:&encoding error:nil];