NSXMLParser rss issue NSXMLParserInvalidCharacterError - iphone

NSXMLParserInvalidCharacterError # 9
This is the error I get when I hit a weird character (like quotes copied and pasted from word to the web form, that end up in the feed). The feed I am using is not giving an encoding, and their is no hope for me to get them to change that. This is all I get in the header:
< ?xml version="1.0"?>
< rss version="2.0">
What can I do about illegal characters when parsing feeds? Do I sweep the data prior to the parse? Is there something I am missing in the API? Has anyone dealt with this issue?

NSString *dataString = [[[NSString alloc] initWithData:webData encoding:NSASCIIStringEncoding] autorelease];
NSData *data = [dataString dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
Fixed my problems...

The NSString -initWithData:encoding: method returns nil if it fails, so you can try one encoding after another until you find one that converts. This doesn't guarantee that you'll convert all the characters correctly, but if your feed source isn't sending you correctly encoded XML, then you'll probably have to live with it.
The basic idea is:
// try the most likely encoding
NSString xmlString = [[NSString alloc] initWithData:xmlData
encoding:NSUTF8StringEncoding];
if (xmlString == nil) {
// try the next likely encoding
xmlString = [[NSString alloc] initWithData:xmlData
encoding:NSWindowsCP1252StringEncoding];
}
if (xmlString == nil) {
// etc...
}
To be generic and robust, you could do the following until successful:
1.) Try the encoding specified in the Content-Type header of the HTTP response (if any)
2.) Check the start of the response data for a byte order mark and if found, try the indicated encoding
3.) Look at the first two bytes; if you find a whitespace character or '<' paired with a nul/zero character, try UTF-16 (similarly, you can check the first four bytes to see if you have UTF-32)
4.) Scan the start of the data looking for the <?xml ... ?> processing instruction and look for encoding='something' inside it; try that encoding.
5.) Try some common encodings. Definitely check Windows Latin-1, Mac Roman, and ISO Latin-1 if your data source is in English.
6.) If none of the above work, you could try removing all bytes greater than 127 (or substitute '?' or another ASCII character) and convert the data using the ASCII encoding.
If you don't have an NSString by this point, you should fail. If you do have an NSString, you should look for the encoding declaration in the <?xml ... ?> processing instruction (if you didn't already in step 4). If it's there, you should convert the NSString back to NSData using that encoding; if it's not there, you should convert back using UTF-8 encoding.
Also, the CFStringConvertIANACharSetNameToEncoding() and CFStringConvertEncodingToNSStringEncoding() functions can help get the NSStringEncoding that goes with the encoding name form the Content-Type header or the <?xml ... ?> processing instruction.

You can also remove that encoding line from xml like this:
int length = str.length >100 ? 100:str.length;
NSString*mystr= [str stringByReplacingOccurrencesOfString:#"encoding=\".*?\""
withString:#""
options:NSRegularExpressionSearch
range:NSMakeRange(0, length)];

Related

Subscript and Superscripts in CDATA of an xml file. Using UILabel to display the parsed XML contents

I need to display subscripts and superscripts (only arabic numerals) within a UILabel. The data is taken from an XML file. Here is the snippet of XML file:
<text><![CDATA[Hello World X\u00B2 World Hello]]></text>
Its supposed to display X2 (2 as superscript). When I read the string from the NSXMLParser and display it in the UILabel, it displays it as X\u00B2. Any ideas on how to make it work?
I think you can do something like this, assuming the CDATA contents have been read into an NSString and passed into this function:
-(NSString *)removeUnicodeEscapes:(NSString *)stringWithUnicodeEscapes {
unichar codeValue;
NSMutableString *result = [stringWithUnicodeEscapes mutableCopy];
NSRange unicodeLocation = [result rangeOfString:#"\\u"];
while (unicodeLocation.location != NSNotFound) {
// Get the 4-character hex code
NSRange charCodeRange = NSMakeRange(unicodeLocation.location + 2, 4);
NSString *charCode = [result substringWithRange:charCodeRange];
[[NSScanner scannerWithString:charCode] scanHexInt:&codeValue];
// Convert it to an NSString and replace in original string
NSString *unicodeChar = [NSString stringWithFormat:%C", codeValue];
NSRange replacementRange = NSMakeRange(unicodeLocation.location, 6);
[result replaceCharactersInRange:replacementRange withString:unicodeChar];
unicodeLocation = [result rangeOfString:#"\\u"];
}
return result;
}
I haven't had a chance to try this out, but I think the basic approach would work
\u00B2 is not any sort of XML encoding for characters. Apparently your data source has defined their own encoding scheme (which, frankly, is pretty stupid as XML is capable of encoding these directly, using entities outside of CDATA blocks).
In any case, you'll have to write your own parser that handles \u#### and converts that to the correct character.
I asked the question to my colleague and he gave me a nice and simple workaround. Am describing it here, in case others also get stuck at this.
Firstly goto this link. It has a list of all subscripts and superscripts. For example, in my case, I clicked on "superscript 0". In the following HTML page detailing "superscript 0", goto "Java Data" section and copy the "⁰". You can either place this directly in XML or write a simple regex in obj-c to replace \u00B2 with "⁰". And you will get nice X⁰. Do the same fro anyother superscript or subscript that you might want to display.

Encode and Decode using UTF-8 in iPhone

I'm looking for an example demonstrating how I can encode and then decode the same string using UTF-8. Encode and then Decode means I want to implement the methods in 2 areas where one can encode it and another is able to decode it.
I have seen the API but I didn't get much success:
stringWithCString:encoding:
stringWithUTF8String:
stringWithCString:(const char *)cString encoding:(NSStringEncoding)enc;
EDITED
I have the string øæ-test-2.txt which I am encoding as follows:
char *s = "øæ-test-2.txt";
NSString *enc = [NSString stringWithCString:s encoding:NSASCIIStringEncoding];
but am getting øæ-test-2.txt as output.
Now I want to get back the original string back i.e. øæ-test-2.txt
EDITED
I am getting øæ-test-2.txt from server and I need øæ-test-2.txt by decoding it. I am able to get the output from the link : http://www.cafewebmaster.com/online_tools/utf_decode
Please try to use the link and you will understand my concern.
It would be highly appreciated if anyone can give some hint, tutorial or point me in the right direction.
Regards
To turn an NSString object into a UTF8 C-string, use UTF8String
char *utf8string = [#"A string with ümläuts" UTF8String];
To turn a UTF8 C-string into an NSString object, use stringWithUTF8String: or initWithUTF8String:
NSString *string = [NSString stringWithUTF8String:utf8string];
Note that NSString objects are implemented as UTF-16, so you can't really have a "UTF-8 NSString" (and the encoding should be treated as an implementation detail, anyway).
Instead of
char *utf8string = [#"A string with ümläuts" UTF8String];
This should be
const char *utf8string = [#"A string with ümläuts" UTF8String];
Otherwise you have an incompatible type issue.

Compress/Decompress NSString in objective-c (iphone) using GZIP or deflate

I have a web-service running on Windows Azure which returns JSON that I consume in my iPhone app.
Unfortunately, Windows Azure doesn't seem to support the compression of dynamic responses yet (long story) so I decided to get around it by returning an uncompressed JSON package, which contains a compressed (using GZIP) string.
e.g
{"Error":null,"IsCompressed":true,"Success":true,"Value":"vWsAAB+LCAAAAAAAB..etc.."}
... where value is the compressed string of a complex object represented in JSON.
This was really easy to implement on the server, but for the life of me I can't figure out how to decompress a gzipped NSString into an uncompressed NSString, all the examples I can find for zlib etc are dealing with files etc.
Can anyone give me any clues on how to do this? (I'd also be happy for a solution that used deflate as I could change the server-side implementation to use deflate too).
Thanks!!
Steven
Edit 1: Aaah, I see that ASIHTTPRequest is using the following function in it's source code:
//uncompress gzipped data with zlib
+ (NSData *)uncompressZippedData:(NSData*)compressedData;
... and I'm aware that I can convert NSString to NSData, so I'll see if this leads me anywhere!
Edit 2: Unfortunately, the method described in Edit 1 didn't lead me anywhere.
Edit 3: Following the advice below regarding base64 encoding/decoding, I came up with the following code. The encodedGzippedString is as you can guess, a string "Hello, my name is Steven Elliott" which is gzipped and then converted to a base64 string. Unfortunately, the result that prints using NSLog is just blank.
NSString *encodedGzippedString = #"GgAAAB+LCAAAAAAABADtvQdgHEmWJSYvbcp7f0r1StfgdKEIgGATJNiQQBDswYjN5pLsHWlHIymrKoHKZVZlXWYWQMztnbz33nvvvffee++997o7nU4n99//P1xmZAFs9s5K2smeIYCqyB8/fnwfPyK+uE6X2SJPiyZ93eaX+TI9Lcuiatvx/wOwYc0HGgAAAA==";
NSData *decodedGzippedData = [NSData dataFromBase64String:encodedGzippedString];
NSData* unGzippedJsonData = [ASIHTTPRequest uncompressZippedData:decodedGzippedData];
NSString* unGzippedJsonString = [[NSString alloc] initWithData:unGzippedJsonData encoding:NSASCIIStringEncoding];
NSLog(#"Result: %#", unGzippedJsonString);
After all this time, I finally found a solution to this problem!
None of the answers above helped me, as promising as they all looked. In the end, I was able to compress the string on the server with gzip using the chilkat framework for .net ... and then decompress it on the iphone using the chilkat framework for iOS (not yet released, but available if you email the guy directly).
The chilkat framework made this super easy to do so big thumbs up to the developer!
Your "compressed" string is not raw GZIP'd data, it's in some encoding that allows those bytes to be stored in a string-- looks like base-64 or something like it. To get an NSData out of this, you'll need to decode it into the NSData.
If it's really base-64, check out this blog post an accompanying code:
http://cocoawithlove.com/2009/06/base64-encoding-options-on-mac-and.html
which will do what you want.
Once you have an NSData object, the ASIHTTPRequest method will probably do as you like.
This worked for me:
from a string gzipeed, then base64 encoded
to un-gzipped string (all utf8).
#import "base64.h"
#import "NSData+Compression.h"
...
+(NSString *)gunzipBase64StrToStr:(NSString *)stringValue {
//now we decode from Base64
Byte inputData[[stringValue lengthOfBytesUsingEncoding:NSUTF8StringEncoding]];//prepare a Byte[]
[[stringValue dataUsingEncoding:NSUTF8StringEncoding] getBytes:inputData];//get the pointer of the data
size_t inputDataSize = (size_t)[stringValue length];
size_t outputDataSize = EstimateBas64DecodedDataSize(inputDataSize);//calculate the decoded data size
Byte outputData[outputDataSize];//prepare a Byte[] for the decoded data
Base64DecodeData(inputData, inputDataSize, outputData, &outputDataSize);//decode the data
NSData *theData = [[NSData alloc] initWithBytes:outputData length:outputDataSize];//create a NSData object from the decoded data
//NSLog(#"DATA: %# \n",[theData description]);
//And now we gunzip:
theData=[theData gzipInflate];//make bigger==gunzip
return [[NSString alloc] initWithData:theData encoding:NSUTF8StringEncoding];
}
#end
I needed to compress data on the iPhone using Objective-c and decompress on PHP. Here is what I used in XCode 11.5 and iOS 12.4:
iOS Objective-c Compression Decompression Test
Include libcompression.tbd in the Build Phases -> Link Binary With Library. Then include the header.
#include "compression.h"
NSLog(#"START META DATA COMPRESSION");
NSString *testString = #"THIS IS A COMPRESSION TESTTHIS IS A COMPRESSION TESTTHIS IS A COMPRESSION TESTTHIS IS A COMPRESSION TESTTHIS IS A COMPRESSION TESTTHIS IS A COMPRESSION TEST";
NSData *theData = [testString dataUsingEncoding:NSUTF8StringEncoding];
size_t src_size = theData.length;
uint8_t *src_buffer = (uint8_t*)[theData bytes];
size_t dst_size = src_size+4096;
uint8_t *dst_buffer = (uint8_t*)malloc(dst_size);
dst_size = compression_encode_buffer(dst_buffer, dst_size, src_buffer, src_size, NULL, COMPRESSION_ZLIB);
NSLog(#"originalsize:%zu compressed:%zu", src_size, dst_size);
NSData *dataData = [NSData dataWithBytes:dst_buffer length:sizeof(dst_buffer)];
NSString *compressedDataBase64String = [dataData base64EncodedStringWithOptions:0];
NSLog(#"Compressed Data %#", compressedDataBase64String);
NSLog(#"START META DATA DECOMPRESSION");
src_size = compression_decode_buffer(src_buffer, src_size, dst_buffer, dst_size, NULL, COMPRESSION_ZLIB);
NSData *decompressed = [[NSData alloc] initWithBytes:src_buffer length:src_size];
NSString *decTestString;
decTestString = [[NSString alloc] initWithData:decompressed encoding:NSASCIIStringEncoding];
NSLog(#"DECOMPRESSED DATA %#", decTestString);
free(dst_buffer);
On the PHP side I used the following function to decompress the data:
function decompressString($compressed_string) {
//NEED RAW GZINFLATE FOR COMPATIBILITY WITH IOS COMPRESSION_ZLIB WITH IETF RFC 1951
$full_string = gzinflate($compressed_string);
return $full_string;
}

iPhone SDK - stringWithContentsOfUrl ASCII characters in HTML source

When I fetch the source of any web page, no matter the encoding I use, I always end up with &# - characters (such as © or ®) instead of the actual characters themselves. This goes for foreign characters as well (such as åäö in swedish), which I have to parse from "&Aring" and such).
I'm using
+stringWithContentsOfUrl: encoding: error;
to fetch the source and have tried several different encodings such as NSUTF8StringEncoding and NSASCIIStringEncoding, but nothing seems to affect the end result string.
Any ideas / tips / solution is greatly appreciated! I'd rather not have to implement the entire ASCII table and replace all occurrances of every character... Thanks in advance!
Regards
I'm using
+stringWithContentsOfUrl: encoding: error;
to fetch the source and have tried several different encodings such as NSUTF8StringEncoding and NSASCIIStringEncoding, but nothing seems to affect the end result string.
You're misunderstanding the purpose of that encoding: argument. The method needs to convert bytes into characters somehow; the encoding tells it what sequences of bytes describe which characters. You need to make sure the encoding matches that of the resource data.
The entity references are an SGML/XML thing. SGML and XML are not encodings; they are markup language syntaxes. stringWithContentsOfURL:encoding:error: and its cousins do not attempt to parse sequences of characters (syntax) in any way, which is what they would have to do to convert one sequence of characters (an entity reference) into a different one (the entity, in practice meaning single character, that is referenced).
You can convert the entity references to un-escaped characters using the CFXMLCreateStringByUnescapingEntities function. It takes a CFString, which an NSString is (toll-free bridging), and returns a CFString, which is an NSString.
Are you sure they originally are not in Å form? Try to view the source code in a browser first.
That really, really sucks. I wanted to convert it directly and the above solution isn't really a good one, so I just wrote my own ascii-table converter (static) class. Works as it should have worked natively (though I have to fill in the ascii table myself...)
Ideas for optimization? ("ASCII" is a static NSDictionary)
#implementation InternetHelper
+(NSString *)HTMLSourceFromUrlWithString:(NSString *)str convertASCII:(BOOL)state
{
NSURL *url = [NSURL URLWithString:str];
NSString *source = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];
if (state)
source = [InternetHelper ConvertASCIICharactersInString:source];
return source;
}
+(NSString *)ConvertASCIICharactersInString:(NSString *)str
{
NSString *ret = [NSString stringWithString:str];
if (!ASCII)
{
NSString *path = [[NSBundle mainBundle] pathForResource:kASCIICharacterTableFilename ofType:kFileFormat];
ASCII = [[NSDictionary alloc] initWithContentsOfFile:path];
}
for (id key in ASCII)
{
ret = [ret stringByReplacingOccurrencesOfString:key withString:[ASCII objectForKey:key]];
}
return ret;
}
#end

How do I encode "&" in a URL in an HTML attribute value?

I'd like to make a URL click able in the email app. The problem is that a parameterized URL breaks this because of "&" in the URL. The body variable below is the problem line. Both versions of "body" are incorrect. Once the email app opens, text stops at "...link:". What is needed to encode the ampersand?
NSString *subject = #"This is a test";
NSString *encodedSubject =
[subject stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
//NSString *body = #"This is a link: <a href='http://somewhere.com/two.woa/wa?id=000&param=0'>click me</a>"; //original
NSString *body = #"This is a link: <a href='http://somewhere.com/two.woa/wa?id=000&param=0'>click me</a>"; //have also tried &
NSString *encodedBody =
[body stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString *formattedURL = [NSString stringWithFormat: #"mailto:myname#somedomain.com?subject=%#&body=%#", encodedSubject, encodedBody];
NSURL *url = [[NSURL alloc] initWithString:formattedURL];
[[UIApplication sharedApplication] openURL:url];
the ampersand would be %26 for HEX in URL Encoding standards
I've been using -[NSString gtm_stringByEscapingForURLArgument], which is provided in Google Toolbox for Mac, specifically in GTMNSString+URLArguments.h and GTMNSString+URLArguments.m.
You can use a hex representation of the character, in this case %26.
you can simply use CFURLCreateStringByAddingPercentEscapes with CFBridgingRelease for ARC support
NSString *subject = #"This is a test";
// Encode all the reserved characters, per RFC 3986
// (<http://www.ietf.org/rfc/rfc3986.txt>)
NSString *encodedSubject =
(NSString *) CFBridgingRelease(CFURLCreateStringByAddingPercentEscapes(kCFAllocatorDefault,
(CFStringRef)subject,
NULL,
(CFStringRef)#"!*'();:#&=+$,/?%#[]",
kCFStringEncodingUTF8));
You use stringByAddingPercentEscapesUsingEncoding, exactly like you are doing.
The problem is that you aren't using it enough. The format into which you're inserting the encoded body also has an ampersand, which you have not encoded. Tack the unencoded string onto it instead, and encode them (using stringByAddingPercentEscapesUsingEncoding) together.
<a href='http://somewhere.com/two.woa/wa?id=000&param=0'>click me</a>
Is correct, although ‘&’ is more commonly used than ‘&’ or ‘,’.
If the ‘stringByAddingPercentEscapesUsingEncoding’ method does what it says on the tin, it should work(*), but the NSString documentation looks a bit unclear on which characters exactly are escaped. Check what you are ending up with, the URL should be something like:
mailto:bob#example.com?subject=test&body=Link%3A%3Ca%20href%3D%22http%3A//example.com/script%3Fp1%3Da%26amp%3Bp2%3Db%22%3Elink%3C/a%3E
(*: modulo the usual disclaimer that mailto: link parameters like ‘subject’ and ‘body’ are non-standard, will fail in many situations, and should generally be avoided.)
Once the email app opens, text stops at "...link:".
If ‘stringByAddingPercentEscapesUsingEncoding’ is not escaping ‘<’ to ‘%3C’, that could be the problem. Otherwise, it might not be anything to do with escapes, but a deliberate mailer-level restriction to disallow ‘<’. As previously mentioned, ?body=... is not a reliable feature.
In any case, you shouldn't expect the mailer to recognise the HTML and try to send an HTML mail; very few will do that.
Example of use of %26 instead of & without this attributes arrived in PHP as an array!
var urlb='/tools/lister.php?type=101%26ID='+ID; // %26 instead of &
window.location.href=urlb;