how to use specific encoding on NSXMLParser ?
It default on utf-8 ,I want to use on tis-620
any idea?
If there is a correct encoding info inside the xml (first line, e.g. <?xml version="1.0" encoding="UTF-8"?>) everything should work out of the box.
Related
i need to convert a ISO-8859-1 file to utf-8 encoding, without loosing content intormations...
i have a file which looks like this:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
Not i want to encode it into UTF-8.
I tried following:
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
ts=new String(f.getBytes("UTF-8"), "UTF-8")
g=new File('c:/temp/myutf8.xml').write(ts)
didnt work due to String incompatibilities.
Then i read something about bytestreamreaders/writers/streamingmarkupbuilder and other...
then i tried
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
mb = new groovy.xml.StreamingMarkupBuilder()
mb.encoding = "UTF-8"
new OutputStreamWriter(new FileOutputStream('c:/temp/myutf8.xml'),'utf-8') << mb.bind {
mkp.xmlDeclaration()
out << f
}
this was totally not that what i wanted..
I just want to get the content of an xml read with an ISO-8859-1 reader and then put it into a new (old) file... why this is so complicated :-/
The result should just be, and the file should be really encoded in utf-8:
<?xml version="1.0" encoding="UTF-8" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
Thanks for any answers
Cheers
def f=new File('c:/data/myiso88591.xml').getText('ISO-8859-1')
new File('c:/data/myutf8.xml').write(f,'utf-8')
(I just gave it a try, it works :-)
same as in java: the libraries do the conversion for you...
as deceze said: when you specify an encoding, it will be converted to an internal format (utf-16 afaik). When you specify another encoding when you write the string, it will be converted to this encoding.
But if you work with XML, you shouldn't have to worry about the encoding anyway because the XML parser will take care of it. It will read the first characters <?xml and determines the basic encoding from those characters. After that, it is able to read the encoding information from your xml header and use this.
Making it a little more Groovy, and not requiring the whole file to fit in memory, you can use the readers and writers to stream the file. This was my solution when I had files too big for plain old Unix iconv(1).
new FileOutputStream('out.txt').withWriter('UTF-8') { writer ->
new FileInputStream('in.txt').withReader('ISO-8859-1') { reader ->
writer << reader
}
}
http://www.hjsoft.com/blog/link/A_Useful_Example_in_Java_Ruby_and_Groovy
i have problem when i parse xml because i have this caracter ö
<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="ok">
<mediaid>abösjdk3</mediaid>
<mediaurl>http://twitöic.com/abc123</mediaurl>
</rsp>
the building:
parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0x9A 0x74 0x68 0x65
<mediaid>ab\232sjdk3</mediaid>
^
other question please if want parse this > 6 < 12 month i will have problem,i not want replace > samone have solution?
You'll have this problem with any parser, not only with objective-c.
That character isn't encoded as UTF-8 and as such it will halt any parser.
Either remove the encoding information or change for the correct value.
Edited to answer a comment
i use GDataXmlNode to parse and in my xml file i not use <?xml version="1.0" encoding="UTF-8"?> – cs1.6
IF the original XML file does not have the encoding attribute, then either when you instantiate the parser, or load the XML file, inform the proper encoding, which I have no idea what it is.
Because for the way that the O.P. is posted, it implies that the character ö is encoded as \232. However, the decimal 232 in ISO-8859-1 represents the character è. The character ö is represented as \246.
Go through this, it will help...
How do I access the encoding for an xml-file using TBXML?
To clarify: I would like to acsess the top row in an xml-file and there get the value of encoding by using TBXML.
?xml version="1.0" encoding="utf-8"?
You can't, it's not part of the xml document.
Why do you want to know the encoding?
how to parse XML which contains data in Norwegian language ?
Does i need any type of encoding with NSParser ?
Thanks.
I guess you are worried about non-ASCII characters in the XML file. Well you don't need to. The first line of an XML file should look something like:
<?xml version="1.0" encoding="UTF-8"?>
where the encoding attribute tells you which character set was used to encode the characters in the file. NSXMLParser will use that line to determine which character set it will use. Once it gets to your methods, all the text will be in NSStrings which will be able to cope with your Norwegian characters automatically.
All you need to be concerned about is that the file really is encoded in the character set that the first line says it is.
The xml is the language which don't concern which kind of language you are using!! In xml there should be one start tag and it's end tag. Then you can parse using xml parsing.
here is the tutorial to understand xml and
here is the link to tutorial to parse the xml file.
may this will be help full to your problem.
I have an XML document that may have shift-jis encoded data in it and I'm trying to parse it using an NSXMLParser object.
Ordinarily I assume the document is UTF8 encoded and all is well - does anyone know if/how I can determine if an element is shift-jis encoded and then how to decode it?
Thanks
An XML document is UTF-8 encoded unless it has an XML declaration stating otherwise, for example:
<?xml version="1.0" encoding="shift_jis"?>
or:
<?xml version="1.0" encoding="cp932"?>
Any XML parser should detect the encoding given in the XML declaration. (Some parsers may not support some of the CJK codecs so will complain, but AIUI NSXMLParser should be fine.)
If you've got a file with Shift-JIS byte sequences that does not have such a stated encoding, or which contains Shift-JIS byte sequences in some elements and UTF-8 in others, what you have is not well-formed; it's not an XML document at all and no parser will read it.
If you've just got a missing encoding declaration, you really need to fix it at the source end, but in the meantime hacking in a suitable XML declaration or transcoding the bytes manually from Shift-JIS to UTF-8 before feeding it into the parser should help.