Decode HTML from XML with NewLine - iphone

First I parse XML and retrieve this:
<p><strong>Berns Salonger - the City's
The I decode it with MWFeedParser (stringByDecodingHTMLEntities) and retrieve this:
<p><strong>Berns Salonger - the City's Ideal Meeting Place
Note that this is only one line of many many lines which includes alot of tags.
Then I replace with \n and the console writes out the text with new lines. Everything is great except that all the other HTML tags is still there.
So I then run stringByConvertingHTMLToPlainText and all HTML tags dissapears. But also my replaced new lines.
How can I decode HTML without and at the same time replace with \n to print out a nice formatted text in a UITextView?

Instead of replacing <br> with \n, try replacing it with an HTML entity for newline:
. Then, when you call stringByConvertingHTMLToPlainText, it will convert the entity to an actual newline character.

Related

\n\n\ to break lines in unicode

I'm using unicode to parse from pyton ,and I'm using \n\n\ to break lines
the parsing is working ok,as I see that the result is with correct breaks lines,
but still in the final html I get the is no line breaking
To create a line break in HTML , use the <br> tag.

XML Parsing issue with special characters

I am trying to display “Administrative File & Express” but it is displaying as "Express". So I am unable to show anything that is before the “&”.
You need to escape chars like '&' in XML Parsing. See following link...
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
What characters do I need to escape in XML documents?
Now check XML you are receiving. if you are not receiving chars with escape sequence then you need to handle it in your code.....
Write here if you need further details.....

How to parse special characters in XML for iPad?

I am getting problem while parsing xml files that contains some special characters like single quote,double quote (', "")etc.I am using NSXMLParser's parser:foundCharacters:method to collect characters in my code.
<synctext type = "word" >They raced to the park Arthur pointed to a sign "Whats that say" he asked Zoo said DW Easy as pie</synctext>
When i parse and save the text from above tag of my xml file,the resultant string is appearing,in GDB, as
"\n\t\tThey raced to the park Arthur pointed to a sign \"Whats that say\" he asked Zoo said DW Easy as pie";
Observe there are 2 issues:
1)Unwanted characters at the beginning of the string.
2)The double quotes around Whats that say.
Can any one please help me how to get rid of these unwanted characters and how to read special characters properly.
NSString*string =[string stringByTrimmingCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:#" \n\t"]];
The parser is apparently returning exactly what's in the string. That is, the XML was coded with the starting tag on one line, a newline, two tabs, and the start of the string. And quotes in the string are obviously there in the original (and it's not clear in at least this example why you'd want to delete them).
But if you want these characters gone then you need to post-process the string. You can use Rams' statement to eliminate the newline and tabs, and stringByReplacingOccurrencesOfString:WithString: to zap the quotes.
(Note that some XML parsers can be instructed to return strings like this with the leading/trailing stuff stripped, but I'm not sure about this one. The quotes will always be there, though.)

regex_replace to replace certain html tags

Is there a way to convert BR tags and/or DIV tags to new lines so it will format correctly when I use an in a mailto? I was thinking I should look for any P, DIV, and BR tags and replace them with a new line character. So anywhere there is a closing tag put the new line character and remove the opening tag. After I do the above I will remove the rest of the html with remove_html="1" but I want to keep the paragraph format.
I thought it can be done using regex_replace but I'm not sure how to write it. Anyone know?
Do not parse HTML files using regex, use HTML parser (HTML::TreeBuilder or something similar that can do in line changes) module, or in this case, even better use XSLT transformations.

query about xml parsing

i just want to knw,is there any boundations in xml parsing with characters
like can we parse a word containing some characters like
"frühe" containing "ü"
"böser" containing "ö"
while i am parsing my xml,which is few different languages, some characters are like the above.
and wen i saw in console, it get interpted,exaactly wen it reacher "ü"
becoz at console it prints "fr"
so can someone provide me some ideas about this thing
regards
shishir
If you are using the standard NSXmlParser class and the XML file has the correct encoding= attribute then you shouldn't have anything to worry about. The console output probably isn't unicode-aware so it is interpreting the multi-byte UTF-8 characters literally. Try showing the parsed text in a UIAlertView or some other UI element and see if you still have problems.