I can't use dashes in XPATH on iphone using libxml2 - iphone

I'm trying to parse HTML data using KissXML for iphone. I've noticed that I can't have dashes in the id() tag, otherwise it won't evaluate. For example, if I'm trying to get the element at I would do
id("foo")
However, if I try to get at element , and I try
id("foo-bar")
the libxml2 XPATH engine doesn't seem to return anything. It works using the XPATH check for firefox, though. Anyone run into this issue and know of a reason why it's happening or have a workaround (besides using the absolute XPATH path?)

I don't know much about KissXML or your project. If you are only doing some light Xpath queries I suggest you just try this:
http://cocoawithlove.com/2008/10/using-libxml2-for-parsing-and-xpath.html
It's basically 2 methods
NSArray *PerformXMLXPathQuery(NSData *document, NSString *query);
NSArray *PerformHTMLXPathQuery(NSData *document, NSString *query);
I had a lot of easy success with this method.

Related

TBXML Parser with Special character getting EXC_BAD_ACCESS

I am using TBXML Parser ver. 1.4, When I am Parsing this kind of following responce via TBXML parser, getting EXC_BAD_ACCESS...
<trainingOrganization xsi:type="xsd:string">~!##$%^&*()_+?> <,./;'{}|<":;'></trainingOrganization>
Stuck with this issue.
As I observe that <> data is failed to parse by TBXML...
Thanks in Advance.
One of the best way to use CDATA, anything inside of CDATA treat as xml document so if you have special character like ';:,.''<>' then parser will ignore it. So I always prefer to use CDATA and advise you to should use it.
I think it's nothing TBXML parser specific.
Chars like <>\"' have to be masked (by a \), so it's maybe only the coding of your xml file or the settings for its parsing.
Otherwise the parser thinks it is :
<trainingOrganization xsi:type="xsd:string">
~!##$%^&*()_+?
> *<-- closing the previous element*
<,./;'{}|
<":;'>
</trainingOrganization>
If you can't find any settings or coding to let do it automaticaly, try it by replacing the characters with a \ in front of it before parsing.
An other possibility is to use xml correctly and not allow the structure to set <> between elements, which is an easiest way i think.

Libxml parser iphone issue

I am using libxml for parsing the xml. I have referred the XMLPerformance example by apple.
Following is the part of xml document which is causing a problem as it contain the "&nbsp" string. i cannot just replace this "&nbsp" with any other string is i am getting data in didReceiveData delegate method and i am parsing that data.
Is there any solution to resolve this issue which is coming because of special character?
<ParentTag>
<AUTHOR>Actavis"Totowa "LLC</AUTHOR>
<SPL_INACTIVE_ING>lactose"monohydrate"/"magnesium"stearate"/"starch"pregelatinized"/"talc</SPL_INACTIVE_ING>
</ParentTag>
Any help will be appreciated.
Thanks in advance
To make sure your XML is well format, you can test you XML first with any online XML validator and then later you should parse that.

NSXMLParser & Problem

My xml is
<categoryname>Baby</categoryname>
<id>244</id>
<categoryname>Boats & Watercraft</categoryname>
<idc>1026</id>
I am getting first two nodes.My problem is the third node i am getting Boats only (parser foundCharacters) and & kills the nsxmlparser. I am searching this forum and other websites most of them post use & instead of & in xml . My xml is coming from server and i wont update xml now.Is there any other option to solve this issue.
If you insist on sending invalid XML from your server this should solve it:
[xmlString stringByReplacingOccurrencesOfString:#"&" withString:#"&"]:
// parse xmlString
[categoryName stringByReplacingOccurrencesOfString:#"&" withString:#"&"]:
If your xml might be coming from some php script then before sending it you have to make change in your script that when & character occurs it substitute with other character like $ or any other and then send it. And when you parse that xml change that symbol to your required symbol.
I have also done the same thing.

NSXML Parsing - Distinguishing Nodes

I am trying to extract the doc-num from the following xml using NSXML. At this point I am able to iterate through all the nodes using the NSXML parser event, but I am trying to distinguish between the doc-num in the input node from the one in the output node.
How can I do this? I am a bit lost on how to get this to work for my iphone app. Also, is there a simpler way than the event based NSXML?
<xmt:input>
<xmt:app-refer>
<doc-id doc-id-type="docdb">
<country>MD</country>
<doc-num>20050130</doc-num>
<kc>A</kc>
<date>20050130</date>
</doc-id>
</xmt:app-refer>
</xmt:input>
<xmt:output>
<xmt:app-refer>
<doc-id doc-id-type="epodoc">
<doc-num>MD20050000130</doc-num>
<date>20050130</date>
</doc-id>
</xmt:app-refer>
</xmt:output>
Here is a tutorial that shows XML parsing using GDataXMLParser.
how-to-read-and-write-xml-documents-with-gdataxml
GDataXMLParser is better than NSXMLParser since latter is slower.

Using regexlite to parse <a href src="">Links</a> out of a NSString

I am writing an iPhone app that has to pull raw HTML data off a website an grab the url of the links and the displayed text of a link.
For example in the like Click here to go to google
It would pull grab
url = www.google.com
text = Click Here to go to google
I'm using the regexlite library but i'm in no way an expert on regular expressions i have tried several things to get this working.
I want to use the following code
NSString *searchString = #"$10.23, $1024.42, $3099";
NSString *regexString = #"\\$((\\d+)(?:\\.(\\d+)|\\.?))";
NSArray *capturesArray = NULL;
capturesArray = [searchString arrayOfCaptureComponentsMatchedByRegex:regexString];
So my question is can someone tell me what the searchString would be to parse html links or point me to a clear tutorial on how regexlite works i have tired reading the documentation at http://regexkit.sourceforge.net/RegexKitLite/ and i dont understand it.
Thanks in advance,
Zen_silence
In short, don't do that. Regular expressions are a horrible way to parse HTML. HTML documents are highly structured with a hierarchy of tags whose contents may span lines without said lines appearing in the rendered form.
Assuming well structured HTML, you can use an XML parser.
In particular, the iPhone offers the NSXMLParser and some good examples of usage therein.
searchString would be the whole raw HTML text, and regexString should be more like:
NSString *regexString = #"href=\"(.*)\">(.*)<";
Then you would use capturing matches to pull out match1 and match2, repeating the match through the HTML text using the Range option for searching so that you would skip past what you had already searched...
I don't know what you are trying to do with searchString and the numbers though.
In case anyone else has this same question the regex string to match an html link is
NSString *regexString = #"<a href=([^>]*)>([^>]*) - ";
The Oreilly book "Mastering Regular Expressions" helped me figure this out really quickly i highly recommend reading if you are trying to use regular expressions.