I am using ROME library for parsing feeds. I want to know how can I identify a link as an XML link or a normal link. Is there a support for this in ROME.
It sounds like you are saying that you will be getting a bunch of links from somewhere (RSS feed content or something). Then you want to go through them and determine if they are RSS/Atom feeds or if they are regular HTML pages.
You could probably attempt to fetch it with Rome and then see if it throws an exception. A better option might be to pull in the document at the end of each link and see what the header says it is.
Keep in mind that retrieving each link is may run into a problem if someone links to a very large file.
You might look at using the HEAD command to find out what type of file you are going to get at the end of a link before downloading the entire thing.
Related
I'm trying to create a web page using Perfect(perfect.org), Where users will browse and upload files. Can anyone tell me how can I get the progress of file upload?
perfect.org-fileUploads
Refer above link and Do as-usual concept following in HTML-JS-PHP or HTML-JS-JSP or other programming
In other words
you can receive response status in percentage from server-side and display it to client or put loder while uploading the file
Thank you
Before an official solution released from PerfectlySoft Inc. for this feature request, you could try splitting the file into small pieces and upload them one by one, then merge them back to the server - since there is no such an industrial standard to apply, all other web servers either provide different solutions or simply stay away from it.
I have been looking around for a while and not found anything useful, also not sure if I have worded the question in the clearest fashion so apologies
I have a section of an app I am building called 'Company News'. The company in question has a news page on their website which displays a title, an excerpt of text and a read more option.
At the minute in the iPhone application I just have a UIWebView which links to that URL, displays an error if no connection is available. However, if my user clicks a story to read the news obviously it opens up a new page, I want to avoid having to build in 'back' and 'forward' buttons and stay away from it looking like a browser within the app.
With that said, I am looking for a way to just extract that data from the website and just display it in my app as raw text. I am not particularly bothered about rich text formatting or anything fancy. I would just like the title and body of text.
Is this possible?
In essence, then, you are looking for an HTML parser.
Assuming the HTML you wish to parse has a predictable format, the approach I would take is to load the HTML via whatever URL loading system you want - e.g. NSURLConnection, ASIHTTPRequest, etc.
Then you will need to parse the raw HTML. I use XPath. It requires that you learn the syntax, but it should work.
For more details about how you might use XPath for parsing HTML, see the second response to this question. You will need to link to libxml2 in your project then use XPath to extract the nodes of interest.
Scraping web pages in this way is fragile, though, because it depends on the structure of a page you don't control and which could be changed unpredictably.
I have an rss reader app which works perfectly on some feeds, but on others it just displays text and no images.
Is it the feed which should be adjusted to publish images or Im sure it is something with the way I read the stream.
Why does it work for some streams(shows entire posts, images, videos etc. from e.g. blogspot rss feeds) but not for other rss feeds?
I have read that the stream itself can be set to publish different content amounts.
How can I parse the feed so that all feeds will work correctly?
It's very likely that it's due to the feed themselves... and not to the reader app.
Feed publisher can decide whether they want to include all or parts only of their content in their feeds (RSS or Atom).
Then, even if they publish all the content, they may decide to publish it as raw text (just the text content) or full XHTML, which would allow the inclusion of images.
Unfortunately, there is little that you can do, except maybe ask the developer of your app to allow to show the original site/page and not just the feed entries.
Ok, so what are the differences between the feeds that work and the feeds that don't? What assumptions do you make about the RSS content, and in what situations are those assumptions not satisfied? When you run the code in the debugger, what do you see when you encounter a feed where images don't work? Also, what code? ;)
I want to add a pdf and word format of my resume to my portfolio page and make it downloadable. Does anyone have some simple script?
Add a link to the file and let the browser handle the download.
You may be over-complicating the problem. It's possible to use a href pointing to the location of the .pdf or .doc file, when a user clicks on this in their browser, generally they will be asked if they would like to save or open the file, depending on their OS/configuration.
If this is still confusing, leave a comment and I'll explain anything you don't get.
Create the PDF. Upload it. Add a link.
Save yourself 30 minutes tossing around with PDFGEN code.
You will want to issue or employ the Content-Disposition HTTP header to force the download otherwise some browsers may recognize the common file extensions and try to automatically open the file contents. It will feel more professional if the link actually downloads the file instead of launching an app - important for a resume I think.
Content-Disposition must be generated within the page from the server side as far as I know.
Option:
Upload your resume to Google Docs.
Add a link to the file on your portfolio page just as I do in the menu of my blog:
Use Google Docs Viewer passing to it the URL of the PDF as you can see in this link.
I am using the SIMPLE RSS reading example found at http://theappleblog.com/2008/08/04/tutorial-build-a-simple-rss-reader-for-iphone/
It uses parseXML to load the RSS feeds.
Here is the problem I am having. For the following RSS feed example, I am having trouble getting it to load the feed. Comes up with an error that it cannot connect. However on my Mac RSS Reader it works fine, so I know the link is good.
Any ideas on why it cannot load this particular feed but it can load others fine?
http://www.okstate.com/rss.dbml?db_oem_id=200&media=news
Thanks.
I've just released an open source RSS/Atom Parser for iPhone and hopefully it might be of some use.
I'd love to hear your thoughts on it too!
In my experience, HTML markup causes an RSS parser to fail in most cases. I've experienced a problem like this with a lot of parser classes I've come across (in search of the ultimate one, which I didn't find)
My guess is that entities such as
's
are responsible for your crash. That was usually the case with my crashes. This also lead to my decision to create a 'proxy server' to pre-parse the XML before sending it to the iPhone (which gives me the advantage of caching, scaling, and some other stuff). I do believe there are solid solutions out there, but is always difficult writing a parser for so many RSS implementations.
P.S: W3C validates this feed as 'valid', so it really is 'our' problem..
Your problem could lie with:
Unicode characters (i.e. I see some o's with two dots above them in the feed)
The code you have doesn't respect CDATA sections correctly
To find out which is the case, save the feed file to your local disk and load it via your code to make sure the error happens.
Do a binary search on the file to find out if a particular RSS entry is causing the problem (i.e. remove all but the first rss entry and see if the problem exists. If it does, then the problem is there, if it doesn't put half the rss entries back in the file and repeat)
I've been experiencing a similar issue. I haven't yet pinned down the answer, but I've noticed that RSS 2 tends to parse more successfully than the rest.
There are many RSS feeds that contain invalid XML, usually because they were hacked together on the server side using HTML templates by somebody who didn't understand XML. I've seen improperly escaped (or non-escaped) HTML post contents, missing close tags, badly nested tags, and so on.
If you want to be able to parse arbitrary feeds, you have to clean up bad XML. The usual way is to use the "htmlTidy" library, which is included in the OS. This can clean up XML as well as HTML.
This example you're following uses NSXMLParser -- I have no idea why. It's a lower-level API and it doesn't support tidying. I would suggest using NSXMLDocument instead. There's a flag in that API that will tell it to use tidy when parsing the XML. This API also returns you the XML as a handy tree of elements that's easy to work with.