Delete certain parts of html code on the iPhone after downloading it - iphone

I'm wondering if it's possible to edit out certain parts of the html code. It's really longa and as I parse it (with element parser), the deeper the parser goes into the code the slower it runs. Any ideas? I'm using a 3G as well.
edit:
For example on this site I'd want the posts and the usernames. Let's say there are like 50 replies on this thread and assume it will take a long time for the 3G phone to parse thousands of lines.
I'd want to remove the right links, the ads, the links at the top and bottom of the page too. Then I'd get the revised html and push it into the parser.

If you downloading a webpage using UIWebView, then you can use normal javascript to (by using the method stringByEvaluatingJavascriptFromString) to hide or remove any elements you want to remove from the view of the user.

Related

How to get array of pixels from browser window without using canvas

I'm attempting to get an array of pixels of the screen (web page) but i know of no way of doing that without using canvas (either straight-up or converting HTML dom elements into canvas, first). I need to capture every pixel on the screen and i don't know what operating system is going to be used so i can't request the display from the O/S, either. Is there a third-party tool, possibly, or a way to do this from the window object in the DOM?
I have only one idea. Maybe you should try to move this functionality to the server. You can use WkHtmlToPDF(http://wkhtmltopdf.org/) for saving websites as PDF, pdf file you can convert to an image and read pixels array.
As web developers with no control of the client machine, there's two approaches to getting a screenshot of a webpage:
Open the webpage in a headless browser on the server and make the screenshot there. phantomjs is a popular one.
(I'm including this for completeness, though you said you don't want to take that route): Use the canvas element on the client. html2canvas is an interesting project that re-renders an entire HTML document into a canvas element so a screenshot can be made.
If your use case allows it, you could of course instruct your users to take a screenshot and paste it in an upload form that can handle images from the clipboard. On Windows, that's a matter of hitting "Print Screen" and CTRL+V.
Here is an api to generate images from online web pages: http://www.page2images.com/Create-Website-Screenshot-with-Javascript-API

Adding a hit counter to Desktop Intelligence/Xi 3/Business Objects webpage?

for my company I am making a report in Xi3/Desktop Intelligence that pulls data via free hand SQL and makes a html file displaying the data, updating every 20mins. We want to incorporate a hit counter that will show us the number of times this report is being viewed.
I found a couple basic templates online. I tried copying and pasting them into a cell, but the output HTML page just displayed the full HTML (unrendered by my browser). I am decent at writing my own HTML, but I just do not understand how to stick my own HTML code in a dynamically updating report in Xi3.
Moreover, I doubt (for legality reasons) my company will be okay with me using a free hit counter template I find online, especially considering they all seem to reference a third party website to do the actual "counting." Any ideas of the best way to implement/learn how to create a visitor counter?
Thanks.
You can include HTML in a DeskI report. In the cell that contains the HTML, click Format Cell; on the "Number" tab, there is a checkbox for "Read as HTML". Make sure that's checked off. Note that you won't see the rendered HTML within DeskI, but it will display when viewed in Infoview.

Getting images by parsing constantly changing HTML

I'm in the process of developing an iOS app that retrieves images from a URL (http://m.9gag.com). It so happens that the HTML for the URL is in constant change and whenever I have a working code, the site's style changes.
Is there any way I could pull those images from the HTML without having to worry about webpage changes? There is no public API at the moment so that's sadly not an option.
I look forward to hearing some options.
Also, if the page is set so that when the user scrolls to the bottom, it loads more content, how can I get more html to load based on how far down in the HTML parsing I've got? I'm not using a webview, I just need to update the HTML I initially retrieved.
It seems that the simplest way in your case - use regular expression (for example http://[A-Za-z0-9_/\.]*\.jpg) to extract URLs and keep track of already pulled images.

How to extract data from a web site and format to raw text - iPhone Dev

I have been looking around for a while and not found anything useful, also not sure if I have worded the question in the clearest fashion so apologies
I have a section of an app I am building called 'Company News'. The company in question has a news page on their website which displays a title, an excerpt of text and a read more option.
At the minute in the iPhone application I just have a UIWebView which links to that URL, displays an error if no connection is available. However, if my user clicks a story to read the news obviously it opens up a new page, I want to avoid having to build in 'back' and 'forward' buttons and stay away from it looking like a browser within the app.
With that said, I am looking for a way to just extract that data from the website and just display it in my app as raw text. I am not particularly bothered about rich text formatting or anything fancy. I would just like the title and body of text.
Is this possible?
In essence, then, you are looking for an HTML parser.
Assuming the HTML you wish to parse has a predictable format, the approach I would take is to load the HTML via whatever URL loading system you want - e.g. NSURLConnection, ASIHTTPRequest, etc.
Then you will need to parse the raw HTML. I use XPath. It requires that you learn the syntax, but it should work.
For more details about how you might use XPath for parsing HTML, see the second response to this question. You will need to link to libxml2 in your project then use XPath to extract the nodes of interest.
Scraping web pages in this way is fragile, though, because it depends on the structure of a page you don't control and which could be changed unpredictably.

How can I return a text file and an error log from a webpage separately

I have a perl script which when run from the command line generates a text file of data with a specific format for use by another application. The script also prints informational warning messages on stderr. I'm writing a web front end for this. In an ideal world when the user clicks 'submit' on the associated form, a page would be displayed in the browser containing the informational messages, and simultaneously a pop-up would appear allowing the user to save the text file of data to disk. I would like this to work on browsers without javascript enabled, so I think exactly what I want is probably not possible.
Some sites I have seen deal with this kind of thing by displaying the page with the informational messages, and a link to the file to be downloaded. This would seem to mean having to store the files and sorting out some sort of security so that another user cannot download your file (not that this is a big deal for the application in question).
I'm wondering if there is a more elegant way of dealing with this? e.g Is it possible to use multipart messages to somehow achieve returning both pieces of information in one go? Is it possible to pop-up a second window with the informational messages without using javascript? Apologies if these seem like basic questions - my programming knowledge is in the domain of DNA sequence manipulation algorithms rather than web page generation..
If (and only if) the data is quick and easy to generate, do it once for error messages and a second time for download. The link or button of the error-message page would regenerate the results and prompt for download.
This is a bit of a hack since you need to consider what to do if the underlying data changes before the user hits the download link. Be careful to set the header correctly for file download vs normal webpage, eg,
if($submit) {
print header(-type=>'application/octet-stream',
-Content_disposition=>'attachment; filename=foobar.dat');
Gen_Results();
}
To be honest, I'd just use a little javascript anyway since it's a pretty safe assumption now a days. Otherwise, use a "noscript" tag for some alternative.