Perl: Scraping a website with frames and javascript - perl

I have a website with 2 frames. Actions performed in 1 frame(enter data in text box/select radio button/click a href) cause the other frame to load data with javascript. I need to be able to enter data in the first frame and scrape the data in the second. What can I do for this?

Load the website in Firefox, then turn on the Firebug extension, enable the 'Net' tab, and have a look at the HTTP data being sent to and from the browser.
Sometimes it can help to try to forget what the webpage looks like, and concentrate on the posts and responses you see in Firebug's Net tab -- that's all you need to reproduce to get your data out.

You can either:
Reverse engineer the JS (monitoring HTTP traffic can help) to figure out what data actually gets sent to the server and then replicate that in your Perl.
Use WWW::Mechanize::Firefox to run a complete browser stack and interrogate it to read the results.

Related

Browser plugin for cross-domain iframe communication

I would like to create a browser plugin/extension that would allow the browser to read contents of a cross-domain iframe. I understand that this isn't possible with javascript, but perhaps someone could point me in the right direction of how to create a plugin that users could install. A cross-browser solution would be ideal.
Specifically, I am creating helpful navigation utility, and I want to know the url of the iframe so that I can prevent the iframe from navigating to any questionable sites accidentally. I would also like to detect the size of the contents.
Thanks in advance.
Option 1: file_get_contents:
What you can try is to get the contents from the page by the PHP function file_get_contents, load the CSS files and get the contents and the size of the page.
Option 2: Headers:
You can start here: http://www.senocular.com/pub/adobe/crossdomain/policyfiles.html
See the "allow-access-from" section where you can allow domains to be accessed cross domain when they have specific headers.
Userscripts have cross-domain XMLHttpRequest, and they will even run on all browsers. They (or at least Kango's Content Scripts) have the ability to write and read stored values for cross-window communication.

How to refresh parts of page when database changes?

I want a better way to update images on a webpage instead of forcing the webpage to refresh every 60 seconds.
I wrote a little status page that monitors some of our web sites. It uses web.py and displays a list of servers I have inputted into a database.
Each server that is up gets a Green png image displayed next to it.
When a server goes down I update the status in a local mysql database to false.
The next time the page refreshes it gets a red png displayed next to it.
Right now that red png file does not display until I refresh the page.
Is there a way in web.py (python) that I can make just that image dynamic without having to refresh the whole page? Or do I have to use something else to make it work?
Well, the generic answer is that you use AJAX and put a script on each server that will return "up" or something every time your page checks.
Here's a tutorial on how to do AJAX with web.py specifically: http://kooneiform.wordpress.com/2010/02/28/python-and-ajax-for-beginners-with-webpy-and-jquery/
I have no knowledge related to web.py.
But in my point of view what you gonna need is to break your code in two parts (a monitor service and a monitor display) and use javascript to asynchronously update the data.
Your monitor service will provide a url that receives the monitored site ID and returns up or down. As you are using web.py, it will probably be 100% python.
Your monitor display will use javascript to ever X seconds checks that url and show the correct icon. It can be python based, or not. But it will need javascript.

How to show a User view in GWT app by typing in browser address bar

I have this gwt app which say, runs on http://mygwtapp.com/ (which is actually: http://mygwtapp.com/index.html)
The app host a database of users, queried by searching usernames using the search view and results are shown in the user results view. Pretty useful enough. However I need to bb add a way that user view can be viewed by just typing http://myapp.com/user123
I am thinking that the question I have here, the answer is a server side solution. However if there's a client side solution, please let me know.
One fellow here in StackOVerflow suggested that the format would be like this:
mygwtapp.com/index.html#user123
However the format is important to be like: http://myapp.com/user123
The 'something' in 'http://host/path#something' is a Fragment identifier. FIs have a specific feature: the page isn't reloaded if only FI part in URL changes, but they still take part in browser history.
FI's are a browser mechanism that GWT uses to create "pages", i.e. parts of GWT application that are bookmarkable and have history support.
You can try to use an URL without # (the FI separator), but then you will have a normal URL, that reloads the page with every change and it could not be (easily) a part of a normal GWT app.
mygwtapp.com/index.html#user123
That would be using the History mechanism (http://code.google.com/webtoolkit/doc/latest/DevGuideCodingBasicsHistory.html) which I would add is the recommended way of doing it.
However, if you insist on using something like http://myapp.com/user123, one of the possible ways is to have a servlet which accepts this request (you might have to switch to something like http://myapp.com/details?id=user123). The servlet will look up the DB and return your host html back. Before returning it will inject the required details as a Dictionary entry in the page (http://google-web-toolkit.googlecode.com/svn/javadoc/1.5/com/google/gwt/i18n/client/Dictionary.html) On the client you can read this data and display on the UI

Web Browser Plugin that Allows user to view Message Traffic

What is the name of the IE plug in that someone can download (I think from Microsoft) that lets a developer (well, anyone who gets the plug-in, actually) to view the message traffic that goes on behind the scenes from the browser to the server? I saw this one in action but I forget its name. And I think, for the FireFox broswer, you can simply turn it on somehow without getting a plug in.
It cuts the browser window in half horizonally and the bottom half is also divided vertically and you can see the GET and POST messages as well as the complete header information that is sent to the server from the browser across the internet.
HttpWatch is a great plugin for IE, but it's not free. Microsoft also released a free tool called VRTA which works for all browsers, but isn't a plugin.
For firefox it's called Live HTTP Headers. Another option of course is WireShark.
Fiddler is from Microsoft.
http://fiddler2.com/fiddler2/

See ASP.Net Performce: PageWeight and Time to load?

I have a Ajax Postback on my site and the same values get back via PageMethods.
I want to see the PageWeight / what exactly is get back, how I can do this?
The author of this blog have a nice tool:
http://encosia.com/2007/07/11/why-aspnet-ajax-updatepanels-are-dangerous/
Use FireBug with FireFox.
The firebug also lists you the page size along with the time taken under the console tab.To do that you need to enable the Net Panel in the firebug.
Bing for firebug and you will see a lot of tutorials on how to do it.
You can also try this tool Deep Tracing of Internet Explorer.It looks nice and seems to have a lot of options for profiling your pages.