mechanize print to pdf [duplicate] - perl

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
How do I grab a thumbnail screenshot of many websites?
I wrote a script using perl mechanize to login and fetch a page. How can I "print" that page to "pdf" directly from my perl script? I'd like to save a snapshot of how it looks in the browser.
I can get the html using $mech->content();

Check out wkhtmltopdf - there are variants for PDF and images (PNG etc). It's basically a command-line tool wrapping the webkit html engine. Works quite nicely, and it's cross-platform too. Whether you can get it past your login form will depend on how the target site works.

There's a number of CPAN modules to convert HTML to PDF. Feed any of them the content from Mechanize.

The $mech contain plain html so you can't just print it. Check this thread: How do I grab a thumbnail screenshot of many websites?

Related

is it possible to view a question with a browser before importing it to Moodle?

I have created a XML file using R-exams out of just a single exercise to be imported to Moodle. I would like to view it before uploading it in the Moodle question bank. I tried to open it with Firefox and I can see some code but not the output and a message appear saying that the XML file does not seem to have a style sheet associated to it. Is there a way to find this style sheet and to see how the question comes out just using a browser like Firefox or Chrome?
To emulate how the R/exams exercises are converted to HTML by exams2moodle() and how Moodle displays mathematical content, it's best to use
exams2html(..., converter = "pandoc-mathjax")
In recent versions of R/exams the resulting HTML file then automatically loads the MathJax Javascript that enables correct rendering of mathematical content in all modern browsers (including Google Chrome). See also http://www.R-exams.org/tutorials/math/ for some general advice about math in HTML.
To the best of my knowledge there is no tool that would quickly display Moodle XML files in such a way that you can easily assess them.

How can I "drill down" into a website using Perl's WWW::Mechanize

I have used the WWW::Mechanize Perl module on a number of projects and it's helped me out a lot.
I am trying to use it on a different site and I can't "drill down" into the content of the site.
The site is https://customer.bookingbug.com/?client=hantsrecyclingcentres#/services
I have tried figure out what the URL would be to get content shown in the resulting HTML, such as bb.d570283b87c834518ba9.css, bb.d570283b87c834518ba9.js and version.js
I tried to copy the resulting HTML into this posting, but used all sorts of quote and code sample combinations and it wouldn't display properly.
Does anyone have any idea how I "navigate" this site using this Perl module please?
WWW::Mechanize is a web client with some HTML parsing capabilities. But as you clearly noticed, the information you want is not in the HTML document you requested. Either download the correct document (whatever that might be), or do what the browser does and execute the JavaScript. This would require a JavaScript engine. The simplest way to achieve that is to remote-control a web browser (e.g. using Selenium::Chrome).

Perl Mechanize module for scraping pdfs

I have a website into which many pdfs are uploaded. What i want to do is to download all those PDFs present in the website. To do so i first need to provide username and password to the website. After searching for sometime i found WWW::Mechanize package that does this work. Now the problem arises here that i want to make a recursive search in the website meaning that if the link does not contain a PDF, then i should not simply discard the link but should navigate the link and check whether the new page has links that contain PDFs. In this way i should exhaustively search the entire website to download all PDFs uploaded. Any suggestion on how to do this?
I'd also go with wget, which runs on a variety of platforms.
If you want to do it in Perl, check CPAN for web crawlers.
You might want to decouple collecting PDF URLs from actually downloading them. Crawling already is lengthy processing and it might be advantageous to be able to hand off downloading tasks to seperate worker processes.
You are right about using WWW::Mechanize module. This module has a method - find_all_links() wherein you can point out the regex to match the kind of pages you want to grab or follow.
For example:
my $obj = WWW::Mechanize->new;
.......
.......
my #pdf_links = $obj->find_all_links( url_regex => qr/^.+?\.pdf/ );
This gives you all the links pointing to pdf files, Now iterate through these links and issue a get call on each of them.
I suggest to try with wget. Something like:
wget -r --no-parent -A.pdf --user=LOGIN --pasword=PASSWORD http://www.server.com/dir/

How do I view .asp Images?

I am trying to download images from a site using Perl to download and save them with LWP::Simple.getstore.
Here's an example of the URL
http://www.aavinvc.com/_includes/blob.asp?Table=user&I=28&Width=100!&Height=100!
Turns out the files are completely empty that I am getting with LWP. I even tried cURL and same thing, completely empty. Would there be another way to get these?
If the file really contains ASP, then you have to run it through an ASP engine.
If things worked properly, then the URL would return an image file with an appropriate content type. You've just saved it with a .asp extension.
The fix is simple: Rename the file (preferably by looking at the Content-Type header returned (trivial with LWP, but I think you'll have to move beyond getstore) and doing it in Perl.
Regarding the update:
I just tried:
#!/usr/bin/perl
use Modern::Perl;
use LWP::Simple;
LWP::Simple::getstore(q{http://www.aavinvc.com/_includes/blob.asp?Table=user&I=28&Width=100!&Height=100}, 'foo.jpeg');
… and it just worked. The file opened without a hitch in my default image viewer.
.asp is not an image format.
Here are two explanations:
The image are simple jpegs generated by .asp files, so just use them if they were .jpegs - just rename them;
You are actually downloading a page that says "LOL I trol U - we don't allow images to be downloaded with Simple.getstore.

How can I take a screenshot of website with Perl? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I take screenshots with Perl?
How can I take a screenshot from a site (in batch mode) using Perl? I.e. solution should produce image file (say .png) given an URL. It would be nice, if no X Window system will be required for solution to work.
I'd use WWW::Mechanize::Firefox. Unfortunately it does need X (at least on non-OS X *NIX), but you can use xvfb to run it headless.
In the past I needed to convert a web page to PDF.
I used http://code.google.com/p/wkhtmltopdf/ and it worked beautifully (it's using the excellent WebKit engine). Problem is it's not Perl-based and it doesn't produce an image, but a PDF. Try it, it might suit your needs (` No longer requires an XServer to be running (however the X11 client libs must be installed' )
If your going to go beyond screen shots, finding a binding for Watir would be my advice. The ability to get javascript, java/flash/activex embedd scripting working is nice (for some value of nice)