What's the best method to generate Multi-Page PDFs with Perl and PDF::API2? - perl

I have been using PDF::API2 module to program a PDF. I work at a warehousing company and we are trying switch from text packing slips to PDF packing slips. Packing Slips have a list of items needed on a single order. It works great but I have run into a problem. Currently my program generates a single page PDF and it was all working fine. But now I realize that the PDF will need to be multiple pages if there are more than 30 items in an order. I was trying to think of an easy(ish) way to do that, but couldn’t come up with one. The only thing I could think of involves creating another page and having logic that redefines the coordinates of the line items if there are multiple pages. So I was trying to see if there was a different method or something I was missing that could help but I wasn’t really finding anything on CPAN.
Basically, i need to create a single page PDF unless there are > 30 items. Then it will need to be multiple.
I hope that made sense and any help at all would be greatly appreciated as I am relatively new to programming.

Since you already have the code working for one-page PDFs, changing it to work for multi-page PDFs shouldn't be too hard.
Try something like this:
use PDF::API2;
sub create_packing_list_pdf {
my #items = #_;
my $pdf = PDF::API2->new();
my $page = _add_pdf_page($pdf);
my $max_items_per_page = 30;
my $item_pos = 0;
while (my $item = shift(#items)) {
$item_pos++;
# Create a new page, if needed
if ($item_pos > $max_items_per_page) {
$page = _add_pdf_page($pdf);
$item_pos = 1;
}
# Add the item at the appropriate height for that position
# (you'll need to declare $base_height and $line_height)
my $y = $base_height - ($item_pos - 1) * $line_height;
# Your code to display the line here, using $y as needed
# to get the right coordinates
}
return $pdf;
}
sub _add_pdf_page {
my $pdf = shift();
my $page = $pdf->page();
# Your code to display the page template here.
#
# Note: You can use a different template for additional pages by
# looking at e.g. $pdf->pages(), which returns the page count.
#
# If you need to include a "Page 1 of 2", you can pass the total
# number of pages in as an argument:
# int(scalar #items / $max_items_per_page) + 1
return $page;
}
The main thing is to split up the page template from the line items so you can easily start a new page without having to duplicate code.

PDF::API2 is low-level. It doesn't have most of what you would consider necessary for a document, things like margins, blocks, and paragraphs. Because of this, I afraid you're going to have to do things the hard way. You may want to look at PDF::API2::Simple. It might meet your criteria and it's simple to use.

I use PDF::FromHTML for some similar work. Seems to be a reasonable choice, I guess I am not too big on positioning by hand.

The simplest method is to use PDF-API2-Simple
my #content;
my $pdf = PDF::API2::Simple->new(file => "$name");
$pdf->add_font('Courier');
$pdf->add_page();
foreach $line (#content)
{
$pdf->text($line, autoflow => 'on');
}
$pdf->save();

Related

Perl WWW::Mechanize Web Spider. How to find all links

I am currently attempting to create a Perl webspider using WWW::Mechanize.
What I am trying to do is create a webspider that will crawl the whole site of the URL (entered by the user) and extract all of the links from every page on the site.
What I have so far:
use strict;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
my $urlToSpider = $ARGV[0];
$mech->get($urlToSpider);
print "\nThe url that will be spidered is $urlToSpider\n";
print "\nThe links found on the url's starting page\n";
my #foundLinks = $mech->find_all_links();
foreach my $linkList(#foundLinks) {
unless ($linkList->[0] =~ /^http?:\/\//i || $linkList->[0] =~ /^https?:\/\//i) {
$linkList->[0] = "$urlToSpider" . $linkList->[0];
}
print "$linkList->[0]";
print "\n";
}
What it does:
1. At present it will extract and list all links on the starting page
2. If the links found are in /contact-us or /help format it will add 'http://www.thestartingurl.com' to the front of it so it becomes 'http://www.thestartingurl.com/contact-us'.
The problem:
At the moment it also finds links to external sites which I do not want it to do, e.g if I want to spider 'http://www.tree.com' it will find links such as http://www.tree.com/find-us.
However it will also find links to other sites like http://www.hotwire.com.
How do I stop it finding these external urls?
After finding all the urls on the page I then also want to save this new list of internal-only links to a new array called #internalLinks but cannot seem to get it working.
Any help is much appreciated, thanks in advance.
This should do the trick:
my #internalLinks = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/);
If you don't want css links try:
my #internalLinks = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/, tag => 'a');
Also, the regex you're using to add the domain to any relative links can be replaced with:
print $linkList->url_abs();

In Perl, can I dynamically add methods to only one object of a package?

I'm working with WWW::Mechanize to automate web-based back office clicking I need to do to get my test e-commerce orders into the state I need them to be to test changes I have made to a particular part of a long, multi-part workflow. To process a lot of orders in a batch, I need to click the Home link often. To make that shorter, I hacked a method into WWW::Mechanize at run time like this (based on an example in Mastering Perl by brian d foy):
{ # Shortcut to go back to the home page by calling $mech->go_home
# I know I'll get a warning and do not want it!
no warnings 'once';
my $homeLink = $mech->find_link( text => 'Home' )->url_abs();
$homeLink =~ s/system=0/system=1/;
*WWW::Mechanize::go_home = sub {
my ($self) = #_;
return $self->get($homeLink);
};
}
This works great, and does not hurt anyone because the script I'm using it in is only used by me and is not part of the larger system.
But now I wonder if it is possible to actually only tell one $mech object that is has this method, while another WWW::Mechanize object that might be created later (to, say, do some cross-referencing without mixing up the other one that has an active session to my back office) cannot use that method.
I'm not sure if that is possible at all, since, if I understand the way objects work in Perl, the -> operator tells it to look for the subroutine go_home inside the package WWW::Mechanize and pass the $mech as the first argument to it. Please correct me if this understanding is wrong.
I've experimented by adding a sort of hard-coded check that only lets the original $mech object use the function.
my $onlyThisMechMayAccessThisMethod = "$mech";
my $homeLink = $mech->find_link( text => 'Home' )->url_abs();
$homeLink =~ s/system=0/system=1/;
*WWW::Mechanize::go_home = sub {
my ($self) = #_;
return undef unless $self eq $onlyThisMechMayAccessThisMethod;
return $self->get($homeLink);
};
Since "$mech" contains the address of where the data is stored (e.g. WWW::Mechanize=HASH(0x2fa25e8)), another object will look differently when stringified this way.
I am not convinced however that this is the way to go. So my question is: Is there a better way to only let one object of the WWW::Mechanize class have this method? I'm also glad about other suggestions regarding this code.
This is just
$mech->follow_link(text => 'Home')
and I don't think it's special enough to warrant a method of its own, or to need restricting to an exclusive club of objects.
It's also worth noting that there is no need to mess with typeglobs to declare a subroutine in a different package. You just have to write, for example
sub WWW::Mechanize::go_home {
my ($self) = #_;
return $self->get($homeLink);
};
But the general solution is to subclass WWW::Mechanize and declare as members only those objects you want to have the new method.
File MyMechanize.pm
package MyMechanize;
use strict;
use warnings;
use parent 'WWW::Mechanize';
sub go_home {
my $self = shift;
my $homeLink = $self->find_link(text => 'Home')->url_abs;
$homeLink =~ s/system=0/system=1/;
return $self->get($homeLink);
}
1;
File test.pl
use strict;
use warnings;
use MyMechanize;
my $mech = MyMechanize->new;
$mech->get('http://mydomain.com/path/to/site/page.html')
$mech->go_home;

What does this Lucene-related code actually do?

#usr/bin/perl
use Plucene::Document;
use Plucene::Document::Field;
use Plucene::Index::Writer;
use Plucene::Analysis::SimpleAnalyzer;
use Plucene::Search::HitCollector;
use Plucene::Search::IndexSearcher;
use Plucene::QueryParser;
my $content = "I am the law";
my $doc = Plucene::Document->new;
$doc->add(Plucene::Document::Field->Text(content => $content));
$doc->add(Plucene::Document::Field->Text(author => "Philip Johnson"));
my $analyzer = Plucene::Analysis::SimpleAnalyzer->new();
my $writer = Plucene::Index::Writer->new("my_index", $analyzer, 1);
$writer->add_document($doc);
undef $writer; # close
my $searcher = Plucene::Search::IndexSearcher->new("my_index");
my #docs;
my $hc = Plucene::Search::HitCollector->new(collect => sub {
my ($self, $doc, $score) = #_;
push #docs, $searcher->doc($doc);
});
$searcher->search_hc($query => $hc);
Try as I may, I don't understand what this code does. I understand the familiar Perl syntax and what's going on on that end...but what is a Lucene Document, Index::Writer - etc.? Most importantly, when I run this code I expect something to be generated...yet I see nothing.
I know what an Analyzer is...thanks to this doc linked to in CPAN: http://onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=2. But I am just not getting why I run this code and it doesn't seem to DO anything...
Lucene is a search engine designed to search huge amounts of text very fast.
My perl is not strong, but from what I understand from Lucene objects:
my $content = "I am the law";
my $doc = Plucene::Document->new;
$doc->add(Plucene::Document::Field->Text(content => $content));
$doc->add(Plucene::Document::Field->Text(author => "Philip Johnson"));
This part creates a new document object and adds two text fields to it, content and author, in preparation to add it to an lucene index file as searchable data.
my $analyzer = Plucene::Analysis::SimpleAnalyzer->new();
my $writer = Plucene::Index::Writer->new("my_index", $analyzer, 1);
$writer->add_document($doc);
undef $writer; # close
This part creates the index files and adds the previously created document do that index. At this point, you should have a "my_index" folder with several index files in it, in your application directory, with docs's data in it as searchable text.
my $searcher = Plucene::Search::IndexSearcher->new("my_index");
my #docs;
my $hc = Plucene::Search::HitCollector->new(collect => sub {
my ($self, $doc, $score) = #_;
push #docs, $searcher->doc($doc);
});
$searcher->search_hc($query => $hc);
This part attempts to search the index file created above for the same document data you just used to create the index file. Presumably, you'll have your search results in #docs at this point, which you might want to display to user (tho it is not, in this sample).
This seems to be a "hello world" application for Lucene usage in perl. In real-life applications, I dont see a scenario where you would create the index file and then search it from same piece of code.
Where did you get this code from? It is a copy of the code in the Synopsis at the start of the Plucene POD documentation.
I guess it was an attempt by someone to begin learning about Plucene. The code in a module's synopsis isn't necessarily meant to achieve something useful on its own.
As the documentation you refer to says, Lucene is a Java library that adds text indexing and searching capabilities to an application. It is not a complete application that one can just download, install, and run.
Where did you get the idea that you should run the code you show?

Multi-Page Form in Zend is Validating All Forms too early

I have been working through the Multi Page forms tutorial in the Zend Form Advanced Usage section of the documentation, http://framework.zend.com/manual/en/zend.form.advanced.html.
My first page loads fine, however when I submit it, the second page loads and it includes validation error messages. (Obviously I don't want to see validation errors for this page until the user has filled in the fields...)
I have tracked it down to the final line in the formIsValid() function. It seems that here validation is run for all elements in the three forms (not just the current one), so it's really no surprise that errors are showing on the second page.
I have tried the suggestion in the comments at the end of the tutorial, i.e. $data[$key] = $info[$key].
Have you had a crack at this tutorial? How did you solve the problem?
Any assistance is much appreciated!
I encountered the same problem this is how I solve it.
By replacing
public function formIsValid()
{
$data = array();
foreach ($this->getSessionNamespace() as $key => $info) {
$data[$key] = $info;
}
return $this->getForm()->isValid($data);
}
With
public function formIsValid()
{
$data = array();
foreach ($this->getSessionNamespace() as $key => $info) {
$data[$key] = $info[$key];
}
return (count($this->getStoredForms()) < count($this->getPotentialForms()))? false : $this->getForm()->isValid($data);
}
The documentations reads:
Currently, Multi-Page forms are not
officially supported in Zend_Form;
however, most support for implementing
them is available and can be utilized
with a little extra tooling.
The key to creating a multi-page form
is to utilize sub forms, but to
display only one such sub form per
page. This allows you to submit a
single sub form at a time and
validate it, but not process the form
until all sub forms are complete.
Are you sure you have been validating a single sub-form instead of just whole form?

How can I write a Perl script to automatically take screenshots?

I want a platform independent utility to take screenshots (not just within the browser).
The utility would be able to take screenshots after fixed intervals of time and be easily configurable by the user in terms of
time between successive shots,
the format the shots are stored,
till when (time, event) should the script run, etc
Since I need platform independence, I think Perl is a good choice.
a. Before I start out, I want to know whether a similar thing already exists, so I can start from there?
Searching CPAN gives me these two relevant results :
Imager-Screenshot-0.009
Imager-Search-1.00
From those pages, the first one looks easier.
b. Which one of these Perl modules should I use?
Taking a look at the sources of both, Imager::Search isn't much more than a wrapper to Imager::Screenshot.
Here's the constructor:
sub new {
my $class = shift;
my #params = ();
#params = #{shift()} if _ARRAY0($_[0]);
my $image = Imager::Screenshot::screenshot( #params );
unless ( _INSTANCE($image, 'Imager') ) {
Carp::croak('Failed to capture screenshot');
}
# Hand off to the parent class
return $class->SUPER::new( image => $image, #_ );
}
Given that Imager::Search does not really extend Imager::Screenshot much more, I'd say you're looking at two modules that are essentially the same.