Strange behavior using POST data in perl scripts - perl

Server is linux. I am having inexplicable problems when I send POST data to the script.
For example, I send the following POST data: choice=update
Here is the script:
#!/usr/bin/perl -w
print "Content-type: text/html\n\n";
if ( $ENV{'REQUEST_METHOD'} eq "GET" ) {
$in = $ENV{'QUERY_STRING'};
} elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
read(STDIN,$in,$ENV{'CONTENT_LENGTH'});
}
#in = split(/&/,$in);
foreach $i (0 .. $#in) {
# Convert plus's to spaces
$in[$i] =~ s/\+/ /g;
# Split into key and value.
($key, $val) = split(/=/,$in[$i],2); # splits on the first =.
# Convert %XX from hex numbers to alphanumeric
$key =~ s/%(..)/pack("c",hex($1))/ge;
$val =~ s/%(..)/pack("c",hex($1))/ge;
# Associate key and value
$in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple separator
$in{$key} .= $val;
}
print $in{'choice'};
The first time I access the script, it prints update
The second time I access it, it prints updateupdate
The third time, it prints updateupdateupdate
...and so on.
What on earth could be causing it to keep appending the string to itself between requests? I am sending exactly the same POST data every time by simply refreshing with my browser. Cookies are not being used. There is nothing else in the file that is not commented out.
Edit: Also, when I print <STDIN> it says choice=update every time. The other updates don't appear to be added to STDIN

My guess is that the script is kept running between requests. As %in is a global variable it is never cleared, so that $in{$key} .= $value ends up making the string longer and longer. You can probably evade the problem by using lexical variables.
This means you'll need to find out how the script is being run by the web server.
You'll also want to look at using modules to do all this parsing work for you, and learn about ways to write perl code avoid the problem you've encountered. I'd suggest taking a look at Modern Perl and working from there.

It sounds / looks like it's related to the web server's configuration and not the script itself.
However, at the beginning of the code, try adding:
my %in;
This would scope the variable you're printing.
Also, at the end of the code I would add: exit 0;
(Although usually not necessary).

Related

Perl Script, Web Scraper

I am new to Perl language and have this script which scrapes the amazon website for reviews. Everytime I run it I get an error about a compilation error. Was wondering if someone could shed some light as to whats wrong with it.
#!/usr/bin/perl
# get_reviews.pl
#
# A script to scrape Amazon, retrieve reviews, and write to a file
# Usage: perl get_reviews.pl <asin>
use strict;
use warnings;
use LWP::Simple;
# Take the asin from the command-line
my $asin = shift #ARGV or die "Usage: perl get_reviews.pl <asin>\n";
# Assemble the URL from the passed asin.
my $url = "http://amazon.com/o/tg/detail/-/$asin/?vi=customer-reviews";
# Set up unescape-HTML rules. Quicker than URI::Escape.
my %unescape = ('"'=>'"', '&'=>'&', ' '=>' ');
my $unescape_re = join '|' => keys %unescape;
# Request the URL.
my $content = get($url);
die "Could not retrieve $url" unless $content;
#Remove everything before the reviews
$content =~ s!.*?Number of Reviews:!!ms;
# Loop through the HTML looking for matches
while ($content =~ m!<img.*?stars-(\d)-0.gif.*?>.*?<b>(.*?)</b>, (.*?)[RETURN]
\n.*?Reviewer:\n<b>\n(.*?)</b>.*?</table>\n(.*?)<br>\n<br>!mgis) {
my($rating,$title,$date,$reviewer,$review) = [RETURN]
($1||'',$2||'',$3||'',$4||'',$5||'');
$reviewer =~ s!<.+?>!!g; # drop all HTML tags
$reviewer =~ s!\(.+?\)!!g; # remove anything in parenthesis
$reviewer =~ s!\n!!g; # remove newlines
$review =~ s!<.+?>!!g; # drop all HTML tags
$review =~ s/($unescape_re)/$unescape{$1}/migs; # unescape.
# Print the results
print "$title\n" . "$date\n" . "by $reviewer\n" .
"$rating stars.\n\n" . "$review\n\n";
}
The syntax errors seem to be caused by the "[RETURN]" that appears twice in your code. When I removed those, the code compiled without problems.
Amazon don't really like people scraping their web site. Which is why they provide an API that gives you access to their content. And there's a Perl module that for using that API - Net::Amazon. You should use that instead of fragile web scraping techniques.
Maybe you should try Web::Scraper (http://metacpan.org/pod/Web::Scraper).
It will get the job done in a much cleaner way.
[EDIT] Anyway, I checked the HTML code of a random review and it appears that your pattern is outdated. The reviewer's name, for instance, is introduced by 'By' and not by 'Reviewer'.

Use of uninitialized value $form_data{"BuildAcct"}

* SECOND UPDATE FOR SUNDAY*
Now at this point I have found some added success at displaying the desired sub
The new snippet of code which enabled me to actually invoke a specific sub as I wanted.
In researching I stumbled upon the following snippet which deals with reading incoming FORM data.
This snippet does enable the invocation of the sub of my choice from this script.
However from the CLI when I run perl -x against the script the system returns the following
nonfatal *warnings* that I would like to gain understanding of and resolve.
My research shows that (tr///) and $ENV{"REQUEST_METHOD"} and $buffer are returning empty
values "" OR 0.
How would I best resolve these following errors?
I realize I can just delete any
reference to (tr///) and $buffer to resolve those errors, however, I question removing
*$ENV{"REQUEST_METHOD"}* as it seems this imperative to the function of this snippet???
CLI ERROR
Use of uninitialized value in transliteration (tr///) at test.pl line 36 (#1)
Use of uninitialized value $ENV{"REQUEST_METHOD"} in string eq at test.pl line 37 (#1)
Use of uninitialized value $buffer in split at test.pl line 44 (#1)
#!/usr/bin/perl -w
# (test.pl)
use DBI;
use DBD::mysql;
use warnings;
use strict;
use diagnostics;
$| = 1;
# The script I am wanting to create, is to allow users at (NAS) HotSpot to create a user account
# which is to write into MySQL db TABLE's *radcheck* and *radreply*.
#
# Now at this point I have found some added success at displaying the desired *sub*
# The new snippet of code which enabled me to actually *invoke* a specific *sub* as I wanted
# from an HTML form.
# Please see below for solution which still has some questions.
print "Content-type: text/html\n\n";
sub BuildAcctNow {
print "<h1 style=\"color:blue;font-family:Arial;font-size:xx-large;\">TO BUILD YOUR ACCOUNT TODAY WE WILL NEED A SMALL AMOUNT OF INFORMATION</h1><br><br>\n\n";
}
sub PauseAcctNow {
print "<h2 style=\"color:red;font-family:Arial;font-size:xx-large;\">YOUR ACCOUNT HAS BEEN PAUSED PLEASE MAKE A PAYMENT HERE.</h2><br><br>\n\n";
}
# In researching I stumbled upon the fllowing snippet which deals with reading inward FORM data.
# This snippet *does* enable the *invocation* of the *sub* of my choice from this script.
# However from the CLI when I run perl -x against the script the system returns the following
# *nonfatal* *warnings* that I would like to gain understading of and resolve.
# My research shows that (tr///) and $ENV{"REQUEST_METHOD"} and $buffer are returning empty
# values, How would I best resolve these following errors? I realize I can just delete any
# reference to (tr///) and $buffer to resolve those errors, howerver I question removing
# $ENV{"REQUEST_METHOD"} as it seems this imperative to the function of th ssnippet???
#
#
# Use of uninitialized value in transliteration (tr///) at test.pl line 36 (#1)
# Use of uninitialized value $ENV{"REQUEST_METHOD"} in string eq at test.pl line 37 (#1)
# Use of uninitialized value $buffer in split at test.pl line 44 (#1)
my ($buffer, #pairs, $pair, $name, $value, %FORM);
# Read in text
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
if ($ENV{'REQUEST_METHOD'} eq "POST")
{
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
}else {
$buffer = $ENV{'QUERY_STRING'};
}
# Split information into name/value pairs
#pairs = split(/&/, $buffer);
foreach $pair (#pairs)
{
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%(..)/pack("C", hex($1))/eg;
$FORM{$name} = $value;
}
if ($FORM{PauseAcct}) {
PauseAcctNow();
exit;
}
elsif ($FORM{BuildAcct}) {
BuildAcctNow();
exit;
}
END SECOND SUNDAY UPDATE
SUNDAY UPDATE **
I have made a simple script to hopefully exhibit what I am trying to do to this point hopefully?
The script I am ultimately needing to create will write to MySQL db into radcheck and radreply to enable users to log onto a (NAS) HotSpot.
So I will have more than one subroutine within the script.
The script presently displays an empty screen VIA browser when I use a HTML doc with a properly named SUBMIT form within the doc named BuildAcct.
I am traditionally used to defining the sub within the script and then I would define if test(s) within the script which would wait for match(es) from any defined form name(s) when they are interacted with to then call a particular sub.
Below is a test script I have created just trying to get past the antiquated use of & when calling a sub, this is causing me some grief and I am hoping for some valuable input.
#!/usr/bin/perl -w
# (test.pl)
use DBI;
use DBD::mysql;
use warnings;
#use strict;
# Presently due to my errors below I have disabled *use strict*.
$| = 1;
# The script I am wanting to create, is to allow users at (NAS) HotSpot to create a user account
# which is to write into MySQL db TABLE's *radcheck* and *radreply*.
#
# Trying to bring myself up to speed with a very basic task which I have defined below, in my
# older scripts I would define the *sub* itself, then in the script I would use an *if* test
# which checks to see if any defined FORM value returns a hit such as $form_data{'BuildAcct'} ne ""
# to call the required *sub* _ThisOne_.
print "Content-type: text/plain\n\n";
sub ThisOne {
print "Trying to display this subroutine upon submission of BuildAcct\n";
}
# Following is the *if* test I am accustomed to using to make a call to a particular sub when
# the form NAME BuildAcct is interacted with, but this is unacceptable now I realize.
# CLI Return:
# Use of uninitialized value $form_data{"BuildAcct"} in string ne at test.pl line 32.
# Use of uninitialized value $form_data{"BuildAcct"} in string ne at test.pl line 41.
if ($form_data{'BuildAcct'} ne "")
{
&ThisOne;
exit;
}
# SO, I have Google'd, and looked over numerous methods of calling *subs*, I am just stuck though,
# Why can't the following *if* test work if use of & is no longer used?
if ($form_data{'BuildAcct'} ne "")
{
ThisOne();
exit;
}
Thanking you in advance for help...
Best Regards
UPDATE **
I have turned the -w switch off on the script, not sure if that poses a negative influence, pretty new to perl.
I also created some clunky code that is ugly.
The weird thing is that from the CLI when I execute the script the system returns:
Use of uninitialized value $form_data{"BuildAcct"} in string at acctmanager.pl line 211.
Use of uninitialized value $form_data{"Test"} in string at acctmanager.pl line 212.
Yet VIA browser from a HTML doc I can change the SUBMIT name value back and forth between BuildAcct and Test and the script successfully returns two different and correct subroutines when submitted.
The BuildAcct sub returns form fields I defined within that subroutine, whereas Test does a MySQL TABLE GROUP rowfetch and displays 3 different tables from a db and prints them to the browser.
Below is my present code :-(
local ($form_data{'BuildAcct'}) = "$form_data{'BuildAcct'}";
local ($form_data{'Test'}) = "$form_data{'Test'}";
#
# AddNewUser FORM definition.
if ($form_data{'BuildAcct'} ne "")
{
&AddNewUser;
exit;
}
#
# DispTest FORM definition.
elsif ($form_data{'Test'} ne "")
{
&DispTest;
exit;
}
Could someone give me a nudge into the right direction possibly?
Thanking you in advance
ORIGINAL POST
At this point I have a FORM on an HTML doc named BuildAcct, likewise within my script I have defined the following which is to make the call to the subroutine AddNewUser when the user submits the HTML FORM...
if ($form_data{'BuildAcct'} ne "")
{
&AddNewUser;
exit;
}
The script uses cgi-lib.pl
# Enable parsing of FORM_DATA VIA cgi-lib.pl.
&ReadParse(*form_data);
## FORM or IMG FORM Fix
foreach (keys %form_data)
{
## Fix incoming form data with Image buttons
$form_data{$1} = $form_data{$_} if (/(.*)\.x/);
}
The thing I can't understand is why does this work in another script I use but this new script returns the following upon execution at the CLI;
Use of uninitialized value $form_data{"BuildAcct"} in string ne at acctmanager.pl line 208.
Use of uninitialized value $form_data{"Test"} in string ne at acctmanager.pl line 215.
Help and suggestions are greatly appreciated.
Best Regards
My best guess is that %form_data hash is being populated via cgi-lib.pl, therefore when you run it via command line, cgi-lib.pl isn't getting any input from the web browser... having said this, you haven't included the code where you're using cgi-lib.pl, so I can't be sure.
p.s. don't turn off warnings. They're there for a reason. If nothing else, you can paste the warning into google. While you're at it, alway put use strict; at the top of your script, and fix all undeclared variables.
To fix the error "Use of uninitialized value $form_data{"BuildAcct"} in string",
You can check with defined($form_data{"BuildAcct"}) or defined $form_data{"BuildAcct"} like below, before making use of $form_data{"BuildAcct"} :
if (defined $form_data{"BuildAcct"}) { < other code based on
$form_data{"BuildAcct"} > }

Subroutine argument apparently lost in loop

I have a CGI script pulling bibliography data from a BibTeX file, building HTML from it. It uses CGI::Ajax to call the subroutine below with one or two arguments. Most of the time, it will be a search term that is passed as $s, but if I pass a string through my HTML form, the subroutine will not be entirely happy with it. There is a foreach loop checking the entries and jumping over the entries that do not match. Now I can print the argument outside this loop alright, but the loop itself won’t print anything for $s, nor will it find any entries matching it. If within the loop $s were simply empty, the subroutine would print the entire bibliography, but this is not the case.
Basically it is as if $s passed as an argument breaks the loop, whereas an explicit definition in the subroutine works fine.
Here is a simplified version of my code. Please excuse sloppy or ignorant coding, I’m just dabbling in Perl.
sub outputBib {
my ( $self,$s,$kw ) = #_;
my #k;
#k = ('foo','bar'); # this is fine
#k = keys (%{$self->{_bib}}); # this causes problems
foreach my $k (#k) {
$output .= "Key = $k<br/>";
$output .= "Search Term = $s<br/>";
}
return $output;
}
The problem seems to be the array built from the keys of the $self->{_bib} hash. It is odd that
the loop is fine when $s is not passed through CGI::Ajax. All elements are processed.
as soon as the subroutine is called with $s, the loop does not return anything.
if #k is defined as a straightforward array, the loop works and $s can be printed within the loop;
I build $self->{_bib} like so:
sub parseBib {
my ( $self ) = #_;
while (my $e = new Text::BibTeX::Entry $self->{_bibFileObject}) {
next unless $e->parse_ok;
my %entry_hash;
$entry_hash{'title'} = $e->get('title');
$entry_hash{'keywords'} = $e->get('keywords');
$self->{_bib}{$e->key} = \%entry_hash;
}
}
Any ideas? Thanks.
My first suggestion would be to use warn/print STDERR to verify on the live running copy that, when called via CGI::Ajax, all of your variables ($self, $s, $kw, $self->{_bib}) have the values that you're expecting. Although I'm a big fan of CGI::Ajax, it does a fair bit of magic behind your back and it may not be calling outputBib in quite the way you think it is.
Also keep in mind that CGI operates on a per-request model, not per-page. Are you perhaps populating $self->{_bib} when you send the initial page (and also doing all of your successful tests in that environment), then expecting it to still be there when the AJAX requests come in? If so, you're out of luck - you'll need to rebuild it in the AJAX handler, either within outputBib or earlier in your code, before you call ->build_html and hand it off to CGI::Ajax.

Why does my Perl CGI program fail when I include a file?

I'm trying to create a base template which then loads data depending on what actions are taken. I included ( required ) some pages which was fine but when I included another file which I got a 500 internal error. pasting the code straight in and it works fine:
Here's what I've got;
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
require LWP::UserAgent;
use DBI;
#deal with post requests
require "perl/post-sort.pl";
#loading stylesheets and javascripts
require "header.pl";
# bring in loggin js
if( $arg{REQUEST_KEY} eq "") {
require "javascript/js-main-login.pl";
}
print "</head> \n";
print " \n";
...
...
perl/post-sort.pl
my %arg = ();
for (split /\&/, <STDIN>) {
my ($key, $val) = split /=/;
$val =~ s/\+/ /g;
$val =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/ge;
$arg{$key} = $val;
}
Any help much appreciated.
A 500 internal server error often indicates a bad or missing header. Make sure that in the included code, the first thing that gets printed (to the browser) is the header, or make sure that nothing gets printed and the original code will print out the right header.
Another possibility is that a file you are require'ing does not "return true as the last statement" (i.e., doesn't end with a 1;), which would cause your script to fail at compile-time and produce a 500 error.
Also see this apropos discussion on debugging CGI scripts from earlier today.

How can I hook into Perl's print?

Here's a scenario. You have a large amount of legacy scripts, all using a common library. Said scripts use the 'print' statement for diagnostic output. No changes are allowed to the scripts - they range far and wide, have their approvals, and have long since left the fruitful valleys of oversight and control.
Now a new need has arrived: logging must now be added to the library. This must be done automatically and transparently, without users of the standard library needing to change their scripts. Common library methods can simply have logging calls added to them; that's the easy part. The hard part lies in the fact that diagnostic output from these scripts were always displayed using the 'print' statement. This diagnostic output must be stored, but just as importantly, processed.
As an example of this processing, the library should only record the printed lines that contain the words 'warning', 'error', 'notice', or 'attention'. The below Extremely Trivial and Contrived Example Code (tm) would record some of said output:
sub CheckPrintOutput
{
my #output = #_; # args passed to print eventually find their way here.
foreach my $value (#output) {
Log->log($value) if $value =~ /warning|error|notice|attention/i;
}
}
(I'd like to avoid such issues as 'what should actually be logged', 'print shouldn't be used for diagnostics', 'perl sucks', or 'this example has the flaws x y and z'...this is greatly simplified for brevity and clarity. )
The basic problem comes down to capturing and processing data passed to print (or any perl builtin, along those lines of reasoning). Is it possible? Is there any way to do it cleanly? Are there any logging modules that have hooks to let you do it? Or is it something that should be avoided like the plague, and I should give up on ever capturing and processing the printed output?
Additional: This must run cross-platform - windows and *nix alike. The process of running the scripts must remain the same, as must the output from the script.
Additional additional: An interesting suggestion made in the comments of codelogic's answer:
You can subclass http://perldoc.perl.org/IO/Handle.html and create your
own file handle which will do the logging work. – Kamil Kisiel
This might do it, with two caveats:
1) I'd need a way to export this functionality to anyone who uses the common library. It would have to apply automatically to STDOUT and probably STDERR too.
2) the IO::Handle documentation says that you can't subclass it, and my attempts so far have been fruitless. Is there anything special needed to make sublclassing IO::Handle work? The standard 'use base 'IO::Handle' and then overriding the new/print methods seem to do nothing.
Final edit: Looks like IO::Handle is a dead end, but Tie::Handle may do it. Thanks for all the suggestions; they're all really good. I'm going to give the Tie::Handle route a try. If it causes problems I'll be back!
Addendum: Note that after working with this a bit, I found that Tie::Handle will work, if you don't do anything tricky. If you use any of the features of IO::Handle with your tied STDOUT or STDERR, it's basically a crapshoot to get them working reliably - I could not find a way to get the autoflush method of IO::Handle to work on my tied handle. If I enabled autoflush before I tied the handle it would work. If that works for you, the Tie::Handle route may be acceptable.
There are a number of built-ins that you can override (see perlsub). However, print is one of the built-ins that doesn't work this way. The difficulties of overriding print are detailed at this perlmonk's thread.
However, you can
Create a package
Tie a handle
Select this handle.
Now, a couple of people have given the basic framework, but it works out kind of like this:
package IO::Override;
use base qw<Tie::Handle>;
use Symbol qw<geniosym>;
sub TIEHANDLE { return bless geniosym, __PACKAGE__ }
sub PRINT {
shift;
# You can do pretty much anything you want here.
# And it's printing to what was STDOUT at the start.
#
print $OLD_STDOUT join( '', 'NOTICE: ', #_ );
}
tie *PRINTOUT, 'IO::Override';
our $OLD_STDOUT = select( *PRINTOUT );
You can override printf in the same manner:
sub PRINTF {
shift;
# You can do pretty much anything you want here.
# And it's printing to what was STDOUT at the start.
#
my $format = shift;
print $OLD_STDOUT join( '', 'NOTICE: ', sprintf( $format, #_ ));
}
See Tie::Handle for what all you can override of STDOUT's behavior.
You can use Perl's select to redirect STDOUT.
open my $fh, ">log.txt";
print "test1\n";
my $current_fh = select $fh;
print "test2\n";
select $current_fh;
print "test3\n";
The file handle could be anything, even a pipe to another process that post processes your log messages.
PerlIO::tee in the PerlIO::Util module seems to allows you to 'tee' the output of a file handle to multiple destinations (e.g. log processor and STDOUT).
Lots of choices. Use select() to change the filehandle that print defaults to. Or tie STDOUT. Or reopen it. Or apply an IO layer to it.
This isn't the answer to your issue but you should be able to adopt the logic for your own use. If not, maybe someone else will find it useful.
Catching malformed headers before they happen...
package PsychicSTDOUT;
use strict;
my $c = 0;
my $malformed_header = 0;
open(TRUE_STDOUT, '>', '/dev/stdout');
tie *STDOUT, __PACKAGE__, (*STDOUT);
sub TIEHANDLE {
my $class = shift;
my $handles = [#_];
bless $handles, $class;
return $handles;
}
sub PRINT {
my $class = shift;
if (!$c++ && #_[0] !~ /^content-type/i) {
my (undef, $file, $line) = caller;
print STDERR "Missing content-type in $file at line $line!!\n";
$malformed_header = 1;
}
return 0 if ($malformed_header);
return print TRUE_STDOUT #_;
}
1;
usage:
use PsychicSTDOUT;
print "content-type: text/html\n\n"; #try commenting out this line
print "<html>\n";
print "</html>\n";
You could run the script from a wrapper script that captures the original script's stdout and writes the output somewhere sensible.