Perl Script stops running - perl

I am trying to get a perl script running (in a Windows cmd window) but it will allways just stop working at a certain point. How can I find out why it will not go on?
Here is the script: The last thing that I can see gets executed is "get_html_source()" in line 37
#!/usr/bin/perl
# Perl script that scrapes the members of the Hellenic Parliament
# Created by Kostas Ntonas, 03 May 2013 - http://ntonas.gr
# http://deixto.blogspot.gr/2013/05/scraping-members-of-greek-parliament.html
use strict;
use warnings;
use utf8;
use IO::File;
use POSIX qw(tmpnam);
use DEiXToBot;
use WWW::Selenium;
my $agent = DEiXToBot->new(); # create the DEiXToBot agent object
# launch a Firefox instance
my $sel = WWW::Selenium->new( host => "localhost",
port => 4444,
browser => "*firefox",
browser_url => "http://www.hellenicparliament.gr/"
);
$sel->start;
for my $i (1..30) {
my $url = "http://www.hellenicparliament.gr/en/Vouleftes/Viografika-Stoicheia?pageNo=$i";
$sel->open($url);
$sel->wait_for_page_to_load(5000);
$sel->pause(1);
print "$i) $url\n";
my $content = $sel->get_html_source();
my ($fh,$name); # create a temporary file containing the page's source code
do { $name = tmpnam() } until $fh = IO::File->new($name, O_RDWR|O_CREAT|O_EXCL);
binmode( $fh, ':utf8' );
print $fh $content;
close $fh;
$agent->get("file://$name"); # load the temporary file/page with the DEiXToBot agent using the file:// scheme
unlink $name; # delete the temporary file, it is not needed any more
if (! $agent->success) { die "Could not fetch the temp file!\n"; }
$agent->build_dom();
$agent->load_pattern('C:\Users\XXX\Documents\Privat\MyCase3\Deixto Patterns\parliament_CVs.xml');
$agent->extract_content();
if (! $agent->hits) {
die "Could not find any MPs/ records!\n";
}
else {
for my $record ($agent->records) {
my #rec = #$record;
my $party;
my $logo = $rec[0];
# deduce the party name from the logo in the first column of the table
if ($logo=~m#ND_Logo#) { $party = "N.D. (New Democracy)"; }
elsif ($logo=~m#COALITION#) { $party = "SYRIZA Unitary Social Front"; }
elsif ($logo=~m#PASOK#) { $party = "PA.SO.K. (Panhellenic Socialist Movement)"; }
elsif ($logo=~m#ANEKS_ELL#) { $party = "ANEXARTITOI ELLINES (Independent Hellenes)"; }
elsif ($logo=~m#xrisi#) { $party = "LAIKOS SYNDESMOS - CHRYSI AVGI (People's Association - Golden Dawn)"; }
elsif ($logo=~m#small#) { $party = "DHM.AR (Democratic Left)"; }
elsif ($logo=~m#KKE#) { $party = "K.K.E. (Communist Party of Greece)"; }
elsif ($logo=~m#INDEPENDENT#) { $party = "INDEPENDENT"; }
else { die "$logo => Unknown logo!\n"; }
$rec[0] = $party;
$rec[3]=~s#\s+# #g; # replace whitespace characters with a single space
# append the data in a tab delimited text file
open my $fh,">>:utf8","MPs.txt";
print $fh join("\t",#rec)."\n";
close $fh;
}
}
}
$sel->stop;

Do you know that the code is dying inside get_html_source, or is it actually dying immediately before or after (e.g. in the call to tmpnam, which seems to be missing a semi-colon)?
Another comment is that this seems like a lot of work just to scrape the list of MPs and their parties. If you look at the page source there is a huge block of base-64 encoded text that appears to have all the data you need. So you might find it quicker to load the page, decode the block and have everything you need.

The tmpnam function is provided by the POSIX Perl module. It should work fine on most variants of Unix/Linux but it seems to be broken under Windows.
I suggest replacing the "problematic" line containing the tmpnam call with the following:
use File::Temp qw/ tempfile /;
($fh,$name) = tempfile();
Hopefully this change will fix the issue and allow the script to complete.
This is also what the Perl tmpnam documentation (http://perldoc.perl.org/POSIX.html) suggests: "For security reasons, which are probably detailed in your system's documentation for the C library tmpnam() function, this interface should not be used; instead see File::Temp".

Related

Perl: Get key value

I'm trying to get key values from hash inside my module:
Module.pm
...
my $logins_dump = "tmp/logins-output.txt";
system("cat /var/log/secure | grep -n -e 'Accepted password for' > $logins_dump");
open (my $fh, "<", $logins_dump) or die "Could not open file '$logins_dump': $!";
sub UserLogins {
my %user_logins;
while (my $array = <$fh>) {
if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
$user_logins{$1}++;
}
}
return \%user_logins;
}
sub CheckUserLogins {
my $LoginCounter;
my $UsersToCheck = shift #_;
if (exists %{UserLogins()}{$UsersToCheck}){
$LoginCounter = %{UserLogins{$UsersToCheck}}; #How many logins?
}
else {
$LoginCounter = "0";
}
return \$LoginCounter;
}
Script.pl
$UserLoginCounter = Module::CheckUserLogins($UsersToPass);
I pass usernames to script and check if username is in hash, if it is, I need to return number of logins, which I'm trying to do with $LoginCounter. For some reason scripts returns only 0 or undef.
Well, for starters - you've got CheckUserLogins not CheckLoginAttempts.
Assuming that's just a typo - UserLogins returns a hash reference - a single scalar value. You're getting 0 if the exists check fails presumably.
If it does exist though, you're doing this:
$LoginCounter = %{UserLogins{$UsersToCheck}};
Which isn't valid. Do you have strict and warnings turned on? Because you're trying to assign a hash to a scalar, which isn't going to do what you want.
You probably mean:
$LoginCounter = ${UserLogins()} -> {$UsersToCheck};
Which dereferences the reference from UserLogins and then looks up a key.
I might however, approach your problem a little differently - it'll only work once when you do what you're doing, because each time you call UserLogins it creates a new hash, but you don't rewind $fh.
So I'd suggest:
use strict;
use warnings;
{
my %userlogins;
sub inituserlogins {
open( my $fh, "<", '/var/log/secure' )
or die "Could not open file: $!";
while ( my $array = <$fh> ) {
if ( $array =~ /Accepted\s+password\s+for\s+(\S+)/ ) {
$userlogins{$1}++;
}
}
close($fh);
}
sub CheckUserLogins {
my ($UsersToCheck) = #_;
inituserlogins() unless %userlogins;
return $userlogins{$UsersToCheck} ? $userlogins{$UsersToCheck} : 0;
}
}
You mustn't use capital letters in lexical identifiers as Perl reserves them for global identifiers like package names
One of the main problems is that you're using
exists %{UserLogins()}{$UsersToCheck}
which should be
exists UserLogins()->{$UsersToCheck}
or
exists ${UserLogins()}{$UsersToCheck}
Do you have use strict and use warnings in place as you should have?
Another problem is that you will read all the way through the file every time you call UserLogins. That means the second and later calls to CheckUserLogins (which calls UserLogins) will find nothing, as the end of the file has been reached
You should call your suibroutine user_logins and call it just once, storing the result as a scalar variable. This program shows how
use strict;
use warnings;
use v5.14; # For state variables
sub user_logins {
open my $fh, '<', '/var/log/secure' or die $!;
my %user_logins;
while ( <$fh> ) {
if ( /Accepted\s+password\s+for\s+(\S+)/ ) {
++$user_logins{$1};
}
}
\%user_logins;
}
sub check_user_logins {
my ($users_to_check) = #_;
state $user_logins = user_logins();
$user_logins->{$users_to_check} // 0;
}

using Perl to scrape a website

I am interested in writing a perl script that goes to the following link and extracts the number 1975: https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219
That website is the amount of white men born in the year 1923 who live in San Diego County, California in 1940. I am trying to do this in a loop structure to generalize over multiple counties and birth years.
In the file, locations.txt, I put the list of counties, such as San Diego County.
The current code runs, but instead of the # 1975, it displays unknown. The number 1975 should be in $val\n.
I would very much appreciate any help!
#!/usr/bin/perl
use strict;
use LWP::Simple;
open(L, "locations26.txt");
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
open(O, ">out26.txt");
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
chomp($location);
$location =~ s/ /+/g;
foreach my $year (1923..1923) {
my $u = $url;
$u =~ s/%LOCATION%/$location/;
$u =~ s/%YEAR%/$year/;
#print "$u\n";
my $content = get($u);
my $val = 'unknown';
if ($content =~ / of .strong.([0-9,]+)..strong. /) {
$val = $1;
}
$val =~ s/,//g;
$location =~ s/\+/ /g;
print "'$location',$year,$val\n";
print O "'$location',$year,$val\n";
}
}
Update: API is not a viable solution. I have been in contact with the site developer. The API does not apply to that part of the webpage. Hence, any solution pertaining to JSON will not be applicbale.
It would appear that your data is generated by Javascript and thus LWP cannot help you. That said, it seems that the site you are interested in has a developer API: https://familysearch.org/developers/
I recommend using Mojo::URL to construct your query and either Mojo::DOM or Mojo::JSON to parse XML or JSON results respectively. Of course other modules will work too, but these tools are very nicely integrated and let you get started quickly.
You could use WWW::Mechanize::Firefox to process any site that could be loaded by Firefox.
http://metacpan.org/pod/WWW::Mechanize::Firefox::Examples
You have to install the Mozrepl plugin and you will be able to process the web page contant via this module. Basically you will "remotly control" the browser.
Here is an example (maybe working)
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html');
my $retries = 10;
while ($retries-- and ! $mech->is_visible( xpath => '//*[#class="form-submit"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout" if 0 > $retries;
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
If you use your browser's development tools, you can clearly see the JSON request that the page you link to uses to get the data you're looking for.
This program should do what you want. I've added a bunch of comments for readability and explanation, as well as made a few other changes.
use warnings;
use strict;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
# Create an LWP User-Agent object for sending HTTP requests.
my $ua = LWP::UserAgent->new;
# Open data files
open(L, 'locations26.txt') or die "Can't open locations: $!";
open(O, '>', 'out26.txt') or die "Can't open output file: $!";
# Enable autoflush on the output file handle
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
# This regular expression is like chomp, but removes both Windows and
# *nix line-endings, regardless of the system the script is running on.
$location =~ s/[\r\n]//g;
foreach my $year (1923..1923) {
# If you need to add quotes around the location, use "\"$location\"".
my %args = (LOCATION => $location, YEAR => $year);
my $url = 'https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A^LOCATION^%2520%252Bbirth_year%253A^YEAR^-^YEAR^~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219';
# Note that values need to be doubly-escaped because of the
# weird way their website is set up (the "/proxy" URL we're
# requesting is subsequently loading some *other* URL which
# is provided to "/proxy" as a URL-encoded URL).
#
# This regular expression replaces any ^WHATEVER^ in the URL
# with the double-URL-encoded value of WHATEVER in %args.
# The /e flag causes the replacement to be evaluated as Perl
# code. This way I can look data up in a hash and do URL-encoding
# as part of the regular expression without an extra step.
$url =~ s/\^([A-Z]+)\^/escape(escape($args{$1}))/ge;
#print "$url\n";
# Create an HTTP request object for this URL.
my $request = HTTP::Request->new(GET => $url);
# This HTTP header is required. The server outputs garbage if
# it's not present.
$request->push_header('Content-Type' => 'application/json');
# Send the request and check for an error from the server.
my $response = $ua->request($request);
die "Error ".$response->code if !$response->is_success;
# The response should be JSON.
my $obj = from_json($response->content);
my $str = "$args{LOCATION},$args{YEAR},$obj->{totalHits}\n";
print O $str;
print $str;
}
}
What about this simple script without firefox ? I had investigated the site a bit to understand how it works, and I saw some JSON requests with firebug firefox addon, so I know which URL to query to get the relevant stuff. Here is the code :
use strict; use warnings;
use JSON::XS;
use LWP::UserAgent;
use HTTP::Request;
my $ua = LWP::UserAgent->new();
open my $fh, '<', 'locations2.txt' or die $!;
open my $fh2, '>>', 'out2.txt' or die $!;
# iterate over locations from locations2.txt file
while (my $place = <$fh>) {
# remove line ending
chomp $place;
# iterate over years
foreach my $year (1923..1925) {
# building URL with the variables
my $url = "https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A%2522$place%2522%2520%252Bbirth_year%253A$year-$year~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219";
my $request = HTTP::Request->new(GET => $url);
# faking referer (where we comes from)
$request->header('Referer', 'https://familysearch.org/search/collection/results');
# setting expected format header for response as JSON
$request->header('content_type', 'application/json');
my $response = $ua->request($request);
if ($response->code == 200) {
# this line convert a JSON to Perl HASH
my $hash = decode_json $response->content;
my $val = $hash->{totalHits};
print $fh2 "year $year, place $place : $val\n";
}
else {
die $response->status_line;
}
}
}
END{ close $fh; close $fh2; }
This seems to do what you need. Instead of waiting for the disappearance of the hourglass it waits - more obviously I think - for the appearance of the text node you're interested in.
use 5.010;
use warnings;
use WWW::Mechanize::Firefox;
STDOUT->autoflush;
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
my $mech = WWW::Mechanize::Firefox->new(tab => qr/FamilySearch\.org/, create => 1, activate => 1);
$mech->autoclose_tab(0);
$mech->get('about:blank');
$mech->get($url);
my $text;
while () {
sleep 1;
$text = $mech->xpath('//p[#class="num-search-results"]/text()', maybe => 1);
last if defined $text;
}
my $results = $text->{nodeValue};
say $results;
if ($results =~ /([\d,]+)\s+results/) {
(my $n = $1) =~ tr/,//d;
say $n;
}
output
1-20 of 1,975 results
1975
Update
This update is with special thanks to #nandhp, who inspired me to look at the underlying data server that produces the data in JSON format.
Rather than making a request via the superfluous https://familysearch.org/proxy this code accesses the server directly at https://familysearch.org/search/records, reencodes the JSON and dumps the required data out of the resulting structure. This has the advantage of both speed (the requests are served about once a second - more than ten times faster than with the equivalent request from the basic web site) and stability (as you note, the site is very flaky - in contrast I have never seen an error using this method).
use strict;
use warnings;
use LWP::UserAgent;
use URI;
use JSON;
use autodie;
STDOUT->autoflush;
open my $fh, '<', 'locations26.txt';
my #locations = <$fh>;
chomp #locations;
open my $outfh, '>', 'out26.txt';
my $ua = LWP::UserAgent->new;
for my $county (#locations[36, 0..2]) {
for my $year (1923 .. 1926) {
my $total = familysearch_info($county, $year);
print STDOUT "$county,$year,$total\n";
print $outfh "$county,$year,$total\n";
}
print "\n";
}
sub familysearch_info {
my ($county, $year) = #_;
my $query = join ' ', (
'+event_place_level_1:California',
sprintf('+event_place_level_2:"%s"', $county),
sprintf('+birth_year:%1$d-%1$d~', $year),
'+gender:M',
'+race:White',
);
my $url = URI->new('https://familysearch.org/search/records');
$url->query_form(
collection_id => 2000219,
count => 20,
query => $query);
my $resp = $ua->get($url, 'Content-Type'=> 'application/json');
my $data = decode_json($resp->decoded_content);
return $data->{totalHits};
}
output
San Diego,1923,1975
San Diego,1924,2004
San Diego,1925,1871
San Diego,1926,1908
Alameda,1923,3577
Alameda,1924,3617
Alameda,1925,3567
Alameda,1926,3464
Alpine,1923,1
Alpine,1924,2
Alpine,1925,0
Alpine,1926,1
Amador,1923,222
Amador,1924,248
Amador,1925,134
Amador,1926,67
I do not know how to post revised code from the solution above.
This code does not (yet) compile correctly. However, I have made some essential update to definitely head in that direction.
I would very much appreciate help on this updated code. I do not know how to post this code and this follow up such that it appease the lords who run this sight.
It get stuck at the sleep line. Any advice on how to proceed past it would be much appreciated!
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html', synchronize => 0);
my $retries = 10;
while ($retries-- and $mech->is_visible( xpath => '//*[#id="hourglass"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout while waiting for application" if 0 > $retries;
# Now the hourglass is not visible anymore
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
You should set the current form before accessing a field:
"Given the name of a field, set its value to the value specified. This applies to the current form (as set by the "form_name()" or "form_number()" method or defaulting to the first form on the page)."
$mech->form_name( 'census-search' );
$mech->field( birth_place => 'value_for_birth_place' );
Sorry, I am not able too try this code out and thanks for open a question for a new question.

Perl How to merge two or more excel files in one (multiple worksheets)?

I need to merge a few excel file into one, multiple sheets.
I do not care too much about the sheet name on the new file.
I do not have Excel on the computer I plan to run this. so I cannot use Win32 OLE.
I attempted to run this code https://sites.google.com/site/mergingxlsfiles/ but it is not working, I get a new empty excel file.
I attempt to run http://www.perlmonks.org/?node_id=743574 but I only obtained one of the file in the new excel file.
My input excel files have some french characters (é for e.g.) I believe these are cp1252.
Code used :
#!/usr/bin/perl -w
use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
use File::Glob qw(bsd_glob);
use Getopt::Long;
use POSIX qw(strftime);
GetOptions(
'output|o=s' => \my $outfile,
'strftime|t' => \my $do_strftime,
) or die;
if ($do_strftime) {
$outfile = strftime $outfile, localtime;
};
my $output = Spreadsheet::WriteExcel->new($outfile)
or die "Couldn't create '$outfile': $!";
for (#ARGV) {
my ($filename,$sheetname,$targetname);
my #files;
if (m!^(.*\.xls):(.*?)(?::([\w ]+))$!) {
($filename,$sheetname,$targetname) = ($1,qr($2),$3);
warn $filename;
if ($do_strftime) {
$filename = strftime $filename, localtime;
};
#files = glob $filename;
} else {
($filename,$sheetname,$targetname) = ($_,qr(.*),undef);
if ($do_strftime) {
$filename = strftime $filename, localtime;
};
push #files, glob $filename;
};
for my $f (#files) {
my $excel = Spreadsheet::ParseExcel::Workbook->Parse($f);
foreach my $sheet (#{$excel->{Worksheet}}) {
if ($sheet->{Name} !~ /$sheetname/) {
warn "Skipping '" . $sheet->{Name} . "' (/$sheetname/)";
next;
};
$targetname ||= $sheet->{Name};
#warn sprintf "Copying %s to %s\n", $sheet->{Name}, $targetname;
my $s = $output->add_worksheet($targetname);
$sheet->{MaxRow} ||= $sheet->{MinRow};
foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow}) {
my #rowdata = map {
$sheet->{Cells}->[$row]->[$_]->{Val};
} $sheet->{MinCol} .. $sheet->{MaxCol};
$s->write($row,0,\#rowdata);
}
}
};
};
$output->close;
I have 2 excel files named: 2.xls (only 1 sheet named 2 in it), 3.xls (only 1 sheet named 3)
I launched the script as this:
xlsmerge.pl -s -o results-%Y%m%d.xls 2.xls:2 3.xls:3
Results: results-20121024.xls empty nothing in it.
Then I tried
xlsmerge.pl -s -o results-%Y%m%d.xls 2.xls 3.xls
And it worked.
I am not sure why is it failing while adding the Sheetname
It appears that there is a bug in this line of the script:
if (m!^(.*\.xls):(.*?)(?::([\w ]+))$!) {
($filename,$sheetname,$targetname) = ($1,qr($2),$3);
...
It looks to me like the goal of that line is to allow arguments either in the form
spreadsheet.xls:source_worksheet
or in another form allowing the name of the target sheet to be specified:
spreadsheet.xls:source_worksheet:target_worksheet
The last grouping appears intended to capture that last, optional argument: (?::([\w ]+)). The only problem is, this grouping was not made optional. Thus, when you only specify the source sheet and not the target, the regex fails to match and it falls to the backup behavior, which is to treat the whole argument as the filename. But this fails, too, because you don't have a file called 2.xls:2.
The solution would be to introduce the ? modifier after the last group in the regex to make it optional:
if (m!^(.*\.xls):(.*?)(?::([\w ]+))?$!) {
($filename,$sheetname,$targetname) = ($1,qr($2),$3);
...
Of course, that may not be the only problem. If the script was posted with an error, there could be other errors, too. I don't have Perl available to test it at the moment.

just can't get perl working as expected ( conditionals and variable declaring )

EDIT:
I will try a better explication this time, this is the exact code from my script (sorry for all them coments, they are a result of your sugestions, and apear in the video below).
#use warnings;
#use Data::Dumper;
open(my $tmp_file, ">>", "/tmp/some_bad.log") or die "Can not open log file: $!\n";
#if( $id_client != "")
#allowed_locations = ();
#print $tmp_file "Before the if: ". Data::Dumper->Dump([\#allowed_locations, $id_client]) . "";
if( $id_client )
{
# print $tmp_file "Start the if: ". Data::Dumper->Dump([\#allowed_locations, $id_client]) . "";
# my $q = "select distinct id_location from locations inner join address using (id_db5_address) inner join zona_rural_detaliat using (id_city) where id_client=$id_client";
# my $st = &sql_special_transaction($sql_local_host, $sql_local_database, $sql_local_root, $sql_local_root_password, $q);
# print $tmp_file "Before the while loop: ref(st)='". ref($st) . "\n";
# while((my $id)=$st->fetchrow())
# {
# print $tmp_file "Row the while loop: ". Data::Dumper->Dump([$id]) . "";
# my $id = 12121212;
# push(#allowed_locations, $id);
# }
# print $tmp_file "After the while loop: ref(st)='". ref($st) . "\n";
# my($a) = 1;
#} else {
# my($a) = 0;
}
#print $tmp_file "After the if: ". Data::Dumper->Dump([\#allowed_locations, $id_client]) . "";
close($tmp_file) or die "Can not close file: $!\n";
#&html_error(#allowed_locations);
First off all, somebody said that I should try to run it in command line, the script works fine in command line (no warnings, It was uncommented then), but when triyng to load in via apache in the browser it fails, please see this video where I captured the script behavior, what I tried to show in the video:
I have opened 2 tabs the first doesn't define the variable $id_client, the second defines the variable $id_client that is read from GET: ?id_client=36124 => $id_client = 36124; , both of them include the library in the video "locallib.pl"
When running the script with all the
new code commented the page loads
when uncoment the line that defines
the #allowed_locations = (); the
script fails
leave this definition and uncoment
the if block, and the definition of
my $a; in the if block; Now the script works fine when $id_client is
defined, but fails when $id_client
is not defined
Uncoment the else block and the
definition of my $a; in the else
block. Now the script works fine
with or without $id_client
now comment all the my $a;
definisions and comment the else
block, the script fails
but if I'm using open() to open
a file before the IF, and
close() to close it after the if it does't fail even if the IF block
is empty and event if there is no
else block
I have replicated all the steps when running the script in the command line, and the script worked after each step.
I know it sounds like something that cannot be the behavior of the script, but please watch the video (2 minutes), maybe you will notice something that I'm doing wrong there.
Using perl version:
[root#db]# perl -v
This is perl, v5.8.6 built for i386-linux-thread-mult
Somebody asked if I don't have a test server, answer: NO, my company has a production server that has multiple purposes, not only the web interface, and I cannot risk to update the kernel or the perl version, and cannot risk instaling any debuger, as the company owners say: "If it works, leave it alone", and for them the solution with my ($a); is perfect beacause it works, I'm asking here just for me, to learn more about perl, and to understand what is going wrong and what can I do better next time.
Thank you.
P.S. hope this new approach will restore some of my -1 :)
EDIT:
I had success starting the error logging, and found this in the error log after each step that resulted in a failure I got this messages:
[Thu Jul 15 14:29:19 2010] [error] locallib.pl did not return a true value at /var/www/html/rdsdb4/cgi-bin/clients/quicksearch.cgi line 2.
[Thu Jul 15 14:29:19 2010] [error] Premature end of script headers: quicksearch.cgi
What I found is that this code is at the end of the main code in the locallib.pl after this there are sub definitions, and locallib.pl is a library not a program file, so it's last statement must returns true. , a simple 1; statement at the end of the library ensures that (I put it after sub definitions to ensure that noobody writes code in the main after the 1;) and the problem was fixed.
Don't know why in CLI it had no problem ...
Maybe I will get a lot of down votes now ( be gentle :) ) , but what can I do ...and I hope that some newbies will read this and learn something from my mistake.
Thank you all for your help.
You need to explicitly check for definedness.
If you want to enter the loop when $client is defined,
use if ( defined $client ).
If you want to enter the loop when $client is defined and a valid integer,
use if ( defined $client && $client =~ /^-?\d+$/ ).
I assume it's an integer from the context, if it can be a float, the regex needs to be enhanced - there's a standard Perl library containing pre-canned regexes, including ones to match floats. If you require a non-negative int, drop -? from regex's start.
If you want to enter the loop when $client is defined and a non-zero (and assuming it shouldn't ever be an empty string),
use if ( $client ).
If you want to enter the loop when $client is defined and a valid non-zero int,
use if ( $client && $client =~ /^-?\d+$/ ).
Your #ids is "undef" when if condition is false, which may break the code later on if it relies on #ids being an array. Since you didn't actually specify how the script breaks without an else, this is the most likely cause.
Please see if this version works (use whichever "if" condition from above you need, I picked the last one as it appears to match the closest witrh the original code's intent - only enter for non-zero integers):
UPDATED CODE WITH DEBUGGING
use Data::Dumper;
open(my $tmp_file, ">", "/tmp/some_bad.log") or die "Can not open log file: $!\n";
#ids = (); # Do this first so #ids is always an array, even for non-client!
print $tmp_file "Before the if: ". Data::Dumper->Dump([\#ids, $client]) . "\n";
if ( $client && $client =~ /^-?\d+$/ ) # First expression catches undef and zero
{
print $tmp_file "Start the if: ". Data::Dumper->Dump([\#ids, $client]) . "\n";
my $st = &sql_query("select id from table where client=$client");
print $tmp_file "Before the while loop: ref(st)='". ref($st) . "'\n";
while(my $row = $st->fetchrow())
{
print $tmp_file "Row the while loop: ". Data::Dumper->Dump([row]) . "'\n";
push(#ids, $row->[0]);
}
print $tmp_file "After the while loop: ref(st)='". ref($st) . "'\n";
# No need to undef since both variables are lexically in this block only
}
print $tmp_file "After the if\n";
close($tmp_file) or die "Can not close file: $!\n";
when checking against a string, == and != should be respectively 'eq' or 'ne'
if( $client != "" )
should be
if( $client ne "" )
Otherwise you don't get what you're expecting to get.
Always begin your script with :
use warnings;
use strict;
these will give you usefull informations.
Then you could write :
my #ids;
if (defined $client) {
#ids = (); # not necessary if you run this part only once
my $st = sql_query("select id from table where client=$client");
while( my ($id) = $st->fetchrow ) {
push #ids, $id;
}
} else {
warn '$client not defined';
}
if (#ids) { # Your query returned something
# do stuff with #ids
} else {
warn "client '$client' does not exist in database";
}
Note: this answer was deleted because I consider that this is not a real question. I am undeleting it to save other people repeating this.
Instead of
if( $client != "" )
try
if ($client)
Also, Perl debugging is easier if you
use warnings;
use strict;
What I found is that this code is at the end of the main code in the locallib.pl after this there are sub definitions, and locallib.pl is a library not a program file, so it's last statement must returns true, a simple 1; statement at the end of the library ensures that (put it after sub definitions to ensure that noobody writes code in the main after the 1;) and the problem was fixed.
The conclusion:
I have learned that every time you write a library or modify one, ensure that it's last statment returns true;
Oh my... Try this as an example instead...
# Move the logic into a subroutine
# Forward definition so perl knows func exists
sub getClientIds($);
# Call subroutine to find id's - defined later.
my #ids_from_database = &getClientIds("Joe Smith");
# If sub returned an empty list () then variable will be false.
# Otherwise, print each ID we found.
if (#ids_from_database) {
foreach my $i (#ids_from_database) {
print "Found ID $i \n";
}
} else {
print "Found nothing! \n";
}
# This is the end of the "main" code - now we define the logic.
# Here's the real work
sub getClientIds($) {
my $client = shift #_; # assign first parameter to var $client
my #ids = (); # what we will return
# ensure we weren't called with &getClientIds("") or something...
if (not $client) {
print "I really need you to give me a parameter...\n";
return #ids;
}
# I'm assuming the query is string based, so probably need to put it
# inside \"quotes\"
my $st = &sql_query("select id from table where client=\"$client\"");
# Did sql_query() fail?
if (not $st) {
print "Oops someone made a problem in the SQL...\n";
return #ids;
}
my #result;
# Returns a list, so putting it in a list and then pulling the first element
# in two steps instead of one.
while (#result = $st->fetchrow()) {
push #ids, $result[0];
}
# Always a good idea to clean up once you're done.
$st->finish();
return #ids;
}
To your specific questions:
If you want to test if $client is defined, you want "if ( eval { defined $client; } )", but that's almost certainly NOT what you're looking for! It's far easier to ensure $client has some definition early in the program (e.g. $client = "";). Also note Kaklon's answer about the difference between ne and !=
if (X) { stuff } else { } is not valid perl. You could do: if (X) { stuff } else { 1; } but that's kind of begging the question, because the real issue is the test of the variable, not an else clause.
Sorry, no clue on that - I think the problem's elsewhere.
I also echo Kinopiko in recommending you add "use strict;" at the start of your program. That means that any $variable #that %you use has to be pre-defined as "my $varable; my #that; my %you;" It may seem like more work, but it's less work than trying to deal with undefined versus defined variables in code. It's a good habit to get into.
Note that my variables only live within the squiggliez in which they are defined (there's implicit squiggliez around the whole file:
my $x = 1;
if ($x == 1)
{
my $x = 2;
print "$x \n"; # prints 2. This is NOT the same $x as was set to 1 above.
}
print "$x \n"; # prints 1, because the $x in the squiggliez is gone.

How can I handle template dependencies in Template Toolkit?

My static web pages are built from a huge bunch of templates which are inter-included using Template Toolkit's "import" and "include", so page.html looks like this:
[% INCLUDE top %]
[% IMPORT middle %]
Then top might have even more files included.
I have very many of these files, and they have to be run through to create the web pages in various languages (English, French, etc., not computer languages). This is a very complicated process and when one file is updated I would like to be able to automatically remake only the necessary files, using a makefile or something similar.
Are there any tools like makedepend for C files which can parse template toolkit templates and create a dependency list for use in a makefile?
Or are there better ways to automate this process?
Template Toolkit does come with its own command line script called ttree for building TT websites ala make.
Here is an ttree.cfg file I use often use on TT website projects here on my Mac:
# directories
src = ./src
lib = ./lib
lib = ./content
dest = ./html
# pre process these site file
pre_process = site.tt
# copy these files
copy = \.(png|gif|jpg)$
# ignore following
ignore = \b(CVS|RCS)\b
ignore = ^#
ignore = ^\.DS_Store$
ignore = ^._
# other options
verbose
recurse
Just running ttree -f ttree.cfg will rebuild the site in dest only updating whats been changed at source (in src) or in my libraries (in lib).
For more fine grained dependencies have a look a Template Dependencies.
Update - And here is my stab at getting dependency list by subclassing Template::Provider:
{
package MyProvider;
use base 'Template::Provider';
# see _dump_cache in Template::Provider
sub _dump_deps {
my $self = shift;
if (my $node = $self->{ HEAD }) {
while ($node) {
my ($prev, $name, $data, $load, $next) = #$node;
say {*STDERR} "$name called from " . $data->{caller}
if exists $data->{caller};
$node = $node->[ 4 ];
}
}
}
}
use Template;
my $provider = MyProvider->new;
my $tt = Template->new({
LOAD_TEMPLATES => $provider,
});
$tt->process( 'root.tt', {} ) or die $tt->error;
$provider->_dump_deps;
The code above displays all dependencies called (via INCLUDE, INSERT, PROCESS and WRAPPER) and where called from within the whole root.tt tree. So from this you could build a ttree dependency file.
/I3az/
In case all you care about are finding file names mentioned in directives such as INCLUDE, PROCESS, WRAPPER etc, one imagine even using sed or perl from the command line to generate the dependencies.
However, if there are subtler dependencies (e.g., you reference an image using <img> in your HTML document whose size is calculated using the Image plugin, the problem can become much less tractable.
I haven't really tested it but something like the following might work:
#!/usr/bin/perl
use strict; use warnings;
use File::Find;
use File::Slurp;
use Regex::PreSuf;
my ($top) = #ARGV;
my $directive_re = presuf qw( INCLUDE IMPORT PROCESS );
my $re = qr{
\[%-? \s+ $directive_re \s+ (\S.+) \s+ -?%\]
}x;
find(\&wanted => $top);
sub wanted {
return unless /\.html\z/i;
my $doc = read_file $File::Find::name;
printf "%s : %s\n", $_, join(" \\\n", $doc =~ /$re/g );
}
After reading the ttree documentation, I decided to create something myself. I'm posting it here in case it's useful to the next person who comes along. However, this is not a general solution, but one which works only for a few limited cases. It worked for this project since all the files are in the same directory and there are no duplicate includes. I've documented the deficiencies as comments before each of the routines.
If there is a simple way to do what this does with ttree which I missed, please let me know.
my #dependencies = make_depend ("first_file.html.tmpl");
# Bugs:
# Insists files end with .tmpl (mine all do)
# Does not check the final list for duplicates.
sub make_depend
{
my ($start_file) = #_;
die unless $start_file && $start_file =~ /\.tmpl/ && -f $start_file;
my $dir = $start_file;
$dir =~ s:/[^/]*$::;
$start_file =~ s:\Q$dir/::;
my #found_files;
find_files ([$start_file], \#found_files, $dir);
return #found_files;
}
# Bugs:
# Doesn't check for including the same file twice.
# Doesn't allow for a list of directories or subdirectories to find the files.
# Warning about files which aren't found switched off, due to
# [% INCLUDE $file %]
sub find_files
{
my ($files_ref, $foundfiles_ref, $dir) = #_;
for my $file (#$files_ref) {
my $full_name = "$dir/$file";
if (-f $full_name) {
push #$foundfiles_ref, $full_name;
my #includes = get_includes ($full_name);
if (#includes) {
find_files (\#includes, $foundfiles_ref, $dir);
}
} else {
# warn "$full_name not found";
}
}
}
# Only recognizes two includes, [% INCLUDE abc.tmpl %] and [% INCLUDE "abc.tmpl" %]
sub get_includes
{
my ($start_file) = #_;
my #includes;
open my $input, "<", $start_file or die "Can't open $start_file: $!";
while (<$input>) {
while (/\[\%-?\s+INCLUDE\s+(?:"([^"]+)"|(.*))\s+-?\%\]/g) {
my $filename = $1 ? $1 : $2;
push #includes, $filename;
}
}
close $input or die $!;
return #includes;
}