Perl GD::Graph Invalid data set: 0 at (pie) - perl

i am trying to make a little PERL-Script (i am an beginner!)
I took an example Code and editet it to my needs.
So the task is to read data from a csv file put them into an html-table and also to show a diagram in pie form.
The table already works, only the pie diagram is my problem. I already looked and tried many changes within the diagram part in the code but not win bringing.
Here is my code:
#!C:\Perl64\bin\perl.exe -w
### Variablendeklarationen und Moduleinbindungen ###
use strict;
use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);
use DBI;
my $DBH = DBI->connect('DBI:CSV:');
my $STH;
use CGI::Carp 'fatalsToBrowser';
### Statement-Vorbereitung ###
$DBH->{'csv_tables'}->{'daten'} = { 'file' => 'daten.csv'}
or die "Konnte Datenbank nicht oeffnen:$!";
$STH = $DBH->prepare("SELECT * FROM daten")
or die "Konnte SQL-Statement nicht ausfuehren:$!";
$STH->execute()
or die "Ausfuehren der Datenbankabfrage nicht moeglich:$!";
print <<HERE_TEXT;
Content-type:text/html
<html>
<head>
<title>Datenanzeige CSV-File</title>
</head>
<body>
<center>
<h1>Folgende Umsatzdaten sind ausgelesen worden:</h1>
<hr>
<table border>
<tr>
<td width="200"><b>Filiale:</b></td>
<td width="100"><b>Leiter:</b></td>
<td width="200"><b>Mitarbeiter:</b></td>
<td width="100"><b>Umsatz:</b></td>
</tr>
HERE_TEXT
my #data;
my #diagarray;
while (#data = $STH->fetchrow_array()) {
my $filiale = $data[0];
my $leiter = $data[1];
my $mitarbeiter = $data[2];
my $umsatz = $data[3];
push (#diagarray, $umsatz);
print qq§<tr>\n<td><b>$filiale</b></td>\n<td>$leiter</td>\n<td>$mitarbeiter</td>\n<td>$umsatz</td>\n</tr>\n§;
}
print ("<br><br>");
use GD::Graph::pie;
my $graph = GD::Graph::pie->new(300, 300);
$graph->set(
title => 'Umsatzverteilung Filialen',
) or die $graph->error;
#my #diagram = (\#data,\#diagarray);
#Debug
#my $diagram;
# foreach $diagram(#diagram)
# {
# print ("$diagram\n");
# }
my $gd = $graph->plot(\#diagarray) or die $graph->error;
my $format = $graph->export_format;
print header("image/$format");
binmode STDOUT;
print $graph->plot(\#diagarray)->$format();
Would be great if anyone could give me the last needed hint.
Greetings

When debugging, always confirm your data and script flow, never assume anything to be correct.
Try
use Data::Dumper; # at the top of your script
[...]
print Dumper(\#diagarray); # just before your $graph->plot call
You'll probably notice that your data format differs from what is shown on http://search.cpan.org/~ruz/GDGraph-1.52/Graph.pm#USAGE
You're passing an ArrayRef to ->plot while the sample shows an ArrayRef of ArrayRefs:
[
['Desc1','Desc2'],
[250000, 350000],
]
I suggest to extract the drawing part and try it with static data until you get a working result. Then copy it back into your script and replace the static data with your data, for example:
#!/usr/bin/perl
use GD::Graph::pie;
my $graph = GD::Graph::pie->new(300, 300);
$graph->set(
title => 'Umsatzverteilung Filialen',
) or die $graph->error;
my #diagarray = (
['Title1', 'Title2', ],
[ 100, 200 ],
);
my $gd = $graph->plot(\#diagarray) or die $graph->error;
my $format = $graph->export_format;
print header("image/$format");
binmode STDOUT;
print $graph->plot(\#diagarray)->$format();
Also check the line reported in the error message. Each of your ->plot calls may be the reason.
Two additional remarks:
No(!) code should be within the use lines of your script as they're processed at compile time while code runs at run time. Mixing doesn't harm your script, but looks like my $DBH = DBI->connect('DBI:CSV:'); would run before use CGI::Carp.
print'ing HTML source from a script is ok for testing and learning, but shouldn't be done in productive environments as it makes maintenance harder. Try using Template::Toolkit or something.

Getting two array refs into #diagarray isn't any different to how you're pushing a scalar in to it.
push(#diagarray,\#labels);
push(#diagarray,\#values);
But you want that to happen outside of the while-loop. Inside the while-loop is where you'd populate #labels and #values. Both arrays have to be the same size.
Also your script is trying to output the HTML and piechart in one go which won't work as your browser will just treat it all as just one lump of HTML. Your HTML needs to have an "img" tag in it that points at another URL. That URL can be the same script but with a different query string. For example
use CGI
my $query=new CGI;
if($query->param("piechart")) {
# print out the graph
} else {
print "<img src=\"",$ENV{"SCRIPT_NAME"},"?piechart=1\"/>";
}
Or alternatively you could split the piechart code into an entirely separate script, but that makes it less easy to maintain as you'd have to update two scripts if the code for reading in the data ever changed.

Related

Submit multiple fasta sequence using WWW::Mechanize

I want to summit multiple protein sequences in fasta format on this server using following perl script
use WWW::Mechanize;
# get the webpage with the form
my $mech = WWW::Mechanize->new();
$mech->show_progress(1);
my $url = "http://harrier.nagahama-i-bio.ac.jp/sosui/sosu/sosuigsubmit.html";
$mech->get($url);
# just checks to see if scripts call properly
sub validateInput{
my $file = shift;
my $inFh = IO::File->new( $file ) || die "can't open input file\n";
close($inFh);
return 1;
}
validateInput($ARGV[0]);
# fill the fields with the appropriate data and submit
my $fields = {
'in' => $ARGV[0],
'value' => 'Exec'
};
$mech->submit_form(
fields => $fields,
);
# print results
print $mech->content();
But every time I getting the result like this
<HTML>
<bR><bR><bR>
<TABLE>
<TR><TD ALIGN=left WIDTH=300>TOTAL PROTEINS</TD><TH>0</TH></TR>
<TR><TD ALIGN=left WIDTH=300>TOTAL MEMBRANE PROTEINS</TD><TH>0</TH></TR>
<TR><TD ALIGN=left WIDTH=300>PERCENTAGE</TD><TH> 0.0 %</TH></TR>
</TABLE>
</HTML>
Which is a result page when you submit form without input. So I suspect that there is some problem with my sequence submission. My input file look like this
>ATCG00420
MQGTLSVWLAKRGLVHRSLGFDYQGIETLQIKPEDWHSIAVILYVYGYNYLRSQCAYDVAPGGLLASVYHLTRIEYGV NQAEEVCIKVFTHRSNPRIPSVFWVWKSTDFQERESYDMLGITYDSHPRLKRILMPESWIGWPLRKDYIAPNFYEIQDAY
>ATCG00360
MSAQSEGNYAEALQNYYEAMRLEIDPYDRSYILYNIGLIHTSNGEHTKALEYYFRALERNPFLPQAFNNMAVICHYRGEQAIQQGDSEMAEAWFAQAAEYWKQAITLTPGNYIEAQNWLTITRRFE
and I am calling my script like this
perl my_script input.seq >output
Thanks for helping me out.
For starters, this line:
'in' => $ARGV[0],
Means that you're sending them a filename, rather than the contents of the file. You'll need to get the contents first, and send those. Libraries like File::Slurper are handy for this.

CGI and Perl script one file, passing arguments

I have a script which fetches the summary file from the NCBI website using command line argument (accession number).
Example:
./efetch.pl NM_000040
Now I am trying to fetch the same file using a HTML webpage which takes the form request via a CGI script.
My question: Is it possible to combine the CGI and my Perl script in one file and pass the HTML form argument from the CGI portion of the code to the perl script in single run?
I have tried to do some scripting but it seems that the argument from the CGI is not getting passed to the Perl script.
Any help will be greatly appreciated.
CGI and Perl Script in one single file:
#!/usr/bin/perl -wT
use strict;
use warnings;
use LWP::Simple;
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
################### Environmental Variables ###################
my ($buffer, #pairs, $pair, $name, $value, %FORM);
# Read in text
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
if ($ENV{'REQUEST_METHOD'} eq "POST")
{
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
} else {
$buffer = $ENV{'QUERY_STRING'};
}
#print "$buffer\n";
# Split information into name/value pairs
#pairs = split(/&/, $buffer);
foreach $pair (#pairs) {
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
#$value =~ s/%(..)/pack("C", hex($1))/eg;
$FORM{$name} = $value;
}
my $access = $FORM{accession};
if ($access =~ m{\A(\w+\d+)\z}) {
$access = $1;
}
print "Content-type:text/html\r\n\r\n";
print "<html>";
print "<head>";
print "<title> CGI Program</title>";
print "</head>";
print "<body>";
if ($access eq "") {
print "<h2> Please check the accession number</h2>";
exit;
}
print "<h2>$access</h2>";
print "</body>";
print "</html>";
print <HEADING
<html>
<head>
<title> Output result of the program </title>
</head>
<body>
<h1> Summary result </h1>
<table border=1>
<tr>
<th>S.No.</th>
<th>Fragment</th>
<th>Position</th>
<th>Region</th>
<th>GC%</th>
</tr>
HEADING
;
######################## INPUT PARAMETERS #####################
my $utils = "http://www.ncbi.nlm.nih.gov/entrez/eutils";
my $db = "nuccore";
my $query = $access; #"$ARGV[0]" or die "Please provide input for the accession number. $!";
############### END OF INPUT PARAMETERS ######################
############### FILE DOWNLOAD FROM NCBI ######################
my $report = "gb"; # downloads the summary text file
open (IN,">", $query.".summary");
my $esearch = "$utils/esearch.fcgi?" . "db=$db&retmax=1&usehistory=y&term=";
my $esearch_result = get($esearch . $query);
$esearch_result =~ m|<Count>(\d+)</Count>.*<QueryKey>(\d+)</QueryKey>.*<WebEnv>(\S+)</WebEnv>|s;
my $Count = $1; my $QueryKey = $2; my $WebEnv = $3;
my $retstart; my $retmax=3;
for($retstart = 0; $retstart < $Count; $retstart += $retmax) {
my $efetch = "$utils/efetch.fcgi?" .
"rettype=$report&retmode=text&retstart=$retstart&retmax=$retmax&" .
"db=$db&query_key=$QueryKey&WebEnv=$WebEnv";
my $efetch_result = get($efetch);
print IN $efetch_result, "\n";
}
close (IN);
Print command in the perl script prints the $access but it fails to pass the value of $access to $query.
HTML form:
<form action="/cgi-bin/efetch.cgi" method="post" id="myform">
<div>
NCBI accession number:<label for="accession"> <input type="text" name="accession"> </label><br>
<input type="submit" value="Submit" form="myform">
</div>
</form>
Your script is much more complicated than it needs to be. Specifically - you're using the CGI module (which is deprecated, so you might want to consider something else*) but then you're trying to roll your own input handling in your script.
You can write a single script that sends 'POST' or 'GET' data to itself for processing. That's not too difficult at all.
A simple example might be
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
print "Content-Type: text/html\n\n";
my %param;
while ( <STDIN> ) {
my ( $key, $value ) = split ( "=" );
$param{$key} = $value;
}
print Dumper \%param;
print "<FORM METHOD=\"POST\">\n";
print " <INPUT TYPE=\"text\" NAME=\"access\">\n";
print " <INPUT TYPE=\"submit\">\n";
print "</FORM>\n";
This isn't a good example, but it'll work, and hopefully it'll give you an idea of what's going on - POSTed stuff comes on STDIN. GET stuff comes in the URL.
You can test for the existence of such input, and either render your basic form or process the input you got.
if ( $param{'access'} ) {
#process it;
else {
#print form;
}
There are many modules that make this easier (you're even using one already, in the form of CGI), so I wouldn't EVER suggest doing it this way 'for real' - this is purely an illustration of the basics.
With the CGI module, which is perhaps the thing that'd require least code alteration, you could use the 'CGI::param()' method to retrieve parameters:
use CGI;
print CGI::header;
print CGI::param('access');
#form stuff.
But a more complete one would be to consider a bit more of an in-depth rewrite, and consider using one of the more up to date 'web handling' frameworks. There really are a lot of potential gotchas. (Although it does depend rather, on how much control over your environment you have - internal/limited user scripts I'm a lot more relaxed about than internet facing).
* See: CGI::Alternatives
Writing a CGI program in 2014 is a lot like using a typewriter. Sure, it'll work, but people will look at you very strangely.
But given that you already have a CGI program, let's look at what it might look like if you used techniques that weren't out of date in the last millennium.
There are basically two underlying problems with your code.
You open a file using a name that comes from user input. And that violates the taint mode rules, so your program dies. You would have seen this in your web server error log, had you looked there.
You don't actually need to write the data to a file, because you want to send the data to the user's web browser.
So here's an improved version of your code. It fixes the two problems I mentioned above but it also uses modern tools.
The CGI module has a param() method which makes it far easier for us to get the parameters passed to our program. We also use its header() method to output the CGI header (basically just the Content-type header).
We use the Template module to move all of the HTML out of out code and put it in a separate area. Here I've cheated slightly and have just put it in the DATA section of the CGI program. Usually we'd put it in a completely separate file. Notice how separating the Perl and the HTML makes the program look cleaner and easier to maintain.
It wasn't clear to me exactly how you wanted to format the data you're getting back from the other web site. So I've just stuck it in a "pre" tag. You'll need to work that out for yourself.
Here's the code:
#!/usr/bin/perl -T
use strict;
use warnings;
use LWP::Simple;
use Template;
use CGI ':cgi';
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
my $access = param('accession');
my $utils = "http://www.ncbi.nlm.nih.gov/entrez/eutils";
my $db = "nuccore";
my $query = $access;
my $report = "gb"; # downloads the summary text file
my $esearch = "$utils/esearch.fcgi?" . "db=$db&retmax=1&usehistory=y&term=";
my $esearch_result = get($esearch . $query);
my $data = '';
if (my ($Count, $QueryKey, $WebEnv) = $esearch_result =~ m|<Count>(\d+)</Count>.*<QueryKey>(\d+)</QueryKey>.*<WebEnv>(\S+)</WebEnv>|s) {
my $retstart;
my $retmax=3;
for ($retstart = 0; $retstart < $Count; $retstart += $retmax) {
my $efetch = "$utils/efetch.fcgi?" .
"rettype=$report&retmode=text&retstart=$retstart&retmax=$retmax&" .
"db=$db&query_key=$QueryKey&WebEnv=$WebEnv";
my $efetch_result = get($efetch);
$data .= $efetch_result;
}
}
my $tt = Template->new;
print header;
$tt->process(\*DATA, { data => $data })
or die $tt->error;
__END__
<html>
<head>
<title> CGI Program</title>
</head>
<body>
<h1>Input</h1>
<form action="/cgi-bin/efetch.cgi" method="post" id="myform">
<div>NCBI accession number:<label for="accession"> <input type="text" name="accession"></label><br>
<input type="submit" value="Submit" form="myform"></div>
</form>
[% IF data -%]
<h1>Summary Result</h1>
<pre>
[% data %]
</pre>
[% END -%]
</body>
</html>

using Perl to scrape a website

I am interested in writing a perl script that goes to the following link and extracts the number 1975: https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219
That website is the amount of white men born in the year 1923 who live in San Diego County, California in 1940. I am trying to do this in a loop structure to generalize over multiple counties and birth years.
In the file, locations.txt, I put the list of counties, such as San Diego County.
The current code runs, but instead of the # 1975, it displays unknown. The number 1975 should be in $val\n.
I would very much appreciate any help!
#!/usr/bin/perl
use strict;
use LWP::Simple;
open(L, "locations26.txt");
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
open(O, ">out26.txt");
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
chomp($location);
$location =~ s/ /+/g;
foreach my $year (1923..1923) {
my $u = $url;
$u =~ s/%LOCATION%/$location/;
$u =~ s/%YEAR%/$year/;
#print "$u\n";
my $content = get($u);
my $val = 'unknown';
if ($content =~ / of .strong.([0-9,]+)..strong. /) {
$val = $1;
}
$val =~ s/,//g;
$location =~ s/\+/ /g;
print "'$location',$year,$val\n";
print O "'$location',$year,$val\n";
}
}
Update: API is not a viable solution. I have been in contact with the site developer. The API does not apply to that part of the webpage. Hence, any solution pertaining to JSON will not be applicbale.
It would appear that your data is generated by Javascript and thus LWP cannot help you. That said, it seems that the site you are interested in has a developer API: https://familysearch.org/developers/
I recommend using Mojo::URL to construct your query and either Mojo::DOM or Mojo::JSON to parse XML or JSON results respectively. Of course other modules will work too, but these tools are very nicely integrated and let you get started quickly.
You could use WWW::Mechanize::Firefox to process any site that could be loaded by Firefox.
http://metacpan.org/pod/WWW::Mechanize::Firefox::Examples
You have to install the Mozrepl plugin and you will be able to process the web page contant via this module. Basically you will "remotly control" the browser.
Here is an example (maybe working)
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html');
my $retries = 10;
while ($retries-- and ! $mech->is_visible( xpath => '//*[#class="form-submit"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout" if 0 > $retries;
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
If you use your browser's development tools, you can clearly see the JSON request that the page you link to uses to get the data you're looking for.
This program should do what you want. I've added a bunch of comments for readability and explanation, as well as made a few other changes.
use warnings;
use strict;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
# Create an LWP User-Agent object for sending HTTP requests.
my $ua = LWP::UserAgent->new;
# Open data files
open(L, 'locations26.txt') or die "Can't open locations: $!";
open(O, '>', 'out26.txt') or die "Can't open output file: $!";
# Enable autoflush on the output file handle
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
# This regular expression is like chomp, but removes both Windows and
# *nix line-endings, regardless of the system the script is running on.
$location =~ s/[\r\n]//g;
foreach my $year (1923..1923) {
# If you need to add quotes around the location, use "\"$location\"".
my %args = (LOCATION => $location, YEAR => $year);
my $url = 'https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A^LOCATION^%2520%252Bbirth_year%253A^YEAR^-^YEAR^~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219';
# Note that values need to be doubly-escaped because of the
# weird way their website is set up (the "/proxy" URL we're
# requesting is subsequently loading some *other* URL which
# is provided to "/proxy" as a URL-encoded URL).
#
# This regular expression replaces any ^WHATEVER^ in the URL
# with the double-URL-encoded value of WHATEVER in %args.
# The /e flag causes the replacement to be evaluated as Perl
# code. This way I can look data up in a hash and do URL-encoding
# as part of the regular expression without an extra step.
$url =~ s/\^([A-Z]+)\^/escape(escape($args{$1}))/ge;
#print "$url\n";
# Create an HTTP request object for this URL.
my $request = HTTP::Request->new(GET => $url);
# This HTTP header is required. The server outputs garbage if
# it's not present.
$request->push_header('Content-Type' => 'application/json');
# Send the request and check for an error from the server.
my $response = $ua->request($request);
die "Error ".$response->code if !$response->is_success;
# The response should be JSON.
my $obj = from_json($response->content);
my $str = "$args{LOCATION},$args{YEAR},$obj->{totalHits}\n";
print O $str;
print $str;
}
}
What about this simple script without firefox ? I had investigated the site a bit to understand how it works, and I saw some JSON requests with firebug firefox addon, so I know which URL to query to get the relevant stuff. Here is the code :
use strict; use warnings;
use JSON::XS;
use LWP::UserAgent;
use HTTP::Request;
my $ua = LWP::UserAgent->new();
open my $fh, '<', 'locations2.txt' or die $!;
open my $fh2, '>>', 'out2.txt' or die $!;
# iterate over locations from locations2.txt file
while (my $place = <$fh>) {
# remove line ending
chomp $place;
# iterate over years
foreach my $year (1923..1925) {
# building URL with the variables
my $url = "https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A%2522$place%2522%2520%252Bbirth_year%253A$year-$year~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219";
my $request = HTTP::Request->new(GET => $url);
# faking referer (where we comes from)
$request->header('Referer', 'https://familysearch.org/search/collection/results');
# setting expected format header for response as JSON
$request->header('content_type', 'application/json');
my $response = $ua->request($request);
if ($response->code == 200) {
# this line convert a JSON to Perl HASH
my $hash = decode_json $response->content;
my $val = $hash->{totalHits};
print $fh2 "year $year, place $place : $val\n";
}
else {
die $response->status_line;
}
}
}
END{ close $fh; close $fh2; }
This seems to do what you need. Instead of waiting for the disappearance of the hourglass it waits - more obviously I think - for the appearance of the text node you're interested in.
use 5.010;
use warnings;
use WWW::Mechanize::Firefox;
STDOUT->autoflush;
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
my $mech = WWW::Mechanize::Firefox->new(tab => qr/FamilySearch\.org/, create => 1, activate => 1);
$mech->autoclose_tab(0);
$mech->get('about:blank');
$mech->get($url);
my $text;
while () {
sleep 1;
$text = $mech->xpath('//p[#class="num-search-results"]/text()', maybe => 1);
last if defined $text;
}
my $results = $text->{nodeValue};
say $results;
if ($results =~ /([\d,]+)\s+results/) {
(my $n = $1) =~ tr/,//d;
say $n;
}
output
1-20 of 1,975 results
1975
Update
This update is with special thanks to #nandhp, who inspired me to look at the underlying data server that produces the data in JSON format.
Rather than making a request via the superfluous https://familysearch.org/proxy this code accesses the server directly at https://familysearch.org/search/records, reencodes the JSON and dumps the required data out of the resulting structure. This has the advantage of both speed (the requests are served about once a second - more than ten times faster than with the equivalent request from the basic web site) and stability (as you note, the site is very flaky - in contrast I have never seen an error using this method).
use strict;
use warnings;
use LWP::UserAgent;
use URI;
use JSON;
use autodie;
STDOUT->autoflush;
open my $fh, '<', 'locations26.txt';
my #locations = <$fh>;
chomp #locations;
open my $outfh, '>', 'out26.txt';
my $ua = LWP::UserAgent->new;
for my $county (#locations[36, 0..2]) {
for my $year (1923 .. 1926) {
my $total = familysearch_info($county, $year);
print STDOUT "$county,$year,$total\n";
print $outfh "$county,$year,$total\n";
}
print "\n";
}
sub familysearch_info {
my ($county, $year) = #_;
my $query = join ' ', (
'+event_place_level_1:California',
sprintf('+event_place_level_2:"%s"', $county),
sprintf('+birth_year:%1$d-%1$d~', $year),
'+gender:M',
'+race:White',
);
my $url = URI->new('https://familysearch.org/search/records');
$url->query_form(
collection_id => 2000219,
count => 20,
query => $query);
my $resp = $ua->get($url, 'Content-Type'=> 'application/json');
my $data = decode_json($resp->decoded_content);
return $data->{totalHits};
}
output
San Diego,1923,1975
San Diego,1924,2004
San Diego,1925,1871
San Diego,1926,1908
Alameda,1923,3577
Alameda,1924,3617
Alameda,1925,3567
Alameda,1926,3464
Alpine,1923,1
Alpine,1924,2
Alpine,1925,0
Alpine,1926,1
Amador,1923,222
Amador,1924,248
Amador,1925,134
Amador,1926,67
I do not know how to post revised code from the solution above.
This code does not (yet) compile correctly. However, I have made some essential update to definitely head in that direction.
I would very much appreciate help on this updated code. I do not know how to post this code and this follow up such that it appease the lords who run this sight.
It get stuck at the sleep line. Any advice on how to proceed past it would be much appreciated!
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html', synchronize => 0);
my $retries = 10;
while ($retries-- and $mech->is_visible( xpath => '//*[#id="hourglass"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout while waiting for application" if 0 > $retries;
# Now the hourglass is not visible anymore
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
You should set the current form before accessing a field:
"Given the name of a field, set its value to the value specified. This applies to the current form (as set by the "form_name()" or "form_number()" method or defaulting to the first form on the page)."
$mech->form_name( 'census-search' );
$mech->field( birth_place => 'value_for_birth_place' );
Sorry, I am not able too try this code out and thanks for open a question for a new question.

Downloading Text from Several Links using WWW::Mechanize

For an entire week I have been attempting to write a code that will download links from a webpage and then loop through each link to dump the content written on each link's page. The original webpage I downloaded has 500 links to separate web pages that each contain important information for me. I only want to go one level down. However I am having several issues.
RECAP: I want to download the links from a webpage and automatically have my program print off the text contained in those links. I would prefer to have them printed in a file.
1) When I download the links from the original website, the useful ones are not written out fully. (ie they say "/festevents.nsf/all?openform" which is not a usable webpage)
2) I have been unable to print the text content of the page. I have been able to print the font details, but that is useless.
#Download all the modules I used#
use LWP::UserAgent;
use HTML::TreeBuilder;
use HTML::FormatText;
use WWW::Mechanize;
use Data::Dumper;
#Download original webpage and acquire 500+ Links#
$url = "http://wx.toronto.ca/festevents.nsf/all?openform";
my $mechanize = WWW::Mechanize->new(autocheck => 1);
$mechanize->get($url);
my $title = $mechanize->title;
print "<b>$title</b><br />";
my #links = $mechanize->links;
foreach my $link (#links) {
# Retrieve the link URL
my $href = $link->url_abs;
#
# $URL1= get("$link");
#
my $ua = LWP::UserAgent->new;
my $response = $ua->get($href);
unless($response->is_success) {
die $response->status_line;
}
my $URL1 = $response->decoded_content;
die Dumper($URL1);
#This part of the code is just to "clean up" the text
$Format=HTML::FormatText->new;
$TreeBuilder=HTML::TreeBuilder->new;
$TreeBuilder->parse($URL1);
$Parsed=$Format->format($TreeBuilder);
open(FILE, ">TorontoParties.txt");
print FILE "$Parsed";
close (FILE);
}
Please help me! I am desperate! If possible please explain to me the logic behind each step? I have been frying my brain on this for a week and I want help seeing other peoples logic behind the problems.
Too much work. Study the WWW::Mechanize API to realise that almost all of that functionality is already built-in. Untested:
use strictures;
use WWW::Mechanize qw();
use autodie qw(:all);
open my $h, '>:encoding(UTF-8)', 'TorontoParties.txt';
my $mechanize = WWW::Mechanize->new;
$mechanize->get('http://wx.toronto.ca/festevents.nsf/all?openform');
foreach my $link (
$mechanize->find_all_links(url_regex => qr'/festevents[.]nsf/[0-9a-f]{32}/[0-9a-f]{32}[?]OpenDocument')
) {
$mechanize->get($link->url_abs);
print {$h} $mechanize->content(format => 'text');
}
close $h;

How can I dynamically generate an HTML tables containing 100 rows each in Perl?

use POSIX;
my $test = "";
my $elements = scalar(#array);
my $tablecount = ($elements / 100);
my $tblnum = ceil($tablecount);
my #hundred;
foreach $test (#array) {
until ($tblcnt == $tblnum){
#hundred(#array, 0, 99);
print qq~<table id="$tblcnt"><tr><td>~;
foreach $test (#hundred){
print qq~<tr><td>$test</td></tr>~;
}
print qq~</table>~;
$tblcnt++;
}
}
UG!!! I am trying to learn this but, I just cannot get it right!!!
I am trying to dynamically generate "x" number of html tables filled with up to 100 lines of data each.
I can get the table count needed, I can loop, I can add but, one thing is for sure: I CANNOT WRITE CODE.
Here is the result wanted:
1- I have 1027 lines of data from an array.
2- I want my script to make 11 html tables with up to 100 lines each. ((Layers actually) which by default will be not visible via css. That way I can do some show hide on the tables like a "next prev" navigation. I don't need help with the cross browser css.)
3- IF there is a better way, a method that I can comprehend anyhow, other than using visible= method, please elaborate!
I gave up trying to have Perl make pages of 100 with "next prev" links to the data so I resorted to using css javascript onclick=yadayada to "display the data".
I thought it would be easier to shift off 100 lines of the array in a loop and put the lines in their own html tables. Not.
I have failed miserably and need assistance.
I think you need to spend more time learning the basics of Perl and CGI before writing any scripts.
It is useful to separate logic from presentation in CGI scripts. To that end, I find HTML::Template very useful.
The following script will generate an HTML document containing 100 tables with 100 rows of 10 cells each. It will take its sweet time doing that.
#!/usr/bin/perl
use strict; use warnings;
use CGI::Simple;
use HTML::Template;
my $tmpl = HTML::Template->new(scalarref => page_template() );
my #tables;
for my $t (1 .. 100) {
my #rows;
for my $r (1 .. 100) {
push #rows, { CELLS => [ map { CELL => $_ }, 1 .. 10 ] };
}
push #tables, { ID => "table_$t", ROWS => \#rows }
}
$tmpl->param(TABLES => \#tables);
my $cgi = CGI::Simple->new;
print $cgi->header('text/html');
$tmpl->output(print_to => \*STDOUT);
sub page_template {
return \ <<EO_TMPL
<!DOCTYPE HTML>
<html>
<head><title>Tables Example</title></head>
<body>
<TMPL_LOOP TABLES>
<table id="<TMPL_VAR ID>">
<TMPL_LOOP ROWS>
<tr>
<TMPL_LOOP CELLS>
<td><TMPL_VAR CELL></td>
</TMPL_LOOP>
</tr>
</TMPL_LOOP>
</table>
</TMPL_LOOP>
</body>
</html>
EO_TMPL
;
}
my $cnt = 0;
while (#array) {
my #rows = splice #array, 0, 100;
print qq(<table id="t$cnt">\n);
for my $row (#rows) {
print "<tr><td>$row</td></tr>\n";
}
print "</table>\n";
++$cnt;
}
You may want to use HTML::Table for generating HTML.