Here is a simple example to illustrate the issue I am seeing when trying to use IO::Compress::Gzip:
use strict;
use warnings;
eval {
require IO::Compress::Gzip;
IO::Compress::Gzip->import();
1;
} or do {
my $error = $#;
die "\nERROR: Couldn't load IO::Compress::Gzip" if $error;
};
my $input = shift;
my $out = $input.".gz";
print "Defined!\n" if defined $out;
IO::Compress::Gzip::gzip $input => $out
or die "gzip failed: $!\n";
This generates the following error:
Defined!
Use of uninitialized value $_[1] in string eq at /home/statonse/perl/perlbrew/perls/perl-5.22.0/lib/5.22.0/IO/Compress/Base/Common.pm line 280.
IO::Compress::Gzip::gzip: output filename is undef or null string at test.pl line 17.
However, if I use the object interface:
use strict;
use warnings;
eval {
require IO::Compress::Gzip;
IO::Compress::Gzip->import();
1;
} or do {
my $error = $#;
die "\nERROR: Couldn't load IO::Compress::Gzip" if $error;
};
my $input = shift;
my $out = $input.".gz";
print "Defined!\n" if defined $out;
my $z = new IO::Compress::Gzip $out
or die "IO::Compress::Gzip failed: $!\n";
$z->print($input);
It works just fine. For some context, it would work as normal if I imported the module with use:
use strict;
use warnings;
use IO::Compress::Gzip;
my $input = shift;
my $out = $input.".gz";
IO::Compress::Gzip::gzip $input => $out
or die "gzip failed: $!\n";
but I am trying to avoid that since this library is rarely used in the application. Is there something obvious I am doing wrong or is this a behavior specific to this module?
This line:
IO::Compress::Gzip::gzip $input => $out
is parsed differently depending on whether the parser knows that there is a function called IO::Compress::Gzip::gzip or not.
When you load the library with use, its functions are known to the parser before the rest of your program is parsed (because use is a type of BEGIN). In this case the parser chooses the interpretation you want.
In the other case, it chooses the alternate interpretation: indirect object syntax, equivalent to $input->IO::Compress::Gzip::gzip, $out
You can see this for yourself by running perl -MO=Deparse on the different versions of your program.
The fix is to make the function call explicit with parentheses:
IO::Compress::Gzip::gzip($input, $out)
The parser can't misinterpret that.
I am unable to print the whole lines as i try and parse the ini file using Config:Ini operation, its the last part where I believed that the array will have the whole line and not only the key, I am surely missing something here
Input
[DomainCredentials]
broker=SERVER
domain=CUSTOMER1
[ProviderCredentials]
Class=A
Routine=B
Code
#!/sbin/perl -w
use lib "/usr/lib/perl5/site_perl";
use lib "/usr/lib/perl5/vendor_perl";
use strict;
use warnings;
use Config::IniFiles;
my $sPPFile="/tmp/config.txt";
my $sysSec="DomainCredentials";
my $cfg = Config::IniFiles->new(-file=> $sPPFile) || die "Could open file $sPPFile\n";
if ($#){
print "Error";
exit 1;
}
my #params_provider = $cfg->Parameters("ProviderCredentials");
foreach (#params_provider){
print $_."\n";
}
Output
Class
Routine
Expected Output
Class=A
Routine=B
You could use the tied hash option of Config::IniFiles to get the config.txt parameter/value pairs:
use strict;
use warnings;
use Config::IniFiles;
my %ini;
my $sPPFile = "/tmp/config.txt";
tie %ini, 'Config::IniFiles', ( -file => $sPPFile );
print "$_=$ini{ProviderCredentials}{$_}\n"
for keys %{ $ini{ProviderCredentials} };
Output on your dataset:
Class=A
Routine=B
You can change the value of a parameter, and then update the config file by doing this:
$ini{ProviderCredentials}{Class} = 'C';
tied(%ini)->RewriteConfig();
The last command actually writes out the entire config held in the tied hash.
Hope this helps!
It looks like Parameters only returns keys.
You then have to use val to get the values.
#!/sbin/perl -w
use lib "/usr/lib/perl5/site_perl";
use lib "/usr/lib/perl5/vendor_perl";
use strict;
use warnings;
use Config::IniFiles;
my $sPPFile="/tmp/config.txt";
my $sysSec="DomainCredentials";
my $cfg = Config::IniFiles->new(-file=> $sPPFile) || die "Could open file $sPPFile\n";
if ($#){
print "Error";
exit 1;
}
my #param_arr = ('broker','domain');
my %param_hash;
foreach my $p (#param_arr){
if (defined $cfg->val("$sysSec",$p)){
$param_hash{$p} = $cfg->val("$sysSec",$p);
}
else{
die "Could not get parameter $p\n";
}
}
#print $param_hash{broker};
#print $param_hash{domain};
my #params_provider = $cfg->Parameters("ProviderCredentials");
if (defined $cfg->Parameters("ProviderCredentials")){
my #params_provider = $cfg->Parameters("ProviderCredentials");
}else{
die "Could not get parameter ProviderCredentials\n";
}
foreach (#params_provider){
print "Key : ".$_."\t Value : ".$cfg->val("ProviderCredentials",$_)."\n";
}
I am interested in writing a perl script that goes to the following link and extracts the number 1975: https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219
That website is the amount of white men born in the year 1923 who live in San Diego County, California in 1940. I am trying to do this in a loop structure to generalize over multiple counties and birth years.
In the file, locations.txt, I put the list of counties, such as San Diego County.
The current code runs, but instead of the # 1975, it displays unknown. The number 1975 should be in $val\n.
I would very much appreciate any help!
#!/usr/bin/perl
use strict;
use LWP::Simple;
open(L, "locations26.txt");
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
open(O, ">out26.txt");
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
chomp($location);
$location =~ s/ /+/g;
foreach my $year (1923..1923) {
my $u = $url;
$u =~ s/%LOCATION%/$location/;
$u =~ s/%YEAR%/$year/;
#print "$u\n";
my $content = get($u);
my $val = 'unknown';
if ($content =~ / of .strong.([0-9,]+)..strong. /) {
$val = $1;
}
$val =~ s/,//g;
$location =~ s/\+/ /g;
print "'$location',$year,$val\n";
print O "'$location',$year,$val\n";
}
}
Update: API is not a viable solution. I have been in contact with the site developer. The API does not apply to that part of the webpage. Hence, any solution pertaining to JSON will not be applicbale.
It would appear that your data is generated by Javascript and thus LWP cannot help you. That said, it seems that the site you are interested in has a developer API: https://familysearch.org/developers/
I recommend using Mojo::URL to construct your query and either Mojo::DOM or Mojo::JSON to parse XML or JSON results respectively. Of course other modules will work too, but these tools are very nicely integrated and let you get started quickly.
You could use WWW::Mechanize::Firefox to process any site that could be loaded by Firefox.
http://metacpan.org/pod/WWW::Mechanize::Firefox::Examples
You have to install the Mozrepl plugin and you will be able to process the web page contant via this module. Basically you will "remotly control" the browser.
Here is an example (maybe working)
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html');
my $retries = 10;
while ($retries-- and ! $mech->is_visible( xpath => '//*[#class="form-submit"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout" if 0 > $retries;
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
If you use your browser's development tools, you can clearly see the JSON request that the page you link to uses to get the data you're looking for.
This program should do what you want. I've added a bunch of comments for readability and explanation, as well as made a few other changes.
use warnings;
use strict;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
# Create an LWP User-Agent object for sending HTTP requests.
my $ua = LWP::UserAgent->new;
# Open data files
open(L, 'locations26.txt') or die "Can't open locations: $!";
open(O, '>', 'out26.txt') or die "Can't open output file: $!";
# Enable autoflush on the output file handle
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
# This regular expression is like chomp, but removes both Windows and
# *nix line-endings, regardless of the system the script is running on.
$location =~ s/[\r\n]//g;
foreach my $year (1923..1923) {
# If you need to add quotes around the location, use "\"$location\"".
my %args = (LOCATION => $location, YEAR => $year);
my $url = 'https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A^LOCATION^%2520%252Bbirth_year%253A^YEAR^-^YEAR^~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219';
# Note that values need to be doubly-escaped because of the
# weird way their website is set up (the "/proxy" URL we're
# requesting is subsequently loading some *other* URL which
# is provided to "/proxy" as a URL-encoded URL).
#
# This regular expression replaces any ^WHATEVER^ in the URL
# with the double-URL-encoded value of WHATEVER in %args.
# The /e flag causes the replacement to be evaluated as Perl
# code. This way I can look data up in a hash and do URL-encoding
# as part of the regular expression without an extra step.
$url =~ s/\^([A-Z]+)\^/escape(escape($args{$1}))/ge;
#print "$url\n";
# Create an HTTP request object for this URL.
my $request = HTTP::Request->new(GET => $url);
# This HTTP header is required. The server outputs garbage if
# it's not present.
$request->push_header('Content-Type' => 'application/json');
# Send the request and check for an error from the server.
my $response = $ua->request($request);
die "Error ".$response->code if !$response->is_success;
# The response should be JSON.
my $obj = from_json($response->content);
my $str = "$args{LOCATION},$args{YEAR},$obj->{totalHits}\n";
print O $str;
print $str;
}
}
What about this simple script without firefox ? I had investigated the site a bit to understand how it works, and I saw some JSON requests with firebug firefox addon, so I know which URL to query to get the relevant stuff. Here is the code :
use strict; use warnings;
use JSON::XS;
use LWP::UserAgent;
use HTTP::Request;
my $ua = LWP::UserAgent->new();
open my $fh, '<', 'locations2.txt' or die $!;
open my $fh2, '>>', 'out2.txt' or die $!;
# iterate over locations from locations2.txt file
while (my $place = <$fh>) {
# remove line ending
chomp $place;
# iterate over years
foreach my $year (1923..1925) {
# building URL with the variables
my $url = "https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A%2522$place%2522%2520%252Bbirth_year%253A$year-$year~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219";
my $request = HTTP::Request->new(GET => $url);
# faking referer (where we comes from)
$request->header('Referer', 'https://familysearch.org/search/collection/results');
# setting expected format header for response as JSON
$request->header('content_type', 'application/json');
my $response = $ua->request($request);
if ($response->code == 200) {
# this line convert a JSON to Perl HASH
my $hash = decode_json $response->content;
my $val = $hash->{totalHits};
print $fh2 "year $year, place $place : $val\n";
}
else {
die $response->status_line;
}
}
}
END{ close $fh; close $fh2; }
This seems to do what you need. Instead of waiting for the disappearance of the hourglass it waits - more obviously I think - for the appearance of the text node you're interested in.
use 5.010;
use warnings;
use WWW::Mechanize::Firefox;
STDOUT->autoflush;
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
my $mech = WWW::Mechanize::Firefox->new(tab => qr/FamilySearch\.org/, create => 1, activate => 1);
$mech->autoclose_tab(0);
$mech->get('about:blank');
$mech->get($url);
my $text;
while () {
sleep 1;
$text = $mech->xpath('//p[#class="num-search-results"]/text()', maybe => 1);
last if defined $text;
}
my $results = $text->{nodeValue};
say $results;
if ($results =~ /([\d,]+)\s+results/) {
(my $n = $1) =~ tr/,//d;
say $n;
}
output
1-20 of 1,975 results
1975
Update
This update is with special thanks to #nandhp, who inspired me to look at the underlying data server that produces the data in JSON format.
Rather than making a request via the superfluous https://familysearch.org/proxy this code accesses the server directly at https://familysearch.org/search/records, reencodes the JSON and dumps the required data out of the resulting structure. This has the advantage of both speed (the requests are served about once a second - more than ten times faster than with the equivalent request from the basic web site) and stability (as you note, the site is very flaky - in contrast I have never seen an error using this method).
use strict;
use warnings;
use LWP::UserAgent;
use URI;
use JSON;
use autodie;
STDOUT->autoflush;
open my $fh, '<', 'locations26.txt';
my #locations = <$fh>;
chomp #locations;
open my $outfh, '>', 'out26.txt';
my $ua = LWP::UserAgent->new;
for my $county (#locations[36, 0..2]) {
for my $year (1923 .. 1926) {
my $total = familysearch_info($county, $year);
print STDOUT "$county,$year,$total\n";
print $outfh "$county,$year,$total\n";
}
print "\n";
}
sub familysearch_info {
my ($county, $year) = #_;
my $query = join ' ', (
'+event_place_level_1:California',
sprintf('+event_place_level_2:"%s"', $county),
sprintf('+birth_year:%1$d-%1$d~', $year),
'+gender:M',
'+race:White',
);
my $url = URI->new('https://familysearch.org/search/records');
$url->query_form(
collection_id => 2000219,
count => 20,
query => $query);
my $resp = $ua->get($url, 'Content-Type'=> 'application/json');
my $data = decode_json($resp->decoded_content);
return $data->{totalHits};
}
output
San Diego,1923,1975
San Diego,1924,2004
San Diego,1925,1871
San Diego,1926,1908
Alameda,1923,3577
Alameda,1924,3617
Alameda,1925,3567
Alameda,1926,3464
Alpine,1923,1
Alpine,1924,2
Alpine,1925,0
Alpine,1926,1
Amador,1923,222
Amador,1924,248
Amador,1925,134
Amador,1926,67
I do not know how to post revised code from the solution above.
This code does not (yet) compile correctly. However, I have made some essential update to definitely head in that direction.
I would very much appreciate help on this updated code. I do not know how to post this code and this follow up such that it appease the lords who run this sight.
It get stuck at the sleep line. Any advice on how to proceed past it would be much appreciated!
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html', synchronize => 0);
my $retries = 10;
while ($retries-- and $mech->is_visible( xpath => '//*[#id="hourglass"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout while waiting for application" if 0 > $retries;
# Now the hourglass is not visible anymore
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
You should set the current form before accessing a field:
"Given the name of a field, set its value to the value specified. This applies to the current form (as set by the "form_name()" or "form_number()" method or defaulting to the first form on the page)."
$mech->form_name( 'census-search' );
$mech->field( birth_place => 'value_for_birth_place' );
Sorry, I am not able too try this code out and thanks for open a question for a new question.
How can I check if a page contains a specific word. Example: I want to return true or false if the page contains the word "candybar". Notice that the "candybar" could be in between tags (candybar) sometimes and sometimes not. How do I accomplish this?
Here is my code for "grabing" the site (just dont now how to check through the site):
#!/usr/bin/perl -w
use utf8;
use RPC::XML;
use RPC::XML::Client;
use Data::Dumper;
use Encode;
use Time::HiRes qw(usleep);
print "Content-type:text/html\n\n";
use LWP::Simple;
$pageURL = "http://example.com";
$simplePage=get($pageURL);
if ($simplePage =~ m/candybar/) {
print "its there!";
}
I'd suggest that you use some kind of parser, if you're looking for words in HTML or anything else that's tagged in a known way [XML, for example]. I use HTML::Tokeparser but there's many parsing modules on CPAN.
I've left the explanation of the returns from the parser as comments, in case you use this parser. This is extracted from a live program that I use to machine translate the text in web pages, so I've taken out some bits and pieces.
The comment above about checking status and content of returns from LWP, is very sensible too, if the website is off-line, you need to know that.
open( my $fh, "<:utf8", $file ) || die "Can't open $file : $!";
my $p = HTML::TokeParser->new($fh) || die "Can't open: $!";
$p->empty_element_tags(1); # configure its behaviour
# put output into here and it's cumulated
while ( my $token = $p->get_token ) {
#["S", $tag, $attr, $attrseq, $text]
#["E", $tag, $text]
#["T", $text, $is_data]
#["C", $text]
#["D", $text]
#["PI", $token0, $text
my ($type,$string) = get_output($token) ;
# ["T", $text, $is_data] : rule for text
if ( $type eq 'T' && $string =~ /^candybar/ ) {
}
I was wondering if you could shed some lights regarding the code I've been doing for a couple of days.
I've been trying to convert a Perl-parsed hash back to XML using the XMLout() and XMLin() method and it has been quite successful with this format.
#!/usr/bin/perl -w
use strict;
# use module
use IO::File;
use XML::Simple;
use XML::Dumper;
use Data::Dumper;
my $dump = new XML::Dumper;
my ( $data, $VAR1 );
Topology:$VAR1 = {
'device' => {
'FOC1047Z2SZ' => {
'ChassisID' => '2009-09',
'Error' => undef,
'Group' => {
'ID' => 'A1',
'Type' => 'Base'
},
'Model' => 'CATALYST',
'Name' => 'CISCO-SW1',
'Neighbor' => {},
'ProbedIP' => 'TEST',
'isDerived' => 0
}
},
'issues' => [
'TEST'
]
};
# create object
my $xml = new XML::Simple (NoAttr=>1,
RootName=>'data',
SuppressEmpty => 'true');
# convert Perl array ref into XML document
$data = $xml->XMLout($VAR1);
#reads an XML file
my $X_out = $xml->XMLin($data);
# access XML data
print Dumper($data);
print "STATUS: $X_out->{issues}\n";
print "CHASSIS ID: $X_out->{device}{ChassisID}\n";
print "GROUP ID: $X_out->{device}{Group}{ID}\n";
print "DEVICE NAME: $X_out->{device}{Name}\n";
print "DEVICE NAME: $X_out->{device}{name}\n";
print "ERROR: $X_out->{device}{error}\n";
I can access all the element in the XML with no problem.
But when I try to create a file that will house the parsed hash, problem arises because I can't seem to access all the XML elements. I guess, I wasn't able to unparse the file with the following code.
#!/usr/bin/perl -w
use strict;
#!/usr/bin/perl
# use module
use IO::File;
use XML::Simple;
use XML::Dumper;
use Data::Dumper;
my $dump = new XML::Dumper;
my ( $data, $VAR1, $line_Holder );
#this is the file that contains the parsed hash
my $saveOut = "C:/parsed_hash.txt";
my $result_Holder = IO::File->new($saveOut, 'r');
while ($line_Holder = $result_Holder->getline){
print $line_Holder;
}
# create object
my $xml = new XML::Simple (NoAttr=>1, RootName=>'data', SuppressEmpty => 'true');
# convert Perl array ref into XML document
$data = $xml->XMLout($line_Holder);
#reads an XML file
my $X_out = $xml->XMLin($data);
# access XML data
print Dumper($data);
print "STATUS: $X_out->{issues}\n";
print "CHASSIS ID: $X_out->{device}{ChassisID}\n";
print "GROUP ID: $X_out->{device}{Group}{ID}\n";
print "DEVICE NAME: $X_out->{device}{Name}\n";
print "DEVICE NAME: $X_out->{device}{name}\n";
print "ERROR: $X_out->{device}{error}\n";
Do you have any idea how I could access the $VAR1 inside the text file?
Regards,
newbee_me
$data = $xml->XMLout($line_Holder);
$line_Holder has only the last line of your file, not the whole file, and not the perl hashref that would result from evaling the file. Try something like this:
my $ref = do $saveOut;
The do function loads and evals a file for you. You may want to do it in separate steps, like:
use File::Slurp "read_file";
my $fileContents = read_file( $saveOut );
my $ref = eval( $fileContents );
You might want to look at the Data::Dump module as a replacement for Data::Dumper; its output is already ready to re-eval back.
Basically to load Dumper data you eval() it:
use strict;
use Data::Dumper;
my $x = {"a" => "b", "c"=>[1,2,3],};
my $q = Dumper($x);
$q =~ s{\A\$VAR\d+\s*=\s*}{};
my $w = eval $q;
print $w->{"a"}, "\n";
The regexp (s{\A\$VAR\d+\s*=\s*}{}) is used to remove $VAR1= from the beginning of string.
On the other hand - if you need a way to store complex data structure, and load it again, it's much better to use Storable module, and it's store() and retrieve() functions.
This has worked for me, for hashes of hashes. Perhaps won't work so well with structures which contain references other structures. But works well enough for simple structures, like arrays, hashes, or hashes of hashes.
open(DATA,">",$file);
print DATA Dumper(\%g_write_hash);
close(DATA);
my %g_read_hash = %{ do $file };
Please use dump module as a replacement for Data::Dumper
You can configure the variable name used in Data::Dumper's output with $Data::Dumper::Varname.
Example
use Data::Dumper
$Data::Dumper::Varname = "foo";
my $string = Dumper($object);
eval($string);
...will create the variable $foo, and should contain the same data as $object.
If your data structure is complicated and you have strange results, you may want to consider Storable's freeze() and thaw() methods.