Would it be OK to keep the END block in this example, because nobody wants a broken terminal or shouldn't I put an END block in a module?
package My_Package;
use warnings;
use strict;
use Term::ReadKey;
sub _init_scr {
my ( $arg ) = #_;
$arg->{backup_flush} = $|;
$| = 1;
Term::ReadKey::ReadMode 'ultra-raw';
}
sub _end_win {
my ( $arg ) = #_;
print "\n\r";
Term::ReadKey::ReadMode 'restore';
$| = $arg->{backup_flush};
}
END {
Term::ReadKey::ReadMode 'restore';
}
sub my_function {
my $arg = {};
_init_scr( $arg );
while ( 1 ) {
my $c = ReadKey 0;
if ( ! defined $c ) {
_end_win( $arg );
warn "EOT";
return;
}
next if $c eq "\e";
given ( $c ) {
when ( $c ge 'a' && $c le 'z' ) {
print $c;
$arg->{string} .= $c;
}
when ( $c eq "\cC" ) {
_end_win( $arg );
print STDERR "^C";
kill( 'INT', $$ );
return;
}
when ( $c eq "\r" ) {
_end_win( $arg );
return $arg->{string};
}
}
}
}
If your module changes the terminal mode, then I would think the most polite thing to do would be for it to also install an END block to restore the terminal mode before the program exits.
No, it's polite and expected that you put things back as you found them.
However, it's unwelcome to tidy up someone else's workspace unless you've been asked to do so.
That is, your END routine shouldn't run unless it has reason to do so, and your module probably ought to allow a developer to disable the automatic cleanup. (E.g., use My_Package qw(:no_auto_restore).)
Failing that, the POD ought to explicitly document that the module fiddles with a system resource upon exit.
Related
I have to write a script that get some URLs in parallel and do some work. In the past I have always used Parallel::ForkManager for such things, but now I wanted to learn something new and try asynchronous programming with AnyEvent (and AnyEvent::HTTP or AnyEvent::Curl::Multi) ... but I'm having problem understanding AnyEvent and writing a script that should:
open a file (every line is a seperate URL)
(from now in parallel, but with a limit for f.e. 10 concurrent requests)
read file line after line (I dont want to load whole file to memory - it might be big)
make a HTTP request for that URL
read response
updates MySQL record accordingly
(next file line)
I have read many manuals, tutorials, but its still hard for me to understand differences between blocking and non-blocking code. I have found similar script at http://perlmaven.com/fetching-several-web-pages-in-parallel-using-anyevent, where Mr. Szabo explains the basics, but I still cant understand how to implement something like:
...
open my $fh, "<", $file;
while ( my $line = <$fh> )
{
# http request, read response, update MySQL
}
close $fh
...
... and add a concurrency limit in this case.
I would be very grateful for help ;)
UPDATE
Following Ikegami's advice I gave Net::Curl::Multi a try. I'm very pleased with results. After years of using Parallel::ForkManager just for concurrent grabbing thousands of URLs, Net::Curl::Multi seems to be awesome.
Here is my code with while loop on filehandle. It seems to work as it should, but considering it's my first time writing something like this I would like to ask more experienced Perl users to take a look and tell me if there are some potential bugs, something I missed, etc.
Also, if I may ask: as I don't fully understand how Net::Curl::Multi's concurrency works, please tell me whether I should expect any problems with putting MySQL UPDATE command (via DBI) inside RESPONSE loop (besides higher server load obviously - I expect final script to run with about 50 concurrent N::C::M workers, maybe more).
#!/usr/bin/perl
use Net::Curl::Easy qw( :constants );
use Net::Curl::Multi qw( );
sub make_request {
my ( $url ) = #_;
my $easy = Net::Curl::Easy->new();
$easy->{url} = $url;
$easy->setopt( CURLOPT_URL, $url );
$easy->setopt( CURLOPT_HEADERDATA, \$easy->{head} );
$easy->setopt( CURLOPT_FILE, \$easy->{body} );
return $easy;
}
my $maxWorkers = 10;
my $multi = Net::Curl::Multi->new();
my $workers = 0;
my $i = 1;
open my $fh, "<", "urls.txt";
LINE: while ( my $url = <$fh> )
{
chomp( $url );
$url .= "?$i";
print "($i) $url\n";
my $easy = make_request( $url );
$multi->add_handle( $easy );
$workers++;
my $running = 0;
do {
my ($r, $w, $e) = $multi->fdset();
my $timeout = $multi->timeout();
select $r, $w, $e, $timeout / 1000
if $timeout > 0;
$running = $multi->perform();
RESPONSE: while ( my ( $msg, $easy, $result ) = $multi->info_read() ) {
$multi->remove_handle( $easy );
$workers--;
printf( "%s getting %s\n", $easy->getinfo( CURLINFO_RESPONSE_CODE ), $easy->{url} );
}
# dont max CPU while waiting
select( undef, undef, undef, 0.01 );
} while ( $workers == $maxWorkers || ( eof && $running ) );
$i++;
}
close $fh;
Net::Curl is a rather good library that's extremely fast. Furthermore, it can handle parallel requests too! I'd recommend using this instead of AnyEvent.
use Net::Curl::Easy qw( :constants );
use Net::Curl::Multi qw( );
sub make_request {
my ( $url ) = #_;
my $easy = Net::Curl::Easy->new();
$easy->{url} = $url;
$easy->setopt( CURLOPT_URL, $url );
$easy->setopt( CURLOPT_HEADERDATA, \$easy->{head} );
$easy->setopt( CURLOPT_FILE, \$easy->{body} );
return $easy;
}
my $max_running = 10;
my #urls = ( 'http://www.google.com/' );
my $multi = Net::Curl::Multi->new();
my $running = 0;
while (1) {
while ( #urls && $running < $max_running ) {
my $easy = make_request( shift( #urls ) );
$multi->add_handle( $easy );
++$running;
}
last if !$running;
my ( $r, $w, $e ) = $multi->fdset();
my $timeout = $multi->timeout();
select( $r, $w, $e, $timeout / 1000 )
if $timeout > 0;
$running = $multi->perform();
while ( my ( $msg, $easy, $result ) = $multi->info_read() ) {
$multi->remove_handle( $easy );
printf( "%s getting %s\n", $easy->getinfo( CURLINFO_RESPONSE_CODE ), $easy->{url} );
}
}
This does exactly what you want, in an asynchronous fashion, and it does that by wrapping Net::Curl in a safe fashion:
#!/usr/bin/env perl
package MyDownloader;
use strict;
use warnings qw(all);
use Moo;
extends 'YADA::Worker';
has '+use_stats'=> (default => sub { 1 });
has '+retry' => (default => sub { 10 });
after init => sub {
my ($self) = #_;
$self->setopt(
encoding => '',
verbose => 1,
);
};
after finish => sub {
my ($self, $result) = #_;
if ($self->has_error) {
print "ERROR: $result\n";
} else {
# do the interesting stuff here
printf "Finished downloading %s: %d bytes\n", $self->final_url, length ${$self->data};
}
};
around has_error => sub {
my $orig = shift;
my $self = shift;
return 1 if $self->$orig(#_);
return 1 if $self->getinfo('response_code') =~ m{^5[0-9]{2}$}x;
};
1;
package main;
use strict;
use warnings qw(all);
use Carp;
use YADA;
my $q = YADA->new(
max => 8,
timeout => 30,
);
open(my $fh, '<', 'file_with_urls_per_line.txt')
or croak "can't open queue: $!";
while (my $url = <$fh>) {
chomp $url;
$q->append(sub {
MyDownloader->new($url)
});
}
close $fh;
$q->wait;
I'm trying to populate the grep result to csv file. But it is showing the following error.
"Use of uninitialized value in concatenation (.) or string at"
code:
sub gen_csv {
my $db_ptr = shift #_;
my $cvs_file_name = shift #_;
open( FILE, ">$cvs_file_name" ) or die("Unable to open CSV FILE $cvs_file_name\n");
print FILE "Channel no, Page no, \n";
foreach my $s ( #{$db_ptr} ) {
my $tmp = "$s->{'ch_no'},";
$tmp .= "$s->{'pg_no'},";
print FILE $tmp;
}
close(FILE);
}
sub parse_test_logs {
my $chnl;
my $page;
my $log = "sample.log";
open my $log_fh, "<", $log;
while ( my $line = <$log_fh> ) {
if ( $line =~ /(.*):.*solo_(.*): queueing.*/ ) {
my $chnl = $1;
my $page = $2;
}
my %test_details = (
'ch_no' => $chnl,
'pg_no' => $page, # <- was missing closing single quote
);
push( #{$dba_ptr}, \%test_details );
}
close log_fh;
}
Any suggestions on what i'm missing out?
(i'm getting the above error pointing to my $tmp = "$s->{'ch_no'},"; in gen_csv module)
Most likely this is due to NULL values in your DB records or the keys you are using are wrong. Either way, the warning is because the ch_no value does not exist.
If you don't care about NULL values, and you are fine with some of the values being missing, then you can suppress warnings for uninitialized values.
no warnings 'uninitialized';
Your problem involves this block:
if ( $line =~ /(.*):.*solo_(.*): queueing.*/ ) {
my $chnl = $1;
my $page = $2;
}
my %test_details = (
'ch_no' => $chnl,
'pg_no' => $page,
);
You're capturing your variables, but you have them declared with my within the if block. Those lexicals then go out of scope and are undef when used to initialize the hash.
I recommend simplifying your parsing function to the following:
sub parse_test_logs {
my $log = "sample.log";
open my $log_fh, "<", $log;
while (<$log_fh>) {
if ( my ( $chnl, $page ) = /(.*):.*solo_(.*): queueing.*/ ) {
push #{$dba_ptr}, { 'ch_no' => $chnl, 'pg_no' => $page };
} else {
warn "regex did not match for line $.: $_";
}
}
close $log_fh;
}
Finally, it's possible that you already are, but I just want to pass on the ever necessary advice to always include use strict; and use warnings; at the top of EVERY Perl script.
Can I use both ways to localize $| or should I use one in favor of the other?
Way 1: backup old value of $| in "_init_scr" and set back $| to the old value when "_end_win" is called.
Way 2: calling local $| = 1 after "_init_scr" is called.
package Package_name
# ...
sub _init_scr {
my ( $arg ) = #_;
$arg->{old_handle} = select( $arg->{handle_out} );
#$arg->{backup_flush} = $|; # way 1
$| = 1;
# ...
}
sub _end_win {
my ( $arg ) = #_;
# ...
#$| = $arg->{backup_flush}; # way 1
select( $arg->{old_handle} );
}
sub choose {
my $arg = ...;
# ...
_init_scr( $arg );
# way 2 - instead of setting `$|` in "_init_scr" set it here:
#local $| = 1;
# ...
while ( 1 ) {
my $c = _getch( $arg );
# ...;
given ( $c ) {
# ...
when ( $c == CONTROL_C ) {
_end_win( $arg );
print "^C";
kill( 'INT', $$ );
return;
}
when ( $c == KEY_ENTER ) {
# ...
_end_win( $arg );
return $result;
}
}
}
}
Use local. That way, $| is restored no matter how the sub is exited (exception, early return, etc).
By the way, you could use select()->flush; instead of toggling $| back and forth.
use IO::Handle qw( ); # Required in older versions of Perl.
print "^C";
select()->flush();
That said, the advantage of local $| is gone since you need to call _end_win to clean up anyway. So let's get rid of the need for _end_win.
use Sub::ScopeFinalizer qw( scope_finalizer );
sub _local_scr {
my ( $arg ) = #_;
my $old_autoflush = $|;
my $old_handle = select( $arg->{handle_out} );
$| = 1;
return scope_finalizer {
$| = $old_autoflush;
select($old_handle);
};
}
sub choose {
my $arg = ...;
my $guard = _local_scr( $arg );
while ( 1 ) {
...
print "^C";
kill( 'INT', $$ );
return;
...
}
}
If you want to localize the value, just use local. It will handle restoring the original value when the scope where it was localized is exited without any additional effort (or chance for mistakes) on your part.
Third way:
use IO::Handle;
# ...
$arg->{handle_out}->autoflush(1);
# ...
$arg->{handle_out}->autoflush(0);
There are various other convenient methods available in IO::Handle.
You should use local $| = 1;, as this is the idiomatic Perl way of doing this. It's simpler than keeping track of the value in another variable.
Use an extra set of braces (if needed) to create a scope so that it only applies to the part of the program that should have it set.
{
#Buffering is turned off only in here.
local $| = 1;
unbuffered_commands();
}
buffered_commands();
Based on the answer provided here, I am attempting to validate whether or not a perl module is installed.
For this, I have:
# &getYN and &prompt are only included here for completeness
sub getYN {
unless ( $autoyes =~ /[Yy]/ ) {
my ( $prompt, $default ) = #_;
my $defaultValue = $default ? "[$default]" : "";
print "$prompt $defaultValue: ";
chomp( my $input = <STDIN> );
return $input ? $input : $default;
} else {
return "Y";
}
}
sub prompt {
my ( $prompt, $default ) = #_;
my $defaultValue = $default ? "[$default]" : "";
print "$prompt $defaultValue: ";
chomp( my $input = <STDIN> );
return $input ? $input : $default;
}
&chklib("RRDTool::OO");
sub chklib {
my $lib = shift;
eval { require $lib; };
if ($#) {
print "You are missing a required Perl Module: $lib\n";
my $ok = &getYN( "Shall I attempt to install it for you?", "y" );
if ( $ok =~ /[Yy]/ ) {
require CPAN;
CPAN::install($lib);
} else {
print "Installation requires $lib\n";
exit;
}
}
}
This runs as expected, but for some reason, the eval returns that I don't have RRDTool::OO installed, when, in fact, I do.
If I create an empty file and run:
# File foo.pl
use strict;
$| = 1;
use RRDTool::OO;
Then I get no errors.
But when I run the first file with print $#;, it returns:
Can't locate RRDTool::OO in ...
What am I doing wrong?
You have to check the result of the eval, like
if (eval("require xxx;")) {
print "you have it\n";
} else {
print "you don't\n";
}
What is happening is that
$lib = "RRDTool::OO";
eval { require $lib }
is executed with the stringified expression
require "RRDTool::OO"
not the bareword style
require RRDTool::OO
so it is looking for a file called RRDTool::OO in your #INC path instead of a file called RRDTool/OO.pm.
If you want to use require at run-time with a variable expression, you'll want to either use the stringy form of eval
eval "require $lib"
or process the arg to require yourself
$lib = "RRDTool::OO";
$lib =~ s{::}{/}g;
eval { require "$lib.pm" }
I am new to Object oriented programming in perl. So, I have a silly question.
What --
I am writing a script which will do something and write result to stream ( STDOUT or NETWORK ).
How --
[main.pl]
#!/usr/bin/perl
use strict;
require const::StreamTypes;
require output;
my $out = output->new("output");
$out->writeLine("Sample output");
[output.pm]
#!/usr/bin/perl
use strict;
require const::StreamTypes;
package output;
my $OUTSTR;
sub new{
my $class = shift();
my $stream = shift();
if($stream eq const::StreamTypes->STDNET){
}elsif($stream eq const::StreamTypes->STDWEB){
}else{
*OUTSTR = *STDOUT;
}
my $self = {
"_outStream" => $stream,
"_outStreamPtr" => $OUTSTR
};
bless($self, $class);
}
sub writeLine{
my $msg = shift();
print(OUTSTR "$msg\n");
}
return 1;
So, can anyone help me understand what is going wrong here? 'cas program runs without error but with no output.
Thanks!
I changed a couple of things here:
the first parameter of a methd is the invocant (instance or class) itself
indirect file handles are globals!
the autodie module comes in handy, if using open
consider using strict in your modules, too
I would not recommend the use of package global variable ( my $OUTSTR; ), because that's going to be messy with multiple instances, which want to have different streams.
And I definitely got into the habit of using accessors for all attributes. You can use a lightweight system like Class::Accessor or perhaps you are even lucky enough to use Moose our Mouse. Of course there are a couple of other modules also providing accessors in different ways.
package output;
use strict;
use warnings;
use autodie;
use Class::Accessor "moose-like";
has "outStream" => ( is => 'rw' );
sub new{
my ( $class, $stream ) = #_;
my $self = bless( {}, $class );
if ( 0 ) {
# ...
} else {
open( my $outStream, '>&', \*STDOUT );
$self->outStream( $outStream );
}
return $self;
}
sub writeLine{
my ( $self, $msg ) = #_;
print { $self->outStream } "$msg\n";
}
return 1;
Moose would create a constructor for you, but you can insert your parameter processing as easy as follows:
use Moose;
has "outStream" => ( is => 'rw' );
sub BUILDARGS {
my ( $class, $stream ) = #_;
open( my $outStream, '>&', \*STDOUT );
return {
outStream => $outStream,
};
}
$OUTSTR and *OUTSTR are very different things -- you should clear up your misunderstanding about this before you worry about object oriented programming.
That said, you can probably fix this script by getting everything to refer to $OUTSTR:
...
}else{
$OUTSTR = *STDOUT;
}
...
print $OUTSTR "$msg\n";
How about just passing a file handle directly into the object's constructor?
package output;
sub new {
my ($class, $fh) = #_;
bless { file_handle => $fh }, $class;
}
sub writeLine {
my $self = shift;
my $line = shift;
print {$self->{file_handle}} $line;
}
1;
Example usage:
my $output = output->new(\*STDOUT); # write to stdout
my $socket = IO::Socket::INET->new('www.perl.org', PeerPort => 'http(80)', Proto => 'tcp');
my $output = output->new($socket); # write to a socket
Please don't use barenames for file handles. Use lexical file handles.
The following lines assume that there is a hash %type_handlers somewhere that looks something like this:
{ const::StreamTypes->STDNET => \&constructor_for_stdnet_handles
, const::StreamTypes->STDWEB => \&constructor_for_stdweb_handles
}
Then you can replace the bottom of your constructor with:
my $handler = $type_handlers{ $stream };
my $outstr
= $handler ? $handler->()
: do { my $h; open( $h, '>&', \*::STDOUT ) and $h; }
;
return bless( {
_outStream => $stream
, _outStreamPtr => $outstr
}
, $class
);
Then writeLine becomes:
sub writeLine {
my ( $self, $msg ) = #_;
( $self->{_outStreamPtr} || *::STDOUT{IO} )->say( $msg );
}
The method is a little more robust in cases where somebody just blessed themselves into your class.
my $q_and_d = bless {}, 'output';
If you don't want to allow "quick & dirty" instances, and want more precise messages from possible failures, you could do this:
Carp::croak( 'No outstream!' )
unless my $h = Params::Util::_HANDLE( $self->{_outStreamPtr} )
;