I would like to watch a set of files for changes, and do so without a large CPU and battery penalty. Ideally, my perl code would run on both macos and linux, but the former is more important. I tried
I tried Mac::FSEvents, which works on macos and seems to do nicely for directories, but not for files as far as I can tell.
my $fs = Mac::FSEvents->new('try.txt');
my $fh= $fs->watch;
my $sel = IO::Select->new($fh);
while ( $sel->can_read ) {
my #events = $fs->read_events;
for my $event ( #events ) {
printf "File %s changed\n", $event->path;
}
}
which simply does not respond; and the promisingly more OS agnostic
use File::Monitor;
my $monitor = File::Monitor->new();
my #files= qw(try.txt);
foreach (#files) { $monitor->watch($_); }
which consumes 100% CPU. the $monitor-watch() alone does not block. I also tried
use File::Monitor;
my $monitor = File::Monitor->new();
$monitor->watch('try.txt', sub {
my ($name, $event, $change) = #_;
print "file has changed\n";
});
but this immediately returns.
I found another,
use File::ChangeNotify;
my $watcher =
File::ChangeNotify->instantiate_watcher
( directories => [ './' ],
filter => qr/try\.txt/,
);
# blocking
while ( my #events = $watcher->wait_for_events() ) {
print "file has changed\n";
}
but the CPU utilization is again high (70%).
Maybe these are all the wrong cpan modules, too. could someone please give me advice on how I should do this?
regards,
/iaw
Partial (macos-specific) example:
use IO::Select;
use Mac::FSEvents;
my $fs = Mac::FSEvents->new(
path => ['./try.txt', './try2.txt'],
file_events => 1,
);
my $fh= $fs->watch;
my $sel = IO::Select->new($fh);
while ( $sel->can_read ) {
my #events = $fs->read_events;
for my $event ( #events ) {
printf "File %s changed\n", $event->path;
}
}
(i.e., it needed the file_events flag.)
Related
I have to write a script that get some URLs in parallel and do some work. In the past I have always used Parallel::ForkManager for such things, but now I wanted to learn something new and try asynchronous programming with AnyEvent (and AnyEvent::HTTP or AnyEvent::Curl::Multi) ... but I'm having problem understanding AnyEvent and writing a script that should:
open a file (every line is a seperate URL)
(from now in parallel, but with a limit for f.e. 10 concurrent requests)
read file line after line (I dont want to load whole file to memory - it might be big)
make a HTTP request for that URL
read response
updates MySQL record accordingly
(next file line)
I have read many manuals, tutorials, but its still hard for me to understand differences between blocking and non-blocking code. I have found similar script at http://perlmaven.com/fetching-several-web-pages-in-parallel-using-anyevent, where Mr. Szabo explains the basics, but I still cant understand how to implement something like:
...
open my $fh, "<", $file;
while ( my $line = <$fh> )
{
# http request, read response, update MySQL
}
close $fh
...
... and add a concurrency limit in this case.
I would be very grateful for help ;)
UPDATE
Following Ikegami's advice I gave Net::Curl::Multi a try. I'm very pleased with results. After years of using Parallel::ForkManager just for concurrent grabbing thousands of URLs, Net::Curl::Multi seems to be awesome.
Here is my code with while loop on filehandle. It seems to work as it should, but considering it's my first time writing something like this I would like to ask more experienced Perl users to take a look and tell me if there are some potential bugs, something I missed, etc.
Also, if I may ask: as I don't fully understand how Net::Curl::Multi's concurrency works, please tell me whether I should expect any problems with putting MySQL UPDATE command (via DBI) inside RESPONSE loop (besides higher server load obviously - I expect final script to run with about 50 concurrent N::C::M workers, maybe more).
#!/usr/bin/perl
use Net::Curl::Easy qw( :constants );
use Net::Curl::Multi qw( );
sub make_request {
my ( $url ) = #_;
my $easy = Net::Curl::Easy->new();
$easy->{url} = $url;
$easy->setopt( CURLOPT_URL, $url );
$easy->setopt( CURLOPT_HEADERDATA, \$easy->{head} );
$easy->setopt( CURLOPT_FILE, \$easy->{body} );
return $easy;
}
my $maxWorkers = 10;
my $multi = Net::Curl::Multi->new();
my $workers = 0;
my $i = 1;
open my $fh, "<", "urls.txt";
LINE: while ( my $url = <$fh> )
{
chomp( $url );
$url .= "?$i";
print "($i) $url\n";
my $easy = make_request( $url );
$multi->add_handle( $easy );
$workers++;
my $running = 0;
do {
my ($r, $w, $e) = $multi->fdset();
my $timeout = $multi->timeout();
select $r, $w, $e, $timeout / 1000
if $timeout > 0;
$running = $multi->perform();
RESPONSE: while ( my ( $msg, $easy, $result ) = $multi->info_read() ) {
$multi->remove_handle( $easy );
$workers--;
printf( "%s getting %s\n", $easy->getinfo( CURLINFO_RESPONSE_CODE ), $easy->{url} );
}
# dont max CPU while waiting
select( undef, undef, undef, 0.01 );
} while ( $workers == $maxWorkers || ( eof && $running ) );
$i++;
}
close $fh;
Net::Curl is a rather good library that's extremely fast. Furthermore, it can handle parallel requests too! I'd recommend using this instead of AnyEvent.
use Net::Curl::Easy qw( :constants );
use Net::Curl::Multi qw( );
sub make_request {
my ( $url ) = #_;
my $easy = Net::Curl::Easy->new();
$easy->{url} = $url;
$easy->setopt( CURLOPT_URL, $url );
$easy->setopt( CURLOPT_HEADERDATA, \$easy->{head} );
$easy->setopt( CURLOPT_FILE, \$easy->{body} );
return $easy;
}
my $max_running = 10;
my #urls = ( 'http://www.google.com/' );
my $multi = Net::Curl::Multi->new();
my $running = 0;
while (1) {
while ( #urls && $running < $max_running ) {
my $easy = make_request( shift( #urls ) );
$multi->add_handle( $easy );
++$running;
}
last if !$running;
my ( $r, $w, $e ) = $multi->fdset();
my $timeout = $multi->timeout();
select( $r, $w, $e, $timeout / 1000 )
if $timeout > 0;
$running = $multi->perform();
while ( my ( $msg, $easy, $result ) = $multi->info_read() ) {
$multi->remove_handle( $easy );
printf( "%s getting %s\n", $easy->getinfo( CURLINFO_RESPONSE_CODE ), $easy->{url} );
}
}
This does exactly what you want, in an asynchronous fashion, and it does that by wrapping Net::Curl in a safe fashion:
#!/usr/bin/env perl
package MyDownloader;
use strict;
use warnings qw(all);
use Moo;
extends 'YADA::Worker';
has '+use_stats'=> (default => sub { 1 });
has '+retry' => (default => sub { 10 });
after init => sub {
my ($self) = #_;
$self->setopt(
encoding => '',
verbose => 1,
);
};
after finish => sub {
my ($self, $result) = #_;
if ($self->has_error) {
print "ERROR: $result\n";
} else {
# do the interesting stuff here
printf "Finished downloading %s: %d bytes\n", $self->final_url, length ${$self->data};
}
};
around has_error => sub {
my $orig = shift;
my $self = shift;
return 1 if $self->$orig(#_);
return 1 if $self->getinfo('response_code') =~ m{^5[0-9]{2}$}x;
};
1;
package main;
use strict;
use warnings qw(all);
use Carp;
use YADA;
my $q = YADA->new(
max => 8,
timeout => 30,
);
open(my $fh, '<', 'file_with_urls_per_line.txt')
or croak "can't open queue: $!";
while (my $url = <$fh>) {
chomp $url;
$q->append(sub {
MyDownloader->new($url)
});
}
close $fh;
$q->wait;
I need to query different LDAP servers in perl via Net::LDAP. I have something that works well. However, in an attempt to speed up things, I tried to query the different servers in parallel, using Parallel::Forkmanager - and things do not work when I do that.
I get the following types of errors:
decode error 02<=>30 0 8 at /Users/myname/perl5/perlbrew/perls/perl-5.18.2/lib/site_perl/5.18.2/Convert/ASN1/_decode.pm line 113, <> line 18.
decode error 43<=>30 0 8 at /Users/myname/perl5/perlbrew/perls/perl-5.18.2/lib/site_perl/5.18.2/Convert/ASN1/_decode.pm line 113, <> line 25.
at the line that gets the search response:
$mesg = $ldap->search( base => $dn, filter => '(CN=*)');
I am puzzled.
Telling it in other other words, why does this fail:
use Net::LDAP;
use Parallel::Forkmanager;
...; # bind LDAP servers
while (<>) {
chop;
my $dn = $_;
foreach my $ldap (#servers) {
my $pid;
$pid = $pm->start and next; # do the fork
print $dn, $pid;
my $mesg;
try {
$mesg = $ldap->search( base => $dn, filter => '(CN=*)');
} catch {
...;
}
$pm->finish;
}
}
while this:
use Net::LDAP;
...; # bind LDAP servers
while (<>) {
chop;
my $dn = $_;
foreach my $ldap (#servers) {
print $dn;
my $mesg;
try {
$mesg = $ldap->search( base => $dn, filter => '(CN=*)');
} catch {
...;
}
}
}
works perfectly?
Whilst forking doesn't have quite the same thread safety problems of threading - there are still a few places you have gotchas. I think this is what's biting you - your Net::LDAP objects are created in the parent thread, but then (effectively) cloned to each when you fork.
Which means in your code - there's a very real possibility that if you've got the list of names coming in fast enough, that a new fork will try to reuse an existing Net::LDAP connection before a previous one is finished with it.
The easy way of preventing this is call wait_all_children to ensure all your parallel LDAP queries are finished before the next one starts.
If you put your LDAP bind within the ForkManager loop, do you still have the same problem? I appreciate that's a potential overhead as you'll be binding each iteration, but if that addresses it, I'd suggest that it's because Net::LDAP is sharing the same file descriptors between forks.
The next best solution there would be to adopt a 'worker' model, where you've got a bunch of 'workers' each with their on LDAP connections to do the querying. That's easier with threading, than forking - goes a bit like this:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
sub server_worker {
my ( $hostname, $q_ref ) = #_;
## connect LDAP to $hostname;
while ( my $dn = $$q_ref->dequeue ) {
#query $dn
}
}
my #queues;
foreach my $server (#list_of_servers) {
my $server_q = Threads::Queue->new();
push( #queues, $server_q );
threads->create( \&server_worker, $hostname, \$server_q );
}
while ( my $dn = <STDIN> ) {
chomp($dn);
foreach my $q (#queues) {
$q->enqueue($dn);
}
}
foreach my $q ( #queues ) {
$q -> end;
}
foreach my $thr ( threads->list ) {
$thr->join();
}
Doing something similar with forking should work:
#!/usr/bin/perl
use strict;
use warnings;
use IO::Pipe;
use Parallel::ForkManager;
use Net::LDAP;
my #list_of_servers = qw ( servername servenama anotherserver );
my $pm = Parallel::ForkManager -> new ( scalar #list_of_servers );
my %pipe_for;
foreach my $server ( #list_of_servers ) {
my $pipe = IO::Pipe -> new();
my $pid = pm -> start;
if ( $pid ) {
print "$$: parent\n";
$pipe -> writer -> autoflush;
$pipe_for{$server} = $pipe;
}
else {
print "$$ child connecting to $server\n";
$pipe -> reader -> autoflush;
close ( STDIN ); #because this is a child.
#Net::LDAP setup
while ( my $item = <$pipe> ) {
chomp ( $item );
#ldap_search $item;
}
}
$pm -> finish;
}
And then send stuff:
for my $number ( 1..10 ) {
foreach my $pipe ( values %pipe_for ) {
print {$pipe} "test$number\n";
}
}
$pm -> wait_all_children();
Edit: Note - autoflush is important, otherwise the IO buffers and doesn't look like it's working. I'm pretty sure closing STDIN is probably a good idea in the child, but perhaps not vitally necessary if they don't use it.
I am trying to build a program wich creates some forks and writes the results of the forks back to the main program. Thereby I try to use IO::Pipe
sub ForkRequests {
my $pipe = IO::Pipe->new();
my $pid;
foreach my $feature ( #features ) {
if ( $pid = fork() ) {
$pipe->reader();
while ( <$pipe> ) {
print $_. "\n";
}
}
elsif ( defined $pid ) {
#child
$pipe->writer();
#somecalculations [...]
print $pipe $calcresults;
}
}
}
I got my code for doing a pipe from the module's documentation.
If i now try to execute, I get an error message
Can't locate object method "reader" via package "IO::Pipe::End" at lmtest3.pl line 56.
Can't locate object method "writer" via package "IO::Pipe::End" at lmtest3.pl line 63.
Can't locate object method "reader" via package "IO::Pipe::End" at lmtest3.pl line 56, <GEN0> line 1.
Can't locate object method "writer" via package "IO::Pipe::End" at lmtest3.pl line 63, <GEN0> line 1.
So, my code does not seem to initiate a pipe object, but an IO::Pipe::End.
So my question is, can anybody see the mistake in there? Why does it return the wrong object, and how would this be done correctly?
EDIT
I have some requests to some servers (most of the time 1 request to 7 ervers).
Those request names are saved in #features, and will be executed at the point of #somecalculations.
Because the server response is pretty slow, I want those requests to start in parallel. They all have to get back to the main program and print the reply to the console.
I tried this code
sub ForkRequests {
my $i = 0;
my #pipes;
my $pid;
foreach my $feature ( #features ) {
#pipes[$i] = IO::Pipe->new();
if ( $pid = fork() ) {
#pipes[$i]->reader();
}
elsif ( defined $pid ) {
#child
#pipes[$i]->writer();
# calculations
my $w = #pipes[$i];
print $w $calc;
print $w "end\n";
}
$i++;
}
}
if ( $pid == 1 ) {
while ( 1 ) {
foreach my $pipe ( #pipes ) {
while ( <$pipe> ) {
unless ( $_ == "end" ) {
print $_. "\n";
}
else { last; }
}
}
}
}
else {
exit;
}
}
as said, to save those pipes, but I still got a problem in reading them, as the program exits before it gets answers.
The problem is that you are forking multiple child processes but trying to use the same pipe for all of them.
The reader method converts $pipe into an IO::Pipe::End object that you can read data from, so the first child is connected correctly. But you then call reader again on the same $pipe, and the error is thrown because it is no longer an object of the right class.
You simply need to create a new pipe for each child process:
sub fork_requests {
for my $feature ( #features ) {
my $pipe = IO::Pipe->new;
my $pid;
if ( $pid = fork ) {
$pipe->reader;
print while <$pipe>;
}
elsif ( defined $pid ) {
$pipe->writer;
# some calculations . . .
print $pipe $calcresults;
exit;
}
}
}
Update
Okay I think I understand what it is you need. This complete program should show you.
I have written fork_requests so that it expects a list of features as parameters, and I have written the child code so that it sleeps for two seconds to emulate the processing time and then simply prints the name of the feature.
The parent code stores all the pipes in an array, as I suggested, and prints the output from each of them in the order they were queued. All five child processes complete after two seconds, so the parent is suspended for that time and then prints the features originally passed in.
use strict;
use warnings;
use IO::Pipe;
STDOUT->autoflush;
fork_requests('A' .. 'E');
sub fork_requests {
my #pipes;
for my $feature ( #_ ) {
my $pipe = IO::Pipe->new;
my $pid;
if ( $pid = fork ) {
$pipe->reader;
push #pipes, $pipe;
}
elsif ( defined $pid ) {
$pipe->writer;
select $pipe;
# some calculations . . .
sleep 2;
my $calcresults = $feature;
print $calcresults, "\n";
exit;
}
}
for my $pipe ( #pipes ) {
print while <$pipe>;
}
}
output
A
B
C
D
E
I installed File::ChangeNotify on Windows System and try to run the following code :
my $watcher =
File::ChangeNotify->instantiate_watcher
( directories => [ 'C:\files' ],
filter => qr/\.txt$/
);
# # blocking
while ( my #events = $watcher->wait_for_events() ) { print "new event"}
When I ran the script and try to create a new .txt file or modify a .txt file under c:\files the script didn't print anything.
It works for me (on linux) if I add this line:
$| = 1;
Then I see new event.
Refer to perldoc perlvar: $| or $OUTPUT_AUTOFLUSH
Here is the complete code:
use warnings;
use strict;
use File::ChangeNotify;
$| = 1;
my $watcher =
File::ChangeNotify->instantiate_watcher
( directories => [ 'C:\files' ],
filter => qr/\.txt$/
);
# # blocking
while ( my #events = $watcher->wait_for_events() ) { print "new event"}
UPDATE: As cjm astutely points out, adding a newline works as an alternative to $|:
while ( my #events = $watcher->wait_for_events() ) { print "new event\n"}
Would it be OK to keep the END block in this example, because nobody wants a broken terminal or shouldn't I put an END block in a module?
package My_Package;
use warnings;
use strict;
use Term::ReadKey;
sub _init_scr {
my ( $arg ) = #_;
$arg->{backup_flush} = $|;
$| = 1;
Term::ReadKey::ReadMode 'ultra-raw';
}
sub _end_win {
my ( $arg ) = #_;
print "\n\r";
Term::ReadKey::ReadMode 'restore';
$| = $arg->{backup_flush};
}
END {
Term::ReadKey::ReadMode 'restore';
}
sub my_function {
my $arg = {};
_init_scr( $arg );
while ( 1 ) {
my $c = ReadKey 0;
if ( ! defined $c ) {
_end_win( $arg );
warn "EOT";
return;
}
next if $c eq "\e";
given ( $c ) {
when ( $c ge 'a' && $c le 'z' ) {
print $c;
$arg->{string} .= $c;
}
when ( $c eq "\cC" ) {
_end_win( $arg );
print STDERR "^C";
kill( 'INT', $$ );
return;
}
when ( $c eq "\r" ) {
_end_win( $arg );
return $arg->{string};
}
}
}
}
If your module changes the terminal mode, then I would think the most polite thing to do would be for it to also install an END block to restore the terminal mode before the program exits.
No, it's polite and expected that you put things back as you found them.
However, it's unwelcome to tidy up someone else's workspace unless you've been asked to do so.
That is, your END routine shouldn't run unless it has reason to do so, and your module probably ought to allow a developer to disable the automatic cleanup. (E.g., use My_Package qw(:no_auto_restore).)
Failing that, the POD ought to explicitly document that the module fiddles with a system resource upon exit.