Perl script: how to realize if a process has finished - perl

This is what I have created so far regarding your advice:
#!/usr/bin/perl -w
use strict;
use CGI qw(:standard);
#some variables
my $message = "please wait, loading data...\n";
#First build the web page
print header;
print start_html('Hello World');
print "<H1>we need love, peace and harmony</H1>\n";
print "<p>$message</p>\n";
#Establish a pipeline between the bash and my script.
my $bash_command = '/love/peace/harmony/./lovepeace.bash';
open(my $pipe, '-|', $bash_command) or die $!;
while (my $line = <$pipe>){
# Do something with each line.
print "<p>$line</p>\n";
}
#when is the job done...?
print end_html;
When I call that .pl script in my browser, everything works nice :-) But a few questions are still on my mind:
When I call this website, it is busy loading some values from the pipe. Since there are about 10 Values its rather quick (2-4 seconds) But if I would have 100+ Values, the user has to wait a while. Since I cannot have a progress bar, I should provide an information to the user.
Like:"Loading data, please wait..."
And when the job is done, this message should say: "Job done" or something similar.
• How do I realize if the process is finnished?
• Can I reload the page if the job is done ?
• Is there any chance of using my own stylesheet wihtin this perl-CGI
Regards,
JJ

Randal Schwartz's Watching long processes through CGI might be helpful here.
As for using your own stylesheet, you can just specify that in the <head>...</head> section you are emitting.

It might make more sense to implement this as ajax. Allow the page to full render and then have an ajax request to watch the progress. Your page may not display nicely if it doesn't see the until the process is finished.
EDIT: Sorry, incomplete thought. A more thorough approach to see the line-by-line output would be to kick off that script as a background process writing output to a log, then having the ajax server code (a separate CGI) return the lines processed so far and a flag if the process has exited.

When your external process is complete, <$pipe> will return undef and your while loop
while (my $line = <$pipe>) { ... }
will finish (Until then, <$pipe> will block while waiting for input to be available). Therefore you can tell that the job is done when the while loop is done. If you can, you'll want to configure your external command (/love/peace/harmony/./lovepeace.bash) to not buffer its output.

Related

Perl crashing with Parallel::ForkManager and WWW::Mechanize

I have written a Perl Script using WWW::Mechanize which reads URLs from a text file and connects to them one by one. In each operation, it parses the content of the webpage looking for some specific keywords and if found, it will be written to the output file.
To speed up the process, I used Parallel::ForkManager with MAX_CHILDREN set to 3. Though I have observed an increase in the speed, the problem is that, after a while the script crashes. Perl.exe process gets killed and it does not display any specific error message.
I have run the script multiple times to see if it always fails at the same point, however the point of failure seems to be intermittent.
Please note that I have already taken care of any memory leaks in WWW::Mechanize and HTML::TreeBuilder::XPath as follows:
For WWW::Mechanize, I set stack_depth(0) so that it does not cache the history of visited pages.
HTML::TreeBuilder::XPath, I delete the root node once I am done with it. This approach helped me in resolving a memory leak issue in another similar script which does not use fork.
Here is the structure of the script, I have mentioned only the relevant parts here, please let me know if more details are required to troubleshoot:
#! /usr/bin/perl
use HTML::TreeBuilder::XPath;
use WWW::Mechanize;
use warnings;
use diagnostics;
use constant MAX_CHILDREN => 3;
open(INPUT,"<",$input) || die("Couldn't read from the file, $input with error: $!\n");
open(OUTPUT, ">>", $output) || die("Couldn't open the file, $output with error: $!\n");
$pm = Parallel::ForkManager->new(MAX_CHILDREN);
$mech=WWW::Mechanize->new();
$mech->stack_depth(0);
while(<INPUT>)
{
chomp $_;
$url=$_;
$pm->start() and next;
$mech->get($url);
if($mech->success)
{
$tree=HTML::TreeBuilder::XPath->new();
$tree->parse($mech->content);
# do some processing here on the content and print the results to OUTPUT file
# once done then delete the root node
$tree->delete();
}
$pm->finish();
print "Child Processing finished\n"; # it never reaches this point!
}
$pm->wait_all_children;
I would like to know, why does this Perl script keep failing after a while?
For understanding purpose, I added a print statement right after the finish method of fork manager, however it does not print that.
I have also used, wait_all_children method, since as per the document of the module on CPAN, it will wait for the processing to get over for all the children of the parent process.
I have not understood why, wait_all_children method is place outside the while or the for loop though (as observed in the documentation as well), since all the processing is taking place inside the loop.
Thanks.
As for why this code is written with a main job loop with the start and finish calls and then followed by a wait_all_children outside the loop. It works like this:
The parent process gets the next job from <INPUT> at the start of each loop.
The parent runs start, which causes the child process to fork. At this point, you have 2 processes, each of which is running the exact same code at the exact same point.
3a. The parent process hits that or next and jumps back to the top to read the next <INPUT> and start the process over.
3b. The child process does not hit the or next and continues running the code you give until it hits finish, where the child exits.
Meanwhile the parent process is busy going through the loop and creating a child each time through. After forking 3 children (or whatever you set the limit to) it blocks until one of the children exit. At which point, it immediately spawns a new child (resulting in step 3b for each child each time).
When the parent runs out of jobs, it jumps out the while loop (never having run anything in it itself) and then waits for all the remaining children to exit.
As you can see, any code in the loop after finish is called is never going to run in either the parent (because it doesn't do anything after or next within the loop) or the children (because they exit at finish).
I've never used Parallel::ForkManager, but it looks like you can put a run_on_finished hook to run some code at the finish if you want to put a print statement there at the end.
To find the problem, though, I'd suggest wrapping all the code between start and finish in an eval or use Try::Tiny and warn out the error to see if there's an exception happening in there that's breaking it. I'd expect such things to show up in STDERR when the child dies, though, so I'm not really sure that will help.
However, it's worth shot. Here's my suggestion in code, just showing the part I'd catch exceptions from:
# At the top add
use Try::Tiny;
# Later in your main loop
$pm->start() and next;
try {
$mech->get($url);
if($mech->success)
{
$tree=HTML::TreeBuilder::XPath->new();
$tree->parse($mech->content);
# do some processing here on the content and print the results to OUTPUT file
# once done then delete the root node
$tree->delete();
}
}
catch {
warn "Bad Stuff: ", $_;
};
$pm->finish();
That might help show you what's gone wrong.
If it does not help, you might try moving the try block to include more of the program (like nearly all of it after the use Try::Tiny line) and see if that elucidates anything.
The $pm->wait_all_children; function call waits for "ALL" the child processes to end and places a Blocking lock. I am not sure what kind of error handling you have done for $mech inside the if() statement, but you may want to re-visit that.

Perl - output from external process directly to stdout (avoid buffering)

I have a Perl script that has to wrap a PHP script that produces a lot of output, and takes about half an hour to run.
At moment I'm shelling out with:
print `$command`;
This works in the sense that the PHP script is called, and it does it's job, but, there is no output rendered by Perl until the PHP script finishes half an hour later.
Is there a way I could shell out so that the output from PHP is printed by perl as soon as it receives it?
The problem is that Perl's not going to finish reading until the PHP script terminates, and only when it finishes reading will it write. The backticks operator blocks until the child process exits, and there's no magic to make a read/write loop implicitly.
So you need to write one. Try a piped open:
open my $fh, '-|', $command or die 'Unable to open';
while (<$fh>) {
print;
}
close $fh;
This should then read each line as the PHP script writes it, and immediately output it. If the PHP script doesn't output in convenient lines and you want to do it with individual characters, you'll need to look into using read to get data from the file handle, and disable output buffering ($| = 1) on stdout for writing it.
See also http://perldoc.perl.org/perlipc.html#Using-open()-for-IPC
Are you really doing print `$command`?
If you are only running a command and not capturing any of its output, simply use system $command. It will write to stdout directly without passing through Perl.
You might want to investigate Capture::Tiny. IIRC something like this should work:
use strict;
use warnings;
use Capture::Tiny qw/tee/;
my ($stdout, $stderr, #result) = tee { system $command };
Actually, just using system might be good enough, YMMV.

CGI script's message not rendring in browser?

I'm trying to copy some files from one network share to another using File::Copy.
This is my code:
#!C:/strawberry/perl/bin/perl.exe
use File::Copy;
print "Content-type: text/html\n\n";
print "<H1>Hello World</H1>\n";
copy("s:\\nl\\cover\\config.jsp", "s:\\temp\\config.jsp")
or die "File cannot be copied.";
print "this is not displayed";
Why is the 'die' message not rendering?
If you are running this under a web server (I cannot imagine why, you are sending a "Content-Type" header), any error messages you emit using die and warn will go to the server's error log.
Further, if you are invoking this as CGI, note that you are lying to the browser by claiming you are sending HTML and not sending HTML.
Especially if you are just learning Perl, you should make an effort to dot all your is and cross all your ts:
#!C:/strawberry/perl/bin/perl.exe
use strict; # every time
use warnings; # every time
use CGI qw(:cgi);
use CGI::Carp qw(fatalsToBrowser); # only during debugging
use File::Copy;
use File::Spec::Functions qw(catfile);
$| = 1;
# prefer portable ways of dealing with filenames
# see http://search.cpan.org/perldoc/File::Spec
my $source = catfile(qw(S: n1 cover config.jsp));
my $target = catfile(qw(S: temp config.jsp));
print header('text/plain');
if ( copy $source => $target ) {
print "'$source' was copied to '$target'\n";
}
else {
print "'$source' was not copied to '$target'\n";
# you can use die if you want the error message to
# go to the error log and an "Internal Server Error"
# to be shown to the web site visitor.
# die "'$source' was not copied to '$target'\n";
}
See CGI for the function oriented interface import lists.
Are you sending your stderr to the stdout stream as well? All your prints will got to stdout which is presumably connected to a browser, given your HTML output.
However, die writes to the stderr stream. This is likely to go, not to the browser window, but to an error log of some sort. As to where it's going, it depends on what Perl is running within.
One way to check is to print something instead of dieing in the or clause.
So, some questions:
How are you running it?
If on the command line, show us the exact command.
If in a web server of some sort, tell us which one so we can find the logs for you.
die sends messages to STDERR, which will wind up in the web server's error logs, not on the screen. There are some CGI modules that offer you greater control over error-handling, or you could install a $SIG{__DIE__} handler (if you don't know what that is, then don't worry -- you don't need to), but when I want a quick-and-dirty way to debug my CGI scripts, I put this at the top of the script:
#! /usr/bin/perl
$src = join'',<DATA>;
eval $src;
print "Content-type: text/plain\n\n$#\n" if $#;
__END__
... my cgi script starts here ...
This loads the script into a variable, uses eval to run the Perl interpreter on that variable's contents, and prints any errors to standard output (the browser window) with a valid header.
copy("s:\\nl\\cover\\config.jsp", "s:\\temp\\config.jsp")
or die "File cannot be copied.";
print "this is not displayed";
Only one of these messages should ever be displayed and it's unclear which you're asking about.
The question says you're wondering why the die message isn't being displayed; to me, that implies that you're not seeing the message "File cannot be copied." and the most obvious reason for this is that the copy operation is succeeding, but see also the previous responses about looking in the error log if you're running this under CGI.
The text of the messages, though, suggests that you actually mean you're not seeing the message "this is not displayed". (Why else would you mention that it isn't displayed?) In that case, the reason you're not seeing it is because die causes the program to exit. After the copy fails and the die executes, your program is dead. Terminated. It has shuffled off this mortal CPU and joined the stack eternal. It wouldn't print "this is not displayed" if you put four million volts through it. It is an ex-process.
After editing your code, it's apparent that your die is seen as a command and probably needs to be escaped. Note how it is rendered on Stack Overflow in blue (indicating that it is a keyword). Try switching to a synonym like "shutdown" instead.

How can I quickly find the user's terminal PID in Perl?

The following snippet of code is used to find the PID of a user's terminal, by using ptree and grabbing the third PID from the results it returns. All terminal PID's are stored in a hash with the user's login as the key.
## If process is a TEMINAL.
## The command ptree is used to get the terminal's process ID.
## The user can then use this ID to peek the user's terminal.
if ($PID =~ /(\w+)\s+(\d+) .+basic/) {
$user = $1;
if (open(PTREE, "ptree $2 |")) {
while ($PTREE = <PTREE>) {
if ($PTREE =~ /(\d+)\s+-pksh-ksh/) {
$terminals{$user} = $terminals{$user} . " $1";
last;
}
next;
}
close(PTREE);
}
next;
}
Below is a sample ptree execution:
ares./home_atenas/lmcgra> ptree 29064
485 /usr/lib/inet/inetd start
23054 /usr/sbin/in.telnetd
23131 -pksh-ksh
26107 -ksh
29058 -ksh
29064 /usr/ob/bin/basic s=61440 pgm=/usr/local/etc/logon -q -nr trans
412 sybsrvr
I'd like to know if there is a better way to code this. This is the part of the script that takes longest to run.
Note: this code, along with other snippets, are inside a loop and are executed a couple of times.
I think the main problem is that this code is in a loop. You don't need to run ptree and parse the results more than once! You need to figure out a way to run ptree once and put it into a data structure that you can use later. Probably be some kind of simple hash will suffice. You may even be able to just keep around your %terminals hash and keep reusing it.
Some nitpicks...
Both of your "next" statements seem
unnecessary to me... you should be
able to just remove them.
Replace
$terminals{$user} = $terminals{$user} . " $1";
with:
$terminals{$user} .= " $1";
Replace the bareword PTREE which you
are using as a filehandle with
$ptreeF or some such... using
barewords became unnecessary for
filehandles about 10 years ago :)
I don't know why your $PID variable
is all caps... it could be confusing
to readers of your code because it
looks like there is something
special about that variable, and
there isn't.
I think you'll get the best performance improvement by avoiding the overhead of repeatedly executing an external command (ptree, in this case). I'd look for a CPAN module that provides a direct interface to the data structures that ptree is reading. Check the Linux:: namespace, maybe? (I'm not sure if ptree is setuid; that may complicate things.)
The above advice aside, some additional style and robustness notes based on the posted snippet only (forgive me if the larger code invalidates them):
I'd start by using strict, at the very least. Lexical filehandles would also be a good idea.
You appear to be silently ignoring the case when you cannot open() the ptree command. That could happen for many reasons, some of which I can't imagine you wanting to ignore, such as…
You're not using the full path to the ptree command, but rather assuming it's in your path—and that the one in your path is the right one.
How many users are on the system? Can you invert this? List all -pksh-ksh processes in the system along with their EUIDs, and build the map from that - that might be only one execution of ps/ptree.
I was thinking of using ps to get the parents pid, but I would need to loop this to get the great-grandparent's pid. That's the one I need. Thanks. – lamcro
Sorry, there are many users and each can have up to three terminals open. The whole script is used to find those terminals that are using a file. I use fuser to find the processes that use a file. Then use ptree to find the terminal's pid. – lamcro
If you have (or can get) a list of PIDs using a file, and just need all of the grand-parents of that PID, there's an easier way, for sure.
#!perl
use warnings;
use strict;
#***** these PIDs are gotten with fuser or some other method *****
my($fpids) = [27538, 31812, 27541];
#***** get all processes, assuming linux PS *****
my($cmd) = "ps -ef";
open(PS, "$cmd |") || die qq([ERROR] Cannot open pipe from "$cmd" - $!\n);
my($processlist) = {};
while (<PS>) {
chomp;
my($user, $pid, $ppid, $rest) = split(/ +/, $_, 4);
$processlist->{$pid} = $ppid;
}
close PS;
#***** lookup grandparent *****
foreach my $fpid (#$fpids) {
my($parent) = $processlist->{$fpid} || 0;
my($grandparent) = $processlist->{$parent} || 0;
if ($grandparent) {
#----- do something here with grandparent's pid -----
print "PID:GRANDPID - $fpid:$grandparent\n";
}
else {
#----- some error condition -----
print "ERROR - Cannot determine GrandPID: $fpid ($parent)\n";
}
}
Which for me produces:
ERROR - Cannot determine GrandPID: 27538 (1)
PID:GRANDPID - 31812:2804
PID:GRANDPID - 27541:27538
Have you considered using 'who -u' to tell you which process is the login shell for a given tty instead of using ptree? This would simplify your search - irrespective of the other changes you should also make.
I just did some trivial timings here based on your script (calling "cat ptree.txt" instead of ptree itself) and confirmed my thoughts that all of your time is spent creating new sub-processes and running ptree itself. Unless you can factor away the need to call ptree (maybe there's a way to open up the connection once and reuse it, like with nslookup), you won't see any real gains.

How do I run a Perl script from within a Perl script?

I've got a Perl script that needs to execute another Perl script. This second script can be executed directly on the command line, but I need to execute it from within my first program. I'll need to pass it a few parameters that would normally be passed in when it's run standalone (the first script runs periodically, and executes the second script under a certain set of system conditions).
Preliminary Google searches suggest using backticks or a system() call. Are there any other ways to run it? (I'm guessing yes, since it's Perl we're talking about :P ) Which method is preferred if I need to capture output from the invoked program (and, if possible, pipe that output as it executes to stdout as though the second program were invoked directly)?
(Edit: oh, now SO suggests some related questions. This one is close, but not exactly the same as what I'm asking. The second program will likely take an hour or more to run (lots of I/O), so I'm not sure a one-off invocation is the right fit for this.)
You can just do it.
{
local #ARGV = qw<param1 param2 param3>;
do '/home/buddy/myscript.pl';
}
Prevents the overhead of loading in another copy of perl.
The location of your current perl interpreter can be found in the special variable $^X. This is important if perl is not in your path, or if you have multiple perl versions available but which to make sure you're using the same one across the board.
When executing external commands, including other Perl programs, determining if they actually ran can be quite difficult. Inspecting $? can leave lasting mental scars, so I prefer to use IPC::System::Simple (available from the CPAN):
use strict;
use warnings;
use IPC::System::Simple qw(system capture);
# Run a command, wait until it finishes, and make sure it works.
# Output from this program goes directly to STDOUT, and it can take input
# from your STDIN if required.
system($^X, "yourscript.pl", #ARGS);
# Run a command, wait until it finishes, and make sure it works.
# The output of this command is captured into $results.
my $results = capture($^X, "yourscript.pl", #ARGS);
In both of the above examples any arguments you wish to pass to your external program go into #ARGS. The shell is also avoided in both of the above examples, which gives you a small speed advantage, and avoids any unwanted interactions involving shell meta-characters. The above code also expects your second program to return a zero exit value to indicate success; if that's not the case, you can specify an additional first argument of allowable exit values:
# Both of these commands allow an exit value of 0, 1 or 2 to be considered
# a successful execution of the command.
system( [0,1,2], $^X, "yourscript.pl", #ARGS );
# OR
capture( [0,1,2, $^X, "yourscript.pl", #ARGS );
If you have a long-running process and you want to process its data while it's being generated, then you're probably going to need a piped open, or one of the more heavyweight IPC modules from the CPAN.
Having said all that, any time you need to be calling another Perl program from Perl, you may wish to consider if using a module would be a better choice. Starting another program carries quite a few overheads, both in terms of start-up costs, and I/O costs for moving data between processes. It also significantly increases the difficulty of error handling. If you can turn your external program into a module, you may find it simplifies your overall design.
All the best,
Paul
I can think of a few ways to do this. You already mentioned the first two, so I won't go into detail on them.
backticks: $retVal = `perl somePerlScript.pl`;
system() call
eval
The eval can be accomplished by slurping the other file into a string (or a list of strings), then 'eval'ing the strings. Heres a sample:
#!/usr/bin/perl
open PERLFILE, "<somePerlScript.pl";
undef $/; # this allows me to slurp the file, ignoring newlines
my $program = <PERLFILE>;
eval $program;
4 . do: do 'somePerlScript.pl'
You already got good answers to your question, but there's always the posibility to take a different point of view: maybe you should consider refactoring the script that you want to run from the first script. Turn the functionality into a module. Use the module from the first and from the second script.
If you need to asynchronously call your external script -you just want to launch it and not wait for it to finish-, then :
# On Unix systems, either of these will execute and just carry-on
# You can't collect output that way
`myscript.pl &`;
system ('myscript.pl &');
# On Windows systems the equivalent would be
`start myscript.pl`;
system ('start myscript.pl');
# If you just want to execute another script and terminate the current one
exec ('myscript.pl');
Use backticks if you need to capture the output of the command.
Use system if you do not need to capture the output of the command.
TMTOWTDI: so there are other ways too, but those are the two easiest and most likely.
See the perlipc documentation for several options for interprocess communication.
If your first script merely sets up the environment for the second script, you may be looking for exec.
#!/usr/bin/perl
use strict;
open(OUTPUT, "date|") or die "Failed to create process: $!\n";
while (<OUTPUT>)
{
print;
}
close(OUTPUT);
print "Process exited with value " . ($? >> 8) . "\n";
This will start the process date and pipe the output of the command to the OUTPUT filehandle which you can process a line at a time. When the command is finished you can close the output filehandle and retrieve the return value of the process. Replace date with whatever you want.
I wanted to do something like this to offload non-subroutines into an external file to make editing easier. I actually made this into a subroutine. The advantage of this way is that those "my" variables in the external file get declared in the main namespace. If you use 'do' they apparently don't migrate to the main namespace. Note the presentation below doesn't include error handling
sub getcode($) {
my #list;
my $filename = shift;
open (INFILE, "< $filename");
#list = <INFILE>;
close (INFILE);
return \#list;
}
# and to use it:
my $codelist = [];
$codelist = getcode('sourcefile.pl');
eval join ("", #$codelist);