I am calling an external program from my perl code using backticks
print `<some long running program>`
The long running program prints detailed log messages onto standard output.
The problem I'm having is that due to buffering, the output from the long running program is printed all at once after it has finished its execution.
I tried making the STDOUT filehandle "hot" but that did not help.
Is there anyway I can have my program print continuously onto the screen?
Open as an exec pipe rather than using backticks.
open ( my $prog_stdout, "-|", "/your/program" ) or die $!;
This will fork and exec but give you access to $prog_stdout to do things with.
E.g.
while ( <$prog_stdout> ) {
print;
}
(It'll close if your external program exits, so the while will terminate).
You may also want to include autoflushing of the filehandle. http://perldoc.perl.org/IO/Handle.html
But that may not be necessary, as output won't be buffered indefinitely.
It might not be buffering but the fact that back ticks return when external program finishes.
You can however use reading pipe to read external output line by line,
use autodie;
open my $pipe, "-|", "<some long running program>";
# $pipe->autoflush();
while (<$pipe>) { .. }
Related
I have to kill a program that I am opening via
$pid = open(FH, "program|")
or
$pid = or open(FH, "-|", "program")
However, the program (mosquittto_sub, to be specific) still lingers around in the background, because open is returning the PID of the sh that perl is using to run the program, so I am only killing the sh wrapper instead of the actual program.
Is there a way to get the programs real PID? What is the point of getting the sh's PID?
There are a few ways to deal with this.
First, you can use a list form to open a process and then no shell is involved so the child process (with pid returned by open) is precisely the one with the program you need to stop
my #cmd = ('progname', '-arg1', ...);
my $pid = open my $fh, '-|', #cmd // die "Can't open \"#cmd\": $!";
...
my $num_signaled = kill 15, $pid;
This sketch needs some checks added. Please see the linked documentation (look for "pipe").
If this isn't suitable for some reason -- perhaps you need the shell to run that program -- then you can find the program's pid, and Proc::ProcessTable module is good for this. A basic demo
use Proc::ProcessTable;
my $prog_name = ...
my $pid;
my $pt = Proc::ProcessTable->new();
foreach my $proc (#{$pt->table}) {
if ($proc->cmndline =~ /\Q$prog_name/) { # is this enough to identify it?
$pid = $proc->pid;
last;
}
}
my $num_signaled = kill 15, $pid;
Please be careful with identifying the program by its name -- on a modern system there may be all kinds of processes running that contain the name of the program you want to terminate. For more detail and discussion please see this post and this post, for starters.
Finally, you can use a module to run your external programs and then you'll be able to manage and control them far more nicely. Here I'd recommend IPC::Run.
I'm trying to figure this out but haven't been able to wrap my head around it. I need to open a piped subprocess and read from its output. Originally I was using the standard open() call like this:
#!/usr/bin/perl;
use warnings;
use strict;
use Scalar::Util qw(openhandle);
use IPC::Open3;
my $fname = "/var/log/file.log.1.gz";
my $pid = open(my $fh, "-|:encoding(UTF-8)", "gunzip -c \"$fname\" | tac");
# Read one line from the file
while (my $row = <$fh>) {
print "Row: $row\n";
last; # Bail out early
}
# Check if the PID is valid and kill it if so
if (kill(0, $pid) == 1) {
kill(15, $pid);
waitpid($pid, 0);
$pid = 0;
}
# Close the filehandle if it is still open
if (openhandle($fh)) {
close $fh;
}
The above works, except that I get errors from tac in the logs saying:
tac: write error
From what I can tell from various testing and research that I've done, this is happening because killing the PID returned from open() just kills the first child process (but not the second) and so when I then close the filehandle, tac is still writing to it, thus the "write error" due to the broken pipe. The strange thing is, at times when I check ($? >> 8) if the close() call returns false, it will return 141, indicating it received a SIGPIPE (backing up my theory above). However, other times it returns 0 which is strange.
Furthermore, if I run the same command but without a double-pipe (only a single one), like this (everything else the same as above):
my $pid = open(my $fh, "-|:encoding(UTF-8)", "gunzip -c \"$fname\"");
...I'll get an error in the logs like this:
gzip: stdout: Broken pipe
...but in this case, gunzip/gzip was the only process (which I killed via the returned PID), so I'm not sure why it would still be writing to the pipe when I close the filehandle (since it was supposed to be killed already, AND waited for with waitpid()).
I'm trying to repro this in the Perl debugger but its difficult because I can't get the stderr of the child process with plain open() (the way I'm seeing the external process' stderr in prod is in the apache2 logs - this is a CGI script).
I understand from reading the docs that I can't get the PID of all child processes in a multi-piped open with open(), so I decided to try and resort to a different method so that I could close all processes cleanly. I tried open3(), and interestingly, without making any changes (literally running basically the same exact scenario as above but with open3() instead of open()):
my $pid = open3(my $in, my $fh, undef, "gunzip -c \"$fname\"");
...and then killing it just like I did above, I don't get any errors. This holds true for both the single piped process as shown above, as well as the double-piped process that involves piping to "tac".
Therefore, I'm wondering what I am missing here? I know there are differences in the way open() and open3() work, but are there differences in the way that child processes are spawned from them? In both cases I can see that the initial child (the PID returned) is itself a child of the Perl process. But its almost as if the process spawned by open(), is not getting properly killed and/or cleaned up (via waitpid()) while the same process spawned by open3() is, and that's the part I can't figure out.
And, more to the bigger picture and the issue at hand - what is the suggestion for the best way to cleanly close a multi-piped process in this sort of scenario? Am I spending more time than is warranted on this? The script itself works as it should aside from these errors, so if it turns out that the tac and gzip errors I'm seeing are inconsequential, should I just live with them and move on?
Any help is much appreciated!
If you just want to read the last line of a gzipped file, it's easy to do it in pure perl without calling an external program:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use IO::Uncompress::Gunzip qw/$GunzipError/;
my $fname = 'foo.txt.gz';
my $z = new IO::Uncompress::Gunzip $fname or die "Couldn't open file: $GunzipError\n";
my $row;
while (<$z>) {
$row = $_;
}
say "Row: $row";
This happens because either your perl script or its parent is ignoring the SIGPIPE signal, and the ignore signal dispositions are inherited by the children.
Here is a simpler testcase for your condition:
$ perl -e '$SIG{PIPE}="IGNORE"; open my $fh, "-|", "seq 100000 | tac; true"; print scalar <$fh>'
100000
tac: write error
$ (trap "" PIPE; perl -e 'open my $fh, "-|", "seq 100000 | tac"; print scalar <$fh>')
100000
tac: write error
$ (trap "" PIPE; perl -e 'my $pid = open my $fh, "-|", "seq 100000 | tac"; print scalar <$fh>; kill 15, $pid; waitpid $pid, 0')
100000
$ tac: write error
The latter version does the same kill as the version from the OP, which will not kill either the right or left side of the pipeline, but the shell running and waiting for both (some shells will exec through the left side of a pipeline; with such shells, a ; exit $? could be appended to the command in order to reproduce the example).
A case where SIGPIPE is ignored upon entering a perl script is when run via fastcgi -- which sets the SIGPIPE disposition to ignore, and expects the script to handle it. In that case simply setting an SIGPIPE handler instead of IGNORE (even an empty handler) would work, since in that case the signal disposition will be reset to default upon executing external commands:
$SIG{PIPE} = sub { };
open my $fh, '-|', 'trap - PIPE; ... | tac';
When run as a standalone script it could be some setup bug (I've see it happen in questions related to containerization on Linux), or someone trying to exploit buggy programs running with elevated privileges not bothering to handle write(2) errors (EPIPE in this case).
my $pid = open3(my $in, my $fh, undef, "gunzip -c \"$fname\"");
...and then killing it just like I did above, I don't get any errors.
Where should you get the errors from, if you're redirecting its stderr to the same $fh you only read the first line from?
The thing is absolutely no different with open3:
$ (trap "" PIPE; perl -MIPC::Open3 -e 'my $pid = open3 my $in, my $out, my $err, "seq 100000 | tac 2>/dev/tty"; print scalar <$out>')
100000
$ tac: write error
i find this in one sample script,
then i search from google and found the following words,
Note that you cannot simply open STDERR to be a dup of STDOUT in your Perl
program and avoid calling the shell to do the redirection. This doesn't
work:
open(STDERR, ">&STDOUT");
This fails because the open() makes STDERR go to where STDOUT was going at
the time of the open(). The backticks then make STDOUT go to a string, but
don't change STDERR (which still goes to the old STDOUT).
Now I am confused. What exactly is the meaning of open(STDERR, ">&STDOUT"); ?
With the & in the mode >& in the call
open STDERR, ">&STDOUT"; # or: open STDERR, ">&", \*STDOUT
the first given filehandle is made a copy of the second one. See open, and see man 2 dup2 since this goes via dup2 syscall. The notation follows the shell's I/O redirection.
Since here the first filehandle exists (STDERR)† it is first closed.
The effect is that prints to STDERR will go to where STDOUT was going before this was done, with the side effect of the original STDERR being closed.
This is legit and does not result in errors but is not a good way to redirect STDERR in general -- after that we cannot restore STDERR any more. See open for how to redirect STDERR.
The rest of the comment clearly refers to a situation where backticks (see qx), which redirect STDOUT of the executed command(s) to the program, are used after that open call. All this seems to refer to an idea of redirecting STDERR to STDOUT in this way.
Alas, the STDERR, made by that open call to go where STDOUT was going, doesn't get redirected by the backticks and thus still goes "there." In my case prints to STDERR wind up on the terminal as I see the warning (ls: cannot access...) with
perl -we'open STDERR, ">&STDOUT"; $out = qx(ls no_such)'
(unlike with perl -we'$out = qx(ls no_such 2>&1)'). Explicit prints to STDERR also go to the terminal as STDOUT (add such prints and redirect output to a file to see).
This may be expected since & made a copy of the filehandle, so the "new" one (the former STDERR) still goes where STDOUT was going, that is to the terminal. What is of course unintended in this case and thus an error.
† Every program in UNIX gets connected to standard streams stdin, stdout, and stderr, with file descriptors 0, 1, and 2 respectively. In a Perl program we then get ready filehandles for these, like the STDERR (for fd 2).
Some generally useful posts on manipulations of file descriptors in the shell:
What does “3>&1 1>&2 2>&3” do in a script?
In the shell, what does “ 2>&1 ” mean?
File descriptors & shell scripting
Order of redirections
Shell redirection i/o order
It's basically dup2(fileno(STDOUT), fileno(STDERR)). See your system's dup2 man page.
In short, it associates STDERR with the same stream as STDOUT at the system level. After the command is performed writing to either will be the same as writing to STDOUT before the change.
Unless someone's messed with STDOUT or STDERR, it's equivalent to the shell command
exec 2>&1
I want to run shell commands in a CGI program (written in Perl). My program doesn’t have root permission. It runs as nobody. I want to use this code:
use strict;
system <<'EEE';
awk '{a[$1]+=$2;b[$1]+=$3}END{for(i in a)print i, a[i], b[i]|"sort -nk 3"}' s.txt
EEE
I can run my code successfully with perl from the command line but not as a CGI program.
Based on the code in your question, there are at least four possibilities for failure.
The nobody user does not have permission to execute your program.
The Perl code in your question has no shebang (#!) line. You are trying to run awk, so I assume you are running on some form of Unix. If your code is missing this line, then your operating system does not know how to run your program.
The file s.txt is either not in the executing program’s working directory, or it is not readable by the nobody user.
For whatever reason, awk is not reachable via the PATH of your executing program’s environment.
To quickly diagnose such low-level problems, try to have all error output to show up in the browser. One way to do this is adding the following just after the shebang line in your code.
BEGIN {
print "Content-type: text/plain\n\n";
open STDERR, ">&", \*STDOUT or print "$0: dup: $!";
}
The output will render as plain text rather than HTML, but this is a temporary measure to see your program’s output. By wrapping it in a BEGIN block, the code executes as soon as it parses. Redirecting STDERR means your browser also gets anything written to the standard output.
Another way to do this is with the CGI::Carp module.
use CGI::Carp 'fatalsToBrowser';
This way, errors go to the browser and also to the web server’s error log.
If you still see 500-series errors from your server, the problem is happening at a lower level: probably some failure to start perl. Go examine your server’s error log. Once your program is executing, you can remove this temporary redirection of error output.
Finally, I recommend changing your program to
#! /usr/bin/perl -T
BEGIN { print "Content-type: text/plain\n\n"; }
use strict;
use warnings;
$ENV{PATH} = "/bin:/usr/bin";
my $input = "/path/to/your/s.txt";
my $buckets = <<'EOProgram'
{ a[$1] += $2; b[$1] += $3 }
END { for (i in a) print i, a[i], b[i] }
EOProgram
open STDIN, "-|", "awk", $buckets, $input or die "$0: open: $!";
exec "sort", "-nk", 3 or die "$0: exec: $!";
The -T switch enables a security dataflow analysis called taint mode that prevents you from using unsanitized input on system operations such as open, exec, and so on that an attacker (or benign user supplying unexpected input) could use to harm your system. You should always add -T to CGI programs and any other code that runs on behalf of another user.
Given the nature of your awk program, a content type of text/plain seems reasonable. Output it as soon as possible.
With taint mode enabled, be explicit about the value of your PATH environment variable. If instead you stick with whatever untrusted PATH your program inherits, attempting to run external programs will fail.
Nail down the full path of your input. This will eliminate surprises.
Using the multi-argument forms of open and exec eliminates the shell and its argument parsing. (For completeness, system also has a similar multi-argument form.) Yes, writing it this way can mean being a little more deliberate (such as breaking out the arguments and setting up the pipeline yourself), but it also avoids nasty surprises.
I'm sure nobody is allowed to run shell commands. The problem is that nobody doesn't have permission to open the file s.txt. Add read permission for everyone to s.txt, and add execute permission to everyone on every directory up to s.txt.
I would suggest finding out the full qualified path for awk and specifying it directly. Likely the nobody that launched httpd had a very minimal path in its $ENV{PATH}. Displaying the $ENV{PATH} I am guessing will show this.
This is a good thing, I wouldn't modify the path, but just specify the path /usr/bin/awk or what not.
If you have shell access and it works, type 'which awk' to find this out.
i can run my codes successfully in
perl file but not in cgi file.
What web server are you running under? For instance, apache requires printing a CGI header i.e. print "Content-type: text/plain; charset=utf-8\n\n", or
use CGI;
my $q = CGI->new();
print $q->header('text/html');
(See CGI)
Apache will conplain in the log (error.log) about "premature end of script headers" IF what I said is the case.
You could just do it inline without having to fork out to another process...
if ( open my $fh, '<', 's.txt' ) {
my %data;
while (<$fh>) {
my ($c1,$c2,$c3) = split;
$data{a}{$c1} += $c2;
$data{b}{$c1} += $c3;
}
foreach ( sort { $data{b}{$a} <=> $data{b}{$b} } keys %{ $data{b} } ) {
print "$_ $data{a}{$_} $data{b}{$_}\n";
}
} else {
warn "Unable to open s.txt: $!\n";
}
Can someone please help me? In Perl, what is the difference between:
exec "command";
and
system("command");
and
print `command`;
Are there other ways to run shell commands too?
exec
executes a command and never returns.
It's like a return statement in a function.
If the command is not found exec returns false.
It never returns true, because if the command is found it never returns at all.
There is also no point in returning STDOUT, STDERR or exit status of the command.
You can find documentation about it in perlfunc,
because it is a function.
system
executes a command and your Perl script is continued after the command has finished.
The return value is the exit status of the command.
You can find documentation about it in perlfunc.
backticks
like system executes a command and your perl script is continued after the command has finished.
In contrary to system the return value is STDOUT of the command.
qx// is equivalent to backticks.
You can find documentation about it in perlop, because unlike system and execit is an operator.
Other ways
What is missing from the above is a way to execute a command asynchronously.
That means your perl script and your command run simultaneously.
This can be accomplished with open.
It allows you to read STDOUT/STDERR and write to STDIN of your command.
It is platform dependent though.
There are also several modules which can ease this tasks.
There is IPC::Open2 and IPC::Open3 and IPC::Run, as well as
Win32::Process::Create if you are on windows.
In general I use system, open, IPC::Open2, or IPC::Open3 depending on what I want to do. The qx// operator, while simple, is too constraining in its functionality to be very useful outside of quick hacks. I find open to much handier.
system: run a command and wait for it to return
Use system when you want to run a command, don't care about its output, and don't want the Perl script to do anything until the command finishes.
#doesn't spawn a shell, arguments are passed as they are
system("command", "arg1", "arg2", "arg3");
or
#spawns a shell, arguments are interpreted by the shell, use only if you
#want the shell to do globbing (e.g. *.txt) for you or you want to redirect
#output
system("command arg1 arg2 arg3");
qx// or ``: run a command and capture its STDOUT
Use qx// when you want to run a command, capture what it writes to STDOUT, and don't want the Perl script to do anything until the command finishes.
#arguments are always processed by the shell
#in list context it returns the output as a list of lines
my #lines = qx/command arg1 arg2 arg3/;
#in scalar context it returns the output as one string
my $output = qx/command arg1 arg2 arg3/;
exec: replace the current process with another process.
Use exec along with fork when you want to run a command, don't care about its output, and don't want to wait for it to return. system is really just
sub my_system {
die "could not fork\n" unless defined(my $pid = fork);
return waitpid $pid, 0 if $pid; #parent waits for child
exec #_; #replace child with new process
}
You may also want to read the waitpid and perlipc manuals.
open: run a process and create a pipe to its STDIN or STDERR
Use open when you want to write data to a process's STDIN or read data from a process's STDOUT (but not both at the same time).
#read from a gzip file as if it were a normal file
open my $read_fh, "-|", "gzip", "-d", $filename
or die "could not open $filename: $!";
#write to a gzip compressed file as if were a normal file
open my $write_fh, "|-", "gzip", $filename
or die "could not open $filename: $!";
IPC::Open2: run a process and create a pipe to both STDIN and STDOUT
Use IPC::Open2 when you need to read from and write to a process's STDIN and STDOUT.
use IPC::Open2;
open2 my $out, my $in, "/usr/bin/bc"
or die "could not run bc";
print $in "5+6\n";
my $answer = <$out>;
IPC::Open3: run a process and create a pipe to STDIN, STDOUT, and STDERR
use IPC::Open3 when you need to capture all three standard file handles of the process. I would write an example, but it works mostly the same way IPC::Open2 does, but with a slightly different order to the arguments and a third file handle.
Let me quote the manuals first:
perldoc exec():
The exec function executes a system command and never returns-- use system instead of exec if you want it to return
perldoc system():
Does exactly the same thing as exec LIST , except that a fork is done first, and the parent process waits for the child process to complete.
In contrast to exec and system, backticks don't give you the return value but the collected STDOUT.
perldoc `String`:
A string which is (possibly) interpolated and then executed as a system command with /bin/sh or its equivalent. Shell wildcards, pipes, and redirections will be honored. The collected standard output of the command is returned; standard error is unaffected.
Alternatives:
In more complex scenarios, where you want to fetch STDOUT, STDERR or the return code, you can use well known standard modules like IPC::Open2 and IPC::Open3.
Example:
use IPC::Open2;
my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'some', 'cmd', 'and', 'args');
waitpid( $pid, 0 );
my $child_exit_status = $? >> 8;
Finally, IPC::Run from the CPAN is also worth looking at…
What's the difference between Perl's backticks (`), system, and exec?
exec -> exec "command"; ,
system -> system("command"); and
backticks -> print `command`;
exec
exec executes a command and never resumes the Perl script. It's to a script like a return statement is to a function.
If the command is not found, exec returns false. It never returns true, because if the command is found, it never returns at all. There is also no point in returning STDOUT, STDERR or exit status of the command. You can find documentation about it in perlfunc, because it is a function.
E.g.:
#!/usr/bin/perl
print "Need to start exec command";
my $data2 = exec('ls');
print "Now END exec command";
print "Hello $data2\n\n";
In above code, there are three print statements, but due to exec leaving the script, only the first print statement is executed. Also, the exec command output is not being assigned to any variable.
Here, only you're only getting the output of the first print statement and of executing the ls command on standard out.
system
system executes a command and your Perl script is resumed after the command has finished. The return value is the exit status of the command. You can find documentation about it in perlfunc.
E.g.:
#!/usr/bin/perl
print "Need to start system command";
my $data2 = system('ls');
print "Now END system command";
print "Hello $data2\n\n";
In above code, there are three print statements. As the script is resumed after the system command, all three print statements are executed.
Also, the result of running system is assigned to data2, but the assigned value is 0 (the exit code from ls).
Here, you're getting the output of the first print statement, then that of the ls command, followed by the outputs of the final two print statements on standard out.
backticks (`)
Like system, enclosing a command in backticks executes that command and your Perl script is resumed after the command has finished. In contrast to system, the return value is STDOUT of the command. qx// is equivalent to backticks. You can find documentation about it in perlop, because unlike system and exec, it is an operator.
E.g.:
#!/usr/bin/perl
print "Need to start backticks command";
my $data2 = `ls`;
print "Now END system command";
print "Hello $data2\n\n";
In above code, there are three print statements and all three are being executed. The output of ls is not going to standard out directly, but assigned to the variable data2 and then printed by the final print statement.
The difference between 'exec' and 'system' is that exec replaces your current program with 'command' and NEVER returns to your program. system, on the other hand, forks and runs 'command' and returns you the exit status of 'command' when it is done running. The back tick runs 'command' and then returns a string representing its standard out (whatever it would have printed to the screen)
You can also use popen to run shell commands and I think that there is a shell module - 'use shell' that gives you transparent access to typical shell commands.
Hope that clarifies it for you.