Run perl binary with autoflush enabled - perl

My program runs perl as a process in potentially many places with different scripts which I have inherited and would prefer not to modify needlessly if I can avoid it.
The root problem I'm facing is that my program cannot consume standard output as the perl script is executing unless autoflush is enabled (otherwise it will just get every log message instantly after the perl script has finished).
Therefore, what I'd like to do is to run perl with autoflush enabled by command line argument if possible. Something like this would be ideal:
perl -e "$| = 1" -e "foo.pl"
But obviously that doesn't work.

There is a CPAN module called Devel::Autoflush that does exactly this. You would invoke it from the command line:
perl -MDevel::Autoflush your-script-name-here.pl
...and it sets autoflush mode. Looking at the source code it's pretty easy to see how it works. You could just implement it yourself if you live in a world where CPAN modules are not permitted. Just create a module as follows:
package AutoFlush;
my $orig_fh = select STDOUT;
$| = 1;
select STDERR;
$| = 1;
select $orig_fh;
1;
And then from the command line invoke it just as I described above:
perl -MAutoFlush your-script-name-here.pl
This little example module is almost identical to how Devel::Autoflush does it.
Update: And as TLP correctly points out, the following would be even simpler syntax:
package AutoFlush
STDOUT->autoflush(1);
STDERR->autoflush(1);
1;
This may pull in more code, since the syntax relies on the implicit on-demand upgrading of the STDOUT and STDERR filehandles to IO::Handle objects, but when coding for clarity and programmer efficiency first, this is an obvious improvement.

Related

How to manually create TAP output

There is a great Perl module
Test::More that everybody uses for
unit testing. Here is the very simple script t/sample_1.t:
use Test::More tests => 1;
fail('This test fails');
I wanted to write script that does the same thing, but without
Test::More.
I've read several the docs about TAP (test anything protocol) to find out how to write the script. I've read:
Wikipedia article about TAP
TAP specification
Unfortunately the documentation wasn't enough. I had to examine the output of script that uses Test::More to find out that I need to output diagnostics to STDERR (there was nothing about this in the docs).
So, I have written a script that does completely the same things as the script with Test::More script. Here is the listing of t/sample_2.t:
$| = 1;
print "1..1\n";
print "not ok 1 - This test fails\n";
print STDERR "# Failed test 'This test fails'\n";
print STDERR "# at t/sample_1.t line 3.\n";
print STDERR "# Looks like you failed 1 test of 1.\n";
exit 1;
But when using prove these 2 scripts output different things. The line "# Failed test 'This test fails'" in prove is displayed on different lines for different tests. Here is the screenshot:
I've written a test scripts that uses Capture::Tiny to check that STDERR, STDOUT and exit code for both scripts a identical. And the script shows that both scripts output the same things.
I've stored all the test files and a test script at GitHub repo.
My question. How should I write Perl unit test without Test::More to have the same output as with Test::More.
PS If you are interested why I need this. I need this to solve the issue of my Perl module Test::Whitespaces.
While I've got absolutely no frickin idea what's going on, I can get the outputs to match (visually at least) by including the following before any other output to STDERR:
print STDERR "\r";
This makes them match visually when run through prove or plain old perl. However, this is NOT what Test::More is doing.
The TAP you're outputting is per spec; if prove wants to treat it differently from the TAP Test::More is outputting, I'd argue that's a bug (or at least an oddity) in prove. Personally when I've written Test modules, I've always used Test::Builder or wrapped Test::More to output the TAP. Each of these is a core module. This seems to be what the majority of Test modules tend to do.
At last I have found out what is going on.
hobbs has advised me to use Test::Builder. I created test script with
Test::Builder that worked exaclty as the script with Test::More (here
it is).
Then I started examinig source code of Test::Builder to find out why the
source of such behaviour. Here is the part of lib/TB2/Formatter/TAP/Base.pm
file:
# Emit old style comment failure diagnostics
sub _comment_diagnostics {
my($self, $result) = #_;
...
# Start on a new line if we're being output by Test::Harness.
# Makes it easier to read
$self->$out_method("\n") if ($out_method eq 'err') and $ENV{HARNESS_ACTIVE};
$self->$diag_method($msg);
return;
}
So, this is the answer. prove sets up special environment variable
HARNESS_ACTIVE and Test::More and friends puts additional line break symbol
"\n" before any diagnostics that are printed to STDERR.
At last I've created test script that outputs exactly the same as the script
written with Test::More. Source code of the script.
I really don't like this solution. It took me and outher peopler much time to
find out what is going on. I'm sure that the task of pretty output should be
solved in TAP parsers, and not in TAP producers.
=(

perl $|=1; What is this?

I am learning Writing CGI Application with Perl -- Kevin Meltzer . Brent Michalski
Scripts in the book mostly begin with this:
#!"c:\strawberry\perl\bin\perl.exe" -wT
# sales.cgi
$|=1;
use strict;
use lib qw(.);
What's the line $|=1; How to space it, eg. $| = 1; or $ |= 1; ?
Why put use strict; after $|=1; ?
Thanks
perlvar is your friend. It documents all these cryptic special variables.
$OUTPUT_AUTOFLUSH (aka $|):
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
Happy coding.
For the other questions:
There is no reason that use strict; comes after $|, except by the programmers convention. $| and other special variables are not affected by strict in this way. The spacing is also not important -- just pick your convention and be consistent. (I prefer spaces.)
$| = 1; forces a flush after every write or print, so the output appears as soon as it's generated rather than being buffered.
See the perlvar documentation.
$| is the name of a special variable. You shouldn't introduce a space between the $ and the |.
Whether you use whitespace around the = or not doesn't matter to Perl. Personally I think using spaces makes the code more readable.
Why the use strict; comes after $| = 1; in your script I don't know, except that they're both the sort of thing you'd put right at the top, and you have to put them in one order or the other. I don't think it matters which comes first.
It does not matter where in your script you put a use statement, because they all get evaluated at compile time.
$| is the built-in variable for autoflush. I agree that in this case, it is ambiguous. However, a lone $ is not a valid statement in perl, so by process of elimination, we can say what it must mean.
use lib qw(.) seems like a silly thing to do, since "." is already in #INC by default. Perhaps it is due to the book being old. This statement tells perl to add "." to the #INC array, which is the "path environment" for perl, i.e. where it looks for modules and such.

What's the difference between system, exec, and backticks in Perl?

In Perl, to run another Perl script from my script, or to run any system commands like mv, cp, pkgadd, pkgrm, pkginfo, rpm etc, we can use the following:
system()
exec()
`` (Backticks)
Are all the three the same, or are they different? Do all the three give the same result in every case? Are they used in different scenarios, like to call a Perl program we have to use system() and for others we have to use ``(backticks).
Please advise, as I am currently using system() for all the calls.
They're all different, and the docs explain how they're different. Backticks capture and return output; system returns an exit status, and exec never returns at all if it's successful.
IPC::System::Simple is probably what you want.
It provides safe, portable alternatives to backticks, system() and other IPC commands.
It also allows you to avoid the shell for most of said commands, which can be helpful in some circumstances.
The best option is to use some module, either in the standard library or from CPAN, that does the job for you. It's going to be more portable, and possibly even faster for quick tasks (no forking to the system).
However, if that's not good enough for you, you can use one of those three, and no, they are not the same. Read the perldoc pages on system(), exec(), and backticks to see the difference.
Calling system is generally a mistake. For instance, instead of saying
system "mv $foo /tmp" == 0
or die "could not move $foo to /tmp";
system "cp $foo /tmp" == 0
or die "could not copy $foo to /tmp";
you should say
use File::Copy;
move $foo, "/tmp"
or die "could not move $foo to /tmp: $!";
copy $foo, "/tmp"
or die "could not copy $foo to /tmp: $!";
Look on CPAN for modules that handle other commands for you. If you find yourself writing a lot of calls to system, it may be time to drop back into a shell script instead.
Well, the more people the more answers.
My answer is to generally avoid external commands execution. If you can - use modules. There is no point executing "cp", "mv" and a lot of another command - there exist modules that do it. And the benefit of using modules is that they usually work cross-platform. While your system("mv") might not.
When put in situation that I have no other way, but to call external command, I prefer to use IPC::Run. The idea is that all simplistic methods (backticks, qx, system, open with pipe) are inherently insecure, and require attention to parameters.
With IPC::Run, I run commands like I would do with system( #array ), which is much more secure, and I can bind to stdin, stdout and stderr separately, using variables, or callbacks, which is very cool when you'll be in situation that you have to pass data to external program from long-running code.
Also, it has built-in handling of timeouts, which come handy more than once :)
If you don't want the shell getting involved (usually you don't) and if waiting for the system command is acceptable, I recommend using IPC::Run3. It is simple, flexible, and does the common task of executing a program, feeding it input and capturing its output and error streams right.

How do I run a Perl script from within a Perl script?

I've got a Perl script that needs to execute another Perl script. This second script can be executed directly on the command line, but I need to execute it from within my first program. I'll need to pass it a few parameters that would normally be passed in when it's run standalone (the first script runs periodically, and executes the second script under a certain set of system conditions).
Preliminary Google searches suggest using backticks or a system() call. Are there any other ways to run it? (I'm guessing yes, since it's Perl we're talking about :P ) Which method is preferred if I need to capture output from the invoked program (and, if possible, pipe that output as it executes to stdout as though the second program were invoked directly)?
(Edit: oh, now SO suggests some related questions. This one is close, but not exactly the same as what I'm asking. The second program will likely take an hour or more to run (lots of I/O), so I'm not sure a one-off invocation is the right fit for this.)
You can just do it.
{
local #ARGV = qw<param1 param2 param3>;
do '/home/buddy/myscript.pl';
}
Prevents the overhead of loading in another copy of perl.
The location of your current perl interpreter can be found in the special variable $^X. This is important if perl is not in your path, or if you have multiple perl versions available but which to make sure you're using the same one across the board.
When executing external commands, including other Perl programs, determining if they actually ran can be quite difficult. Inspecting $? can leave lasting mental scars, so I prefer to use IPC::System::Simple (available from the CPAN):
use strict;
use warnings;
use IPC::System::Simple qw(system capture);
# Run a command, wait until it finishes, and make sure it works.
# Output from this program goes directly to STDOUT, and it can take input
# from your STDIN if required.
system($^X, "yourscript.pl", #ARGS);
# Run a command, wait until it finishes, and make sure it works.
# The output of this command is captured into $results.
my $results = capture($^X, "yourscript.pl", #ARGS);
In both of the above examples any arguments you wish to pass to your external program go into #ARGS. The shell is also avoided in both of the above examples, which gives you a small speed advantage, and avoids any unwanted interactions involving shell meta-characters. The above code also expects your second program to return a zero exit value to indicate success; if that's not the case, you can specify an additional first argument of allowable exit values:
# Both of these commands allow an exit value of 0, 1 or 2 to be considered
# a successful execution of the command.
system( [0,1,2], $^X, "yourscript.pl", #ARGS );
# OR
capture( [0,1,2, $^X, "yourscript.pl", #ARGS );
If you have a long-running process and you want to process its data while it's being generated, then you're probably going to need a piped open, or one of the more heavyweight IPC modules from the CPAN.
Having said all that, any time you need to be calling another Perl program from Perl, you may wish to consider if using a module would be a better choice. Starting another program carries quite a few overheads, both in terms of start-up costs, and I/O costs for moving data between processes. It also significantly increases the difficulty of error handling. If you can turn your external program into a module, you may find it simplifies your overall design.
All the best,
Paul
I can think of a few ways to do this. You already mentioned the first two, so I won't go into detail on them.
backticks: $retVal = `perl somePerlScript.pl`;
system() call
eval
The eval can be accomplished by slurping the other file into a string (or a list of strings), then 'eval'ing the strings. Heres a sample:
#!/usr/bin/perl
open PERLFILE, "<somePerlScript.pl";
undef $/; # this allows me to slurp the file, ignoring newlines
my $program = <PERLFILE>;
eval $program;
4 . do: do 'somePerlScript.pl'
You already got good answers to your question, but there's always the posibility to take a different point of view: maybe you should consider refactoring the script that you want to run from the first script. Turn the functionality into a module. Use the module from the first and from the second script.
If you need to asynchronously call your external script -you just want to launch it and not wait for it to finish-, then :
# On Unix systems, either of these will execute and just carry-on
# You can't collect output that way
`myscript.pl &`;
system ('myscript.pl &');
# On Windows systems the equivalent would be
`start myscript.pl`;
system ('start myscript.pl');
# If you just want to execute another script and terminate the current one
exec ('myscript.pl');
Use backticks if you need to capture the output of the command.
Use system if you do not need to capture the output of the command.
TMTOWTDI: so there are other ways too, but those are the two easiest and most likely.
See the perlipc documentation for several options for interprocess communication.
If your first script merely sets up the environment for the second script, you may be looking for exec.
#!/usr/bin/perl
use strict;
open(OUTPUT, "date|") or die "Failed to create process: $!\n";
while (<OUTPUT>)
{
print;
}
close(OUTPUT);
print "Process exited with value " . ($? >> 8) . "\n";
This will start the process date and pipe the output of the command to the OUTPUT filehandle which you can process a line at a time. When the command is finished you can close the output filehandle and retrieve the return value of the process. Replace date with whatever you want.
I wanted to do something like this to offload non-subroutines into an external file to make editing easier. I actually made this into a subroutine. The advantage of this way is that those "my" variables in the external file get declared in the main namespace. If you use 'do' they apparently don't migrate to the main namespace. Note the presentation below doesn't include error handling
sub getcode($) {
my #list;
my $filename = shift;
open (INFILE, "< $filename");
#list = <INFILE>;
close (INFILE);
return \#list;
}
# and to use it:
my $codelist = [];
$codelist = getcode('sourcefile.pl');
eval join ("", #$codelist);

When is the right time (and the wrong time) to use backticks?

Many beginning programmers write code like this:
sub copy_file ($$) {
my $from = shift;
my $to = shift;
`cp $from $to`;
}
Is this bad, and why? Should backticks ever be used? If so, how?
A few people have already mentioned that you should only use backticks when:
You need to capture (or supress) the output.
There exists no built-in function or Perl module to do the same task, or you have a good reason not to use the module or built-in.
You sanitise your input.
You check the return value.
Unfortunately, things like checking the return value properly can be quite challenging. Did it die to a signal? Did it run to completion, but return a funny exit status? The standard ways of trying to interpret $? are just awful.
I'd recommend using the IPC::System::Simple module's capture() and system() functions rather than backticks. The capture() function works just like backticks, except that:
It provides detailed diagnostics if the command doesn't start, is killed by a signal, or returns an unexpected exit value.
It provides detailed diagnostics if passed tainted data.
It provides an easy mechanism for specifying acceptable exit values.
It allows you to call backticks without the shell, if you want to.
It provides reliable mechanisms for avoiding the shell, even if you use a single argument.
The commands also work consistently across operating systems and Perl versions, unlike Perl's built-in system() which may not check for tainted data when called with multiple arguments on older versions of Perl (eg, 5.6.0 with multiple arguments), or which may call the shell anyway under Windows.
As an example, the following code snippet will save the results of a call to perldoc into a scalar, avoids the shell, and throws an exception if the page cannot be found (since perldoc returns 1).
#!/usr/bin/perl -w
use strict;
use IPC::System::Simple qw(capture);
# Make sure we're called with command-line arguments.
#ARGV or die "Usage: $0 arguments\n";
my $documentation = capture('perldoc', #ARGV);
IPC::System::Simple is pure Perl, works on 5.6.0 and above, and doesn't have any dependencies that wouldn't normally come with your Perl distribution. (On Windows it depends upon a Win32:: module that comes with both ActiveState and Strawberry Perl).
Disclaimer: I'm the author of IPC::System::Simple, so I may show some bias.
The rule is simple: never use backticks if you can find a built-in to do the same job, or if their is a robust module on the CPAN which will do it for you. Backticks often rely on unportable code and even if you untaint the variables, you can still open yourself up to a lot of security holes.
Never use backticks with user data unless you have very tightly specified what is allowed (not what is disallowed -- you'll miss things)! This is very, very dangerous.
Backticks should be used if and only if you need to capture the output of a command. Otherwise, system() should be used. And, of course, if there's a Perl function or CPAN module that does the job, this should be used instead of either.
In either case, two things are strongly encouraged:
First, sanitize all inputs: Use Taint mode (-T) if the code is exposed to possible untrusted input. Even if it's not, make sure to handle (or prevent) funky characters like space or the three kinds of quote.
Second, check the return code to make sure the command succeeded. Here is an example of how to do so:
my $cmd = "./do_something.sh foo bar";
my $output = `$cmd`;
if ($?) {
die "Error running [$cmd]";
}
Another way to capture stdout(in addition to pid and exit code) is to use IPC::Open3 possibily negating the use of both system and backticks.
Use backticks when you want to collect the output from the command.
Otherwise system() is a better choice, especially if you don't need to invoke a shell to handle metacharacters or command parsing. You can avoid that by passing a list to system(), eg system('cp', 'foo', 'bar') (however you'd probably do better to use a module for that particular example :))
In Perl, there's always more than one way to do anything you want. The primary point of backticks is to get the standard output of the shell command into a Perl variable. (In your example, anything that the cp command prints will be returned to the caller.) The downside of using backticks in your example is you don't check the shell command's return value; cp could fail and you wouldn't notice. You can use this with the special Perl variable $?. When I want to execute a shell command, I tend to use system:
system("cp $from $to") == 0
or die "Unable to copy $from to $to!";
(Also observe that this will fail on filenames with embedded spaces, but I presume that's not the point of the question.)
Here's a contrived example of where backticks might be useful:
my $user = `whoami`;
chomp $user;
print "Hello, $user!\n";
For more complicated cases, you can also use open as a pipe:
open WHO, "who|"
or die "who failed";
while(<WHO>) {
# Do something with each line
}
close WHO;
From the "perlop" manpage:
That doesn't mean you should go out of
your way to avoid backticks when
they're the right way to get something
done. Perl was made to be a glue
language, and one of the things it
glues together is commands. Just
understand what you're getting
yourself into.
For the case you are showing using the File::Copy module is probably best. However, to answer your question, whenever I need to run a system command I typically rely on IPC::Run3. It provides a lot of functionality such as collecting the return code and the standard and error output.
Whatever you do, as well as sanitising input and checking the return value of your code, make sure you call any external programs with their explicit, full path. e.g. say
my $user = `/bin/whoami`;
or
my $result = `/bin/cp $from $to`;
Saying just "whoami" or "cp" runs the risk of accidentally running a command other than what you intended, if the user's path changes - which is a security vulnerability that a malicious attacker could attempt to exploit.
Your example's bad because there are perl builtins to do that which are portable and usually more efficient than the backtick alternative.
They should be used only when there's no Perl builtin (or module) alternative. This is both for backticks and system() calls. Backticks are intended for capturing output of the executed command.
Backticks are only supposed to be used when you want to capture output. Using them here "looks silly." It's going to clue anyone looking at your code into the fact that you aren't very familiar with Perl.
Use backticks if you want to capture output.
Use system if you want to run a command. One advantage you'll gain is the ability to check the return status.
Use modules where possible for portability. In this case, File::Copy fits the bill.
In general, it's best to use system instead of backticks because:
system encourages the caller to check the return code of the command.
system allows "indirect object" notation, which is more secure and adds flexibility.
Backticks are culturally tied to shell scripting, which might not be common among readers of the code.
Backticks use minimal syntax for what can be a heavy command.
One reason users might be temped to use backticks instead of system is to hide STDOUT from the user. This is more easily and flexibly accomplished by redirecting the STDOUT stream:
my $cmd = 'command > /dev/null';
system($cmd) == 0 or die "system $cmd failed: $?"
Further, getting rid of STDERR is easily accomplished:
my $cmd = 'command 2> error_file.txt > /dev/null';
In situations where it makes sense to use backticks, I prefer to use the qx{} in order to emphasize that there is a heavy-weight command occurring.
On the other hand, having Another Way to Do It can really help. Sometimes you just need to see what a command prints to STDOUT. Backticks, when used as in shell scripts are just the right tool for the job.
Perl has a split personality. On the one hand it is a great scripting language that can replace the use of a shell. In this kind of one-off I-watching-the-outcome use, backticks are convenient.
When used a programming language, backticks are to be avoided. This is a lack of error
checking and, if the separate program backticks execute can be avoided, efficiency is
gained.
Aside from the above, the system function should be used when the command's output is not being used.
Backticks are for amateurs. The bullet-proof solution is a "Safe Pipe Open" (see "man perlipc"). You exec your command in another process, which allows you to first futz with STDERR, setuid, etc. Advantages: it does not rely on the shell to parse #ARGV, unlike open("$cmd $args|"), which is unreliable. You can redirect STDERR and change user priviliges without changing the behavior of your main program. This is more verbose than backticks but you can wrap it in your own function like run_cmd($cmd,#args);
sub run_cmd {
my $cmd = shift #_;
my #args = #_;
my $fh; # file handle
my $pid = open($fh, '-|');
defined($pid) or die "Could not fork";
if ($pid == 0) {
open STDERR, '>/dev/null';
# setuid() if necessary
exec ($cmd, #args) or exit 1;
}
wait; # may want to time out here?
if ($? >> 8) { die "Error running $cmd: [$?]"; }
while (<$fh>) {
# Have fun with the output of $cmd
}
close $fh;
}