The documentation for IPC::Open3 states:
The CHLD_IN will have autoflush turned on
But nothing in the source code mentions IO::Handle::autoflush. What mechanism does the module use to turn on autoflush for CHLD_IN?
Buffering is disabled with the following line
select((select($handles[0]{parent}), $| = 1)[0]); # unbuffer pipe
which could be rewritten as
my $old_fh = select($handles[0]{parent});
$| = 1;
select($old_fh);
The traditional way to disable output buffering in Perl is via the $| variable. From man perlvar:
HANDLE->autoflush( EXPR )
$OUTPUT_AUTOFLUSH
$|
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
Setting $| acts on "the currently selected output channel" which is set with the one-argument form of select.
Related
I am debugging a daemon and I'm trying to use print statements to output information to the terminal. The gist of my code is:
#!/usr/bin/env perl
use strict;
use warnings;
use Readonly;
Readonly my $TIMEOUT => ...;
...
while (1) {
print "DEBUG INFO";
...
sleep $TIMEOUT;
}
However, no output it getting printed to my terminal. Why is this?
Summary:
Use $| = 1 or add a newline, "\n" to the print.
Explanation:
The reason this isn't printing to the terminal is because perl is buffering the output for efficiency. Once the print buffer has been filled it will be flushed and the output will appear in your terminal. It may be desirable for you to force flushing the buffer, as depending on the length of $TIMEOUT you could be waiting for a considerable length of time for output!
There are two main approaches to flushing the buffer:
1) As you're printing to your terminal, then your filehandle is most likely STDOUT. Any file handles attached to the terminal are by default in line-buffered mode, and we can flush the buffer and force output by adding a newline character to your print statement:
while (1) {
print "DEBUG INFO\n";
...
sleep $TIMEOUT;
}
2) The second approach is to use $| which when set to non-zero makes the current filehandle (STDOUT by default or the last to be selected) hot and forces a flush of the buffer immediately. Therefore, the following will also force printing of the debug information:
$| = 1;
while (1) {
print "DEBUG INFO";
...
sleep $TIMEOUT;
}
If using syntax such as this is confusing, then you may like to consider:
use IO::Handle;
STDOUT->autoflush(1);
while (1) {
print "DEBUG INFO";
...
sleep $TIMEOUT;
}
In many code examples where immediate flushing of the buffer is required, you may see $|++ used to make a file-handle hot and immediately flush the buffer, and --$| to make a file-handle cold and switch off auto-flushing. See these two answers for more details:
Perl operator: $|++; dollar sign pipe plus plus
How does --$| work in Perl?
If you're interested in learning more about perl buffers, then I would suggest reading Suffering from Buffering, which gives great insight into why we have buffering and explains how to switch it on and off.
I'm re-factoring some perl code, and as seems to be the case, Perl has some weird constructs that are a pain to look up.
In this case I encountered the following...
$|++;
This is on a line by itself just after the "use" statements.
What does this command do?
From perldoc perlvar:
$|
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Therefore, as it always starts as 0, this increments it to 1, forcing a flush after every write/print.
You can replace it with the following to be much clearer.
use English '-no_match_vars';
$OUTPUT_AUTOFLUSH = 1;
Looking up variables is best done with perlvar (perldoc perlvar, or http://perldoc.perl.org/perlvar.html)
From that:
HANDLE->autoflush( EXPR )
$OUTPUT_AUTOFLUSH
$|
If set to nonzero,
forces a flush right away and after every write or print on the
currently selected output channel. Default is 0 (regardless of whether
the channel is really buffered by the system or not; $| tells you only
whether you've asked Perl explicitly to flush after each write).
STDOUT will typically be line buffered if output is to the terminal
and block buffered otherwise. Setting this variable is useful
primarily when you are outputting to a pipe or socket, such as when
you are running a Perl program under rsh and want to see the output as
it's happening. This has no effect on input buffering. See getc for
that. See select on how to select the output channel. See also
IO::Handle.
++ is the increment operator, which adds one to the variable.
So $|++ sets autoflush true (default 0 + 1 = 1, which boolean evals as true), which forces writes to stdout to not be buffered.
$| is one of Perl's special variables.
According to perlvar:
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel.
If Google is your only source of information, I can understand how looking up special variables in Perl could cause consternation. Fortunately there is perldoc! Every machine with perl on it should also have perldoc. Use it without command line parameters to get a list of all the Core documentation that comes with your version of Perl.
To look up all special variables: perldoc perlvar
To look up a specific special variable:perldoc -v '$|' ( on *nix,
use double quotes on Windows)
To look up perl's list of functions: perldoc perlfunc
To look up a specific function: perldoc -f sprintf
To look up the operators (including precedence): perldoc perlop
Armed with that information, you'll know what happens when you post-increment the Output Autoflush variable.
As a special bonus, perldoc.perl.org can manage all of these jobs with the exception of the -v search...
As others have pointed out, it enables autoflush on the selected output filehandle (which is likely STDOUT). What nobody else has said, though, is that while you're generally refactoring and neatening up code, you really ought to replace it with the equivalent but much more obvious
STDOUT->autoflush(1);
I have a perl script, say "process_output.pl" which is used in the following context:
long_running_command | "process_output.pl"
The process_output script, needs to be like the unix "tee" command, which dumps output of "long_running_command" to the terminal as it gets generated, and in addition captures output to a text file, and at the end of "long_running_command", forks another process with the text file as an input.
The behavior I am currently seeing is that, the output of "long_running_command" gets dumped to the terminal, only when it gets completed instead of, dumping output as it gets generated. Do I need to do something special to fix this?
Based on my reading in a few other stackexchange posts, i tried the following in "process_output.pl", without much help:
select(STDOUT); $| =1;
select(STDIN); $| =1; # Not sure even if this is needed
use FileHandle; STDOUT->autoflush(1);
stdbuf -oL -eL long_running_command | "process_output.pl"
Any pointers on how to proceed further.
Thanks
AB
This is more likely an issue with the output of the first process being buffered, rather than the input of your script. The easiest solution would be to try using the unbuffer command (I believe it's part of the expect package), something like
unbuffer long_running_command | "process_output.pl"
The unbuffer command will disable the buffering that happens normally when output is directed to a non-interactive place.
This will be the output processing of long_running_processing. More than likely it is using stdio - which will look to see what the output file descriptor is connected to before it does outputing. If it is a terminal (tty), then it will generally output line based, but in the above case - it will notice it is writing to a pipe and will therefore buffer the output into larger chunks.
You can control the buffering in your own process by using, as you showed
select(STDOUT); $| =1;
This means that things that your process prints to STDIO, are not buffered - it makes no sense doing this for input, as you control how much buffering is done - if you use sysread() then you are reading unbuffered, if you use a construct like <$fh> then perl will await until it has a "whole line" (it actually reads up to the next input line separator (as defined in variable $/ which is newline by default)) before it returns data to you.
unbuffer can be used to "disable" the output buffering, what it actually does is make the outputing process think that it is talking to a tty (by using a pseudo tty) so the output process does not buffer.
I was working with a file parser in perl that prints the name of every file it processes. But i noticed that these print outputs appeared out of order which got my attention. After further digging, i found out that this is because, Perl is using Buffering and releases these print statements to the output only when the buffer is full. I also learned that there is a work around by "making the filehandle hot". Whenever you print to a hot filehandle, Perl flushes the buffer immediately. So my question is :
Are there any consequences of "making the filehandle hot" ?
Does leaving the buffer to get filled up before flushing vs flushing immediately have any effect on performance ?
Perl uses different output buffering modes depending on context: Writing to files etc. buffers in chunks (this is important for performance), while a handle is flushed after each line if perl has reason to believe that the output goes to a terminal. STDERR is unbuffered by default.
You can deactivate buffering for the currently selected file handle by setting the special $| variable to a true value. However, this is better expressed as:
use IO::File; # on older perls
...
$some_file_handle->autoflush(1);
print { $some_file_handle } "this isn't buffered";
which has the advantage that you don't have to use the annoying select function for handles other than STDOUT. Why is this method called autoflush? The file handle is still buffered, but the buffer is automatically flushed after each print or say call.
Careful: The autoflush method won't work on truly ancient perls where file handles aren't objects yet. In that case, do the select dance:
my $old_fh = select $my_$fh;
$| = 1;
select $old_fh;
print { $my_fh } "this isn't buffered";
(select returns the currently selected file handle).
I am learning Writing CGI Application with Perl -- Kevin Meltzer . Brent Michalski
Scripts in the book mostly begin with this:
#!"c:\strawberry\perl\bin\perl.exe" -wT
# sales.cgi
$|=1;
use strict;
use lib qw(.);
What's the line $|=1; How to space it, eg. $| = 1; or $ |= 1; ?
Why put use strict; after $|=1; ?
Thanks
perlvar is your friend. It documents all these cryptic special variables.
$OUTPUT_AUTOFLUSH (aka $|):
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
Happy coding.
For the other questions:
There is no reason that use strict; comes after $|, except by the programmers convention. $| and other special variables are not affected by strict in this way. The spacing is also not important -- just pick your convention and be consistent. (I prefer spaces.)
$| = 1; forces a flush after every write or print, so the output appears as soon as it's generated rather than being buffered.
See the perlvar documentation.
$| is the name of a special variable. You shouldn't introduce a space between the $ and the |.
Whether you use whitespace around the = or not doesn't matter to Perl. Personally I think using spaces makes the code more readable.
Why the use strict; comes after $| = 1; in your script I don't know, except that they're both the sort of thing you'd put right at the top, and you have to put them in one order or the other. I don't think it matters which comes first.
It does not matter where in your script you put a use statement, because they all get evaluated at compile time.
$| is the built-in variable for autoflush. I agree that in this case, it is ambiguous. However, a lone $ is not a valid statement in perl, so by process of elimination, we can say what it must mean.
use lib qw(.) seems like a silly thing to do, since "." is already in #INC by default. Perhaps it is due to the book being old. This statement tells perl to add "." to the #INC array, which is the "path environment" for perl, i.e. where it looks for modules and such.