Perl, what does $|++ do? - perl

I'm re-factoring some perl code, and as seems to be the case, Perl has some weird constructs that are a pain to look up.
In this case I encountered the following...
$|++;
This is on a line by itself just after the "use" statements.
What does this command do?

From perldoc perlvar:
$|
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Therefore, as it always starts as 0, this increments it to 1, forcing a flush after every write/print.
You can replace it with the following to be much clearer.
use English '-no_match_vars';
$OUTPUT_AUTOFLUSH = 1;

Looking up variables is best done with perlvar (perldoc perlvar, or http://perldoc.perl.org/perlvar.html)
From that:
HANDLE->autoflush( EXPR )
$OUTPUT_AUTOFLUSH
$|
If set to nonzero,
forces a flush right away and after every write or print on the
currently selected output channel. Default is 0 (regardless of whether
the channel is really buffered by the system or not; $| tells you only
whether you've asked Perl explicitly to flush after each write).
STDOUT will typically be line buffered if output is to the terminal
and block buffered otherwise. Setting this variable is useful
primarily when you are outputting to a pipe or socket, such as when
you are running a Perl program under rsh and want to see the output as
it's happening. This has no effect on input buffering. See getc for
that. See select on how to select the output channel. See also
IO::Handle.
++ is the increment operator, which adds one to the variable.
So $|++ sets autoflush true (default 0 + 1 = 1, which boolean evals as true), which forces writes to stdout to not be buffered.

$| is one of Perl's special variables.
According to perlvar:
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel.

If Google is your only source of information, I can understand how looking up special variables in Perl could cause consternation. Fortunately there is perldoc! Every machine with perl on it should also have perldoc. Use it without command line parameters to get a list of all the Core documentation that comes with your version of Perl.
To look up all special variables: perldoc perlvar
To look up a specific special variable:perldoc -v '$|' ( on *nix,
use double quotes on Windows)
To look up perl's list of functions: perldoc perlfunc
To look up a specific function: perldoc -f sprintf
To look up the operators (including precedence): perldoc perlop
Armed with that information, you'll know what happens when you post-increment the Output Autoflush variable.
As a special bonus, perldoc.perl.org can manage all of these jobs with the exception of the -v search...

As others have pointed out, it enables autoflush on the selected output filehandle (which is likely STDOUT). What nobody else has said, though, is that while you're generally refactoring and neatening up code, you really ought to replace it with the equivalent but much more obvious
STDOUT->autoflush(1);

Related

How does IPC::Open3 enable autoflush for CHLD_IN

The documentation for IPC::Open3 states:
The CHLD_IN will have autoflush turned on
But nothing in the source code mentions IO::Handle::autoflush. What mechanism does the module use to turn on autoflush for CHLD_IN?
Buffering is disabled with the following line
select((select($handles[0]{parent}), $| = 1)[0]); # unbuffer pipe
which could be rewritten as
my $old_fh = select($handles[0]{parent});
$| = 1;
select($old_fh);
The traditional way to disable output buffering in Perl is via the $| variable. From man perlvar:
HANDLE->autoflush( EXPR )
$OUTPUT_AUTOFLUSH
$|
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
Setting $| acts on "the currently selected output channel" which is set with the one-argument form of select.

Line buffered reading in Perl

I have a perl script, say "process_output.pl" which is used in the following context:
long_running_command | "process_output.pl"
The process_output script, needs to be like the unix "tee" command, which dumps output of "long_running_command" to the terminal as it gets generated, and in addition captures output to a text file, and at the end of "long_running_command", forks another process with the text file as an input.
The behavior I am currently seeing is that, the output of "long_running_command" gets dumped to the terminal, only when it gets completed instead of, dumping output as it gets generated. Do I need to do something special to fix this?
Based on my reading in a few other stackexchange posts, i tried the following in "process_output.pl", without much help:
select(STDOUT); $| =1;
select(STDIN); $| =1; # Not sure even if this is needed
use FileHandle; STDOUT->autoflush(1);
stdbuf -oL -eL long_running_command | "process_output.pl"
Any pointers on how to proceed further.
Thanks
AB
This is more likely an issue with the output of the first process being buffered, rather than the input of your script. The easiest solution would be to try using the unbuffer command (I believe it's part of the expect package), something like
unbuffer long_running_command | "process_output.pl"
The unbuffer command will disable the buffering that happens normally when output is directed to a non-interactive place.
This will be the output processing of long_running_processing. More than likely it is using stdio - which will look to see what the output file descriptor is connected to before it does outputing. If it is a terminal (tty), then it will generally output line based, but in the above case - it will notice it is writing to a pipe and will therefore buffer the output into larger chunks.
You can control the buffering in your own process by using, as you showed
select(STDOUT); $| =1;
This means that things that your process prints to STDIO, are not buffered - it makes no sense doing this for input, as you control how much buffering is done - if you use sysread() then you are reading unbuffered, if you use a construct like <$fh> then perl will await until it has a "whole line" (it actually reads up to the next input line separator (as defined in variable $/ which is newline by default)) before it returns data to you.
unbuffer can be used to "disable" the output buffering, what it actually does is make the outputing process think that it is talking to a tty (by using a pseudo tty) so the output process does not buffer.

Perl operator: $|++; dollar sign pipe plus plus

I'm working on a new version of an already released code of perl, and found the line:
$|++;
AFAIK, $| is related with pipes, as explained in this link, and I understand this, but I cannot figure out what the ++ (plus plus) means here.
Thank you in advance.
EDIT: Found the answer in this link:
In short: It forces to print (flush) to your console before the next statement, in case the script is too fast.
Sometimes, if you put a print statement inside of a loop that runs really really quickly, you won’t see the output of your print statement until the program terminates. sometimes, you don’t even see the output at all. the solution to this problem is to “flush” the output buffer after each print statement; this can be performed in perl with the following command:
$|++;
[update]
as has been pointed out by r. schwartz, i’ve misspoken; the above command causes print to flush the buffer preceding the next output.
$| defaults to 0; doing $|++ thus increments it to 1. Setting it to nonzero enables autoflush on the currently-selected file handle, which is STDOUT by default, and is rarely changed.
So the effect is to ensure that print statements and the like output immediately. This is useful if you're outputting to a socket or the like.
$| is an abbreviation for $OUTPUT_AUTOFLUSH, as you had found out. The ++ increments this variable.
$| = 1 would be the clean way to do this (IMHO).
It's an old idiom, from the days before IO::Handle. In modern code this should be written as
use IO::Handle;
STDOUT->autoflush(1);
It increments autoflush, which is most probably equivalent to turning it on.

perl $|=1; What is this?

I am learning Writing CGI Application with Perl -- Kevin Meltzer . Brent Michalski
Scripts in the book mostly begin with this:
#!"c:\strawberry\perl\bin\perl.exe" -wT
# sales.cgi
$|=1;
use strict;
use lib qw(.);
What's the line $|=1; How to space it, eg. $| = 1; or $ |= 1; ?
Why put use strict; after $|=1; ?
Thanks
perlvar is your friend. It documents all these cryptic special variables.
$OUTPUT_AUTOFLUSH (aka $|):
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
Happy coding.
For the other questions:
There is no reason that use strict; comes after $|, except by the programmers convention. $| and other special variables are not affected by strict in this way. The spacing is also not important -- just pick your convention and be consistent. (I prefer spaces.)
$| = 1; forces a flush after every write or print, so the output appears as soon as it's generated rather than being buffered.
See the perlvar documentation.
$| is the name of a special variable. You shouldn't introduce a space between the $ and the |.
Whether you use whitespace around the = or not doesn't matter to Perl. Personally I think using spaces makes the code more readable.
Why the use strict; comes after $| = 1; in your script I don't know, except that they're both the sort of thing you'd put right at the top, and you have to put them in one order or the other. I don't think it matters which comes first.
It does not matter where in your script you put a use statement, because they all get evaluated at compile time.
$| is the built-in variable for autoflush. I agree that in this case, it is ambiguous. However, a lone $ is not a valid statement in perl, so by process of elimination, we can say what it must mean.
use lib qw(.) seems like a silly thing to do, since "." is already in #INC by default. Perhaps it is due to the book being old. This statement tells perl to add "." to the #INC array, which is the "path environment" for perl, i.e. where it looks for modules and such.

What does the '`' character do in Perl?

I was using Perl to read through each line of a file. I used a command line tool to call a service, and I noticed some interesting functionality that I can't figure out how to search for. To the variable $cmd I assigned the command that invokes the service. If I refer to $cmd later in the code it prints out the command line argument, but if I refer to it as `$cmd`, however, it gives the output from running the service.
What is the explanation for this?
It works just like backquotes in the shell, which is why it is called that. See sh(1) for details. It captures the standard output alone, and nothing else. It sets the $? variable to the 16-bit wait status word.
This is all explained in the perlop(1) manpage:
qx/STRING/
`STRING`
A string which is (possibly) interpolated and then
executed as a system command with /bin/sh or its
equivalent. Shell wildcards, pipes, and redirections
will be honored. The collected standard output of the
command is returned; standard error is unaffected. In
scalar context, it comes back as a single (potentially
multi-line) string, or undef if the command failed.
In list context, returns a list of lines (however
you’ve defined lines with $/ or
$INPUT_RECORD_SEPARATOR), or the empty list if the
command failed.
Because backticks do not affect standard error: use
shell file descriptor syntax (assuming the shell
supports this) if you care to address this. To
capture a command’s STDERR and STDOUT merged together:
$output = `cmd 2>&1`;
To capture a command’s STDOUT but discard its STDERR:
$output = `cmd 2>/dev/null`;
To capture a command’s STDERR but discard its STDOUT
(ordering is important here):
$output = `cmd 2>&1 1>/dev/null`;
To exchange a command’s STDOUT and STDERR in order to
capture the STDERR but leave its STDOUT to come out
the old STDERR:
$output = `cmd 3>&1 1>&2 2>&3 3>&-`;
To read both a command’s STDOUT and its STDERR
separately, it’s easiest to redirect them separately
to files, and then read from those files when the
program is done:
system("program args 1>program.stdout 2>program.stderr");
The STDIN filehandle used by the command is inherited
from Perl’s STDIN. For example:
open(BLAM, "blam") || die "$0: can't open blam: $!";
open (STDIN, "<&BLAM") || die "$0: can't dup BLAM: $!";
print `sort`;
will print the sorted contents of the file blam.
Using single-quote as the delimiter protects the command
from Perl’s double-quote interpolation, passing the contents on
to the shell instead:
$perl_info = qx(ps $$); # that's Perl's $$
$shell_info = qx'ps $$'; # that's the new shell's $$
How that string gets evaluated is entirely subject to
the command interpreter on your system. On most
platforms, you will have to protect shell
metacharacters if you want them treated literally.
This is in practice difficult to do, as it’s unclear
which characters need escaping, or how. See perlsec for a
clean and safe example of a manual fork and exec
to emulate backticks safely.
On some platforms (notably DOS-like ones), the shell
may not be capable of dealing with multiline commands,
so putting newlines in the string may not get you what
you want. You may be able to evaluate multiple
commands in a single line by separating them with the
command separator character, if your shell supports
that (e.g. ; on many Unix shells; & on the Windows
NT CMD.COM shell).
Beginning with v5.6.0, Perl attempts to flush all
files opened for output before starting the child
process, but this may not be supported on some
platforms (see perlport(1)). To be safe, you may need to
set $| ($AUTOFLUSH in English) or call the
autoflush method of IO::Handle on any open
handles.
Beware that some command shells may place restrictions
on the length of the command line. You must ensure
your strings don’t exceed this limit after any
necessary interpolations. See the platform-specific
release notes for more details about your particular
environment.
Using this operator can lead to programs that are
difficult to port, because the shell commands called
vary between systems, and may in fact not be present
at all. As one example, the type command under the
POSIX shell is very different from the type command
under DOS. That doesn't mean you should go out of
your way to avoid backticks when they’re the right way
to get something done. Perl was made to be a glue
language, and one of the things it glues together is
commands. Just understand what you’re getting
yourself into.
See I/O Operators for more discussion.
Here’s a simple example of using backticks to get the exit status of the first element in a pipeline:
$device = q(/dev/rmt8);
$dd_noise = q(^[0-9]+\+[0-9]+ records (in|out)$);
$status = `exec 3>&1; ((dd if=$device ibs=64k 2>&1 1>&3 3>&- 4>&-; echo $? >&4) | egrep -v "$dd_noise" 1>&2 3>&- 4>&-) 4>&1`;
EDIT
Well ok then, so maybe that wasn’t that simple an example. :) But this one is.
I’d like to recommend the Capture::Tiny CPAN module as a simpler way to manage the output from external commands that you would normally run using backquotes. It has advantages and disadvantages, but I feel that for many people, the advantages outweigh any arguable disadvantageL
The advantage is that you get to do all this without requiring deep knowledge of arcane mysteries of file-descriptor redirection the way the previous example did.
The disadvantage is it’s yet another non-core dependency — something else you have to install from CPAN.
That’s really not bad for what you get.
Here’s an example of how easy it is:
NAME
Capture::Tiny - Capture STDOUT and STDERR from Perl, XS, or external programs
SYNOPSIS
use Capture::Tiny qw/capture tee capture_merged tee_merged/;
($stdout, $stderr) = capture {
# your code here
};
($stdout, $stderr) = tee {
# your code here
};
$merged = capture_merged {
# your code here
};
$merged = tee_merged {
# your code here
};
DESCRIPTION
Capture::Tiny provides a simple, portable way to capture anything sent to STDOUT or STDERR, regardless of whether it comes from Perl, from XS code
or from an external program. Optionally, output can be teed so that it is captured while being passed through to the original handles. Yes, it
even works on Windows. Stop guessing which of a dozen capturing modules to use in any particular situation and just use this one.
There, isn’t that a whole lot easier?
The back-quote in Perl does much the same as the back-quote in shell - it runs a command and captures the standard output.
See also qx//.
I think the backtick lets you run commands and store their output in a variable:
$listing=`ls -1 /tmp/`;