What does "select((select(s),$|=1)[0])" do in Perl? - perl

I've seen some horrific code written in Perl, but I can't make head nor tail of this one:
select((select(s),$|=1)[0])
It's in some networking code that we use to communicate with a server and I assume it's something to do with buffering (since it sets $|).
But I can't figure out why there's multiple select calls or the array reference. Can anyone help me out?

It's a nasty little idiom for setting autoflush on a filehandle other than STDOUT.
select() takes the supplied filehandle and (basically) replaces STDOUT with it, and it returns the old filehandle when it's done.
So (select($s),$|=1) redirects the filehandle (remember select returns the old one), and sets autoflush ($| = 1). It does this in a list ((...)[0]) and returns the first value (which is the result of the select call - the original STDOUT), and then passes that back into another select to reinstate the original STDOUT filehandle. Phew.
But now you understand it (well, maybe ;)), do this instead:
use IO::Handle;
$fh->autoflush;

The way to figure out any code is to pick it apart. You know that stuff inside parentheses happens before stuff outside. This is the same way you'd figuring out what code is doing in other languages.
The first bit is then:
( select(s), $|=1 )
That list has two elements, which are the results of two operations: one to select the s filehandle as the default then one to set $| to a true value. The $| is one of the per-filehandle variables which only apply to the currently selected filehandle (see Understand global variables at The Effective Perler). In the end, you have a list of two items: the previous default filehandle (the result of select), and 1.
The next part is a literal list slice to pull out the item in index 0:
( PREVIOUS_DEFAULT, 1 )[0]
The result of that is the single item that is previous default filehandle.
The next part takes the result of the slice and uses it as the argument to another call to select
select( PREVIOUS_DEFAULT );
So, in effect, you've set $| on a filehandle and ended up back where you started with the default filehandle.

select($fh)
Select a new default file handle. See http://perldoc.perl.org/functions/select.html
(select($fh), $|=1)
Turn on autoflush. See http://perldoc.perl.org/perlvar.html
(select($fh), $|=1)[0]
Return the first value of this tuple.
select((select($fh), $|=1)[0])
select it, i.e. restore the old default file handle.
Equivalent to
$oldfh = select($fh);
$| = 1;
select($oldfh);
which means
use IO::Handle;
$fh->autoflush(1);
as demonstrated in the perldoc page.

In another venue, I once proposed that a more comprehensible version would be thus:
for ( select $fh ) { $| = 1; select $_ }
This preserves the compact idiom’s sole advantage that no variable needs be declared in the surrounding scope.
Or if you’re not comfortable with $_, you can write it like this:
for my $prevfh ( select $fh ) { $| = 1; select $prevfh }
The scope of $prevfh is limited to the for block. (But if you write Perl you really have no excuse to be skittish about $_.)

It's overly clever code for turning on buffer flushing on handle s and then re-selecting the current handle.
See perldoc -f select for more.

please check perldoc -f select. For the meaning of $|, please check perldoc perlvar

It is overoptimization to skip loading IO::Handle.
use IO::Handle;
$fh->autoflush(1);
is much more readable.

Related

What does "[0]" mean in Perl? [duplicate]

This question already has answers here:
What does "select((select(s),$|=1)[0])" do in Perl?
(7 answers)
Closed 4 years ago.
What is the [0] doing in this code:
select((select(LOG_FILE),$!=1)[0]);
UPDATE: I answered this ten years ago! What does “select((select(s),$|=1)[0])” do in Perl?
You're looking at a single element access to a list. The expression in side the parentheses produces some sort of list and the [0] selects one item from the list.
This bit of code is a very old idiom to set a per-filehandle kinda-global variable. I think you probably meant $| (the autoflush setting) instead of $!.
First, remember that Perl has the concept of a "default filehandle". That starts out as standard output, but you can change it. That's what the select does.
Next, realize that each file handle knows its own settings for various things; these are represented by special variables such as $| (see perlvar's section on "Variables related to Filehandles"). When you change these variables, they apply to the current default filehandle.
So, what you see in this idiom is an inner select that changes the default filehandle. You change the default then set $| to whatever value you want. It looks a bit odd because you have two expressions separated by a comma instead of a semicolon, the use statement separator:
(select(LOG_FILE), $|=1)
From this, the idiom wants the result of the select; that's the previous default filehandle. To get that you want the first item in that list. That's in index 0:
(select(LOG_FILE), $|=1)[0]
The result of that entire expression is the previous default filehandle, which you now want to restore. Do that with the outer select:
select((select(LOG_FILE), $|=1)[0]);
You could have written that with an intermediate variable:
my $previous = select LOG_FILE;
$| = 1;
select($previous);
If you're writing new stuff on your own, you might use scalar variable for the filehandle then call its autoflush method:
open my $log_file_fh, '>', $log_filename or die ...;
$log_file_fh->autoflush(1);
( LIST1 )[ LIST2 ] is a list slice. In list context, it evaluates to the elements of LIST1 specified by LIST2.
In this case, it returns the result of the select.
select((select(LOG_FILE),$!=1)[0]);
should be
select((select(LOG_FILE),$|=1)[0]);
The latter enables auto-flushing for the LOG_FILE file handle. It can be written more clearly as follows:
use IO::Handle (); # Only needed in older versions of Perl.
LOG_FILE->autoflush(1);
By the way, you shouldn't be using global variables like that. Instead of
open LOG_FILE, ...
you should be using
open my $LOG_FILE, ...

How is a Perl filehandle a scalar if it can return multiple lines?

I have kind of fundamental question about scalars in Perl. Everything I read says scalars hold one value:
A scalar may contain one single value in any of three different
flavors: a number, a string, or a reference. Although a scalar may not
directly hold multiple values, it may contain a reference to an array
or hash which in turn contains multiple values.
--from perldoc
Was curious how the code below works
open( $IN, "<", "phonebook.txt" )
or die "Cannot open the file\n";
while ( my $line = <$IN> ) {
chomp($line);
my ( $name, $area, $phone ) = split /\|/, $line;
print "$name $phone $phone\n";
}
close $IN;
Just to clarify the code above is opening a pipe delimited text file in the following format name|areacode|phone
It opens the file up and then it splits them into $name $area $phone; how does it go through the multiple lines of the file and print them out?
Going back to the perldoc quote from above "A scalar may contain a single value of a string, number, reference." I am assuming that it has to be a reference, but doesn't even really seem like a reference and if it is looks like it would a reference of a scalar? so I am wondering what is going on internally that allows Perl to iterate through all of the lines in the code?
Nothing urgent, just something I noticed and was curious about. Thanks.
It looks like Borodin zeroed in on the part you wanted, but I'll add to it.
There are variables, which store things for us, and there are operators, which do things for us. A file handle, the thing you have in $IN, isn't the file itself or the data in the file. It's a connection that the program to use to get information from the file.
When you use the line input operator, <>, you give it a file handle to tell it where to grab the next line from. By itself, it defaults to ARGV, but you can put any file handle in there. In this case, you have <$IN>. Borodin already explained the reference and bareword stuff.
So, when you use the line input operator, it look at the connection you give in then gets a line from that file and returns it. You might be able to grok this more easily with it's function form:
my $line = readline( $IN );
The thing you get back doesn't come out of $IN, but the thing it points to. Along the way, $IN keeps track of where it is in the file. See seek and tell.
Along the same lines are Perl's regexes. Many people call something like /foo.*bar/ a regular expression. They are slightly wrong. There's a regular expression inside the pattern match operator //. The pattern is the instructions, but it doesn't do anything by itself until the operator uses it.
I find in my classes if I emphasize the difference between the noun and verb parts of the syntax, people have a much easier time with this sort of stuff.
Old Answer
Through each iteration of the while loop, exactly one value is put into the scalar variables. When the loop is done with a line, everything is reset.
The value in $line is a single value: the entire line which you have not broken up yet. Perl doesn't care what that single value looks like. With each iteration, you deal with exactly one line and that's what's in $line. Remember, these are variables, which means you can modify and replace their values, so they can only hold one thing at a time, but there can be multiple times.
The scalars $name, $area, and $phone have single values, each produced by split. Those are lexical variables (my), so they are only visible inside the specific loop iteration where they are defined.
Beyond that, I'm not sure which scalar you might be confused about.
The old-fashioned way of opening files is to use a bare name for the file handle, like so
open IN, 'phonebook.txt'
A file handle is a special type of value, like scalar, hash, array etc. but it has no prefix symbol to differentiate it. (This isn't actually the full extent of the truth, but I am worried about confusing you if I add even more detail.)
Perl still works like this, but it is best avoided for a couple of reasons.
All such file handles are global, and there is no way to restrict access to them by scope
There is no way to pass the value to a subroutine or store it in a data structure
So Perl was enhanced several years ago so that you can use references to file handles. These can be stored in scalar variables, arrays, or hashes, and can be passed as subroutine parameters.
What happens now when you write
open my $in, '<', 'phonebook.txt'
is that perl autovivifies an anonymous file handle, and puts a reference to it in variable $in, so yes, you were right, it is a reference. (Another thing that was changed about the same time was the move to three-parameter open calls, which allow you to open a file called, say, >.txt for input.)
I hope that helps you to understand. It's an unnecessary level of detail, but it can often help you to remember the way Perl works to understand the underlying details.
Incidentally, it is best to keep to lower-case letters for lexical variables, even for file handle references. I often add fh to the end to indicate that the variable holds a file handle, like $in_fh. But there's no need to use capitals, which are generally reserved for global variables like Package::Names.
Update - The Rest of the Story
I thought I should add something to explain what I have mised out, for fear of misleading people who care about the gory detail.
Perl keeps a symbol table hash - a stash - that work very like ordinary Perl hashes. There is one such stash for each package, including the default package main. Note that this hash nothing to do with lexical variables - declared with my - which are stored entirely separately.
Ther indexes for the stashes are the names of the package variables, without the initial symbol. So, for example, if you have
our $val;
our #val;
our %val;
then the stash will have only a single element, with a key of val and a value which is a reference to an intermediate structure called a typeglob. This is another hash structure, with one element for each different type of variable that has been declared. In this case our val typeglob will have three elements, for the scalar, array, and hash varieties of the val variables.
One of these elements may also be an IO variable type, which is where file handles are kept. But, for historical reasons, the value that is passed around as a file handle is in fact a reference to the typeglob that contains it. That is why, if you write open my $in, '<', 'phonebook.txt' and then print $in you will see something like GLOB(0x269581c) - the GLOB being short for typeglob.
Apart from that, the account above is accurate. Perl autovivifies an anonymous typeglob in the current package, and uses only its IO slot for the file handle.
Scalars in Perl are denoted by a $ and they can indeed contain the type of values you mention in your questions but next to that they can also contain a file handle. You can create file handles in Perl in two ways one way is Lexical
open my $filehandle, '>', '/path/to/file' or die $!;
and the other is global
open FILEHANDLE, '>', '/path/to/file' or die $!;
You should use the Lexical version which is what you're doing.
The while loop in your code uses the <> operator on your lexical filehandle which returns a line out of your file every time it's called, until it's out of lines (when End Of File is reached) in which case it returns false.
I went into a bit more detail on file handles as it seems it's a concept you're not completely clear on.

Perl, what does $|++ do?

I'm re-factoring some perl code, and as seems to be the case, Perl has some weird constructs that are a pain to look up.
In this case I encountered the following...
$|++;
This is on a line by itself just after the "use" statements.
What does this command do?
From perldoc perlvar:
$|
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Therefore, as it always starts as 0, this increments it to 1, forcing a flush after every write/print.
You can replace it with the following to be much clearer.
use English '-no_match_vars';
$OUTPUT_AUTOFLUSH = 1;
Looking up variables is best done with perlvar (perldoc perlvar, or http://perldoc.perl.org/perlvar.html)
From that:
HANDLE->autoflush( EXPR )
$OUTPUT_AUTOFLUSH
$|
If set to nonzero,
forces a flush right away and after every write or print on the
currently selected output channel. Default is 0 (regardless of whether
the channel is really buffered by the system or not; $| tells you only
whether you've asked Perl explicitly to flush after each write).
STDOUT will typically be line buffered if output is to the terminal
and block buffered otherwise. Setting this variable is useful
primarily when you are outputting to a pipe or socket, such as when
you are running a Perl program under rsh and want to see the output as
it's happening. This has no effect on input buffering. See getc for
that. See select on how to select the output channel. See also
IO::Handle.
++ is the increment operator, which adds one to the variable.
So $|++ sets autoflush true (default 0 + 1 = 1, which boolean evals as true), which forces writes to stdout to not be buffered.
$| is one of Perl's special variables.
According to perlvar:
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel.
If Google is your only source of information, I can understand how looking up special variables in Perl could cause consternation. Fortunately there is perldoc! Every machine with perl on it should also have perldoc. Use it without command line parameters to get a list of all the Core documentation that comes with your version of Perl.
To look up all special variables: perldoc perlvar
To look up a specific special variable:perldoc -v '$|' ( on *nix,
use double quotes on Windows)
To look up perl's list of functions: perldoc perlfunc
To look up a specific function: perldoc -f sprintf
To look up the operators (including precedence): perldoc perlop
Armed with that information, you'll know what happens when you post-increment the Output Autoflush variable.
As a special bonus, perldoc.perl.org can manage all of these jobs with the exception of the -v search...
As others have pointed out, it enables autoflush on the selected output filehandle (which is likely STDOUT). What nobody else has said, though, is that while you're generally refactoring and neatening up code, you really ought to replace it with the equivalent but much more obvious
STDOUT->autoflush(1);

Perl operator: $|++; dollar sign pipe plus plus

I'm working on a new version of an already released code of perl, and found the line:
$|++;
AFAIK, $| is related with pipes, as explained in this link, and I understand this, but I cannot figure out what the ++ (plus plus) means here.
Thank you in advance.
EDIT: Found the answer in this link:
In short: It forces to print (flush) to your console before the next statement, in case the script is too fast.
Sometimes, if you put a print statement inside of a loop that runs really really quickly, you won’t see the output of your print statement until the program terminates. sometimes, you don’t even see the output at all. the solution to this problem is to “flush” the output buffer after each print statement; this can be performed in perl with the following command:
$|++;
[update]
as has been pointed out by r. schwartz, i’ve misspoken; the above command causes print to flush the buffer preceding the next output.
$| defaults to 0; doing $|++ thus increments it to 1. Setting it to nonzero enables autoflush on the currently-selected file handle, which is STDOUT by default, and is rarely changed.
So the effect is to ensure that print statements and the like output immediately. This is useful if you're outputting to a socket or the like.
$| is an abbreviation for $OUTPUT_AUTOFLUSH, as you had found out. The ++ increments this variable.
$| = 1 would be the clean way to do this (IMHO).
It's an old idiom, from the days before IO::Handle. In modern code this should be written as
use IO::Handle;
STDOUT->autoflush(1);
It increments autoflush, which is most probably equivalent to turning it on.

perl $|=1; What is this?

I am learning Writing CGI Application with Perl -- Kevin Meltzer . Brent Michalski
Scripts in the book mostly begin with this:
#!"c:\strawberry\perl\bin\perl.exe" -wT
# sales.cgi
$|=1;
use strict;
use lib qw(.);
What's the line $|=1; How to space it, eg. $| = 1; or $ |= 1; ?
Why put use strict; after $|=1; ?
Thanks
perlvar is your friend. It documents all these cryptic special variables.
$OUTPUT_AUTOFLUSH (aka $|):
If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See getc for that. See select on how to select the output channel. See also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
Happy coding.
For the other questions:
There is no reason that use strict; comes after $|, except by the programmers convention. $| and other special variables are not affected by strict in this way. The spacing is also not important -- just pick your convention and be consistent. (I prefer spaces.)
$| = 1; forces a flush after every write or print, so the output appears as soon as it's generated rather than being buffered.
See the perlvar documentation.
$| is the name of a special variable. You shouldn't introduce a space between the $ and the |.
Whether you use whitespace around the = or not doesn't matter to Perl. Personally I think using spaces makes the code more readable.
Why the use strict; comes after $| = 1; in your script I don't know, except that they're both the sort of thing you'd put right at the top, and you have to put them in one order or the other. I don't think it matters which comes first.
It does not matter where in your script you put a use statement, because they all get evaluated at compile time.
$| is the built-in variable for autoflush. I agree that in this case, it is ambiguous. However, a lone $ is not a valid statement in perl, so by process of elimination, we can say what it must mean.
use lib qw(.) seems like a silly thing to do, since "." is already in #INC by default. Perhaps it is due to the book being old. This statement tells perl to add "." to the #INC array, which is the "path environment" for perl, i.e. where it looks for modules and such.